The Semantic Web Lab (SWlab) at the University of Zakho has achieved a significant milestone with the publication of a groundbreaking research paper in IEEE. The paper, titled “Clustering Documents based on Semantic Similarity using HAC and K-Mean Algorithms,” presents a novel approach to document clustering that leverages semantic analysis for improved accuracy.
Authored by Karwan Jacksi, Rowaida Kh. Ibrahim, Subhi R. M. Zeebaree, Rizgar R. Zebari, and Mohammed A. M. Sadeeq, the research addresses the limitations of traditional document clustering methods that primarily rely on statistical features and syntactic analysis.
The proposed method in this paper focuses on semantic similarity by:
- Extracting document summaries: The researchers extracted summaries from reputable sources like Wikipedia and IMDB.
- Semantic analysis: The summaries were then processed using the Natural Language Toolkit (NLTK) dictionary to capture the underlying semantic meaning of the text.
- Vector space modeling: The processed data was transformed into a vector space model using TF-IDF, a widely used technique for weighting the importance of terms within a document.
- Clustering algorithms: The final stage involved applying two robust clustering algorithms: Hierarchical Agglomerative Clustering (HAC) and K-means, to group documents based on their semantic similarity.
The results of the research were compared, analyzed, and presented in an interactive webpage for easy visualization and exploration. This innovative approach has the potential to significantly improve the accuracy and effectiveness of document organization and retrieval in various domains.
This publication in IEEE is a testament to the high-quality research conducted by the SWlab at the University of Zakho and its dedication to advancing the field of semantic web technologies.
For further details and access to the full paper, please visit: