SWlab Researchers Present Novel Document Clustering Approach at ICOASE 2020

The Semantic Web Lab (SWlab) at the University of Zakho has made significant strides in the field of document clustering with the publication of a research paper at the 2020 International Conference on Advanced Science and Engineering (ICOASE).

The paper, titled “Semantic Document Clustering using K-means algorithm and Ward’s Method,” authored by Niyaz M Salih and Karwan Jacksi, addresses the critical challenge of organizing and retrieving information from the ever-growing volume of textual data available today.

The research introduces a novel approach to document clustering that leverages semantic analysis to group documents based on their underlying meaning rather than just surface-level similarities. The key steps of the proposed method include:

  1. Data Collection and Preprocessing: Extracting summaries from IMDB and Wikipedia datasets, followed by tokenization and stemming to prepare the text for analysis.
  2. Vector Space Model: Constructing a vector space model using TF-IDF to represent the importance of terms within each document.
  3. Semantic Analysis: Utilizing the NLTK dictionary and WordNet to identify and analyze the semantic relationships between words and concepts.
  4. Clustering Algorithms: Employing both K-means and Ward’s method to group documents based on their semantic similarity.

The results of the research were visualized and presented in an interactive website, providing a clear and intuitive representation of the relationships between different clusters.

This research contributes valuable insights into the field of document clustering and demonstrates the potential of semantic analysis for improving the accuracy and effectiveness of information retrieval systems.

For further details and access to the full paper, please refer to:

  • Publication: 2020 International Conference on Advanced Science and Engineering (ICOASE)
  • Authors: Niyaz M Salih, Karwan Jacksi
  • Pages: 1-6
  • Publisher: IEEE

Keywords: Semantic Similarity, Document Clustering, K-means, Ward’s Method, TF-IDF, NLTK, WordNet, Semantic Web, University of Zakho, SWlab

Leave a Reply

Your email address will not be published. Required fields are marked *