Comparison of similarity measure for web document clustering 

Author & Affiliation:
Gunjan Ansariu00a0(gunjan_ansari@yahoo.co.in)
Department of Information Technology, JSS Academy of Technical Education, C-20/1, Sector-62 Noida
Keyword:
Comparison , similarity measure, document clustering
Issue Date:
December 2009
Abstract:

With the rapid growth of the World Wide Web (www), it becomes a critical issue to design and organize the vast amounts of on-line documents on the web according to their topic. Even for the search engines it is very important to group similar documents in order to improve their performance when a query is submitted to the system. Clustering is useful for taxonomy design and similarity search of documents on such a domain.

Similarity or distance measures play important role in the performance of clustering algorithms. This paper compares three term based similarity measures for web document clustering. The similarity measures used are Euclidean distance, cosine measure and jaccard measure. The clustering algorithm used is the so-called k-means clustering to cluster web documents. These three different similarity measures are used to find the similarity between documents. The clusters are formed based on their similarity measure calculation using k-means clustering algorithm. Overall similarity measure is used to evaluate the clusters formed using different similarity measure. Tested with web data, we observe that the Euclidean measure outperforms the other similarity measures in clustering accuracy.

Pages:
981-988
ISSN:
2319-8044 (Online) - 2231-346X (Print)
Source:
DOI:
jusps-A
Share This:
Facebook Twitter Google Plus LinkedIn Reddit

Copy the following to cite this article:

G. Ansariu00a0, "Comparison of similarity measure for web document clustering ", Journal of Ultra Scientist of Physical Sciences, Volume 21, Issue 3, Page Number 981-988, 2018

Copy the following to cite this URL:

G. Ansariu00a0, "Comparison of similarity measure for web document clustering ", Journal of Ultra Scientist of Physical Sciences, Volume 21, Issue 3, Page Number 981-988, 2018

Available from: https://www.ultrascientist.org/paper/1288/

Ansari Education And Research Society
Facebook Google Plus Twitter