抽象的な

Impact of Similarity Measures on Causal Relation Based Feature Selection Method for Clustering Maritime Accident Reports

Santosh Tirunagari, Maria Hanninen, Guggilla Abhishek, Kaarle Stahlberg, and Pentti Kujala

Unsupervised document clustering is an automated process in which documents are analyzed based on their similarity. In this paper, we propose a new feature selection method based on causal relations to classify maritime accident reports in unsupervised manner. We also compare the impact of different similarity measures on proposed feature selection method. Based on the analysis, we conclude that the proposed feature selection method has better performance over the conventional method due to the effect of dimensionality curse. The impact of similarity measures improves with the proposed feature selection method. In the analysis, we have compared Correlation, Cosine, Spearman, Bray-Curtis, Euclidean, City-block, Squared-Euclidean, Standardized Euclidean, and, Chebychev similarity measures. The first two produced the best results, followed by the next two. The rest did not produce good results with the maritime accident reports used in our analysis. Interestingly Chi-Square gave good results with proposed method in our analysis.

免責事項: この要約は人工知能ツールを使用して翻訳されており、まだレビューまたは確認されていません