An Efficient K-means Seeding Algorithm

Omar Kettani

Abstract


This study presents a novel initialization algorithm for the k-means clustering algorithms, which consists to sort the points of a given dataset by both their angles and their norms. The proposed method aims to improve the speed and accuracy of clustering solutions by providing a more informed starting point for the iterative optimization process. Through extensive experiments on a variety of datasets, the proposed algorithm was shown to significantly improve convergence time and result in solutions with higher quality in term of average Silhouette index compared to traditional methods. The results indicate that the proposed initialization technique is a promising alternative for clustering tasks in various domains.

Full Text:

PDF

References


Lloyd, S.P., 1982. Least square quantization in PCM. IEEE Trans. Inform. Theor., 28: 129-136.

MacQueen, J.B., 1967. Some Method for Classification and Analysis of Multivariate Observations, Proceeding of the Berkeley Symposium on Mathematical Statistics and Probability, (MSP’67), Berkeley, University of California Press, pp: 281-297.K. Elissa, “Title of paper if known,” unpublished.

Arthur, D.; Vassilvitskii, S. (2007). "k-means++: the advantages of careful seeding" (PDF). Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics Philadelphia, PA, USA. pp. 1027–1035.

Nielsen, Frank (2016). "8. Hierarchical Clustering". Introduction to HPC with MPI for Data Science. Springer. pp. 195–211. ISBN 978-3-319-21903-5.

Székely, G. J.; Rizzo, M. L. (2005). "Hierarchical clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method". Journal of Classification. 22 (2): 151–183. doi:10.1007/s00357-005-0012-9. S2CID 206960007.

Dunn, J. C. (1973-01-01). "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters". Journal of Cybernetics. 3 (3): 32–57. doi:10.1080/01969727308546046. ISSN 0022-0280.

Katsavounidis, I., C.C.J. Kuo and Z. Zhen, 1994. A new initialization technique for generalized Lloyd iteration. IEEE. Sig. Process. Lett., 1: 144-146.

T. Vo-Van, A. Nguyen-Hai, M. V. Tat-Hong, T. Nguyen-Trang, "A New Clustering Algorithm and Its Application in Assessing the Quality of Underground Water", Scientific Programming, vol. 2020, Article ID 6458576, 12 pages, 2020. https://doi.org/10.1155/2020/6458576

L. Kaufman and P. J. Rousseeuw. Finding groups in Data: “an Introduction to Cluster Analysis”. Wiley, 1990.


Refbacks

  • There are currently no refbacks.


Copyright (c) 2023 Journal of Electrical Engineering, Electronics, Control and Computer Science

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.