Research on Big Data Classification Based on K-Means Bayes Algorithm in Cloud Storage Environment

Changhong Wu

Research on Big Data Classification Based on K-Means Bayes Algorithm in Cloud Storage Environment

Download as PDF

DOI: 10.38007/Proceedings.0000878

Author(s)

Changhong Wu

Corresponding Author

Changhong Wu

Abstract

Cloud storage environment provides a broader space for the storage and extraction of massive big data, but also puts forward higher requirements for the performance of data classification algorithm. In view of the shortcomings of the accuracy and poor convergence performance of existing big data classification methods, this paper proposes the research of big data classification method based on K-means Bayes algorithm. Using Bayes theory to calculate the posterior probability value of data set, using k-means algorithm to improve the generalization performance of Bayes model in small sample environment; selecting cluster center and calculating the Euclidean distance from current data set to cluster center, extracting the missing value index of data set, and realizing the accurate classification of target big data set. The simulation results show that the accuracy of the proposed method is due to the traditional classification method, and the convergence performance of the algorithm model is better.

Keywords

Cloud Storage Environment; K-Means; Bayes; European Distance; Missing Value