Classification Algorithms in Data Mining

Classification Algorithms 

Classification is a categorization process used to determine the class of data whose class label is not known. Predicting class labels is the main goal of this classification.

There are two stages in the classification process, namely first learning, where this learning analyzes training data using a classification algorithm, then the classification process, namely predicting the accuracy of the classification using testing data. Data classification can be started by creating certain classification rules using training data and test data.

This classification also has several types, including:

  • Decision Trees

The Decision Tree method is a classification method that uses a tree structure, explicitly a decision tree describes a pattern/knowledge or information in the form of a decision tree. Trees are built in a top-down recursive divide and conquer manner. This algorithm includes conditional control statements to classify data. A decision tree starts at one point or node which then branches in two directions or it could be more and each branch has a different result until it reaches the achieved results or goals. The types of decision tree algorithms that are widely used are the C.45 algorithm using the Gain Ratio attribute, ID3 (Iterative Dichotimiser 3) with the Information Gain attribute, CART (Classification Regression Tree) with the Gini Index attribute.

  • Bayesian Classification or Naive Bayes

The Naive Bayes method is a classification method using probability and statistical methods which are rooted in Bayes’ theorem, namely predicting future opportunities based on previous experience. Naive Bayes has a better level of accuracy and the advantage of using this method is that we only need a small amount of training data to determine the parameter estimates needed in the classification process. The naive Bayes algorithm is very suitable for classifying nominal data sets.

  • K-Nearest Neighbor (k-NN)

K-Nearest Neighbor is a method for classifying objects based on learning data that is closest to the object. The principle of this algorithm is to calculate the closest distance of new data to its nearest neighbors and rely on memory with the majority of neighbor classifications. K-NN is widely used by researchers because it provides good accuracy results on large amounts of data.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *