Classification

Gopal Khadka
5 min readJun 8, 2024

--

https://www.paperswithcode.com/task/classification/latest

Classification is the process of grouping the similar data items in a single group and separating the dissimilar data items in different groups. For example: dividing the students into different grade class (A+, A, B+, …) based on their obtained marks. In the context of data mining, it is the process of finding the model that is capable of distinguishing the data onto different classes based on their similarities and dissimilarities.

It is used in almost all areas like medical (grouping the medical patients based on their symptoms), sales (categorizing the customers), finance (categorizing the loan applicants), etc.

Classification process

First, we gather an unclassified dataset (structured or unstructured) and then pass it to a classifier (the model) to get the classified dataset. Various classification algorithms can be used for this purpose, and the appropriate algorithm can be used based on your requirements.

Learning and Testing of Classification

The steps involved in learning and testing of classification are given below:

  1. Data Preparation
    Gather the necessary data and pre-process them if necessary. Split the data into training and testing sets.
  2. Model selection
    Choose the appropriate algorithm based on your requirements (e.g. logistic regression, decision tree, random forest). Tune the hyperparameter properly for effective and optimal performance of the model.
  3. Model training
    Use the training data to fit the data into the classification model and learn underlying data patterns and insights.
  4. Model evaluation
    Evaluate the model based on metrics like accuracy, efficiency, speed and more.
  5. Model optimization
    If the results of the model is not satisfactory, optimize the model by tasks like feature engineering, choosing the different algorithm and tune hyperparameters further.

Decision Tree Induction

Decision Tree

A decision tree is a tree in which each branch node represents the no of choices and the terminal node represents the decision or classification. In classification, a decision tree is a classifier that classify an instance starting from the root node until the terminal node is found.

Terminologies

  1. Root node: It is the starting node of the decision tree, which gets split into further nodes.
  2. Decision node: It is when the node further splits into sub-nodes.
  3. Terminal node: It is when the node can’t be further split into sub-nodes.
  4. Splitting: It is the process of dividing nodes into sub-nodes.
  5. Pruning: It is the process of merging sub-nodes back into a single node. It is the opposite of splitting.
  6. Child node: It is sub-nodes divided from a single node (parent node).

Example

https://www.vrogue.co/post/decision-tree-classification-in-python-shishir-kant-singh

From the above figure, we can see how we can turn the training data into a decision tree model and perform the decision based on that. Decision tree helps to visualize the data better, which helps in better understanding of the data.

Merits of decision tree

  1. Interpretability: Transparent and easy to interpret model
  2. Robustness: Handles both numerical and categorical features/ inputs and insensitive to outliers
  3. Versatility: Can perform both classification and regression tasks

Demerits of decision tree

  1. Overfitting: Decision tree may overfit the data, especially when the tree grows too deep
  2. Bias towards the feature with larger no of unique values
  3. Sensitive to feature scaling or normalization
  4. Resource intensive and expensive due to long time requirement and complexity

Bayesian Classification

https://www.dataminingapps.com/2019/09/bayesian-networks-lessons-learned-from-the-past/

Bayesian network is a powerful supervised machine learning model that helps to make prediction by using principles of Bayes Theorem. It is a probabilistic graphical model that represents the knowledge about an uncertain domain, where each node corresponds to a random variable and each edge represents conditional probability for the corresponding variable.

Bayesian Theorem

It is a fundamental concept in probability that describes the relationship between two conditional probability of two events.

https://towardsdatascience.com/bayes-rule-with-a-simple-and-practical-example-2bce3d0f4ad0

Bayesian classification helps to predict the label of the instance based on its features by calculating posterior given all the features of the instance. The Bayesian network is trained on a labelled dataset with a set of features along with corresponding labels.

From this data, the classifier finds the posterior probabilities of each class and conditional probabilities of the features. For new instance, it calculates the posterior probabilities for each class using the Bayes Theorem. The class with the highest posterior probability is then assigned to the predicted label.

This classification often assumes that features are independent of each other given the class label (known as “naive” assumption). This makes calculation more efficient and easy.

Rule Based Classification

Rule based classification is a straight forward approach for building the classification model where prediction is made based on set of pre-defined rules.
In rule based classification, the rules are manually defined by domain experts or extracted from the training data. These rules usually take forms of “IF-THEN” statements, where the antecedent (IF part) describes the conditions based on features values and the consequent (THEN part) describes the predicted class.
Even though this classification is interpretable and flexible, it can’t handle complex relationships.

Linear Regression

https://www.paperswithcode.com/task/classification/latest

Linear based classification uses a linear regression model to make binary or multi-class predictions. As shown in the above figure of binary classification, the values above the threshold (line) are classified as positive and the values below the threshold are classified as negative.

This type of classification is simple, efficient and easy to implement but sensitive to feature scaling and can only handle simple predictions. So it is only suitable for well-behaved classification problems where decision boundary can be approximated by linear function.

--

--

Gopal Khadka
Gopal Khadka

Written by Gopal Khadka

Aspiring coder and writer, passionate about crafting compelling stories through code and words. Merging creativity and technology to connect with others.