## Machine Learning : Introduction to Naive Bayes Classifier in Python

Have you ever wondered how the email service providers apply spam filtering? How do social media perform the sentiment analysis or how do news channels perform the text classification? In machine learning, a simple but surprisingly powerful algorithm, Naive Bayes can help us with all of these. So let's get started with Naive Bayes.

## What is Naive Bayes Classifier?

In machine learning, the Naive Bayes belongs to probabilistic classification algorithms. For example, flipping two coins and finding probability of getting two heads, where the sample space is **{HH, HT, TH, TT}** H is for Head and
T is for Tail. $$ P(\text{Getting Two Heads}) = 1/4 $$

The Naive Bayes assumes that all the features in a class are independent to any other feature in that class. All the features independently calculate the probability. For example, consider Person as a class and first name and last name as its properties. Probability of having first name "A" is independent of any value in last name. Another example, an object can be predicted as lemon if it is round in size, about one inch in diameter and yellow in color. All these features are dependent on each other to predict it as lemon, but in Naive Bayes, all these properties contribute independently so it is called Naive.

## Bayes' Theorem and its use

The Bayes' theorem gives the conditional probability of event A given that the other event B has occured. This means that it describes the probability of an event based on prior knowledge related to the event.

The Bayes' theorem has following equation. $$ P(A|B) = P(B|A)P(A)/P(B) $$ where

- P(A) = Probability of event A, it is prior probability
- P(B) = Probability of event B, it is evidence or predictor prior probability
- P(B|A) = Conditional Probability of event B given A, it is likelihood
- P(A|B) = Conditional Probability of event A given B, it is posterior probability

Let **SH** be the event *second coin is head* and **FT** be the event *first coin is tail*.
So second coin being head given first is tail can be written as

$$ P(SH|FT) $$
$$ = \frac{P(FT|SH)P(SH)}{P(FT)} $$
$$ = \frac{P(\text{first coin being tail given second is head}) P(\text{second is head})}{ P(\text{first is tail})} $$
$$ = \frac{\frac{1}{2} * \frac{1}{2}} {\frac{1}{2}} $$
$$ = \frac{1}{2} $$

### Example of Bayes' Theorem

From the deck of the cards, find the probability of the card being picked is king given that it is the face card. This can be represented as $$ P(King|Face) $$.

There are total 52 cards in a deck, from which 12 cards are face cards; king, queen and jack with club, diamond, heart and spade. From the above picture, it is clear that there are total 52 cards in the space (n = 52). There is a set of face cards with 12 members. There is a subset of king cards with 4 members. So 4 face cards out of 12 face cards (4/12) are king cards. In other words, 1/3 cards are king cards given that they are face cards.

According to Bayes' theorem $$ P(King|Face) = \frac{P(Face|King) P(King)}{P(Face)} $$

4 members are king cards out of 52 cards:
$$ P(King) = \frac{4}{52} = \frac{1}{13} $$

12 members are face cards out of 52 cards:
$$ P(Face) = \frac{12}{52} = \frac{3}{13} $$

Probability of being face given king:
$$ P(Face|King) = 1 $$
Here we have prior knowledge that if the card is king, it is face card only. It is 100% sure so it is always 1.

Putting all in Bayes' formula:
$$ P(King|Face) = \frac{P(Face|King) P(King)}{P(Face)} $$
$$ = \frac{1 * \frac{1}{13}}{\frac{3}{13}} $$
$$ = \frac{1}{3} $$

### Bayes' Theorem proof

$$ P(King|Face) = P(King \cap Face)/P(Face) $$ So from the image above $$ P(King \cap Face) = 4 $$ $$ P(Face) = 12 $$ $$ P(King|Face) = 4/12 = 1/3 $$ $$ \therefore 4/12 = 1/3 $$

## Understanding Naive Bayes Classifier/Prediction using Bayes Theorem

Let's see step by step how Naive Bayes will work for the following problem statement.
**Predict whether a person will buy a product given the specific day, discount and free delivery using Naive Bayes Classifier.**

Here is the data, which comprises of *Day* which can be Weekday/Weekend/Holiday, *Discount* can be Yes/No, *Free delivery* can be Yes/No.

Day | Discount | Free Delivery | Buy |
---|---|---|---|

Weekday | Yes | Yes | Yes |

Weekday | Yes | Yes | No |

Weekday | No | No | No |

Holiday | Yes | Yes | Yes |

Weekend | Yes | Yes | Yes |

Holiday | Yes | Yes | Yes |

Weekend | Yes | No | Yes |

Weekday | Yes | Yes | No |

Weekend | Yes | Yes | Yes |

Holiday | Yes | Yes | Yes |

Holiday | No | Yes | Yes |

Holiday | No | No | No |

Weekend | Yes | Yes | Yes |

Holiday | Yes | Yes | Yes |

Weekend | Yes | Yes | Yes |

Weekday | Yes | Yes | Yes |

Holiday | No | No | No |

Weekday | Yes | Yes | Yes |

Holiday | Yes | Yes | Yes |

Weekend | No | No | No |

Weekday | Yes | No | Yes |

Holiday | Yes | Yes | Yes |

Weekend | Yes | Yes | Yes |

Weekday | Yes | Yes | Yes |

Holiday | No | Yes | Yes |

Weekend | No | No | No |

Weekday | Yes | Yes | No |

Holiday | Yes | Yes | Yes |

Weekday | Yes | Yes | Yes |

Holiday | No | Yes | Yes |

Based on the above dataset, let's create the frequency table for each attribute in it.

**Frequency table and Likelihood table for Discount and Buy **

**Frequency table and Likelihood table for Free Delivery and Buy**

**Frequency table and Likelihood table for Day and Buy**

From the likelihood table, find probability that a person will buy on holiday (Buy with Yes and Day is Holiday) which is 10/22 = 0.45. Let's find the probability of free delivery and buy, which is 20/22 = 0.90.

Calculate the probability of buy given day is holiday, discount is yes and free delivery is yes. Here let A be the event Buy and B be the event day is holiday, discount is yes and free delivery is yes. Putting all in Bayes' theorem.

P(A|B)

= P(Buy| Day = Holiday, Discount = Yes, Free Delivery = Yes)

= P(Day = Holiday| Buy) * P(Discount = Yes| Buy) * P(Free Delivery = Yes |Buy) * P(Buy) / P(Day = Holiday) * P(Discount = Yes) * P(Free Delivery = Yes)

= ((10/22) * (19/22) * (20/22) * (22/30)) / ((12/30) * (22/30) * (23/30))

= 1.1637

Calculate the probability of No Buy given day is holiday, discount is yes and free delivery is yes. Here let A be the event No Buy and B be the event day is holiday, discount is yes and free delivery is yes.

P(A|B)

= P(No Buy| Day = Holiday, Discount = Yes, Free Delivery = Yes)

= P(Day = Holiday|No Buy) * P(Discount = Yes|No Buy) * P(Free Delivery = Yes |No Buy) * P(No Buy) / P(Day = Holiday) * P(Discount = Yes) * P(Free Delivery = Yes)

= ((2/8) * (3/8) * (3/8) * (8/30)) / ((12/30) * (22/30) * (23/30))

= 0.0417

Let's predict likelihood of buying or not buying given day is holiday, discount is yes and free delivery is yes, which is simple sum of above calculated probabilities Buy and No Buy.

Sum of probabilities = 1.1637 + 0.0417 = 1.2054

So Likelihood of Buy is P(Buy| Day = Holiday, Discount = Yes, Free Delivery = Yes) = 1.1637/1.2054 = 0.9654

So Likelihood of No Buy is P(No Buy| Day = Holiday, Discount = Yes, Free Delivery = Yes) = 0.0417/1.2054 = 0.0346

We can conclude that people are more likely to buy on holidays if they get discount and free delivery.

## Uses of Naive Bayes

So how Naive Bayes can be used in real life or industry?. Check out some of the real life scenarios give below:

- News Categorization
- Spam Filtering
- Fraud Detection
- Medical Diagnosis
- Weather Prediction
- Object identification or Face recognition

## Advantages and Disadvantages of Naive Bayes

### Advantages

- It is easy to implement
- It is not very much sensitive to irrelevant features
- It needs less training data
- It can handle both continuous and discrete data
- It is highly scalable with number of predictors and data points
- It is a fast executing algorithm so it can be used in real time predictions

### Disadvantages

- Treats each feature independently, so it will mislead if not used wisely on dependent feature
- In text classification or filtering, it ignores the order of the words. So this algorithm cannot work well where there is importance of order