Project #11
Spam Classification using NLP
Spam classification using NLP is a technique that uses Natural Language Processing (NLP) to classify emails or messages as spam or not spam. The technique involves analyzing the text of the message and extracting features that can help identify whether the message is spam or not. These features can include the presence of certain words or phrases, the length of the message, the use of capital letters, and other characteristics. The extracted features are then used to train a machine learning model that can classify new messages as spam or not spam. This technique is widely used in email filtering systems and messaging applications to prevent unwanted messages from reaching the user's inbox.
Summary
Data Set - spam.csv on Kaggle. The dataset consists of 5572 rows and 2 columns. The first column contains the text of the message, and the second column contains the label indicating whether the message is spam or not spam. The dataset is divided into two parts: a training set and a test set. The training set is used to train the machine learning model, and the test set is used to evaluate the model's performance. The model is trained using a variety of machine learning algorithms, including Naive Bayes, Support Vector Machines, and Random Forest. The performance of the model is evaluated using metrics such as accuracy, precision, recall, and F1 score.