Fake News Classifier

Jay Prakash Thakur
5 min readMay 4, 2021

building a classifier to detect fake news

Objective : — Given a news article, classify whether its true or fake news

Dataset — i have used this dataset.

so let’s start.

lets import the dataset

Load the dataset

Let’s create a word cloud

Preprocess the data

Let’s divide our dataset into train & test

Let’s implement models

Logistic Regression

Decision Tree Classifier

Random Forest Classifier

Naive Bayes

ANN

Experiment

HyperParameter Tuning — is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. for example learning rate, batch size

Overfitting — When out model performs poor on unseen data, means gives high accuracy on train data but less accuracy on test data, that is called as overfitting.

Model Accuracies

Final Accuracy — 99.60

My Contribution

I have followed this to understand but visualised & implemented on my own. I also trined & checked accuracy on Logistic Regression, Decision Tree Classifier, Random Forest Classifier, Naive Bayes (Multinomial NB) models.

Challanges & solutions

Text preprocessing was a major challange. then implementing & finding good accurcy on different models was a challange too.

Went through many articles to understand text preprocessing & for finding good accuracy.

## References

https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

https://scikit-learn.org/stable/modules/tree.html

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

https://scikit-learn.org/stable/modules/naive_bayes.html

https://www.kaggle.com/mehmetlaudatekman/detailed-fake-news-classification-with-pytorch-98

http://datamine.unc.edu/jupyter/notebooks/Text%20Mining%20Modules/(1)%20Text%20Preprocessing.ipynb

https://en.wikipedia.org/wiki/Hyperparameter_optimization

https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html

http://faculty.cas.usf.edu/mbrannick/regression/Logistic.html

https://christophm.github.io/interpretable-ml-book/logistic.html

https://www.kaggle.com/prashant111/decision-tree-classifier-tutorial

http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html

https://www.kaggle.com/prashant111/random-forest-classifier-tutorial

https://www.kdnuggets.com/2020/06/naive-bayes-algorithm-everything.html

https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c

https://medium.com/the-theory-of-everything/understanding-activation-functions-in-neural-networks-9491262884e0

https://en.wikipedia.org/wiki/Activation_function

https://towardsdatascience.com/everything-you-need-to-know-about-activation-functions-in-deep-learning-models-84ba9f82c253

https://medium.com/@amarbudhiraja/https-medium-com-amarbudhiraja-learning-less-to-learn-better-dropout-in-deep-machine-learning-74334da4bfc5

https://www.kaggle.com/akarsh1/fakenews-classification-using-ml-and-deep-learning

https://www.kaggle.com/sukanyabag/fake-news-classifiernlp

https://www.kaggle.com/pinkychauhan/fakenewsclassifierusingnltk-sklearn

find the git repo here.

find the kaggle notebook here.

Thanks for reading.

--

--