Fake News Classifier
building a classifier to detect fake news
Objective : — Given a news article, classify whether its true or fake news
Dataset — i have used this dataset.
so let’s start.
lets import the dataset
Load the dataset
Let’s create a word cloud
Preprocess the data
Let’s divide our dataset into train & test
Let’s implement models
Logistic Regression
Decision Tree Classifier
Random Forest Classifier
Naive Bayes
ANN
Experiment
HyperParameter Tuning — is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. for example learning rate, batch size
Overfitting — When out model performs poor on unseen data, means gives high accuracy on train data but less accuracy on test data, that is called as overfitting.
Model Accuracies
Final Accuracy — 99.60
My Contribution
I have followed this to understand but visualised & implemented on my own. I also trined & checked accuracy on Logistic Regression, Decision Tree Classifier, Random Forest Classifier, Naive Bayes (Multinomial NB) models.
Challanges & solutions
Text preprocessing was a major challange. then implementing & finding good accurcy on different models was a challange too.
Went through many articles to understand text preprocessing & for finding good accuracy.
## References
https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
https://scikit-learn.org/stable/modules/tree.html
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
https://scikit-learn.org/stable/modules/naive_bayes.html
https://www.kaggle.com/mehmetlaudatekman/detailed-fake-news-classification-with-pytorch-98
http://datamine.unc.edu/jupyter/notebooks/Text%20Mining%20Modules/(1)%20Text%20Preprocessing.ipynb
https://en.wikipedia.org/wiki/Hyperparameter_optimization
https://docs.aws.amazon.com/machine-learning/latest/dg/model-fit-underfitting-vs-overfitting.html
http://faculty.cas.usf.edu/mbrannick/regression/Logistic.html
https://christophm.github.io/interpretable-ml-book/logistic.html
https://www.kaggle.com/prashant111/decision-tree-classifier-tutorial
http://mines.humanoriented.com/classes/2010/fall/csci568/portfolio_exports/lguo/decisionTree.html
https://www.kaggle.com/prashant111/random-forest-classifier-tutorial
https://www.kdnuggets.com/2020/06/naive-bayes-algorithm-everything.html
https://towardsdatascience.com/naive-bayes-classifier-81d512f50a7c
https://en.wikipedia.org/wiki/Activation_function
https://www.kaggle.com/akarsh1/fakenews-classification-using-ml-and-deep-learning
https://www.kaggle.com/sukanyabag/fake-news-classifiernlp
https://www.kaggle.com/pinkychauhan/fakenewsclassifierusingnltk-sklearn
find the git repo here.
find the kaggle notebook here.
Thanks for reading.