Charlene Chambliss | Full-Stack AI Engineer

As part of my efforts to learn in public earlier on in my data science journey, I wrote this article on an end-to-end analysis I did on a dataset of news headlines (apologies, I can’t find the original dataset, but I got it from the UCI ML Repository.)

The article includes:

Preprocessing/cleaning the text data, using NLTK
Using word2vec to create word and title embeddings, then visualizing them as clusters using t-SNE
Visualizing the relationship between title sentiment and article popularity
Attempting to predict article popularity from the embeddings and other available features, using XGBoost (gradient-boosted trees)
Using model stacking (ensembling) to improve the performance of the popularity model (this step was not successful, but was still a valuable experiment!)

The full text of the article (with code snippets and a link to the Jupyter Notebook) is here.