(For source code refer to https://github.com/donglinchen/text_classification/blob/master/look_inside_model_scikit_learn.ipynb)
Previously I wrote a an article on how to train text classification model, sometimes I got question like how the model predict one category/class over the other? What are the important features the model use for prediction?
To answer the question, let’s dissect the model we built and look inside the parts to gain insights.
First let’s again quickly train a text classification model using scikit learn TfidfVectorizer and SDGClassifier, and train it with bbc news article data that you can download from kaggle:
df = pd.read_csv('bbc-text.csv')
X_train, X_test, y_train, y_test = train_test_split(df['text'], df['category'], test_size=.2…
Text classification has been widely used in real-world business processes like email spam detection, support ticket classification, or content recommendation based on text topics.
Thanks to the popular machine learning and deep learning libraries like scikit-learn, PyTorch, and TensorFlow, we can leverage them to build text classification models for text classification.
I was interested in learning how the three frameworks compare to each other, I had not found working examples that build text classifiers using all of them and compare the performances. Here I am going to build multi-class text classifiers using the above popular libraries and see how they…
Travelers around the world have been using Airbnb for rental accomadations to save money or look for different experiences.
As a software engineer who loves travel and are currently venturing into data science field, I am interested in analyzing the Airbnb datasets to find answers to the following 4 questions, and to build a price model to predict the rental price for a potential property to be listed on Airbnb.
1. What kind of property type were available in listings and what were the availabilities for those property types?
2. What were the listed prices in Boston and Seattle? …