Python - TextBlob - Create Text Classifier

TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.

TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.

I have used this library to create custom classifier.I have not provided data that I used. Instead of that I am using samle data provided on TextBlob site. This library gives very good results and compared to tools like IBM watson NLC api, it is way better. Also it is open source.

Installation :

Using PIP pip install -U textblob python -m textblob.download_corpora

Using Conda conda install -c https://conda.anaconda.org/sloria textblob python -m textblob.download_corpora

We have successfully install TextBlob now. we will start to create text classifier.

1. Import required libraries

import os path="your dir path"

os.chdir(path)

from textblob.classifiers import NaiveBayesClassifier

2. Load Train and Test Data

train = [ ('I love this sandwich.', 'pos'), ('This is an amazing place!', 'pos'), ('I feel very good about these beers.', 'pos'), ('This is my best work.', 'pos'), ("What an awesome view", 'pos'), ('I do not like this restaurant', 'neg'), ('I am tired of this stuff.', 'neg'), ("I can't deal with this", 'neg'), ('He is my sworn enemy!', 'neg'), ('My boss is horrible.', 'neg') ] test = [ ('The beer was good.', 'pos'), ('I do not enjoy my job', 'neg'), ("I ain't feeling dandy today.", 'neg'), ("I feel amazing!", 'pos'), ('Gary is a friend of mine.', 'pos'), ("I can't believe I'm doing this.", 'neg') ]

3. Create Classifier

cl = NaiveBayesClassifier(train)

4. Check with some random samples

cl.classify("Their burgers are amazing")

# "pos" cl.classify("I don't like their pizza.")

# "neg"

5. Check accuracy for test data

cl.accuracy(test)

#Out[40]: 0.8333333333333334

How Naive Bays Algorithm Works?

""" A classifier based on the Naive Bayes algorithm. In order to find the probability for a label, this algorithm first uses the Bayes rule to express P(label|features) in terms of P(label) and P(features|label): | P(label) * P(features|label) | P(label|features) = ------------------------------------------------------ | P(features) The algorithm then makes the 'naive' assumption that all features are independent, given the label: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = --------------------------------------------------------------------------------------------- | P(features) Rather than computing P(features) explicitly, the algorithm just calculates the numerator for each label, and normalizes them so they sum to one: | P(label) * P(f1|label) * ... * P(fn|label) | P(label|features) = ------------------------------------------------------------------------------------------------ | SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) ) """

That's it. We had created simple text classifier which can classify tweets in two different categories pos and neg.

References :

http://textblob.readthedocs.io/en/dev/classifiers.html


About Author

Dattatray Shinde have over 12+ years of experience in Software Design, Development & Maintenance of Web Based Applications; worked on Healthcare, Insurance, E-commerce and Learning Management System domains. Over 6 + years as Data Scientist worked mainly in predictive analytics, survey analytics, risk analytics platforms.

Featured Posts
Recent Posts