Stemming
Overview
Time: minObjectives
What is Stemming?
Stemming is the process of reducing tokens to root forms. For example - studying and studied are converted to study. There are two commonly used stemming techniques in python.
- Porter Stemming
- Lancaster Stemming
Among these, Lancaster stemming is more aggresive, with twice the rules as porter stemmer and tends to over stem words -
Porter Stemming
from nltk.stem import PorterStemmer
pst = PorterStemmer()
stm = ["giving", "given", "given", "gave"]
for word in stm :
print(word+ ":" +pst.stem(word))
Lancaster Stemming
from nltk.stem import LancasterStemmer
lst = LancasterStemmer()
stm = ["giving", "given", "given", "gave"]
for word in stm :
print(word+ ":" +lst.stem(word))
Key Points