This lesson is still being designed and assembled (Pre-Alpha version)

Stop Words

Overview

Time: min
Objectives

What are stop words?

Commonly used words like “the”, “a”, “at”, “for”, “above”, “on”, “is”, “all” are called stop words. While processing text, we delete these words as they do not provide any meaning or have a significant effect on the analysis performed. This step depends highly on the language. Python provides a library called “stopwords” that holds various pre defined stop word collections.

from nltk import word_tokenize
from nltk.corpus import stopwords
nltk.download('stopwords')
a = set(stopwords.words('english'))
text = '''The UIC Library Digital Scholarship Hub is a facility that is available to support students, 
staff and faculty with digital scholarship and humanities experimental research and instruction. 
The Hub provides technology, data and individual consultations to encourage creative, 
innovative and non-traditional research and development. '''
text1 = word_tokenize(text.lower())
stopwords = [x for x in text1 if x not in a]
print(stopwords)

Key Points