This lesson is still being designed and assembled (Pre-Alpha version)

Term Document Matrix

Overview

Time: min
Objectives

What is Term Document Matrix?

The term document matrix means we map a collection of ‘n’ documents to the vector space model by a term-document matrix. In other words, It creates a numerical representation of the documents. Representing text as a numerical structure is a common starting point for text mining and analytics such as search and ranking, creating taxonomies, categorization, document similarity, and text-based machine learning.

tdm=textmining.TermDocumentMatrix() # use a function from textmining library
for i in post_corpus:
    #print(i)
    tdm.add_doc(i)# update the matrix with each variable conversion
type(tdm)
os.chdir("../working")
tdm.write_csv("TDM_DataFrame.csv",cutoff= 1)
def buildMatrix(self,document_list):
        print("building matrix...")
        tdm = textmining.TermDocumentMatrix()
        for doc in document_list:
             tdm.add_doc(doc)
        #write tdm into dataframe
        tdm.write_csv(r'path\matrix.csv', cutoff=1)
df=pd.read_csv("TDM_DataFrame.csv")
df.head(20)
df.shape

Key Points