Expand description
Term frequency - inverse document frequency vectorization methods
Structs§
- Fitted
TfIdf Vectorizer - Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of texts and scales them by the inverse document document frequency defined by the method. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
- TfIdf
Vectorizer - Simlar to
CountVectorizer
but instead of just counting the term frequency of each vocabulary entry in each given document, it computes the term frequecy times the inverse document frequency, thus giving more importance to entries that appear many times but only on some documents. The weight function can be adjusted by setting the appropriate method. This struct provides the same string
processing customizations described inCountVectorizer
.
Enums§
- TfIdf
Method - Methods for computing the inverse document frequency of a vocabulary entry