pub struct CountVectorizer { /* private fields */ }
Expand description
Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
Implementations§
Source§impl CountVectorizer
impl CountVectorizer
Sourcepub fn params() -> CountVectorizerParams
pub fn params() -> CountVectorizerParams
Construct a new set of parameters
pub fn force_tokenizer_function_redefinition( &mut self, tokenizer: fn(&str) -> Vec<&str>, )
Sourcepub fn transform<T: ToString, D: Data<Elem = T>>(
&self,
x: &ArrayBase<D, Ix1>,
) -> Result<CsMat<usize>>
pub fn transform<T: ToString, D: Data<Elem = T>>( &self, x: &ArrayBase<D, Ix1>, ) -> Result<CsMat<usize>>
Given a sequence of n
documents, produces a sparse array of size (n, vocabulary_entries)
where column j
of row i
is the number of occurrences of vocabulary entry j
in the document of index i
. Vocabulary entry j
is the string
at the j
-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative
cell in the sparse matrix will be set to None
.
Sourcepub fn transform_files<P: AsRef<Path>>(
&self,
input: &[P],
encoding: EncodingRef,
trap: DecoderTrap,
) -> Result<CsMat<usize>>
pub fn transform_files<P: AsRef<Path>>( &self, input: &[P], encoding: EncodingRef, trap: DecoderTrap, ) -> Result<CsMat<usize>>
Given a sequence of n
file names, produces a sparse array of size (n, vocabulary_entries)
where column j
of row i
is the number of occurrences of vocabulary entry j
in the document contained in the file of index i
. Vocabulary entry j
is the string
at the j
-th position in the vocabulary. If a vocabulary entry was not encountered in a document, then the relative
cell in the sparse matrix will be set to None
.
The files will be read using the specified encoding
, and any sequence unrecognized by the encoding will be handled
according to trap
.
Sourcepub fn vocabulary(&self) -> &Vec<String>
pub fn vocabulary(&self) -> &Vec<String>
Contains all vocabulary entries, in the same order used by the transform
methods.
Trait Implementations§
Source§impl Clone for CountVectorizer
impl Clone for CountVectorizer
Source§fn clone(&self) -> CountVectorizer
fn clone(&self) -> CountVectorizer
1.0.0 · Source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreAuto Trait Implementations§
impl !Freeze for CountVectorizer
impl !RefUnwindSafe for CountVectorizer
impl Send for CountVectorizer
impl !Sync for CountVectorizer
impl Unpin for CountVectorizer
impl UnwindSafe for CountVectorizer
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more§impl<T> Pointable for T
impl<T> Pointable for T
§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self
from the equivalent element of its
superset. Read more§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self
is actually part of its subset T
(and can be converted to it).§unsafe fn to_subset_unchecked(&self) -> SS
unsafe fn to_subset_unchecked(&self) -> SS
self.to_subset
but without any property checks. Always succeeds.§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self
to the equivalent element of its superset.