Crate linfa_preprocessing

Source
Expand description

§Preprocessing

§The Big Picture

linfa-preprocessing is a crate in the linfa ecosystem, an effort to create a toolkit for classical Machine Learning implemented in pure Rust, akin to Python’s scikit-learn.

§Current state

linfa-preprocessing provides a pure Rust implementation of:

  • Standard scaling
  • Min-max scaling
  • Max Abs Scaling
  • Normalization (l1, l2 and max norm)
  • Count vectorization
  • Term frequency - inverse document frequency count vectorization
  • Whitening

Re-exports§

pub use error::PreprocessingError;
pub use error::Result;

Modules§

error
Error definitions for preprocessing
linear_scaling
Linear Scaling methods
norm_scaling
Sample normalization methods
tf_idf_vectorization
Term frequency - inverse document frequency vectorization methods
whitening
Methods for uncorrelating data

Macros§

column_for_word

Structs§

CountVectorizer
Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
CountVectorizerParams
CountVectorizerValidParams
Count vectorizer: learns a vocabulary from a sequence of documents (or file paths) and maps each vocabulary entry to an integer value, producing a CountVectorizer that can be used to count the occurrences of each vocabulary entry in any sequence of documents. Alternatively a user-specified vocabulary can be used for fitting.

Enums§

Tokenizer