Expand description
§Preprocessing
§The Big Picture
linfa-preprocessing
is a crate in the linfa
ecosystem, an effort to create a toolkit for classical Machine Learning implemented in pure Rust, akin to Python’s scikit-learn
.
§Current state
linfa-preprocessing
provides a pure Rust implementation of:
- Standard scaling
- Min-max scaling
- Max Abs Scaling
- Normalization (l1, l2 and max norm)
- Count vectorization
- Term frequency - inverse document frequency count vectorization
- Whitening
Re-exports§
pub use error::PreprocessingError;
pub use error::Result;
Modules§
- error
- Error definitions for preprocessing
- linear_
scaling - Linear Scaling methods
- norm_
scaling - Sample normalization methods
- tf_
idf_ vectorization - Term frequency - inverse document frequency vectorization methods
- whitening
- Methods for uncorrelating data
Macros§
Structs§
- Count
Vectorizer - Counts the occurrences of each vocabulary entry, learned during fitting, in a sequence of documents. Each vocabulary entry is mapped to an integer value that is used to index the count in the result.
- Count
Vectorizer Params - Count
Vectorizer Valid Params - Count vectorizer: learns a vocabulary from a sequence of documents (or file paths) and maps each vocabulary entry to an integer value, producing a CountVectorizer that can be used to count the occurrences of each vocabulary entry in any sequence of documents. Alternatively a user-specified vocabulary can be used for fitting.