Hello Igniters, I want to make up some overview of all features and major improvement of ML module for this release.
So let me start from the one of our main feature for this release: *TensorFlow integration* <https://issues.apache.org/jira/browse/IGNITE-8670> This integration allows us to use Apache Ignite as a data source for TensorFlow. Also, this integration will allow creating and maintaining TensorFlow clusters over Apache Ignite and submit TF jobs to those clusters. More details in the related umbrella ticket. Also, for this release we have some new algorithms: * Random forest <https://issues.apache.org/jira/browse/IGNITE-8840> * Gradient boosted trees <https://issues.apache.org/jira/browse/IGNITE-7149> * Logistic regression[binary <https://issues.apache.org/jira/browse/IGNITE-8403>][multi-class <https://issues.apache.org/jira/browse/IGNITE-8511>] * ANN <https://issues.apache.org/jira/browse/IGNITE-9261> New features related with data preprocessing: * Pipeline <https://issues.apache.org/jira/browse/IGNITE-9158> * L1,L2 normalization <https://issues.apache.org/jira/browse/IGNITE-8663> * Data filtering for new datasets <https://issues.apache.org/jira/browse/IGNITE-8666> * Encoding categorical features [OneHotEncoder <https://issues.apache.org/jira/browse/IGNITE-8680>][OneOfKEncoder <https://issues.apache.org/jira/browse/IGNITE-8664>] * Imputer and Binarizer <https://issues.apache.org/jira/browse/IGNITE-8567> * MaxAbsScaler <https://issues.apache.org/jira/browse/IGNITE-9285> * Dataset splitting <https://issues.apache.org/jira/browse/IGNITE-8667> New features for a model validation: * Model estimator <https://issues.apache.org/jira/browse/IGNITE-8669> * k-fold cross-validation <https://issues.apache.org/jira/browse/IGNITE-8668> * Param grid for tuning hyper-parameters in cross-validation <https://issues.apache.org/jira/browse/IGNITE-8924> Other features and improvements: * Model updating <https://issues.apache.org/jira/browse/IGNITE-9387> * ML tutorial <https://issues.apache.org/jira/browse/IGNITE-8741> * Optional indexing for decision trees <https://issues.apache.org/jira/browse/IGNITE-9064> * Learning context for trainers(local parallelizing and logging of training process) <https://issues.apache.org/jira/browse/IGNITE-8981> * Unification of API for feature extractor <https://issues.apache.org/jira/browse/IGNITE-8907> * Several tickets for removing old unused classes and improvements for code coverage and examples [1 <https://issues.apache.org/jira/browse/IGNITE-9124> ][2 <https://issues.apache.org/jira/browse/IGNITE-9297>][3 <https://issues.apache.org/jira/browse/IGNITE-9146>][4 <https://issues.apache.org/jira/browse/IGNITE-9316>][5 <https://issues.apache.org/jira/browse/IGNITE-9348>][6 <https://issues.apache.org/jira/browse/IGNITE-8450>] Sincerely, Yuriy Babak