Glenn Wiebe created IGNITE-12849:
------------------------------------
Summary: Add New BinaryObject Vectorizer for SparseVectors and
Integer Coordinates
Key: IGNITE-12849
URL: https://issues.apache.org/jira/browse/IGNITE-12849
Project: Ignite
Issue Type: New Feature
Components: ml
Affects Versions: 2.8
Reporter: Glenn Wiebe
Fix For: 2.8
A. DenseVector-based BinaryObjectVectorizer
When using existing caches as a source of Datasets, the BinaryObjectVectorizer
is used.
The existing BinaryObjectVectorizer only supports the creation of a
SparseVector.
The LUDecomposition utility that supports gaussian factorization for models
like GMM have a "Singularity indicator" for which a SparseVector and its null
handling will set a matrix column calculation to be zero/0.0 which is below the
minimum check value (1e-11) and thus indicate a matrix is not square.
This null handling of the SparseMatrix will restrict the use of some algorithms
like Gaussian Mixture Models where any Vector dimension that is null will
incorrectly signal that a matrix is not square.
It would be great if we could:
- Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this
singularity trigger and enable use of GMM Trainer.
B. CacheBasedDatasets not treated as Temporary Cache
When using a cache-based dataset, the close() method destroys the Ignite cache.
This means that there is no ability to re-use the data loaded into this dataset.
It would be great if we could:
- Not destroy the Ignite Cache holding the dataset on close (of one step in an
ML processing flow)
- Allow for "attaching" to this prior, pre-calculated dataset in subsequent use.
C. Vector Visibility
Vectors (unlike other value types, e.g. BinaryObjects) are not visible in
standard mechanisms, like the Ignite Web Console, where the toString() method
does not present any information about the embedded vector values.
It would be great if we could:
- have a Vector.toString() method implementation that presented some
information about what is actually in the Vector.
I have implemented the above items and have used them at a customer where I
needed these capabilities (or at least it dramatically reduced the cost and
increased the value of the solution).
It would be great if the community was supportive of this expansion/improvement
of the Ignite ML library.
Thanks,
Glenn
--
This message was sent by Atlassian Jira
(v8.3.4#803005)