Great suggestion! Are you ready to make a PR for any of suggested ideas?
вт, 31 мар. 2020 г., 20:26 Glenn Wiebe (Jira) <j...@apache.org>: > Glenn Wiebe created IGNITE-12849: > ------------------------------------ > > Summary: Add New BinaryObject Vectorizer for SparseVectors > and Integer Coordinates > Key: IGNITE-12849 > URL: https://issues.apache.org/jira/browse/IGNITE-12849 > Project: Ignite > Issue Type: New Feature > Components: ml > Affects Versions: 2.8 > Reporter: Glenn Wiebe > Fix For: 2.8 > > > A. DenseVector-based BinaryObjectVectorizer > When using existing caches as a source of Datasets, the > BinaryObjectVectorizer is used. > The existing BinaryObjectVectorizer only supports the creation of a > SparseVector. > The LUDecomposition utility that supports gaussian factorization for > models like GMM have a "Singularity indicator" for which a SparseVector and > its null handling will set a matrix column calculation to be zero/0.0 which > is below the minimum check value (1e-11) and thus indicate a matrix is not > square. > > This null handling of the SparseMatrix will restrict the use of some > algorithms like Gaussian Mixture Models where any Vector dimension that is > null will incorrectly signal that a matrix is not square. > > It would be great if we could: > - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this > singularity trigger and enable use of GMM Trainer. > > B. CacheBasedDatasets not treated as Temporary Cache > When using a cache-based dataset, the close() method destroys the Ignite > cache. This means that there is no ability to re-use the data loaded into > this dataset. > > It would be great if we could: > - Not destroy the Ignite Cache holding the dataset on close (of one step > in an ML processing flow) > - Allow for "attaching" to this prior, pre-calculated dataset in > subsequent use. > > C. Vector Visibility > Vectors (unlike other value types, e.g. BinaryObjects) are not visible in > standard mechanisms, like the Ignite Web Console, where the toString() > method does not present any information about the embedded vector values. > > It would be great if we could: > - have a Vector.toString() method implementation that presented some > information about what is actually in the Vector. > > I have implemented the above items and have used them at a customer where > I needed these capabilities (or at least it dramatically reduced the cost > and increased the value of the solution). > > It would be great if the community was supportive of this > expansion/improvement of the Ignite ML library. > > Thanks, > Glenn > > > > > -- > This message was sent by Atlassian Jira > (v8.3.4#803005) >