Great suggestion! Are you ready to make a PR for any of suggested ideas?

вт, 31 мар. 2020 г., 20:26 Glenn Wiebe (Jira) <j...@apache.org>:

> Glenn Wiebe created IGNITE-12849:
> ------------------------------------
>
>              Summary: Add New BinaryObject Vectorizer for SparseVectors
> and Integer Coordinates
>                  Key: IGNITE-12849
>                  URL: https://issues.apache.org/jira/browse/IGNITE-12849
>              Project: Ignite
>           Issue Type: New Feature
>           Components: ml
>     Affects Versions: 2.8
>             Reporter: Glenn Wiebe
>              Fix For: 2.8
>
>
> A. DenseVector-based BinaryObjectVectorizer
> When using existing caches as a source of Datasets, the
> BinaryObjectVectorizer is used.
> The existing BinaryObjectVectorizer only supports the creation of a
> SparseVector.
> The LUDecomposition utility that supports gaussian factorization for
> models like GMM have a "Singularity indicator" for which a SparseVector and
> its null handling will set a matrix column calculation to be zero/0.0 which
> is below the minimum check value (1e-11) and thus indicate a matrix is not
> square.
>
> This null handling of the SparseMatrix will restrict the use of some
> algorithms like Gaussian Mixture Models where any Vector dimension that is
> null will incorrectly signal that a matrix is not square.
>
> It would be great if we could:
> - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this
> singularity trigger and enable use of GMM Trainer.
>
> B. CacheBasedDatasets not treated as Temporary Cache
> When using a cache-based dataset, the close() method destroys the Ignite
> cache. This means that there is no ability to re-use the data loaded into
> this dataset.
>
> It would be great if we could:
> - Not destroy the Ignite Cache holding the dataset on close (of one step
> in an ML processing flow)
> - Allow for "attaching" to this prior, pre-calculated dataset in
> subsequent use.
>
> C. Vector Visibility
> Vectors (unlike other value types, e.g. BinaryObjects) are not visible in
> standard mechanisms, like the Ignite Web Console, where the toString()
> method does not present any information about the embedded vector values.
>
> It would be great if we could:
> - have a Vector.toString() method implementation that presented some
> information about what is actually in the Vector.
>
> I have implemented the above items and have used them at a customer where
> I needed these capabilities (or at least it dramatically reduced the cost
> and increased the value of the solution).
>
> It would be great if the community was supportive of this
> expansion/improvement of the Ignite ML library.
>
> Thanks,
>   Glenn
>
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>

Reply via email to