Especially I'd like to add to String method for vectors, what information could be added here, what do you think?
вт, 31 мар. 2020 г., 20:41 Alexey Zinoviev <zaleslaw....@gmail.com>: > Great suggestion! Are you ready to make a PR for any of suggested ideas? > > > вт, 31 мар. 2020 г., 20:26 Glenn Wiebe (Jira) <j...@apache.org>: > >> Glenn Wiebe created IGNITE-12849: >> ------------------------------------ >> >> Summary: Add New BinaryObject Vectorizer for SparseVectors >> and Integer Coordinates >> Key: IGNITE-12849 >> URL: https://issues.apache.org/jira/browse/IGNITE-12849 >> Project: Ignite >> Issue Type: New Feature >> Components: ml >> Affects Versions: 2.8 >> Reporter: Glenn Wiebe >> Fix For: 2.8 >> >> >> A. DenseVector-based BinaryObjectVectorizer >> When using existing caches as a source of Datasets, the >> BinaryObjectVectorizer is used. >> The existing BinaryObjectVectorizer only supports the creation of a >> SparseVector. >> The LUDecomposition utility that supports gaussian factorization for >> models like GMM have a "Singularity indicator" for which a SparseVector and >> its null handling will set a matrix column calculation to be zero/0.0 which >> is below the minimum check value (1e-11) and thus indicate a matrix is not >> square. >> >> This null handling of the SparseMatrix will restrict the use of some >> algorithms like Gaussian Mixture Models where any Vector dimension that is >> null will incorrectly signal that a matrix is not square. >> >> It would be great if we could: >> - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this >> singularity trigger and enable use of GMM Trainer. >> >> B. CacheBasedDatasets not treated as Temporary Cache >> When using a cache-based dataset, the close() method destroys the Ignite >> cache. This means that there is no ability to re-use the data loaded into >> this dataset. >> >> It would be great if we could: >> - Not destroy the Ignite Cache holding the dataset on close (of one step >> in an ML processing flow) >> - Allow for "attaching" to this prior, pre-calculated dataset in >> subsequent use. >> >> C. Vector Visibility >> Vectors (unlike other value types, e.g. BinaryObjects) are not visible in >> standard mechanisms, like the Ignite Web Console, where the toString() >> method does not present any information about the embedded vector values. >> >> It would be great if we could: >> - have a Vector.toString() method implementation that presented some >> information about what is actually in the Vector. >> >> I have implemented the above items and have used them at a customer where >> I needed these capabilities (or at least it dramatically reduced the cost >> and increased the value of the solution). >> >> It would be great if the community was supportive of this >> expansion/improvement of the Ignite ML library. >> >> Thanks, >> Glenn >> >> >> >> >> -- >> This message was sent by Atlassian Jira >> (v8.3.4#803005) >> >