Especially I'd like to add to String method for vectors, what information
could be added here, what do you think?

вт, 31 мар. 2020 г., 20:41 Alexey Zinoviev <zaleslaw....@gmail.com>:

> Great suggestion! Are you ready to make a PR for any of suggested ideas?
>
>
> вт, 31 мар. 2020 г., 20:26 Glenn Wiebe (Jira) <j...@apache.org>:
>
>> Glenn Wiebe created IGNITE-12849:
>> ------------------------------------
>>
>>              Summary: Add New BinaryObject Vectorizer for SparseVectors
>> and Integer Coordinates
>>                  Key: IGNITE-12849
>>                  URL: https://issues.apache.org/jira/browse/IGNITE-12849
>>              Project: Ignite
>>           Issue Type: New Feature
>>           Components: ml
>>     Affects Versions: 2.8
>>             Reporter: Glenn Wiebe
>>              Fix For: 2.8
>>
>>
>> A. DenseVector-based BinaryObjectVectorizer
>> When using existing caches as a source of Datasets, the
>> BinaryObjectVectorizer is used.
>> The existing BinaryObjectVectorizer only supports the creation of a
>> SparseVector.
>> The LUDecomposition utility that supports gaussian factorization for
>> models like GMM have a "Singularity indicator" for which a SparseVector and
>> its null handling will set a matrix column calculation to be zero/0.0 which
>> is below the minimum check value (1e-11) and thus indicate a matrix is not
>> square.
>>
>> This null handling of the SparseMatrix will restrict the use of some
>> algorithms like Gaussian Mixture Models where any Vector dimension that is
>> null will incorrectly signal that a matrix is not square.
>>
>> It would be great if we could:
>> - Have a BinaryObjectVectorizer that uses a DenseMatrix to eliminate this
>> singularity trigger and enable use of GMM Trainer.
>>
>> B. CacheBasedDatasets not treated as Temporary Cache
>> When using a cache-based dataset, the close() method destroys the Ignite
>> cache. This means that there is no ability to re-use the data loaded into
>> this dataset.
>>
>> It would be great if we could:
>> - Not destroy the Ignite Cache holding the dataset on close (of one step
>> in an ML processing flow)
>> - Allow for "attaching" to this prior, pre-calculated dataset in
>> subsequent use.
>>
>> C. Vector Visibility
>> Vectors (unlike other value types, e.g. BinaryObjects) are not visible in
>> standard mechanisms, like the Ignite Web Console, where the toString()
>> method does not present any information about the embedded vector values.
>>
>> It would be great if we could:
>> - have a Vector.toString() method implementation that presented some
>> information about what is actually in the Vector.
>>
>> I have implemented the above items and have used them at a customer where
>> I needed these capabilities (or at least it dramatically reduced the cost
>> and increased the value of the solution).
>>
>> It would be great if the community was supportive of this
>> expansion/improvement of the Ignite ML library.
>>
>> Thanks,
>>   Glenn
>>
>>
>>
>>
>> --
>> This message was sent by Atlassian Jira
>> (v8.3.4#803005)
>>
>

Reply via email to