[
https://issues.apache.org/jira/browse/SPARK-17950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
AbderRahman Sobh updated SPARK-17950:
-------------------------------------
Description:
What changes were proposed in this pull request?
Simply added the __getattr__ to SparseVector that DenseVector has, but calls to
a SciPy sparse representation instead of storing a vector all the time in
self.array
This allows for use of functions on the values of an entire SparseVector in the
same direct way that users interact with DenseVectors.
i.e. you can simply call SparseVector.mean() to average the values in the
entire vector.
Note: The functions do have a slight bit of variance due to calling SciPy and
not NumPy. However, the majority of useful functions (sums, means, max, etc.)
are available to both packages anyways.
How was this patch tested?
Manual testing on local machine.
Passed ./python/run-tests
No UI changes.
was:
Simply added the `__getattr__` to SparseVector that DenseVector has, but calls
self.toArray() instead of storing a vector all the time in self.array
This allows for use of numpy functions on the values of a SparseVector in the
same direct way that users interact with DenseVectors.
i.e. you can simply call SparseVector.mean() to average the values in the
entire vector.
Component/s: ML
> Match SparseVector behavior with DenseVector
> --------------------------------------------
>
> Key: SPARK-17950
> URL: https://issues.apache.org/jira/browse/SPARK-17950
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib, PySpark
> Affects Versions: 2.0.1
> Reporter: AbderRahman Sobh
> Priority: Minor
> Original Estimate: 0h
> Remaining Estimate: 0h
>
> What changes were proposed in this pull request?
> Simply added the __getattr__ to SparseVector that DenseVector has, but calls
> to a SciPy sparse representation instead of storing a vector all the time in
> self.array
> This allows for use of functions on the values of an entire SparseVector in
> the same direct way that users interact with DenseVectors.
> i.e. you can simply call SparseVector.mean() to average the values in the
> entire vector.
> Note: The functions do have a slight bit of variance due to calling SciPy and
> not NumPy. However, the majority of useful functions (sums, means, max, etc.)
> are available to both packages anyways.
> How was this patch tested?
> Manual testing on local machine.
> Passed ./python/run-tests
> No UI changes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]