What data type would I use for a pyarrow SparseCSRMatrix in a schema? I need to build a table with rows which include a field of this type. I don't see a related example in the test module. I'm doing something like:
schema = pyarrow.schema(fields, metadata=metadata)
table = pyarrow.Table.from_arrays(table_data, schema=schema)
where fields is a list of tuples of the form (field_name, pyarrow_type), e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a SparseCSRMatrix field? Or will this not work?
Thanks,
David
On 7/1/2022 9:18 AM, Rok Mihevc wrote:
We lack pyarow sparse tensor documentation (PRs welcome), so tests are perhaps most extensive description of what is doable: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
Rok
On Fri, Jul 1, 2022 at 5:38 PM dl via user <[email protected]> wrote:
So, I guess this is supported in 8.0.0. I can do this:
import numpy as np import pyarrow as pa from scipy.sparse import csr_matrixa = np.random.rand(100) a[a < .9] = 0.0 s = csr_matrix(a) arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)Now, how do I use that to build a pyarrow table? Stay tuned...
On 7/1/2022 8:19 AM, dl wrote:
I find pyarrow.SparseCSRMatrix mentioned here. But how do I use that? Is there documentation for that class?
On 7/1/2022 7:47 AM, dl wrote:
Hi,
I'm trying to understand support for sparse tensors in Arrow. It looks like there is "experimental" support using the C++ API. When was this introduced? I see in the code base here Cython sparse array classes. Can these be accessed using the Python API. Are they included in the 8.0.0 release? Is there any other support for sparse arrays/tensors in the Python API? Are there good examples for any of this, in particular for using the 8.0.0 Python API to create sparse tensors?
Thanks,
David
