Re: Using arrow for sparse data

2020-08-21 Thread Rok Mihevc
Hi Niranda, There's some examples in tests: https://github.com/apache/arrow/blob/master/cpp/src/arrow/sparse_tensor_test.cc#L187 , https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py If you have more questions just ask. Questions are good input for documentation

Re: [C++] Computation functions in Apache Arrow

2021-07-25 Thread Rok Mihevc
On Thu, Jul 22, 2021 at 6:54 PM Weston Pace wrote: > > Does arrow support matrix operations? > > [...] > > On the other hand, there has been some interest in the past in > representing tensors as a logical data type in Arrow. A rank 2 tensor > is either the same as a matrix or very similar to a m

Re: [Question][Python] Columns with Limited Value Set

2022-01-05 Thread Rok Mihevc
Hey Sam, Did you consider DictionaryArray? (https://arrow.apache.org/docs/python/data.html#dictionary-arrays) It's to_pandas will return pd.Categorical. Rok On Wed, Jan 5, 2022 at 3:35 PM Sam Davis wrote: > > Hi, > > I'm looking at defining a schema for a table where one of the values is > inh

Re: [Question][Python] Columns with Limited Value Set

2022-01-05 Thread Rok Mihevc
How big are your dictionaries typically? What are your upper and lower bounds? On Wed, Jan 5, 2022 at 10:22 PM David Li wrote: > > Ah, thank you for the clarification. Indeed, Arrow dictionaries don't make > the dictionary part of the schema itself (and the format even allows for > dictionaries

Re: [c++] Tensor features

2022-05-31 Thread Rok Mihevc
Hi Fabian, I'm not aware of any plans to add tensor compute functions at the moment. There was recently a discussion [1] that boiled down to: try UDFs if you want to stay in Arrow or do the compute in numpy/pytorch/tensorflow/... - moving is zero-copy but of course adds additional dependency. [1]

Re: support for sparse tensors

2022-07-01 Thread Rok Mihevc
We lack pyarow sparse tensor documentation (PRs welcome), so tests are perhaps most extensive description of what is doable: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py Rok On Fri, Jul 1, 2022 at 5:38 PM dl via user wrote: > So, I guess this is support

Re: [Python] iloc equivalent for selection by position and setting values?

2022-07-04 Thread Rok Mihevc
I believe currently updating array values is not possible by design. Using the approach Michael pointed out you can create a new array to replace the old one. See this discussion [1] for more nuance. Rok [1] https://lists.apache.org/thread/kph2sk0nqc0yfcb39dmjmh3ljg4dpyfx On Mon, Jul 4, 2022 at

Re: support for sparse tensors

2022-07-06 Thread Rok Mihevc
rrow.schema(fields, metadata=metadata) > table = pyarrow.Table.from_arrays(table_data, schema=schema) > > where fields is a list of tuples of the form (field_name, pyarrow_type), > e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a > SparseCSRMatrix field? Or will this no

Re: support for sparse tensors

2022-07-06 Thread Rok Mihevc
nstead of the custom > three field representation. Is that possible? Incidentally, the shape of > the csr_matrix is typically (1,N) where N may vary for different records. > But I don't think "typically (1,N)" matters. It would work with variable > shape (M,N). The shape field ha

Re: support for sparse tensors

2022-07-06 Thread Rok Mihevc
I don't think "typically (1,N)" matters. It would work with variable > shape (M,N). The shape field has type pyarrow.List with value_type = > pyarrow.int32(). > > > On 7/6/2022 2:53 PM, Rok Mihevc wrote: > > Hey David, > > I don't think Table is designed in

Re: support for sparse tensors

2022-07-06 Thread Rok Mihevc
om > three field representation. Is that possible? Incidentally, the shape of > the csr_matrix is typically (1,N) where N may vary for different records. > But I don't think "typically (1,N)" matters. It would work with variable > shape (M,N). The shape field has type

Re: support for sparse tensors

2022-07-07 Thread Rok Mihevc
; Thanks. That helps. > > Can SparseCSRMatrix be used the way I'm trying to use it, as a field value > in a table? I think that would need a DataType associated with it to give > the field. > > On 7/6/2022 6:25 PM, Rok Mihevc wrote: > > arrow_sparse_csr_matrix.to

Re: ExtensionArray Examples

2022-07-08 Thread Rok Mihevc
Hey Michael, https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py might have the material you need. Rok On Fri, Jul 8, 2022 at 10:23 PM Michael wrote: > I'm trying to create some ExtensionArrays in pandas and pyarrow but having > trouble figuring out the rela

Re: support for sparse tensors

2022-07-13 Thread Rok Mihevc
indices) and building the pyarrow table using a schema > with the types of these fields and table data with a separate list for each > field (and each list having one entry per input record). I was hoping I > could use a single pyarrow.SparseCSRMatrix field instead of the custom > th

Re: [C++] Working with dates before the epoch

2024-04-08 Thread Rok Mihevc
Here's an example of how Arrow uses date.h to get day/month/year from epoch time [1]. [1] https://github.com/apache/arrow/blob/main/cpp/src/arrow/compute/kernels/scalar_temporal_unary.cc#L261-L269 Rok On Mon, Apr 8, 2024 at 1:54 PM David Li wrote: > The C++ library vendors a backport of C++20'

Re: [DISCUSS] Apache Arrow Meetup in Europe

2025-03-06 Thread Rok Mihevc
+1 would attend and help with organisation. On Thu, Mar 6, 2025 at 5:57 PM Alenka Frim wrote: > +1 from me too, great idea - would definitely like to attend and help > with organisation! > > V V čet., 6. mar. 2025 ob 17:31 je oseba Raúl Cumplido > napisala: > > > +1, sounds like a great idea. I

Re: [QUESTION][Parquet][Encryption] Checksum Flow for Parquet Modular Encryption

2025-02-27 Thread Rok Mihevc
Yet another good resource would be parquet encryption docs [1]. Search for "integrity" to see how AES-GCM is used to ensure it. [1] https://parquet.apache.org/docs/file-format/data-pages/encryption/ Rok On Thu, Feb 27, 2025 at 8:22 PM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote: > Fur