Your problem is probably old hardware, specifically an older CPU. Pip
builds rely on popcnt (which I think is SSE4.1?)
I'm pretty sure you are right that you can compile from source and be ok.
It's a performance / portability tradeoff that has to be made when
packaging prebuilt binaries.
On Mon,
Hello,
I was just taking a look at pyarrow in my off hours. I was trying to write
a partitioned data set based on the birthdays example in the pyarrow cook
book. However when I run the script I get no data written and a "Illegal
Instruction" message prints to screen, no exception is raised. I inst
Thank you very much for the helpful response, Alenka. This provides much more
clarity to the partitioning system and how I should be interacting with it. I’m
in the process of re-processing my dataset to use integers for the date
partitioning, but still use strings for the site identifiers. I do
> You are looking for a row-wise mean, isn't it! I don't think there's an API
> for that pyarrow.compute.
Right, I don't think this is in there today either. The C++ compute
infrastructure itself can create functions that run on record batches
(instead of just arrays). An example of this is dro
Hi Antonio,
Sorry I think I misunderstood your question. You are looking for a row-wise
mean, isn't it! I don't think there's an API for that pyarrow.compute.
Sorry my bad.
You could call `add` for each column and manually create the mean (this
would be a vectorized operation column-wise. But this
Hi Niranda,
On Mon, Jan 24, 2022 at 2:41 PM Niranda Perera
wrote:
> Did you try using `pyarrow.compute` options? Inside that batch iterator
> loop you can call the compute mean function and then call the add_column
> method for record batches.
>
I cannot find how to pass multiple columns to be
Hi Antonio,
Did you try using `pyarrow.compute` options? Inside that batch iterator
loop you can call the compute mean function and then call the add_column
method for record batches.
In the latest arrow code base might have support for 'projection', that
could do this without having to iterate th
Hi list,
I am looking for a way to add a new column to an existing table that is
computed as the sum/mean of other columns. From the docs, I understand
that pyarrow compute functions operate on arrays (i.e. columns) but I
cannot find if it is possible to aggregate through columns in some way.
In