Never mind, I realized I can use the pyarrow.compute.invert. Thank you again
for the super fast answer
> On 4 Nov 2020, at 15:13, Niklas B wrote:
>
> Thank you! This looks awesome. Any good way to inverse the ChunkedArray? I
> know I can cast to Numpy (experimental) and do it ther
ssues.apache.org/jira/browse/ARROW-9164, and
> we should maybe also try to inject the option keywords in the function
> docstring.
>
> Best,
> Joris
>
> On Wed, 4 Nov 2020 at 14:14, Niklas B wrote:
>
>> Hi,
>>
>> I’m trying in Python to (without reading e
Hi,
I’m trying in Python to (without reading entire parquet file into memory)
filter out certain rows (based on uuid-strings). My approach is to read each
row group, then try to filter it without casting it to pandas (since it’s
expensive for data-frames with lots of strings it in). Looking in
Hi,
I’ve been (together with the PyPy team) working on getting arrow to build on
PyPy3. I’m not looking for full feature capability, but specifically getting it
to work with pandas read_parquet/to_parquet which it now does. There were a few
roadblocks solved by the awesome Matti Picus on the Py
1
> 1 2 1
> 2 4 1
>
> So still more manual work than just specifying a DNF filter, but normally
> all necessary building blocks are available (the goal is certainly to use
> those building block in a more general query engine that works for both
> in-memory tables as
Hi,
I have an in-memory dataset from Plasma that I need to filter before running
`to_pandas()`. It’s a very text heavy dataset with a lot of rows and columns
(only about 30% of which is applicable for any operation). Now I know that you
use DNF filters to filter a parquet file before reading to
We to rely heavily on Plasma (we use Ray as well, but also Plasma independent
of Ray). I’ve started a thread on ray dev list to see if Rays plasma can be
used standalone outside of ray as well. That would allow us who use Plasma to
move to a standalone “ray plasma” when/if it’s removed from Arro
thub.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/memory.pxi#L156
> [3]
> https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/memory.pxi#L156
>
> On Tue, Sep 15, 2020 at 8:46 AM Niklas B wrote:
>
>>
First of all: Thank you so much for all hard work on Arrow, it’s an awesome
project.
Hi,
I'm trying to write a large parquet file onto disk (larger then memory) using
PyArrows ParquetWriter and write_table, but even though the file is written
incrementally to disk it still appears to keeps th