Re: ParquetDataset Filters Question

Wes McKinney Wed, 22 May 2019 12:13:11 -0700

hi Abe -- you may have to open a JIRA about documentation improvement
and/or bug fix for this. I don't know off-hand. Copying the dev@ list


- Wes

On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <a...@apache.org> wrote:
>
> Folks
>
> Does any one know how to do the following with filters for ParquetDataset 
> (DNF): A ⋀ B ⋀ (C ⋁ D)?
>
> I've tried the following without luck:
>
>> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[
>>     ("col", ">=", "<>"),
>>     ("col", "<=", "<>"),
>>     [[("col", "=", "<>")], [("col", "=", "<>")]]
>> ])
>
>
> Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col", "=", 
> "<>"), and D = ("col", "=", "<>").
>
> In the above example, I get the following error:
>>
>>   File 
>> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
>>  line 961, in __init__
>>     filters = _check_filters(filters)
>>   File 
>> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py",
>>  line 93, in _check_filters
>>     for col, op, val in conjunction:
>> ValueError: not enough values to unpack (expected 3, got 2)
>
>
> Abe

Re: ParquetDataset Filters Question

Reply via email to