hi Abe -- you may have to open a JIRA about documentation improvement and/or bug fix for this. I don't know off-hand. Copying the dev@ list
- Wes On Tue, May 21, 2019 at 12:05 PM Abraham Elmahrek <a...@apache.org> wrote: > > Folks > > Does any one know how to do the following with filters for ParquetDataset > (DNF): A ⋀ B ⋀ (C ⋁ D)? > > I've tried the following without luck: > >> dataset = pq.ParquetDataset("<>", filesystem=s3fs.S3FileSystem(), filters=[ >> ("col", ">=", "<>"), >> ("col", "<=", "<>"), >> [[("col", "=", "<>")], [("col", "=", "<>")]] >> ]) > > > Where A = ("col", ">=", "<>"), B = ("col", "<=", "<>"), C = ("col", "=", > "<>"), and D = ("col", "=", "<>"). > > In the above example, I get the following error: >> >> File >> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py", >> line 961, in __init__ >> filters = _check_filters(filters) >> File >> "/opt/miniconda/envs/flatiron-cron/lib/python3.6/site-packages/pyarrow-0.13.0-py3.6-linux-x86_64.egg/pyarrow/parquet.py", >> line 93, in _check_filters >> for col, op, val in conjunction: >> ValueError: not enough values to unpack (expected 3, got 2) > > > Abe