; > Gerlando,
> > >
> > > AFAIK Parquet does not yet support indexing. I believe it does store
> > min/max values at the row batch (or maybe it's page) level which may help
> > eliminate large "swaths" of data depending on how actual data values
) level which may help
> eliminate large "swaths" of data depending on how actual data values
> corresponding to a search predicate are distributed across large Parquet
> files.
>
> I have an interest in the future of indexing within the native Parquet
> structure as
ut you can query the data as it
> > > arrives.
> > >
> > > Then, later, say once per day, you can consolidate the files into a few
> > > big files. The only trick is the race condition of doing the
> > consolidation
> > > while running queries. Not su
Hi,
I'm looking for a way to store huge amounts of logging data in the cloud
from about 100 different data sources, each producing about 50MB/day (so
it's something like 5GB/day).
The target storage would be an S3 object storage for cost-efficiency
reasons.
I would like to be able to store (i.e. a
Gerlando Falauto created ARROW-2616:
---
Summary: Cross-compiling Pyarrow
Key: ARROW-2616
URL: https://issues.apache.org/jira/browse/ARROW-2616
Project: Apache Arrow
Issue Type: Bug