date:20200711

Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Micah Kornfield

This is an interesting idea. For s3 multipart uploads one might run into limitations pretty quickly (only 10k parts appear to be supported. all but the last are expected to be at least 5mb if I read their docs correctly [1]) [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html On Sat

Re: Writing very large rowgroups to Apache Parquet

2020-07-11 Thread Jacques Nadeau

I'd suggest a new write pattern. Write the columns page at a time to separate files then use a second process to concatenate the columns and append the footer. Odds are you would do better than os swapping and take memory requirements down to page size times field count. In s3 I believe you could

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau

For reference, the doc (from eight years ago) I meant to link in my initial message was: https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit On Sat, Jul 11, 2020, 11:24 AM Wes McKinney wrote: > On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau > wrote: > > > > On

Re: [DISCUSS] [C++] custom allocator for large objects

2020-07-11 Thread Wes McKinney

On Sat, Jul 11, 2020 at 4:10 AM Rémi Dettai wrote: > > Hi Micah, > > Thanks for the answer ! But it seems your email got split in half in some > way ;-) > > My use case mainly focuses on aggregations (with group by), and after > fighting quite a bit with the allocators I ended up thinking that it

Re: language independent representation of filter expressions

2020-07-11 Thread Wes McKinney

On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau wrote: > > On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > > > I would also be interested in having a reusable serialized format for > > filter- and projection-like expressions. I think trying to go so far > > as full logical query plans suitab

Re: language independent representation of filter expressions

2020-07-11 Thread Jacques Nadeau

On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote: > I would also be interested in having a reusable serialized format for > filter- and projection-like expressions. I think trying to go so far > as full logical query plans suitable for building a SQL engine is > perhaps a bit too far but we coul

Re: Status of Rust Integration Testing

2020-07-11 Thread Neville Dipale

Hi Micah, Yes, those files are read correctly. We test against them. I was trying to generate gold files based on 0.17.1, so I could debug against those, I'll work on that in the coming days. On Sat, 11 Jul 2020, 05:58 Micah Kornfield, wrote: > Hi Neville, > Thanks for the update. One question

[NIGHTLY] Arrow Build Report for Job nightly-2020-07-11-0

2020-07-11 Thread Crossbow

Arrow Build Report for Job nightly-2020-07-11-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-11-0 Failed Tasks: - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-11-0-travis-centos-8-aarch64 - conda-linux

Re: [DISCUSS] [C++] custom allocator for large objects

2020-07-11 Thread Rémi Dettai

Hi Micah, Thanks for the answer ! But it seems your email got split in half in some way ;-) My use case mainly focuses on aggregations (with group by), and after fighting quite a bit with the allocators I ended up thinking that it might not be worth it materializing the raw data as arrow tables i

Re: Writing very large rowgroups to Apache Parquet

Re: Writing very large rowgroups to Apache Parquet

Re: language independent representation of filter expressions

Re: [DISCUSS] [C++] custom allocator for large objects

Re: language independent representation of filter expressions

Re: language independent representation of filter expressions

Re: Status of Rust Integration Testing

[NIGHTLY] Arrow Build Report for Job nightly-2020-07-11-0

Re: [DISCUSS] [C++] custom allocator for large objects

9 matches

Site Navigation

Mail list logo

Footer information