This is an interesting idea. For s3 multipart uploads one might run into
limitations pretty quickly (only 10k parts appear to be supported. all but
the last are expected to be at least 5mb if I read their docs correctly [1])
[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html
On Sat
I'd suggest a new write pattern. Write the columns page at a time to
separate files then use a second process to concatenate the columns and
append the footer. Odds are you would do better than os swapping and take
memory requirements down to page size times field count.
In s3 I believe you could
For reference, the doc (from eight years ago) I meant to link in my initial
message was:
https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/edit
On Sat, Jul 11, 2020, 11:24 AM Wes McKinney wrote:
> On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau
> wrote:
> >
> > On
On Sat, Jul 11, 2020 at 4:10 AM Rémi Dettai wrote:
>
> Hi Micah,
>
> Thanks for the answer ! But it seems your email got split in half in some
> way ;-)
>
> My use case mainly focuses on aggregations (with group by), and after
> fighting quite a bit with the allocators I ended up thinking that it
On Sat, Jul 11, 2020 at 11:55 AM Jacques Nadeau wrote:
>
> On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote:
>
> > I would also be interested in having a reusable serialized format for
> > filter- and projection-like expressions. I think trying to go so far
> > as full logical query plans suitab
On Mon, Jul 6, 2020 at 2:45 PM Wes McKinney wrote:
> I would also be interested in having a reusable serialized format for
> filter- and projection-like expressions. I think trying to go so far
> as full logical query plans suitable for building a SQL engine is
> perhaps a bit too far but we coul
Hi Micah,
Yes, those files are read correctly. We test against them.
I was trying to generate gold files based on 0.17.1, so I could debug
against those, I'll work on that in the coming days.
On Sat, 11 Jul 2020, 05:58 Micah Kornfield, wrote:
> Hi Neville,
> Thanks for the update. One question
Arrow Build Report for Job nightly-2020-07-11-0
All tasks:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-11-0
Failed Tasks:
- centos-8-aarch64:
URL:
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-07-11-0-travis-centos-8-aarch64
- conda-linux
Hi Micah,
Thanks for the answer ! But it seems your email got split in half in some
way ;-)
My use case mainly focuses on aggregations (with group by), and after
fighting quite a bit with the allocators I ended up thinking that it might
not be worth it materializing the raw data as arrow tables i