Re: [DISCUSS] Release Python Datafusion 0.3.0

2021-07-21 Thread Andrew Lamb
I think it is a great idea to release the python bindings. In terms of binary / source releases, one approach that also work could be 1. sign / vote on a source release of DataFusion as a whole 2. build and push the binaries based on that approved source (much like the various Linux distributions

Re: [DISCUSS] Release Python Datafusion 0.3.0

2021-07-21 Thread Neal Richardson
Sounds good to me. I'd recommend that you document the release process, whenever it is agreed upon, on https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide (or document it somewhere and link to it there). Neal On Wed, Jul 21, 2021 at 8:35 AM Andrew Lamb wrote: > I think i

Arrow sync call July 21 at 12:00 US/Eastern, 16:00 UTC

2021-07-21 Thread Ian Cook
Hi all, Our biweekly sync call is today at 12:00 noon Eastern time. For today's call, let's please us this Google Meet URL (different from the usual one): https://meet.google.com/ebp-tczo-xjn All are welcome to join. Notes will be shared with the mailing list afterward. Thanks, Ian

Re: C++ parquet::TypedColumnReader::ReadBatchSpaced() replacement?

2021-07-21 Thread Adam Hooper
Hi Micah, Thank you for this wonderful description. You've solved my problem exactly. Responses inline: > "ReadBatchSpaced() in a loop isfaster than reading an entire record > > batch." > > Could you elaborate on this? What code path were you using for reading > record batches that was slower?

How Pandas/Perspective represent table pivots in arrow

2021-07-21 Thread Michael Lavina
Hello Apache Arrow Team, I am looking at ways my company can create an SDK that can share apache arrow data while preserving table pivots. I was looking at how Pandas and Perspective do it and it seems like For row_pivots Pandas just sorts the data into a flat arrow structure Perspective actu

Re: C++ parquet::TypedColumnReader::ReadBatchSpaced() replacement?

2021-07-21 Thread Micah Kornfield
If dictionary encoded data is specifically a concern, we've added new experimental APIs that should be in the next release that allows for retrieving dictionary data as indexes + dictionaries (ReadBatchWithDictionary) instead of denormalizing them as ReadBatch does. -Micah On Wed, Jul 21, 2021 at