Re: Arrow sync call October 12 at 12:00 US/Eastern, 16:00 UTC

2022-10-13 Thread Ian Cook
Attendees: - Vibhatha Abeykoon - Anja Boskovic - Ian Cook - Will Jones - David Li - Rok Mihevc - Percy Triveño Aucahuasi - Joris Van den Bossche - Jacob Wujciak Discussion: DELTA_BINARY_PACKED encoder for Parquet - Rok looking for help debugging the PR [1] - Tests are failing on some architectu

Re: Question about pyarrow.substrait.run_query

2022-10-13 Thread Li Jin
Thank you Weston! On Thu, Oct 13, 2022 at 1:05 AM Weston Pace wrote: > 1. Yes. > 2. I was going to say yes but...on closer examination...it appears > that it is not applying backpressure. > > The SinkNode accumulates batches in a queue and applies backpressure. > I thought we were using a sink n

Re: Substrait consumer for custom data sources

2022-10-13 Thread Li Jin
We did some work around this recently and think there needs to be some small change to allow users to override this default provider. I will explain in more details: (1) Since the variable is defined as static in the substrait/options.h file, each translation unit will have a separate copy of the

Re: Substrait consumer for custom data sources

2022-10-13 Thread Weston Pace
> Does that sound like a reasonable way to do this? It's not ideal. I may be assuming here but I think your problem is more that there is no way to more flexibly describe a source in python and less that you need to change the default. For example, if you could do something like this (in pyarrow

Re: Substrait consumer for custom data sources

2022-10-13 Thread Li Jin
> I may be assuming here but I think your problem is more that there is no way to more flexibly describe a source in python and less that you need to change the default. Correct. > For example, if you could do something like this (in pyarrow) would it work? I could try to see if that works. I fee

Re: Substrait consumer for custom data sources

2022-10-13 Thread Li Jin
Weston - was trying the pyarrow approach you suggested: >def custom_source(endpoint): return pc.Declaration("my_custom_source", create_my_custom_options()) (1) I didn't see "Declaration" under pyarrow.compute - which package is this under? (2) What Python object should I return with create_my_

Re: Substrait consumer for custom data sources

2022-10-13 Thread Li Jin
Going back to the default_exec_factory_registry idea, I think ultimately maybe we want registration API that looks like: """ MetaRegistry* registry = compute::default_meta_registry(); registry->RegisterNamedTableProvider(...); registry->exec_factory_registry()->AddFactory("my_custom_node", MakeMyC

Re: Parser for expressions

2022-10-13 Thread Sasha Krassovsky
Hi everyone, I’d be fine with switching it to add(x, y). I’ll look into round-trip support, I imagine we can massage the ToString implementation a bit as well to make it easier to parse back. Did anyone have opinions about the syntax for FieldRefs or Scalars? Scalars of the form $type:value ma

Re: Substrait consumer for custom data sources

2022-10-13 Thread Li Jin
After some struggling we finally managed to connect our internal data source to Acero and executed a data load via pyarrow.substrait.run_query() ! We did end up temporarily modifying substrait/options.h source code locally and made kDefaultNamedTableProvider extern/global. But since this doesn't