Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-03 Thread Jeroen van Straten
It's not so much about whether or not we *can*, but about whether or not we *should* generate and expose these files. Fundamentally, Substrait aims to be a means to connect different systems together by standardizing some interchange format. Currently that happens to be based on protobuf, so *one*

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-04 Thread Jeroen van Straten
s to protobuf classes is not Arrow's > job. You can probably take the upstream (i.e. Substrait's) protobuf > definitions and compile them yourself, using whatever settings required > by your project. > > Regards > > Antoine. > > > Le 03/07/2022 à 21:16, Jeroen

Re: [C++] Question about substrait dependency in C++

2022-07-18 Thread Jeroen van Straten
Hi, I'm not sure I completely understand what you're trying to do, but if lack of internet access is the only problem, I think you should just be able to override the URL it tries to download by setting the ARROW_SUBSTRAIT_URL environment variable to some local file:// URL. I think it should work

Re: [C++] Question about substrait dependency in C++

2022-07-18 Thread Jeroen van Straten
gt; wouldn't recommend doing these auto-generation and build steps on your > own, > > as you'd have to change the Arrow C++ build system, and specifically > > "cpp/cmake_modules/ThirdpartyToolchain.cmake", to enable this. > > > > > > Yaron. > &

Re: [C++] Question about substrait dependency in C++

2022-07-18 Thread Jeroen van Straten
022 at 21:32, Li Jin wrote: > Thanks both. This appears to work! > > With regard to linking, I also have libsubstrait.a under build/debug, but > not in dist/lib - I suppose maybe the substrait classes are statically > linked into libarrow.so? > > Li > > On

Re: [DISCUSS] Policies for Substrait extensions

2022-04-19 Thread Jeroen van Straten
> At the moment there is a version at [2] which I will propose be the > official implementation for the Apache Arrow project (although it > needs a tiny bit of cleanup to remove a comment reference to C++). > Assuming the discussion doesn't raise any significant concerns in the > next week or so I'

Re: [C++] output field names in Arrow Substrait

2022-04-20 Thread Jeroen van Straten
>From a Substrait perspective, it would be up to Ibis to convert the column names to the correct indices. This can and should be transparent to the end user; front-ends already need to know what the schemas are in order to produce a valid Substrait plan, so Ibis should have all the information it n

Re: [C++] output field names in Arrow Substrait

2022-04-20 Thread Jeroen van Straten
umes substrait I'd > >> imagine there would be similar problems because schema is part of Spark > >> operators too and I don't think Substratit cannot convince Spark to go > >> index based. > >> I'd imagine during deserialization of the subtrai