FWIW, I filed an RFC issue here, along with a prototype implementation and
sample usage + console output code:

https://github.com/apache/arrow/issues/12618

On Sun, Mar 13, 2022 at 10:43 AM Gavin Ray <ray.gavi...@gmail.com> wrote:

> Generally, the preferred pattern is one VectorSchemaRoot that
>> gets reloaded each time.  So an API like "df.loadVectorSchemaRoot(root)"
>> probably makes more sense but we can iterate on this.
>>
>
> Could you expand on what exactly you mean by this?
>
> Still a bit blurry on the best-practices behind sending
> the Arrow response in Flight and seems like an important point.
>
>
> ... creating a new contrib module that maps
>> from java objects (just like there are JDBC and Avro ones) seems
>> worthwhile.  If you are interested in contributing something like this I
>> think a short design doc would be worth-while.
>>
>
> Where would be the best place to post this?
>
> I was thinking about GitHub issues but I am GitHub-centric,
> not sure if JIRA or mailing list would be better.
>
> Thanks, Micah!
>
>
> On Sun, Mar 13, 2022 at 12:46 AM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
>> Hi Gavin,
>>
>> > Just curious whether there is any interest/intention of possibly making
>> a
>> > higher level API around the basic FlightSQL one?
>>
>>
>> IIUC, I don't think this is an issue with Flight but one with generic
>> conversion between data into Arrow.  I don't think anyone is actively
>> working on something like this, but creating a new contrib module that
>> maps
>> from java objects (just like there are JDBC and Avro ones) seems
>> worthwhile.  If you are interested in contributing something like this I
>> think a short design doc would be worth-while.
>>
>> VectorSchemaRoot root = df.toVectorSchemaRoot();
>> > listener.setVectorSchemaRoot(root);
>> > listener.sendVectorSchemaRootContents();
>>
>>
>> A small nit.  Generally, the preferred pattern is one VectorSchemaRoot
>> that
>> gets reloaded each time.  So an API like "df.loadVectorSchemaRoot(root)"
>> probably makes more sense but we can iterate on this.  This wasn't
>> commonly
>> understood when some of the other contrib modules were developed.
>>
>> Cheers,
>> Micah
>>
>>
>> On Sat, Mar 12, 2022 at 12:15 PM Gavin Ray <ray.gavi...@gmail.com> wrote:
>>
>> > While trying to implement and introduce the idea of adopting FlightSQL,
>> the
>> > largest challenge was the API itself
>> >
>> > I know it's meant to be low-level. But I found that most of the
>> development
>> > time was in code to convert to/from
>> > row-based data (IE Map<String, Object>) and Java types, and columnar
>> data +
>> > Arrow types.
>> >
>> > I'm likely in the minority position here -- I know that Arrow and
>> FlightSQL
>> > users are largely looking at transferring large volumes of data and
>> > servicing OLAP-type workloads
>> > But the thing that excites me most about FlightSQL, isn't its
>> performance
>> > (always nice to have), but that it's a language-agnostic standard for
>> data
>> > access.
>> >
>> > That has broad implications -- for all kinds of data-access workloads
>> and
>> > business usecases.
>> >
>> > The challenge is that in trying to advocate for it, when presenting a
>> > proof-of-concept,
>> > rather than what a developer might expect to see, something like:
>> >
>> > // FlightSQL handler code
>> > List<Map<String, Object>> results = ....;
>> > results.add(Map.of("id", 1, "name", "Person 1");
>> > return results;
>> >
>> > A significant portion of the code is in Arrow-specific implementation
>> > details:
>> > creating a VectorSchemaRoot, FieldVector, de-serializing the results on
>> the
>> > client, etc.
>> >
>> > Just curious whether there is any interest/intention of possibly making
>> a
>> > higher level API around the basic FlightSQL one?
>> > Maybe something closer to the traditional notion of a row-based
>> "DataFrame"
>> > or "Table", like:
>> >
>> > DataFrame df = new DataFrame();
>> > df.addColumn("id", ArrowTypes.Int);
>> > df.addColumn("name", ArrowTypes.VarChar);
>> > df.addRow(Map.of("id", 1, "name", "Person 1"));
>> > VectorSchemaRoot root = df.toVectorSchemaRoot();
>> > listener.setVectorSchemaRoot(root);
>> > listener.sendVectorSchemaRootContents();
>> >
>>
>

Reply via email to