FWIW, I filed an RFC issue here, along with a prototype implementation and sample usage + console output code:
https://github.com/apache/arrow/issues/12618 On Sun, Mar 13, 2022 at 10:43 AM Gavin Ray <ray.gavi...@gmail.com> wrote: > Generally, the preferred pattern is one VectorSchemaRoot that >> gets reloaded each time. So an API like "df.loadVectorSchemaRoot(root)" >> probably makes more sense but we can iterate on this. >> > > Could you expand on what exactly you mean by this? > > Still a bit blurry on the best-practices behind sending > the Arrow response in Flight and seems like an important point. > > > ... creating a new contrib module that maps >> from java objects (just like there are JDBC and Avro ones) seems >> worthwhile. If you are interested in contributing something like this I >> think a short design doc would be worth-while. >> > > Where would be the best place to post this? > > I was thinking about GitHub issues but I am GitHub-centric, > not sure if JIRA or mailing list would be better. > > Thanks, Micah! > > > On Sun, Mar 13, 2022 at 12:46 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> Hi Gavin, >> >> > Just curious whether there is any interest/intention of possibly making >> a >> > higher level API around the basic FlightSQL one? >> >> >> IIUC, I don't think this is an issue with Flight but one with generic >> conversion between data into Arrow. I don't think anyone is actively >> working on something like this, but creating a new contrib module that >> maps >> from java objects (just like there are JDBC and Avro ones) seems >> worthwhile. If you are interested in contributing something like this I >> think a short design doc would be worth-while. >> >> VectorSchemaRoot root = df.toVectorSchemaRoot(); >> > listener.setVectorSchemaRoot(root); >> > listener.sendVectorSchemaRootContents(); >> >> >> A small nit. Generally, the preferred pattern is one VectorSchemaRoot >> that >> gets reloaded each time. So an API like "df.loadVectorSchemaRoot(root)" >> probably makes more sense but we can iterate on this. This wasn't >> commonly >> understood when some of the other contrib modules were developed. >> >> Cheers, >> Micah >> >> >> On Sat, Mar 12, 2022 at 12:15 PM Gavin Ray <ray.gavi...@gmail.com> wrote: >> >> > While trying to implement and introduce the idea of adopting FlightSQL, >> the >> > largest challenge was the API itself >> > >> > I know it's meant to be low-level. But I found that most of the >> development >> > time was in code to convert to/from >> > row-based data (IE Map<String, Object>) and Java types, and columnar >> data + >> > Arrow types. >> > >> > I'm likely in the minority position here -- I know that Arrow and >> FlightSQL >> > users are largely looking at transferring large volumes of data and >> > servicing OLAP-type workloads >> > But the thing that excites me most about FlightSQL, isn't its >> performance >> > (always nice to have), but that it's a language-agnostic standard for >> data >> > access. >> > >> > That has broad implications -- for all kinds of data-access workloads >> and >> > business usecases. >> > >> > The challenge is that in trying to advocate for it, when presenting a >> > proof-of-concept, >> > rather than what a developer might expect to see, something like: >> > >> > // FlightSQL handler code >> > List<Map<String, Object>> results = ....; >> > results.add(Map.of("id", 1, "name", "Person 1"); >> > return results; >> > >> > A significant portion of the code is in Arrow-specific implementation >> > details: >> > creating a VectorSchemaRoot, FieldVector, de-serializing the results on >> the >> > client, etc. >> > >> > Just curious whether there is any interest/intention of possibly making >> a >> > higher level API around the basic FlightSQL one? >> > Maybe something closer to the traditional notion of a row-based >> "DataFrame" >> > or "Table", like: >> > >> > DataFrame df = new DataFrame(); >> > df.addColumn("id", ArrowTypes.Int); >> > df.addColumn("name", ArrowTypes.VarChar); >> > df.addRow(Map.of("id", 1, "name", "Person 1")); >> > VectorSchemaRoot root = df.toVectorSchemaRoot(); >> > listener.setVectorSchemaRoot(root); >> > listener.sendVectorSchemaRootContents(); >> > >> >