Re: [ANNOUNCE] New Arrow PMC chair: Andy Grove

2023-11-27 Thread Gavin Ray
Yay, congrats Andy! Well-deserved! On Mon, Nov 27, 2023 at 9:13 AM Kevin Gurney wrote: > Congratulations, Andy! > > From: Raúl Cumplido > Sent: Monday, November 27, 2023 8:58 AM > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow PMC chair: Andy Grov

Re: [DISCUSS] Protocol for exchanging Arrow data over REST APIs

2023-11-18 Thread Gavin Ray
I know that myself and a number of folks I work with would be interested in this. gRPC is a bit of a barrier for a lot of services. Having a spec for doing Arrow over HTTP API's would be solid. In my opinion, it doesn't necessarily need to be REST-ful. Something like JSON-RPC might fit well with

Re: Apache Arrow | Graph Algorithms & Data Structures

2023-06-30 Thread Gavin Ray
This isn't particularly efficient, but could you do something like this? https://replit.com/@GavinRay97/EnlightenedRichAdministration#main.py On Fri, Jun 30, 2023 at 1:10 PM Aldrin wrote: > > But I found out very quickly that I won't be able to... using only > Apache Arrow without resorting to

Re: DISCUSS: [FlightSQL] Catalog support

2022-11-30 Thread Gavin Ray
Just to chime in on this, one thing I'm curious about is whether there will be support for user-defined catalog/schema hierarchy depth? This comment that James made does seem reasonable to me > scheme://:/path-1/path-2/.../path-n Trino/Presto does a similar thing (jdbc:trino://localhost:8080/tpch

Re: [Rust][Blog] Fast and Memory Efficient Multi-Column Sorts

2022-11-07 Thread Gavin Ray
This is awesome, thanks for sharing! I was at the All Things Open conference recently, and Influx had a booth there. I went over to try to ask about the IOx/Datafusion stuff but unfortunately nobody at the booth knew anything about the technical details. Maybe next time =) On Mon, Nov 7, 2022 a

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-22 Thread Gavin Ray
wouldn't be able to > concatenate the parts together without knowing a safe separator to use. > > On Thu, Sep 22, 2022, at 14:23, Gavin Ray wrote: > > Wait, what happens if a datasource's spec allows dots as valid > identifiers? > > > > On Thu, Sep 22, 2022 a

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-22 Thread Gavin Ray
Wait, what happens if a datasource's spec allows dots as valid identifiers? On Thu, Sep 22, 2022 at 2:22 PM Gavin Ray wrote: > Ah okay, yeah that's a reasonable angle too haha > > > On Thu, Sep 22, 2022 at 1:59 PM David Li wrote: > >> Frankly it was from a &quo

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-22 Thread Gavin Ray
sets, so there's relatively little overhead. (In particular, > there's not an extra allocation per array; there's just an overall > allocation of a bitmap/offsets buffer.) > > On Thu, Sep 22, 2022, at 13:46, Gavin Ray wrote: > > I suppose you're thinking from a mem

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-22 Thread Gavin Ray
> want to work with the full hierarchy) be too much trouble? > > On Thu, Sep 22, 2022, at 13:14, Gavin Ray wrote: > > Antoine, I can't comment on the Go code (not qualified) but to me, the > > "verification" test > > examples look like a mixture between JDBC an

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-22 Thread Gavin Ray
Antoine, I can't comment on the Go code (not qualified) but to me, the "verification" test examples look like a mixture between JDBC and Java FlightSQL driver usage, and seem solid. There was one reservation I had about the ability to handle datasource namespacing that I brought up early on in the

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-21 Thread Gavin Ray
+1 (non-binding/I'm not important) On Wed, Sep 21, 2022 at 11:40 AM David Li wrote: > Hello, > > We have been discussing [1] standard interfaces for Arrow-based database > access and have been working on implementations of the proposed interfaces > [2], all under the name "ADBC". This proposal a

Re: Request for help with node/yarn in Docker image

2022-09-17 Thread Gavin Ray
(I omitted the part where you'd need to run the "apt-get install nginx" above in last, single-file Docker build, whoops) That would of course go after the "COPY --from=ui-build" and before the CMD/ENTRYPOINT 👍 On Sat, Sep 17, 2022 at 9:46 PM Gavin Ray wrote: > Hey

Re: Request for help with node/yarn in Docker image

2022-09-17 Thread Gavin Ray
Hey Andy, Happy to be useful in some way, I have a fair amount of experience here. Since you already have a Dockerfile next to this one that is building the React app and serving it on NGINX: "/workspaces/arrow-ballista/dev/docker/ballista-scheduler-ui.dockerfile" You can just copy the built ass

Re: [VOTE] Substrait for Flight SQL

2022-09-16 Thread Gavin Ray
: > >>> > >>> My vote continues to be +1 > >>> > >>> On Thu, Sep 8, 2022 at 11:44 AM Neal Richardson < > neal.p.richard...@gmail.com> > >>> wrote: > >>> > >>> > +1 > >>> > > >>>

Re: [VOTE] Substrait for Flight SQL

2022-09-08 Thread Gavin Ray
nimum 3 binding votes here but it turns out I can't > count and I make three. > > On Thu, Sep 8, 2022, at 12:14, Gavin Ray wrote: > > If non-PMC can vote, I'll also give a huge +1 > > > > On Thu, Sep 8, 2022 at 11:34 AM Matthew Topol > > > wrote: > >

Re: [VOTE] Substrait for Flight SQL

2022-09-08 Thread Gavin Ray
If non-PMC can vote, I'll also give a huge +1 On Thu, Sep 8, 2022 at 11:34 AM Matthew Topol wrote: > I'm not PMC but i'll give a +1 (non-binding) vote. I like the idea of > integrating Substrait plans into Flight SQL if possible and it aligns > with the arrow-adbc work. > > On Thu, Sep 8 2022 at

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-05 Thread Gavin Ray
Well-earned mate! On Mon, Sep 5, 2022 at 6:09 PM Sasha Krassovsky wrote: > Congratulations Weston!! Very well deserved! > > > > On Sep 5, 2022, at 11:04 AM, Ian Joiner wrote: > > > > Congrats Weston! > > > > On Mon, Sep 5, 2022 at 1:56 AM Sutou Kouhei wrote: > > > >> The Project Management Com

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Gavin Ray
> there are scalar api functions that can be logically used to process rows of data, but they are executed on columnar batches of data. > As mentioned previously it is better to have an API that applies row level transformations than to have an intermediary row level memory format. Another way of

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Gavin Ray
This is essentially the same idea as the proposal here I think -- row/map-based representation & conversion functions for ease of use: [RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry, increase adoption/audience and productivity. · Issue #12618 · apache/arrow (github.com)

Re: [FlightSql] Spark Flight SQL

2022-07-23 Thread Gavin Ray
This sounds pretty darn nifty! I don't have much of value to offer, but the idea sounds like a great one to me =) On Sat, Jul 23, 2022 at 5:18 PM Tornike Gurgenidze wrote: > David, thank you for the reply. > > I recently managed to find the time to get back to the repo. I thought I > would post

Re: Arrow sync call July 20 at 12:00 US/Eastern, 16:00 UTC

2022-07-20 Thread Gavin Ray
tion is > correct. > > [1]: > https://github.com/apache/arrow-adbc/blob/cf43e0cc2ae15ad0ce669b531d475ee218698100/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/JdbcStatement.java#L160 > > -David > > On Wed, Jul 20, 2022, at 14:22, Gavin Ray wrote: > > Th

Re: Arrow sync call July 20 at 12:00 US/Eastern, 16:00 UTC

2022-07-20 Thread Gavin Ray
That JDBC PreparedStatement binding utility looks super useful! I had one question about the behavior of it, if that's alright: The doc says: "Each call to next() will bind parameters > from the next row of data, and then the application can execute the > statement, call addBatch(), etc. as desir

Re: Arrow Flight usage with graph databases

2022-07-20 Thread Gavin Ray
> > We considered the option to analyze data to build a schema on the fly, > however it will be quite an expensive operation which will not allow us to > get performance benefits from using Arrow Flight. I'm not sure if you'll be able to avoid generating a schema on the fly, if it's anything like

Re: Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

2022-06-09 Thread Gavin Ray
substrait/ > [6] https://github.com/voltrondata/substrait-r > [7] http://github.com/substrait-io/substrait-validator > [8] > https://docs.google.com/document/d/1t7NrC76SyxL_OffATmjzZs2xcj1owdUsIF2WKL_Zw1U/ > [9] https://github.com/substrait-io/substrait/tree/main/extensions > > >

Re: Arrow sync call June 8 at 12:00 US/Eastern, 16:00 UTC

2022-06-08 Thread Gavin Ray
Thanks Ian -- can I ask whether there was any discussion of note that happened around Arrow + Substrait stuff? On Wed, Jun 8, 2022 at 5:31 PM Ian Cook wrote: > Attendees: > > Ian Cook > Raúl Cumplido > Alenka Frim > Ian Joiner > Will Jones > Jorge Leitão > David Li > Rok Mihevc > Ashish Paliwal

Re: [DISC] Improving Arrow's database support

2022-06-01 Thread Gavin Ray
This sounds great, but I had one question: Read the initial ADBC proposal and it mentioned that OLTP was not a targeted usecase If this work is intended to take on the role of a sort of standard ABI/SDK, does that mean that building OLTP-oriented drivers/tooling with it is off the table? On Wed,

Re: Datafusion's Java binding is available in Maven Central

2022-05-16 Thread Gavin Ray
On that note, you should be able to use the "jextract" tool from Project Panama to auto generate the glue code and types if you have C headers panama-foreign/panama_jextract.md at foreign-jextract · openjdk/panama-foreign (github.com)

Re: Datafusion's Java binding is available in Maven Central

2022-05-16 Thread Gavin Ray
This is awesome, thank you! On Mon, May 16, 2022 at 6:30 AM Jiayu Liu wrote: > Thanks for the question Atonine, > > So far the data is copied over (not IPC per-se, since it's the same > process), because I haven't found time (and motivation) to migrate to > Arrow C interface just yet. > > A next

Re: June 23 virtual conference to highlight work in the Arrow ecosystem

2022-05-13 Thread Gavin Ray
Super neat, saw the announcement post on Twitter and signed up the other day! If folks would find it interesting, I could do a short talk on a use-case for FlightSQL (and Substrait) The gist of it is having a central API that allows users/vendors to write "plugins" to register new data sources: [

Re: Arrow sync call May 11 at 12:00 US/Eastern, 16:00 UTC

2022-05-13 Thread Gavin Ray
I agree with this as well, and I it's also along the lines of what I was trying to propose here: "[RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry, increase adoption/audience and productivity." https://github.com/apache/arrow/issues/12618 It would be really nice if there was

Re: [Rust] Enable GitHub discussions for Rust projects?

2022-05-04 Thread Gavin Ray
How does voting on ASF mailing lists work? I assume random people don't get votes. If so, consider this email an informal voice of support -- otherwise +1 from me =) On Wed, May 4, 2022 at 11:40 AM Matthew Turner wrote: > +1 on enabling GitHub discussions for both arrow-rs and datafusion. I > t

Re: Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

2022-04-26 Thread Gavin Ray
Antoine, sandboxing comes into play from two places: 1) The WASM specification itself, which puts a bounds on the types of behaviors possible 2) The implementation of the WASM bytecode interpreter chosen, like Jorge mentioned in the comment above The wasmtime docs have a pretty solid section cove

Re: Designing standards for "sandboxed" Arrow user-defined functions [was Re: User defined "Arrow Compute Function"]

2022-04-25 Thread Gavin Ray
Sounds like a fantastic idea, and WASM seems a natural choice You get the ability to opt into IO if you want/need to, with WASI, but by default you can rest assured about worst-case consequences being contained. On Mon, Apr 25, 2022 at 4:20 PM Wes McKinney wrote: > I was going to reply to this

Re: [DISCUSS] A book about Apache Arrow

2022-04-20 Thread Gavin Ray
Nevermind, I'm a bit slow -- the ToC is at the bottom of the Amazon description. On Wed, Apr 20, 2022 at 2:17 PM Gavin Ray wrote: > Sorry to derail the thread a bit -- the book looks great and based on the > author/reviewers & summary I've ordered a copy. > Not a bi

Re: [DISCUSS] A book about Apache Arrow

2022-04-20 Thread Gavin Ray
Sorry to derail the thread a bit -- the book looks great and based on the author/reviewers & summary I've ordered a copy. Not a big deal but just curious whether there's a preview available/table-of-contents though? I didn't see one on either Amazon or Packt. Thanks =) On Wed, Apr 20, 2022 at 1:2

Re: Arrow in HPC

2022-04-07 Thread Gavin Ray
Congrats! On Thu, Apr 7, 2022 at 1:35 PM David Li wrote: > Just as an update: thanks to Yibo for the reviews; we've merged an initial > implementation that will be available in Arrow 8.0.0 (if built from > source). There's definitely more work to do: > > ARROW-10787 [C++][Flight] DoExchange does

Re: [DISCUSS] "Naming" the Arrow C++ execution engine subproject?

2022-03-29 Thread Gavin Ray
"Arrow Compute Engine" sounds quite nice to me, tbh Agreeing with the points made above about ACE being difficult to google, and AQE being a loaded term in query engines already. On Tue, Mar 29, 2022 at 10:07 AM Andy Grove wrote: > Just my 2 cents on this. If you were to call it ACE, I would ma

Re: [FlightSQL] Higher-level facade API to increase adoption/audience? Or does this belong as a personal project

2022-03-13 Thread Gavin Ray
FWIW, I filed an RFC issue here, along with a prototype implementation and sample usage + console output code: https://github.com/apache/arrow/issues/12618 On Sun, Mar 13, 2022 at 10:43 AM Gavin Ray wrote: > Generally, the preferred pattern is one VectorSchemaRoot that >> gets relo

Re: [FlightSQL] Higher-level facade API to increase adoption/audience? Or does this belong as a personal project

2022-03-13 Thread Gavin Ray
.loadVectorSchemaRoot(root)" > probably makes more sense but we can iterate on this. This wasn't commonly > understood when some of the other contrib modules were developed. > > Cheers, > Micah > > > On Sat, Mar 12, 2022 at 12:15 PM Gavin Ray wrote: > > >

[FlightSQL] Higher-level facade API to increase adoption/audience? Or does this belong as a personal project

2022-03-12 Thread Gavin Ray
While trying to implement and introduce the idea of adopting FlightSQL, the largest challenge was the API itself I know it's meant to be low-level. But I found that most of the development time was in code to convert to/from row-based data (IE Map) and Java types, and columnar data + Arrow types.

Re: Flight/FlightSQL Optimization for Small Results?

2022-03-08 Thread Gavin Ray
Thank you for doing this, left a few questions on the GH issue I would adopt this proposal as soon as it makes it into nightlies (or possibly earlier if it's just a matter of regenerating the proto definitions) The operation flow would be like this, or what would it look like? Client ---> GetFli

Re: [Rust] DataFusion + Substrait

2022-03-07 Thread Gavin Ray
Incredibly exciting! Following along eagerly =) On Mon, Mar 7, 2022 at 11:31 AM Andy Grove wrote: > I created a new repo in the datafusion-contrib GitHub org over the weekend > with a starting point for supporting DataFusion as both a producer and > consumer of Substrait plans. > > https://githu

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
The optimization proposed is for Flight. Once/if that gets > accepted and implemented, Flight SQL servers could then use it to optimize > GetCatalogs: they would return a FlightInfo that has the data embedded. So > yes, all the methods should get support for this once things get worked o

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
at 10:46 AM David Li wrote: > (responses inline) > > On Mon, Mar 7, 2022, at 10:37, Gavin Ray wrote: > >> > >> Another contributor is currently working on some Java > >> tutorials/documentation so any feedback would be helpful. > > > > > > A

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
t; issue? Seems something might have changed and we should be prepared to fix > it. (Flight/Java does a lot of poking at internal APIs to try to avoid > copies.) > > Thanks, > David > > On Mon, Mar 7, 2022, at 09:48, Gavin Ray wrote: > > Ah brilliant! Yeah, Websockets (o

Re: [FlightSQL] "flightsql-kotlin" submodule for Kotlin protobuf/gRPC codegen?

2022-03-07 Thread Gavin Ray
; On Mon, Mar 7, 2022, at 09:13, Gavin Ray wrote: > > I'm curious whether folks think it would be reasonable to upstream an > > optional Kotlin submodule that uses the Kotlin code generator for > FlightSQL? > > > > Or would this be better off as a personal repository

Re: [FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
String so you could use them to try to implement > your own protocol using HTTP, yes. > > [1]: https://github.com/apache/arrow/pull/12465 > > -David > > On Mon, Mar 7, 2022, at 09:24, Gavin Ray wrote: > > Due to the current implementation status of FlightSQL (C++/Rust/JVM o

[FlightSQL] Non-gRPC interop (IE REST) possible with SerializeToString() [C++] / serialize() [Java]?

2022-03-07 Thread Gavin Ray
Due to the current implementation status of FlightSQL (C++/Rust/JVM only) I am trying to see whether it's possible to allow FlightSQL over something like HTTP/REST so that arbitrary languages can be used. In the codebase, I saw these (and their deserialize counterparts): /// \brief Get the wir

[FlightSQL] "flightsql-kotlin" submodule for Kotlin protobuf/gRPC codegen?

2022-03-07 Thread Gavin Ray
I'm curious whether folks think it would be reasonable to upstream an optional Kotlin submodule that uses the Kotlin code generator for FlightSQL? Or would this be better off as a personal repository? The Rust FlightSQL API is a fair bit nicer due to the syntax. The Kotlin Protobuf plugin produce

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-06 Thread Gavin Ray
ry to standardize them. > > > > I think what you are doing should be reasonable. You may not need _all_ > of > > the capabilities in Flight SQL for this (e.g. all the various metadata > > calls, or prepared statements, perhaps) but I don't see why it wouldn't &g

Re: Is 7.0.0 release missing the Java arrow-flight POM?

2022-03-06 Thread Gavin Ray
Hey all, I wanted to start prototyping a project with FlightSQL, so I have written a script to extract from the nightlies and published the assets from 03/03 on my personal Github. You can use this repo as a Gradle/Maven repository if you want, while we wait for the next release containing the Fl

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-04 Thread Gavin Ray
is a property in GetSqlInfo. > >> 2. What happens to client code written prior to changing the command > type > >> to be a oneOf field? Same for servers. > >> More generally, how should backward compatibility work, and what should > >> happen if a client

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-03 Thread Gavin Ray
te: > > > >> In the same way that you could write an ODBC driver that takes in text > >> that's not SQL, you could write a Flight SQL server that takes in text > >> that's JSON. > >> Flight SQL doesn't parse the query, so you could create comman

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-03 Thread Gavin Ray
//substrait.io/ > > Which is being worked on by several people, including Arrow community > members. > > It might be interesting to generalize Flight SQL to include support for > Substrait. I'm curious what your application, if you're able to share more. > > -David

[FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

2022-03-03 Thread Gavin Ray
Hiya, I am drafting a proposal for a way to enable services to express data compute operations to each other. However I think it'll be difficult to get buy-in if the only representation for queries is as SQL strings. Is there any kind of lower-level API that can be used to express operations? I

Re: Arrow sync call March 2 at 12:00 US/Eastern, 17:00 UTC

2022-03-02 Thread Gavin Ray
parsing we want to do, which again depends on intended size of > inlined data. > > -Micah > > On Wed, Mar 2, 2022 at 10:22 AM Gavin Ray wrote: > >> Particularly curious about the small-results FlightSQL optimizations and >> general FlightSQL developments, if there was

Re: Arrow sync call March 2 at 12:00 US/Eastern, 17:00 UTC

2022-03-02 Thread Gavin Ray
you would like more context on? > > On Wed, Mar 2, 2022 at 10:10 AM Gavin Ray wrote: > > > Was this recorded by any chance? No worries if not. > > > > On Wed, Mar 2, 2022 at 9:58 AM Alessandro Molina < > > alessan...@ursacomputing.com> wrote: > > >

Re: Arrow sync call March 2 at 12:00 US/Eastern, 17:00 UTC

2022-03-02 Thread Gavin Ray
Was this recorded by any chance? No worries if not. On Wed, Mar 2, 2022 at 9:58 AM Alessandro Molina < alessan...@ursacomputing.com> wrote: > Attendees: > > > Alessandro Molina > > Micah Kornfield > > David Li > > Joris Van Den Bossche > > > > Discussion: > > > Flight SQL Optimization for Small R

Re: [PROPOSAL] New Proposals for FlightSQL

2022-02-24 Thread Gavin Ray
My opinion isn't worth much, but any extra metadata and utility methods/classes to work with them are incredibly useful for tools that do dynamic/programmatic generation of UI's or codegen. Imagining a service that takes a FlightSQL connection and generates a web UI for CRUD dynamically using the

[FlightSQL] Flight as a cross-language JDBC driver?

2022-02-22 Thread Gavin Ray
done this (Chesterton's Fence) Thank you =) Gavin Ray.