Re: [Flight Extension] Request for Comments

2021-06-02 Thread Nate Bauernfeind
The thread isn't stale, and this is an appropriate question. Caveat; I have not yet finished applying the feedback from this thread. So, some of what I say below is not yet reflected in the oss offering (nor is it reflected in the existing main branch of the barrage repo). IMO there are two kinds

Re: [Flight Extension] Request for Comments

2021-06-01 Thread Paul Whalen
Hopefully this thread isn't too stale to pick back up with an open ended question. What interface would a Barrage client library expose? With Flight, application code cares about RecordBatches, but with Barrage it seems as though a client library ought to handle the updating of the table and expo

Re: [Flight Extension] Request for Comments

2021-03-09 Thread Micah Kornfield
> > As for schema evolution, I agree with what Micah proposes as a first step. > That would again add some overhead, perhaps. As for feasibility, at least > on the C++/Python side, I think there would be a decent amount of > refactoring needed, and there's also the question of how to expose this in

Re: [Flight Extension] Request for Comments

2021-03-09 Thread David Li
There's not really any convention for the app_metadata field or any of the other application-defined fields (e.g. DoAction, Criteria). That said, I wouldn't necessarily worry about conflicting with other projects - if a client connects to a Barrage service, presumably it knows what to expect. An

Re: [Flight Extension] Request for Comments

2021-03-08 Thread Micah Kornfield
> > You know what? This is actually a nicer solution than I am giving it > credit for. I've been trying to think about how to handle the > Integer.MAX_VALUE limit that arrow strongly suggests to maintain > compatibility with Java, while still respecting the need to apply an update > atomically. Fo

Re: [Flight Extension] Request for Comments

2021-03-08 Thread Nate Bauernfeind
>note that FlightData already has a separate app_metadata field That is an interesting point; are there any conventions on how to use the app_metadata compatibly without stepping on other ideas/projects doing the same? It would be convenient for the server to verify that the client is making the r

Re: [Flight Extension] Request for Comments

2021-03-08 Thread David Li
Hey - pretty much, I think. I'd just like to note that FlightData already has a separate app_metadata field, for metadata on top of any Arrow-level data, so you could ship the Barrage metadata alongside the first record batch, without having to modify anything about the record batch itself, and

Re: [Flight Extension] Request for Comments

2021-03-05 Thread Nate Bauernfeind
Eww. I didn't specify why I had two sets of record batches. Slightly revised: Are you suggesting this pattern of messages per incremental update? - FlightData with [the new] metadata header that includes added/removed/modified information, the number of add record batches, and the number of modifi

Re: [Flight Extension] Request for Comments

2021-03-05 Thread Nate Bauernfeind
> It seems that atomic application could also be something controlled in metadata (i.e. this is batch 1 or X)? You know what? This is actually a nicer solution than I am giving it credit for. I've been trying to think about how to handle the Integer.MAX_VALUE limit that arrow strongly suggests to

Re: [Flight Extension] Request for Comments

2021-03-05 Thread Micah Kornfield
> > And then having two sets of buffers, is the same as having two record > batches, albeit you need both sets to be delivered together, as noted. It seems that atomic application could also be something controlled in metadata (i.e. this is batch 1 or X)? The schema evolution question is interes

Re: [Flight Extension] Request for Comments

2021-03-05 Thread David Li
(responses inline) On Thu, Mar 4, 2021, at 17:26, Nate Bauernfeind wrote: > Regarding the BarrageRecordBatch: > > I have been concatenating them; it’s one batch with two sets of arrow > payloads. They don’t have separate metadata headers; the update is to be > applied atomically. I have only stud

Re: [Flight Extension] Request for Comments

2021-03-04 Thread Nate Bauernfeind
Regarding the BarrageRecordBatch: I have been concatenating them; it’s one batch with two sets of arrow payloads. They don’t have separate metadata headers; the update is to be applied atomically. I have only studied the Java Arrow Flight implementation, and I believe it is usable maybe with some

Re: [Flight Extension] Request for Comments

2021-03-04 Thread David Li
Re: the multiple batches, that makes sense. In that case, depending on how exactly the two record batches are laid out, I'd suggest considering a Union of Struct columns (where a Struct is essentially interchangeable with a record batch or table) - that would let you encode two distinct record b

Re: [Flight Extension] Request for Comments

2021-03-03 Thread Nate Bauernfeind
> if each payload has two batches with different purposes [...] The purposes of the payloads are slightly different, however they are intended to be applied atomically. If there are guarantees by the table operation generating the updates then those guarantees are only valid on each boundary of a

Re: [Flight Extension] Request for Comments

2021-03-03 Thread Paul Whalen
I'm not an Arrow contributor (perhaps one day!) but as a close follower and user of the project for the last six months (Arrow Flight specifically), I kind of jumped out of my chair when I saw this today. It's *exactly* what my team is looking for and something I have been close to building mysel

Re: [Flight Extension] Request for Comments

2021-03-03 Thread David Li
Ah okay, thank you for clarifying! In that case, if each payload has two batches with different purposes - might it make sense to just make that two different payloads, and set a flag/enum in the metadata to indicate how to interpret the batch? Then you'd be officially the same as Arrow Flight :

Re: [Flight Extension] Request for Comments

2021-03-03 Thread Nate Bauernfeind
Thanks for the interest =). > However, if I understand right, you're sending data without a fixed schema [...] The dataset does have a known schema ahead of time, which is similar to Flight. However, as you point out, the subscription can change which columns it is interested in without re-acquir

Re: [Flight Extension] Request for Comments

2021-03-03 Thread David Li
Hey Nate, Thanks for sharing this & for the detailed docs and writeup. I think your use case is interesting, but I'd like to clarify a few things. I would say Arrow Flight doesn't try to impose a particular model, but I agree that Barrage does things that aren't easily doable with Flight. Fligh

[Flight Extension] Request for Comments

2021-03-03 Thread Nate Bauernfeind
Hello, My colleagues at Deephaven Data Labs and I have been addressing problems at the intersection of data-driven applications, data science, and updating (/ticking) data for some years. Deephaven has a query engine that supports updating tabular data via a protocol that communicates precise cha