The thread isn't stale, and this is an appropriate question.
Caveat; I have not yet finished applying the feedback from this thread. So,
some of what I say below is not yet reflected in the oss offering (nor is
it reflected in the existing main branch of the barrage repo).
IMO there are two kinds
Hopefully this thread isn't too stale to pick back up with an open ended
question. What interface would a Barrage client library expose? With
Flight, application code cares about RecordBatches, but with Barrage it
seems as though a client library ought to handle the updating of the table
and expo
>
> As for schema evolution, I agree with what Micah proposes as a first step.
> That would again add some overhead, perhaps. As for feasibility, at least
> on the C++/Python side, I think there would be a decent amount of
> refactoring needed, and there's also the question of how to expose this in
There's not really any convention for the app_metadata field or any of the
other application-defined fields (e.g. DoAction, Criteria). That said, I
wouldn't necessarily worry about conflicting with other projects - if a client
connects to a Barrage service, presumably it knows what to expect. An
>
> You know what? This is actually a nicer solution than I am giving it
> credit for. I've been trying to think about how to handle the
> Integer.MAX_VALUE limit that arrow strongly suggests to maintain
> compatibility with Java, while still respecting the need to apply an update
> atomically.
Fo
>note that FlightData already has a separate app_metadata field
That is an interesting point; are there any conventions on how to use the
app_metadata compatibly without stepping on other ideas/projects doing the
same? It would be convenient for the server to verify that the client is
making the r
Hey - pretty much, I think. I'd just like to note that FlightData already has a
separate app_metadata field, for metadata on top of any Arrow-level data, so
you could ship the Barrage metadata alongside the first record batch, without
having to modify anything about the record batch itself, and
Eww. I didn't specify why I had two sets of record batches. Slightly
revised:
Are you suggesting this pattern of messages per incremental update?
- FlightData with [the new] metadata header that includes
added/removed/modified information, the number of add record batches, and
the number of modifi
> It seems that atomic application could also be something controlled in
metadata (i.e. this is batch 1 or X)?
You know what? This is actually a nicer solution than I am giving it credit
for. I've been trying to think about how to handle the Integer.MAX_VALUE
limit that arrow strongly suggests to
>
> And then having two sets of buffers, is the same as having two record
> batches, albeit you need both sets to be delivered together, as noted.
It seems that atomic application could also be something controlled in
metadata (i.e. this is batch 1 or X)?
The schema evolution question is interes
(responses inline)
On Thu, Mar 4, 2021, at 17:26, Nate Bauernfeind wrote:
> Regarding the BarrageRecordBatch:
>
> I have been concatenating them; it’s one batch with two sets of arrow
> payloads. They don’t have separate metadata headers; the update is to be
> applied atomically. I have only stud
Regarding the BarrageRecordBatch:
I have been concatenating them; it’s one batch with two sets of arrow
payloads. They don’t have separate metadata headers; the update is to be
applied atomically. I have only studied the Java Arrow Flight
implementation, and I believe it is usable maybe with some
Re: the multiple batches, that makes sense. In that case, depending on how
exactly the two record batches are laid out, I'd suggest considering a Union of
Struct columns (where a Struct is essentially interchangeable with a record
batch or table) - that would let you encode two distinct record b
> if each payload has two batches with different purposes [...]
The purposes of the payloads are slightly different, however they are
intended to be applied atomically. If there are guarantees by the table
operation generating the updates then those guarantees are only valid on
each boundary of a
I'm not an Arrow contributor (perhaps one day!) but as a close follower and
user of the project for the last six months (Arrow Flight specifically), I
kind of jumped out of my chair when I saw this today. It's *exactly* what
my team is looking for and something I have been close to building mysel
Ah okay, thank you for clarifying! In that case, if each payload has two
batches with different purposes - might it make sense to just make that two
different payloads, and set a flag/enum in the metadata to indicate how to
interpret the batch? Then you'd be officially the same as Arrow Flight :
Thanks for the interest =).
> However, if I understand right, you're sending data without a fixed
schema [...]
The dataset does have a known schema ahead of time, which is similar to
Flight. However, as you point out, the subscription can change which
columns it is interested in without re-acquir
Hey Nate,
Thanks for sharing this & for the detailed docs and writeup. I think your use
case is interesting, but I'd like to clarify a few things.
I would say Arrow Flight doesn't try to impose a particular model, but I agree
that Barrage does things that aren't easily doable with Flight. Fligh
Hello,
My colleagues at Deephaven Data Labs and I have been addressing problems at
the intersection of data-driven applications, data science, and updating
(/ticking) data for some years.
Deephaven has a query engine that supports updating tabular data via a
protocol that communicates precise cha
19 matches
Mail list logo