Re: Arrow & plasma - java sample to store complex objects

Gérard Dupont Thu, 09 Aug 2018 09:06:41 -0700

Thanks both for the prompt reply. Indeed I figured out that the current
JAVA api was a "work-in-progress". Still, it is quite promising.


I managed to code a workable solution with ByteArrayOutputStream and taping
directly on Plasma API. Which is not particularly efficient but it works.
I'll have a look on BufferAllocator and also the Spar SQL parts. Not sure
I'm up to the level for contribution, but I'll look into it.

On another note, I've also investigated Ray project (which uses plasma) and
offer a nice JAVA api (not sure about maturity though).

Thanks again,
Cheers,
gdupont

On Tue, 7 Aug 2018 at 17:19, Jacques Nadeau <jacq...@apache.org> wrote:

> Bleeding edge is probably an understatement. We need someone to implement
> https://issues.apache.org/jira/browse/ARROW-2892 before this is really
> feasible without copy. You could do it with Glue code today and a copy from
> the memory used in the current Plasma Java client to the memory used in
> BufferAllocator (and vice versa).
>
> On Mon, Aug 6, 2018 at 4:19 PM, Wes McKinney <wesmck...@gmail.com> wrote:
>
> > hi Gerard,
> >
> > This is the right place to ask questions. The Slack channel was closed
> > (see prior discussions on the mailing list); few Java developers were
> > on Slack anyway so it wouldn't have been a good place to get help.
> >
> > Using Java with Plasma is very bleeding edge territory. I don't know
> > if anyone has an example yet of using Plasma with end-to-end Arrow
> > columnar read and write. I would say it's definitely the domain of
> > developers working on the Java codebase to build out support tooling
> > for these workflows. We'd be glad to have you involved.
> >
> > For general Arrow workflows, I would recommend looking at the Arrow
> > conversion paths in the Spark SQL codebase. There we have record
> > batches being streamed to Python and then results received back on the
> > JVM side.
> >
> > - Wes
> >
> > On Mon, Aug 6, 2018 at 10:19 AM, Gérard Dupont <ger.dup...@gmail.com>
> > wrote:
> > > Hi,
> > > Not sure this is the right channel for a "user" oriented question but
> the
> > > slack channel on heroku seams to be down...
> > >
> > > TL;DR: is there some hidden tutorial/java samples to store complex data
> > > objects in arrow and access (put/get) with plasma? I'm currently
> > exploring
> > > the unit test from the java part of the source, but it's not really
> > > obvious...
> > >
> > > So, I'm starting with arrow and actually I went in for the plasma
> object
> > > store which should address an issue I currently have: sharing objects
> > > between process on multiples servers to serve as initial starting point
> > in
> > > computations.
> > >
> > > The computation parts are already done since the need for distribution
> > just
> > > recently emerged, and I'm trying to see if I can port the data object
> > > within arrow to distribute them over plasma. So far so good: I can
> launch
> > > the plasma_store and access it through the ObjectStoreLink API. But
> it's
> > > really on the byte array level. Any advice or best practice on how to
> > > convert existing data model to arrow compliant one? Should I look into
> > the
> > > Arrow Schema example?
> > >
> > > Thanks for any pointer.
> > >
> > > Cheers,
> > > --
> > > Gérard Dupont
> >
>


-- 
Gérard Dupont

Re: Arrow & plasma - java sample to store complex objects

Reply via email to