For java, you can start by looking at this entry point:

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet/columnreaders/DeprecatedParquetVectorizedReader.java

Something that might actually be easier as an initial understanding
(simpler) is looking at the half complete Avro reader we have (we don't
actually currently support this functionality but it should give you some
ideas).

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/avro/AvroRecordReader.java

Or Json:

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonReader.java

I suggest working directly with memory or vectors for highest performance
(as opposed to the ComplexWriter facade). Typically what I suggest is start
by getting flat types working with vectors directly. From there, look to
enhance support to also include complex types using the ComplexWriter,
optimizing for specific patterns as makes sense.

For a sense of how one might interact directly with memory, you could look
at something like this:

https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/copier/FieldBufferCopier.java



On Thu, Nov 16, 2017 at 2:32 PM, Lewis John McGibbney <lewi...@apache.org>
wrote:

> Hi Jacques,
> Can you point me to where I get started e.g. with the converter?
> Where does the Parquet --> Arrow one current exist?
> Thank you
>
> On 2017-11-16 10:42, Jacques Nadeau <jacq...@apache.org> wrote:
> > Welcome Lewis!
> >
> > The use case you outline makes a lot of sense for Arrow to help out
> > with. We don't yet have an AVRO <> Arrow converter written but it is
> > something that would be great to have. We'd all be happy to help if
> you're
> > interested in taking this on. The new improvements to the Arrow Java APIs
> > (just merged and will be available in 0.8.0) have made this substantially
> > nicer/easier.
> >
> > On Thu, Nov 16, 2017 at 10:36 AM, Lewis John McGibbney <
> lewi...@apache.org>
> > wrote:
> >
> > > Hi Folks,
> > >
> > > We've been working on GORA (Generic Object Representation using Avro)
> for
> > > some years now. https://gora.apache.org
> > >
> > > The framework provides an in-memory data model and persistence for big
> > > data. Gora supports persisting to column stores, key value stores,
> document
> > > stores, distributed in-memory key/value stores, in-memory data grids,
> > > in-memory caches, distributed multi-model stores, and hybrid in-memory
> > > architectures.
> > > I am intersted in seeing how arrow can be used within GORA and would
> > > appreciate some input from the community here.
> > >
> > > In GORA we maintain the concept of modeling object structure through
> use
> > > of a JSON (Avro) schema similar to what we see within the Arrow schema
> > > design as documented at http://arrow.apache.org/docs/
> metadata.html#schemas.
> > > This is encouraging and although I am still learning about Arrow is
> seems
> > > that it would be a very nice fit for GORA to consider leveraging as a
> > > common data format/layer.
> > >
> > > A few questions I have as follows
> > > 1) Is Avro supported as an Arrow system?
> > > 2) If so, is there any mechanism(s) to transition/copy data which has
> been
> > > written using Gora (with Avro as the underlying data format) to Arrow
> > > memory format? I see there is mention of "Projects can share
> functionality
> > > (eg, Parquet-to-Arrow reader)", is such functionality available for
> Avro?
> > >
> > > Let's start with the above, I can take it from there.
> > > Thanks in advance for any heads up folks,
> > > Lewis
> > >
> >
>

Reply via email to