Ideal, I'll let you know how we get on.
Thank you

On 2017-11-16 14:45, Jacques Nadeau <jacq...@apache.org> wrote: 
> For java, you can start by looking at this entry point:
> 
> https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet/columnreaders/DeprecatedParquetVectorizedReader.java
> 
> Something that might actually be easier as an initial understanding
> (simpler) is looking at the half complete Avro reader we have (we don't
> actually currently support this functionality but it should give you some
> ideas).
> 
> https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/avro/AvroRecordReader.java
> 
> Or Json:
> 
> https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonReader.java
> 
> I suggest working directly with memory or vectors for highest performance
> (as opposed to the ComplexWriter facade). Typically what I suggest is start
> by getting flat types working with vectors directly. From there, look to
> enhance support to also include complex types using the ComplexWriter,
> optimizing for specific patterns as makes sense.
> 
> For a sense of how one might interact directly with memory, you could look
> at something like this:
> 
> https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/copier/FieldBufferCopier.java
> 
> 
> 
> On Thu, Nov 16, 2017 at 2:32 PM, Lewis John McGibbney <lewi...@apache.org>
> wrote:
> 
> > Hi Jacques,
> > Can you point me to where I get started e.g. with the converter?
> > Where does the Parquet --> Arrow one current exist?
> > Thank you
> >
> > On 2017-11-16 10:42, Jacques Nadeau <jacq...@apache.org> wrote:
> > > Welcome Lewis!
> > >
> > > The use case you outline makes a lot of sense for Arrow to help out
> > > with. We don't yet have an AVRO <> Arrow converter written but it is
> > > something that would be great to have. We'd all be happy to help if
> > you're
> > > interested in taking this on. The new improvements to the Arrow Java APIs
> > > (just merged and will be available in 0.8.0) have made this substantially
> > > nicer/easier.
> > >
> > > On Thu, Nov 16, 2017 at 10:36 AM, Lewis John McGibbney <
> > lewi...@apache.org>
> > > wrote:
> > >
> > > > Hi Folks,
> > > >
> > > > We've been working on GORA (Generic Object Representation using Avro)
> > for
> > > > some years now. https://gora.apache.org
> > > >
> > > > The framework provides an in-memory data model and persistence for big
> > > > data. Gora supports persisting to column stores, key value stores,
> > document
> > > > stores, distributed in-memory key/value stores, in-memory data grids,
> > > > in-memory caches, distributed multi-model stores, and hybrid in-memory
> > > > architectures.
> > > > I am intersted in seeing how arrow can be used within GORA and would
> > > > appreciate some input from the community here.
> > > >
> > > > In GORA we maintain the concept of modeling object structure through
> > use
> > > > of a JSON (Avro) schema similar to what we see within the Arrow schema
> > > > design as documented at http://arrow.apache.org/docs/
> > metadata.html#schemas.
> > > > This is encouraging and although I am still learning about Arrow is
> > seems
> > > > that it would be a very nice fit for GORA to consider leveraging as a
> > > > common data format/layer.
> > > >
> > > > A few questions I have as follows
> > > > 1) Is Avro supported as an Arrow system?
> > > > 2) If so, is there any mechanism(s) to transition/copy data which has
> > been
> > > > written using Gora (with Avro as the underlying data format) to Arrow
> > > > memory format? I see there is mention of "Projects can share
> > functionality
> > > > (eg, Parquet-to-Arrow reader)", is such functionality available for
> > Avro?
> > > >
> > > > Let's start with the above, I can take it from there.
> > > > Thanks in advance for any heads up folks,
> > > > Lewis
> > > >
> > >
> >
> 

Reply via email to