Ideal, I'll let you know how we get on. Thank you
On 2017-11-16 14:45, Jacques Nadeau <jacq...@apache.org> wrote: > For java, you can start by looking at this entry point: > > https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/parquet/columnreaders/DeprecatedParquetVectorizedReader.java > > Something that might actually be easier as an initial understanding > (simpler) is looking at the half complete Avro reader we have (we don't > actually currently support this functionality but it should give you some > ideas). > > https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/store/avro/AvroRecordReader.java > > Or Json: > > https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/exec/vector/complex/fn/JsonReader.java > > I suggest working directly with memory or vectors for highest performance > (as opposed to the ComplexWriter facade). Typically what I suggest is start > by getting flat types working with vectors directly. From there, look to > enhance support to also include complex types using the ComplexWriter, > optimizing for specific patterns as makes sense. > > For a sense of how one might interact directly with memory, you could look > at something like this: > > https://github.com/dremio/dremio-oss/blob/master/sabot/kernel/src/main/java/com/dremio/sabot/op/copier/FieldBufferCopier.java > > > > On Thu, Nov 16, 2017 at 2:32 PM, Lewis John McGibbney <lewi...@apache.org> > wrote: > > > Hi Jacques, > > Can you point me to where I get started e.g. with the converter? > > Where does the Parquet --> Arrow one current exist? > > Thank you > > > > On 2017-11-16 10:42, Jacques Nadeau <jacq...@apache.org> wrote: > > > Welcome Lewis! > > > > > > The use case you outline makes a lot of sense for Arrow to help out > > > with. We don't yet have an AVRO <> Arrow converter written but it is > > > something that would be great to have. We'd all be happy to help if > > you're > > > interested in taking this on. The new improvements to the Arrow Java APIs > > > (just merged and will be available in 0.8.0) have made this substantially > > > nicer/easier. > > > > > > On Thu, Nov 16, 2017 at 10:36 AM, Lewis John McGibbney < > > lewi...@apache.org> > > > wrote: > > > > > > > Hi Folks, > > > > > > > > We've been working on GORA (Generic Object Representation using Avro) > > for > > > > some years now. https://gora.apache.org > > > > > > > > The framework provides an in-memory data model and persistence for big > > > > data. Gora supports persisting to column stores, key value stores, > > document > > > > stores, distributed in-memory key/value stores, in-memory data grids, > > > > in-memory caches, distributed multi-model stores, and hybrid in-memory > > > > architectures. > > > > I am intersted in seeing how arrow can be used within GORA and would > > > > appreciate some input from the community here. > > > > > > > > In GORA we maintain the concept of modeling object structure through > > use > > > > of a JSON (Avro) schema similar to what we see within the Arrow schema > > > > design as documented at http://arrow.apache.org/docs/ > > metadata.html#schemas. > > > > This is encouraging and although I am still learning about Arrow is > > seems > > > > that it would be a very nice fit for GORA to consider leveraging as a > > > > common data format/layer. > > > > > > > > A few questions I have as follows > > > > 1) Is Avro supported as an Arrow system? > > > > 2) If so, is there any mechanism(s) to transition/copy data which has > > been > > > > written using Gora (with Avro as the underlying data format) to Arrow > > > > memory format? I see there is mention of "Projects can share > > functionality > > > > (eg, Parquet-to-Arrow reader)", is such functionality available for > > Avro? > > > > > > > > Let's start with the above, I can take it from there. > > > > Thanks in advance for any heads up folks, > > > > Lewis > > > > > > > > > >