Thanks. This work has been pushed off a bit because I need to get existing PRs into better shape. Hopefully 1.9 is released when I pick it up (otherwise I would lean towards forking for the time being as well).
On Tue, Mar 12, 2019 at 3:01 AM Uwe L. Korn <uw...@xhochy.com> wrote: > Hello Micah, > > > Uwe, I'm not sure I understand what type of support/help you are thinking > > of. Could you elaborate a little bit more before I reach out? > > I would help them with the same build system improvement we have done in > the recent time (and are currently) doing in Arrow for C++. Nothing I > would explicitly advertise on their ML, only as a heads up if there are > Avro people on the Arrow ML. I cannot give any commitments on this but this > would be probably some high gain for not so much work if that is currently > hindering releases or adoption. > > Uwe > > > > > -Micah > > > > On Tue, Mar 5, 2019 at 4:53 PM Wes McKinney <wesmck...@gmail.com> wrote: > > > > > I am OK with that, but if we find ourselves making compromises that > > > affect performance or memory efficiency (where possibly invasive > > > refactoring may be required) perhaps we should reconsider option #3. > > > > > > On Tue, Mar 5, 2019 at 11:29 AM Uwe L. Korn <uw...@xhochy.com> wrote: > > > > > > > > I'm leaning a bit towards 1) but I would love to get some input from > the > > > Avro community as 1) depends also on their side as we will submit some > > > patches upstream that need to be reviewed and someday also released. > > > > > > > > Are AVRO committers subscribed here or should we reach out to them on > > > their ML? Given that we are quite active in the C++ space currently, I > feel > > > that we can contribute quite some infrastructure in building and > packaging > > > that we do eitherway for Arrow. This might be quite helpful for a > project. > > > We have seen with Parquet where much of the development is just > happening > > > as it is part of Arrow. (Not suggesting to merge/fork the Avro > codebase but > > > just to apply some of the best practices we learned while building > Arrow). > > > > > > > > Uwe > > > > > > > > On Tue, Mar 5, 2019, at 4:57 PM, Wes McKinney wrote: > > > > > I'd be +0.5 in favor of forking in this particular case. Since > Avro is > > > > > not vectorized (unlike Parquet and ORC) I suspect it may be more > > > > > difficult to get the best performance using a general purpose API > > > > > versus one that is more specialized to producing Arrow record > batches. > > > > > Given that has been relatively light C++ development activity in > > > > > Apache Avro and no releases for 2 years it does give me pause. > > > > > > > > > > We might want to look at Impala's Avro scanner, they are doing some > > > > > LLVM IR cross-compilation also (they're using the Avro C++ library > > > > > though) > > > > > > > > > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner-ir.cc > > > > > > > > > https://github.com/apache/impala/blob/master/be/src/exec/hdfs-avro-scanner.cc > > > > > > > > > > On Tue, Mar 5, 2019 at 1:01 AM Micah Kornfield < > emkornfi...@gmail.com> > > > wrote: > > > > > > > > > > > > I'm looking at incorporating Avro in Arrow C++ [1]. It seems > that > > > the Avro > > > > > > C++ library APIs have improved from the last release. However, > it > > > is not > > > > > > clear when a new release will be available (I asked on the JIRA > > > Item for > > > > > > the next release [2] and received no response). > > > > > > > > > > > > I was wondering if there is a policy governing using other Apache > > > projects > > > > > > or how people felt about the following options: > > > > > > 1. Depend on a specific git commit through the third-party > library > > > system. > > > > > > 2. Copy the necessary source code temporarily to our project, > and > > > change > > > > > > to using the next release when it is available. > > > > > > 3. Fork the code we need (the main benefit I see here is being > able > > > to > > > > > > refactor it to avoid having to deal with exceptions, easier > > > integration > > > > > > with our IO system and one less 3rd party dependency to deal > with). > > > > > > 4. Wait on the 1.9 release before proceeding. > > > > > > > > > > > > Thanks, > > > > > > Micah > > > > > > > > > > > > [1] https://issues.apache.org/jira/browse/ARROW-1209 > > > > > > [2] https://issues.apache.org/jira/browse/AVRO-2250 > > > > > > > > > > >