@Wes, Uwe: Thank you! @Brian: no procedure required :) Thanks for your feedback. We're happy to hear more about SAS integration. Feel free to send a blurb to the list.
On Tue, Sep 6, 2016 at 9:51 PM, Uwe Korn <uw...@xhochy.com> wrote: > Hello, > > I'm also in favour of switching the dependency direction between Parquet > and Arrow as this would avoid a lot of duplicate code in both projects as > well as parquet-cpp profiting from functionality that is available in Arrow. > > @wesm: go ahead with the JIRAs and I'll add comments or will pick some of > them up. > > Cheers > > Uwe > > > > On 07.09.16 04:41, Wes McKinney wrote: > >> hi Julien, >> >> It makes sense to move the Parquet support for Arrow into Parquet >> itself and invert the dependency. I had thought that the coupling to >> Arrow C++'s IO subsystem might be tighter, but the connection between >> memory allocators and file abstractions is fairly simple: >> >> https://github.com/apache/arrow/blob/master/cpp/src/arrow/parquet/io.h >> >> I'll open appropriate JIRAs and Uwe and I can coordinate on the >> refactoring. >> >> The exposure of the Parquet functionality in Python should stay inside >> Arrow for now, but mainly because it would make developing the Python >> side of things much more difficult if we split things up right now. >> >> - Wes >> >> On Tue, Sep 6, 2016 at 8:27 PM, Brian Bowman <brian.bow...@sas.com> >> wrote: >> >>> Forgive me if interposing my first post for the Apache Arrow project on >>> this thread is incorrect procedure. >>> >>> What Julien proposes with each storage layer producing Arrow Record >>> Batches is exactly how I envision it working and would certainly make Arrow >>> integration with SAS much more palatable. This is likely true for other >>> storage layer providers as well. >>> >>> Brian Bowman (SAS) >>> >>> On Sep 6, 2016, at 7:52 PM, Julien Le Dem <jul...@dremio.com> wrote: >>>> >>>> Thanks Wes, >>>> No worries, I know you are on top of those things. >>>> On a side note, I was wondering if the arrow-parquet integration should >>>> be >>>> in Parquet instead. >>>> Parquet would depend on Arrow and not the other way around. >>>> Arrow provides the API and each storage layer (Parquet, Kudu, Cassandra, >>>> ...) provides a way to produce Arrow Record Batches. >>>> thoughts? >>>> >>>> On Tue, Sep 6, 2016 at 3:37 PM, Wes McKinney <wesmck...@gmail.com> >>>>> wrote: >>>>> >>>>> hi Julien, >>>>> >>>>> I'm very sorry about the inconvenience with this and the delay in >>>>> getting it sorted out. I will triage this evening by disabling the >>>>> Parquet tests in Arrow until we get the current problems under >>>>> control. When we re-enable the Parquet tests in Travis CI I agree we >>>>> should pin the version SHA. >>>>> >>>>> - Wes >>>>> >>>>> On Tue, Sep 6, 2016 at 5:30 PM, Julien Le Dem <jul...@dremio.com> >>>>>> wrote: >>>>>> The Arrow cpp travis-ci build is broken right now because it depends >>>>>> on >>>>>> parquet-cpp which has changed in an incompatible way. [1] [2] (or so >>>>>> it >>>>>> looks to me) >>>>>> Since parquet-cpp is not released yet it is totally fine to make >>>>>> incompatible API changes. >>>>>> However, we may want to pin the Arrow to Parquet dependency (on a git >>>>>> >>>>> sha?) >>>>> >>>>>> to prevent cross project changes from breaking the master build. >>>>>> Since I'm not one of the core cpp dev on those projects I mainly want >>>>>> to >>>>>> start that conversation rather than prescribe a solution. Feel free to >>>>>> >>>>> take >>>>> >>>>>> this as a straw man and suggest something else. >>>>>> >>>>>> [1] https://travis-ci.org/apache/arrow/jobs/156080555 >>>>>> [2] >>>>>> https://github.com/apache/arrow/blob/2d8ec789365f3c0f82b1f22d76160d >>>>>> >>>>> 5af150dd31/ci/travis_before_script_cpp.sh >>>>> >>>>>> >>>>>> -- >>>>>> Julien >>>>>> >>>>> >>>> >>>> -- >>>> Julien >>>> >>> > -- Julien