For the case in question, ORC can extend its shim module to do exactly
this. Let me take a look at the async ORC patch to see what needs to
happen. BTW, Steve I'll be at Buzzwords for your talk.

It may make sense to have such a library, but ORC already has that
mechanism.

.. Owen

On Thu, Jun 1, 2023 at 2:09 PM Ayush Saxena <ayush...@gmail.com> wrote:

> That sounds good
>
> On Thu, 1 Jun 2023 at 18:09, Steve Loughran <ste...@cloudera.com.invalid>
> wrote:
>
> > hadoop-api-shim ?
> >
> > On Thu, 1 Jun 2023 at 04:07, Ayush Saxena <ayush...@gmail.com> wrote:
> >
> > > +1, for the new repo.
> > >
> > > The name sounds fine, but good if we have scope of having “hadoop-”
> > > prefix, we have that for almost all of the subprojects/modules
> > >
> > > Can hadoop-shims or hadoop-shims-api or something on similar lines
> work?
> > >
> > > -Ayush
> > >
> > > > On 01-Jun-2023, at 1:18 AM, Steve Loughran
> <ste...@cloudera.com.invalid
> > >
> > > wrote:
> > > >
> > > > I want to create a new repository to put a shim library to allow
> > > previous
> > >
> > > > releases to access the more recent hadoop filesystem APIs -currently
> > the
> > > > open source implementations of parquet, ORC can't use vectored io, in
> > > > particular, even though we can in Cloudera. Providing a shim opens
> them
> > > up
> > > > to all *and* gets the APIs more broadly stressed/tested.
> > > > This needs to be in its own repository, not just for rapid initial
> > > release,
> > > > but because it is designed to be built as old a version of hadoop we
> > can
> > > > reasonably support, which IMO means hadoop 3.1.0+. I know parquet
> still
> > > > wants to build against 2.8.x, but to claim support for hadoop 2 means
> > > > "build and test on java7", which is unrealistic in 2023.
> > > >
> > > > Initial WiP implementation, which works with 3.1.0 and tests against
> > > others
> > > > https://github.com/steveloughran/fs-api-shim
> > > > the complexity is about testing this -I have contract tests which
> then
> > > need
> > > > to be executed on every supported hadoop release, which will need a
> > > > separate module for each one.
> > > >
> > > > I can create the repo easily enough, just would like approval. And is
> > the
> > > > name OK?
> > > >
> > > > steve
> > >
> >
>

Reply via email to