Re: Planned Support for ORC Dataset?

Wes McKinney Thu, 12 Dec 2019 14:49:37 -0800

FWIW, the incremental effort of adding new data formats to the C++
Datasets API should be relatively low. I think we even should document
in broad terms how users can define their own data sources or file
formats


On Wed, Dec 11, 2019 at 4:19 PM Neal Richardson
<[email protected]> wrote:
>
> Hi William,
> ORC is part of the C++ Datasets grand vision: see
> https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit#heading=h.22aikbvt54fv.
> That said, I don't think anyone in the Arrow community is currently
> prioritizing work on ORC, and we'd welcome contributions in that area.
>
> For a view of what open issues we have for ORC (at least for C++), see
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20in%20(%22C%2B%2B%22%2C%20%22C%2B%2B%20-%20Dataset%22)%20AND%20text%20~%20ORC%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC,
> though that's surely not an exhaustive list of ORC-related features one
> could want.
>
> Neal
>
> On Wed, Dec 11, 2019 at 12:49 PM William Callaghan <[email protected]>
> wrote:
>
> > Hi there,
> >
> > Not sure if this is the appropriate place, but I had done some searching
> > and could not find anything with regards to supporting ORC datasets. I see
> > that Parquet datasets are support (where a dataset could contain multiple
> > Parquet files), but I do not see this for ORC (only the ability to read a
> > single ORC file and not multiple, or nested ORCs -- ie. a directory with
> > sub directories (indices) with corresponding orc files underneath).
> >
> > I'm wondering, does Arrow currently have support for nested ORC structures?
> > If not, is this planned?
> >
> > Thank you.
> > Regards,
> > William
> >

Re: Planned Support for ORC Dataset?

Reply via email to