I question the value of adding the Orc format. The format is fragmented
with the main tool writing it (hive) writing a version of the format (acid
v2) that can't be consumed by systems that only use the Orc libraries
(since they don't support acid). If you want to consume that data, you have
to depend on internal Hive code (which is only written in java).

On Thu, Dec 12, 2019 at 2:49 PM Wes McKinney <wesmck...@gmail.com> wrote:

> FWIW, the incremental effort of adding new data formats to the C++
> Datasets API should be relatively low. I think we even should document
> in broad terms how users can define their own data sources or file
> formats
>
> On Wed, Dec 11, 2019 at 4:19 PM Neal Richardson
> <neal.p.richard...@gmail.com> wrote:
> >
> > Hi William,
> > ORC is part of the C++ Datasets grand vision: see
> >
> https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit#heading=h.22aikbvt54fv
> .
> > That said, I don't think anyone in the Arrow community is currently
> > prioritizing work on ORC, and we'd welcome contributions in that area.
> >
> > For a view of what open issues we have for ORC (at least for C++), see
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20in%20(%22C%2B%2B%22%2C%20%22C%2B%2B%20-%20Dataset%22)%20AND%20text%20~%20ORC%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC
> ,
> > though that's surely not an exhaustive list of ORC-related features one
> > could want.
> >
> > Neal
> >
> > On Wed, Dec 11, 2019 at 12:49 PM William Callaghan <wcal...@gmail.com>
> > wrote:
> >
> > > Hi there,
> > >
> > > Not sure if this is the appropriate place, but I had done some
> searching
> > > and could not find anything with regards to supporting ORC datasets. I
> see
> > > that Parquet datasets are support (where a dataset could contain
> multiple
> > > Parquet files), but I do not see this for ORC (only the ability to
> read a
> > > single ORC file and not multiple, or nested ORCs -- ie. a directory
> with
> > > sub directories (indices) with corresponding orc files underneath).
> > >
> > > I'm wondering, does Arrow currently have support for nested ORC
> structures?
> > > If not, is this planned?
> > >
> > > Thank you.
> > > Regards,
> > > William
> > >
>

Reply via email to