Re: support of RCFile

Russell Spitzer Wed, 29 Sep 2021 08:49:23 -0700

Within Iceberg it would take a bit of effort, we would need custom readers
at the minimum if we just wanted to make it ReadOnly support. I think the
main complexity would be designing the specific readers for the platform
you want to use like "Spark" or "Flink", the actual metadata handling and
such would probably be pretty straightforward. I would definitely size it
as at least a several week project and I'm not sure we would want to
support it in OSS Iceberg.


On Wed, Sep 29, 2021 at 10:40 AM 袁尤军 <wdyuanyou...@163.com> wrote:

> thanks for the suggestion. we need to evaluate the cost to convert the
> format, as those hive tables  have been there for many years, so PB data
> need to reformat.
>
> also, do you think it is possible to develop the support for a new format?
> how costly is it?
>
> 发自我的iPhone
>
> > 在 2021年9月29日，下午9:34，Russell Spitzer <russell.spit...@gmail.com> 写道：
> >
> > There is no plan I am aware of using RCFiles directly in Iceberg. While
> we could work to support other file formats, I don't think it is very
> widely used compared to ORC and Parquet (Iceberg has native support for
> these formats).
> >
> > My suggestion for conversion would be to do a CTAS statement in Spark
> and have the table completely converted over to Parquet (or ORC). This is
> probably the simplest way.
> >
> >> On Sep 29, 2021, at 7:01 AM, yuan youjun <yuanyou...@gmail.com> wrote:
> >>
> >> Hi community,
> >>
> >> I am exploring ways to evolute existing hive tables (RCFile)  into data
> lake. However I found out that iceberg (or Hudi, delta lake) does not
> support RCFile. So my questions are:
> >> 1, is there any plan (or is it possible) to support RCFile in the
> future? So we can manage those existing data file without re-formating.
> >> 2, If no such plan, do you have any suggestion to migrate RCFiles into
> iceberg?
> >>
> >> Thanks
> >> Youjun
>
>
>

Re: support of RCFile

Reply via email to