Re: support of RCFile

2021-09-29 Thread yuan youjun
that’s exactly what we need. > 2021年9月30日 上午9:58,Jacques Nadeau 写道: > > I actually wonder if file formats should be an extension api so someone can > implement a file format but it without any changes in Iceberg core (I don't > think this is possible today). Let's say one wanted to create a pr

Re: [DISCUSS] Spark version support strategy

2021-09-29 Thread Steven Wu
Wing, sorry, my earlier message probably misled you. I was speaking my personal opinion on Flink version support. On Tue, Sep 28, 2021 at 8:03 PM Wing Yew Poon wrote: > Hi OpenInx, > I'm sorry I misunderstood the thinking of the Flink community. Thanks for > the clarification. > - Wing Yew > > >

Re: support of RCFile

2021-09-29 Thread Jacques Nadeau
I actually wonder if file formats should be an extension api so someone can implement a file format but it without any changes in Iceberg core (I don't think this is possible today). Let's say one wanted to create a proprietary format but use Iceberg semantics (not me). Could we make it such that o

Re: support of RCFile

2021-09-29 Thread yuan youjun
Hi Ryan and Russell Thanks very much for your response. well, I want ACID and row level update capability that icegerg provides. I believe data lake is a better way to manage our dataset, instead of hive. I also want our transition from hive to data lake is as smooth as possible, which means: 1

Re: support of RCFile

2021-09-29 Thread Ryan Blue
Youjun, what are you trying to do? If you have existing tables in an incompatible format, you may just want to leave them as they are for historical data. It depends on why you want to use Iceberg. If you want to be able to query larger ranges of that data because you've clustered across files by

Re: support of RCFile

2021-09-29 Thread Russell Spitzer
Within Iceberg it would take a bit of effort, we would need custom readers at the minimum if we just wanted to make it ReadOnly support. I think the main complexity would be designing the specific readers for the platform you want to use like "Spark" or "Flink", the actual metadata handling and suc

Re: support of RCFile

2021-09-29 Thread 袁尤军
thanks for the suggestion. we need to evaluate the cost to convert the format, as those hive tables have been there for many years, so PB data need to reformat. also, do you think it is possible to develop the support for a new format? how costly is it? 发自我的iPhone > 在 2021年9月29日,下午9:34,Russe

Re: support of RCFile

2021-09-29 Thread Russell Spitzer
There is no plan I am aware of using RCFiles directly in Iceberg. While we could work to support other file formats, I don't think it is very widely used compared to ORC and Parquet (Iceberg has native support for these formats). My suggestion for conversion would be to do a CTAS statement in Sp

support of RCFile

2021-09-29 Thread yuan youjun
Hi community, I am exploring ways to evolute existing hive tables (RCFile) into data lake. However I found out that iceberg (or Hudi, delta lake) does not support RCFile. So my questions are: 1, is there any plan (or is it possible) to support RCFile in the future? So we can manage those exist