Re: What's the time to expose iceberg format v2 to end users ?

Ryan Blue Tue, 22 Dec 2020 17:06:04 -0800

Thanks, Yan!

To summarize that doc a bit, the main blockers are:
* Finish updating the spec for NaN counters and behavior
* Fix the issue with partition transforms and values before 1970 (#1680)
* Partition evolution: Add lastPartitionFieldId to table metadata and
update docs
* Add order id column to manifest files
* Track the schema of each snapshot


Only the last one is a somewhat large task, but even that should be fairly
quick. I think we can take care of those in the first couple months of 2021
after the 0.11.0 release is out.

On Fri, Dec 18, 2020 at 12:59 AM OpenInx <[email protected]> wrote:

> Thanks Yan for the document,  I will take a look at it, and see what I can
> do.
>
> On Fri, Dec 18, 2020 at 3:38 AM Yan Yan <[email protected]> wrote:
>
>> Hi OpenInx,
>>
>> Thanks for bringing this up. I am currently working on Format v2 blocking
>> tasks, and am maintaining a full list of blocking tasks with their
>> description and current status here
>> <https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit?usp=sharing>
>>  after
>> speaking with Ryan a while ago, which covers all open issues listed in the
>> github milestone <https://github.com/apache/iceberg/milestone/7> plus
>> some others brought up by people during community sync. It would be great
>> if you are interested in collaborating/code reviewing!
>>
>> Everyone please feel free to let me know/update the doc if you see any
>> item missing/described inaccurately.
>>
>> Thanks,
>> Yan
>>
>> On Wed, Dec 16, 2020 at 11:03 PM OpenInx <[email protected]> wrote:
>>
>>> Hi
>>>
>>> I wrote this email to align with the community about the time to expose
>>> format v2 to end users.
>>>
>>> In iceberg format v2,  we've accomplished the row-level delete.  It's
>>> designed for two user cases:
>>>
>>> 1.  Execute a single query to update or delete lots of rows.  It's a
>>> typical batch update/delete job,  which is suitable for GDPR  or the case
>>> that we want to correct the wrong data.
>>> 2.  Write the real-time CDC/UPSERT stream to the iceberg table, so that
>>> the upper layer  compute engines could  analyze the change log in minutes.
>>> It's almost ready in the current master branch for flink integration.
>>>
>>>
>>> I'm not quite sure what's the blocker about the iceberg format v2 now.
>>> I'd love to resolve those blockers if there're some.
>>>
>>> Thanks.
>>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: What's the time to expose iceberg format v2 to end users ?

Reply via email to