I have updated the PR with Java implementation for StaticDataTask. As I
updated the PR description, we can add support for other task types
incrementally to keep the PRs relatively small.

[3] https://github.com/apache/iceberg/pull/9728

On Wed, Feb 14, 2024 at 4:24 PM Jack Ye <yezhao...@gmail.com> wrote:

> Nice, just saw that.
>
> We are adding the definitions as a part of ttps://
> github.com/apache/iceberg/pull/9695, we can help review the PRs listed
> here and then update the OpenAPI spec accordingly.
>
> -Jack
>
> On Wed, Feb 14, 2024 at 4:12 PM Steven Wu <stevenz...@gmail.com> wrote:
>
>> @Ryan, the JSON serialization is also used by Flink for checkpoint state.
>> so it is not purely a REST API thing.
>>
>> @Jack, Ryan also had the same suggestion in the PR comment. I have
>> updated the naming
>>
>> On Wed, Feb 14, 2024 at 4:08 PM Jack Ye <yezhao...@gmail.com> wrote:
>>
>>> > It would fail if the FileScanTask is some other implementation (like
>>> StaticDataTask).
>>> Actually we faced exactly the same issue, and we have an internal patch
>>> to fix the parser for that. +1 for the proposal.
>>>
>>> For the type names, can we come up with a different name from "
>>> base-file-task"? "base" is very Java abstract class specific. In fact,
>>> the StaticDataTask is not really scanning a file anyway, maybe we
>>> should just call these like file-scan-task, data-task, etc.?
>>>
>>> Best,
>>> Jack Ye
>>>
>>>
>>> On Wed, Feb 14, 2024 at 4:01 PM Ryan Blue <b...@tabular.io> wrote:
>>>
>>>> Thanks, Steven! Looks like the right direction to add other task types
>>>> with their own serialization.
>>>>
>>>> I hadn't realized that these were in the table spec and not just the
>>>> REST spec. What do you think about keeping JSON serialization that isn't
>>>> part of table metadata in the REST spec? I'm actually pretty happy with
>>>> OpenAPI for defining our JSON structures, so I think this would be easier
>>>> in the REST spec. I would also consider an OpenAPI extension to the table
>>>> spec for JSON objects since it is pretty easy to work with and does a good
>>>> job defining the metadata.
>>>>
>>>> Ryan
>>>>
>>>> On Wed, Feb 14, 2024 at 3:48 PM Steven Wu <stevenz...@gmail.com> wrote:
>>>>
>>>>> The first linked reference is the PR for spec update.
>>>>>
>>>>> [3] https://github.com/apache/iceberg/pull/9728
>>>>>
>>>>> On Wed, Feb 14, 2024 at 3:36 PM Steven Wu <stevenz...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> We just ran out of time and didn't get a chance to discuss this in
>>>>>> the community sync meeting today. Hence, I am raising the discussion 
>>>>>> here.
>>>>>>
>>>>>> We added JSON parsers for content file and file scan task a year ago
>>>>>> [1]. Recently, I just realized the implementation only handles
>>>>>> BaseFileScanTask. It would fail if the FileScanTask is some other
>>>>>> implementation (like StaticDataTask).
>>>>>>
>>>>>> Eduard, Anton, and I have been discussing a solution in issue-9597
>>>>>> [2]. We reached a consensus that we need to define a new `task-type` enum
>>>>>> field to indicate the implementation class/type [3]. For backward
>>>>>> compatibility, the lack of this new `task-type` field should be
>>>>>> interpreted as `base-file-task`.
>>>>>>
>>>>>> Since this is a spec change, Anton suggested more visibility. Hence I
>>>>>> am starting this discussion thread.
>>>>>>
>>>>>> [1] https://github.com/apache/iceberg/pull/6934
>>>>>> [2] https://github.com/apache/iceberg/issues/9597
>>>>>> [3] https://github.com/apache/iceberg/pull/9728
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

Reply via email to