I have updated the PR with Java implementation for StaticDataTask. As I updated the PR description, we can add support for other task types incrementally to keep the PRs relatively small.
[3] https://github.com/apache/iceberg/pull/9728 On Wed, Feb 14, 2024 at 4:24 PM Jack Ye <yezhao...@gmail.com> wrote: > Nice, just saw that. > > We are adding the definitions as a part of ttps:// > github.com/apache/iceberg/pull/9695, we can help review the PRs listed > here and then update the OpenAPI spec accordingly. > > -Jack > > On Wed, Feb 14, 2024 at 4:12 PM Steven Wu <stevenz...@gmail.com> wrote: > >> @Ryan, the JSON serialization is also used by Flink for checkpoint state. >> so it is not purely a REST API thing. >> >> @Jack, Ryan also had the same suggestion in the PR comment. I have >> updated the naming >> >> On Wed, Feb 14, 2024 at 4:08 PM Jack Ye <yezhao...@gmail.com> wrote: >> >>> > It would fail if the FileScanTask is some other implementation (like >>> StaticDataTask). >>> Actually we faced exactly the same issue, and we have an internal patch >>> to fix the parser for that. +1 for the proposal. >>> >>> For the type names, can we come up with a different name from " >>> base-file-task"? "base" is very Java abstract class specific. In fact, >>> the StaticDataTask is not really scanning a file anyway, maybe we >>> should just call these like file-scan-task, data-task, etc.? >>> >>> Best, >>> Jack Ye >>> >>> >>> On Wed, Feb 14, 2024 at 4:01 PM Ryan Blue <b...@tabular.io> wrote: >>> >>>> Thanks, Steven! Looks like the right direction to add other task types >>>> with their own serialization. >>>> >>>> I hadn't realized that these were in the table spec and not just the >>>> REST spec. What do you think about keeping JSON serialization that isn't >>>> part of table metadata in the REST spec? I'm actually pretty happy with >>>> OpenAPI for defining our JSON structures, so I think this would be easier >>>> in the REST spec. I would also consider an OpenAPI extension to the table >>>> spec for JSON objects since it is pretty easy to work with and does a good >>>> job defining the metadata. >>>> >>>> Ryan >>>> >>>> On Wed, Feb 14, 2024 at 3:48 PM Steven Wu <stevenz...@gmail.com> wrote: >>>> >>>>> The first linked reference is the PR for spec update. >>>>> >>>>> [3] https://github.com/apache/iceberg/pull/9728 >>>>> >>>>> On Wed, Feb 14, 2024 at 3:36 PM Steven Wu <stevenz...@gmail.com> >>>>> wrote: >>>>> >>>>>> We just ran out of time and didn't get a chance to discuss this in >>>>>> the community sync meeting today. Hence, I am raising the discussion >>>>>> here. >>>>>> >>>>>> We added JSON parsers for content file and file scan task a year ago >>>>>> [1]. Recently, I just realized the implementation only handles >>>>>> BaseFileScanTask. It would fail if the FileScanTask is some other >>>>>> implementation (like StaticDataTask). >>>>>> >>>>>> Eduard, Anton, and I have been discussing a solution in issue-9597 >>>>>> [2]. We reached a consensus that we need to define a new `task-type` enum >>>>>> field to indicate the implementation class/type [3]. For backward >>>>>> compatibility, the lack of this new `task-type` field should be >>>>>> interpreted as `base-file-task`. >>>>>> >>>>>> Since this is a spec change, Anton suggested more visibility. Hence I >>>>>> am starting this discussion thread. >>>>>> >>>>>> [1] https://github.com/apache/iceberg/pull/6934 >>>>>> [2] https://github.com/apache/iceberg/issues/9597 >>>>>> [3] https://github.com/apache/iceberg/pull/9728 >>>>>> >>>>> >>>> >>>> -- >>>> Ryan Blue >>>> Tabular >>>> >>>