> It would fail if the FileScanTask is some other implementation (like
StaticDataTask).
Actually we faced exactly the same issue, and we have an internal patch to
fix the parser for that. +1 for the proposal.

For the type names, can we come up with a different name from "
base-file-task"? "base" is very Java abstract class specific. In fact,
the StaticDataTask
is not really scanning a file anyway, maybe we should just call these like
file-scan-task, data-task, etc.?

Best,
Jack Ye


On Wed, Feb 14, 2024 at 4:01 PM Ryan Blue <b...@tabular.io> wrote:

> Thanks, Steven! Looks like the right direction to add other task types
> with their own serialization.
>
> I hadn't realized that these were in the table spec and not just the REST
> spec. What do you think about keeping JSON serialization that isn't part of
> table metadata in the REST spec? I'm actually pretty happy with OpenAPI for
> defining our JSON structures, so I think this would be easier in the REST
> spec. I would also consider an OpenAPI extension to the table spec for JSON
> objects since it is pretty easy to work with and does a good job defining
> the metadata.
>
> Ryan
>
> On Wed, Feb 14, 2024 at 3:48 PM Steven Wu <stevenz...@gmail.com> wrote:
>
>> The first linked reference is the PR for spec update.
>>
>> [3] https://github.com/apache/iceberg/pull/9728
>>
>> On Wed, Feb 14, 2024 at 3:36 PM Steven Wu <stevenz...@gmail.com> wrote:
>>
>>> We just ran out of time and didn't get a chance to discuss this in the
>>> community sync meeting today. Hence, I am raising the discussion here.
>>>
>>> We added JSON parsers for content file and file scan task a year ago
>>> [1]. Recently, I just realized the implementation only handles
>>> BaseFileScanTask. It would fail if the FileScanTask is some other
>>> implementation (like StaticDataTask).
>>>
>>> Eduard, Anton, and I have been discussing a solution in issue-9597 [2].
>>> We reached a consensus that we need to define a new `task-type` enum field
>>> to indicate the implementation class/type [3]. For backward compatibility,
>>> the lack of this new `task-type` field should be interpreted as
>>> `base-file-task`.
>>>
>>> Since this is a spec change, Anton suggested more visibility. Hence I am
>>> starting this discussion thread.
>>>
>>> [1] https://github.com/apache/iceberg/pull/6934
>>> [2] https://github.com/apache/iceberg/issues/9597
>>> [3] https://github.com/apache/iceberg/pull/9728
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to