For a RESTCatalog server offering server-side scan planning, it does not 
prohibit a client (doing read and write) from using that API. I wonder why not 
server-side scan plan returns the same result as client-side scan planning 
given the same input parameters? In case server-side scan planning offers 
better performance than client-side scan planning, clients can take advantage 
of that. Your thoughts?

Thanks,
Limin

From: Ryan Blue <rdb...@gmail.com>
Reply-To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>
Date: Tuesday, August 12, 2025 at 6:45 PM
To: "dev@iceberg.apache.org" <dev@iceberg.apache.org>
Subject: Re: [QUESTION] Rest catalog TableScan API response's data_file has no 
'metadataLocation'

The spec doesn't keep track of the manifest or position in the manifest where a 
data file is stored because that information is determined by writing a data 
file into a manifest. It's useful to keep track of that information for writes,
ZjQcmQRYFpfptBannerStart
This Message Is From an Untrusted Sender

You have not previously corresponded with this sender.



ZjQcmQRYFpfptBannerEnd
The spec doesn't keep track of the manifest or position in the manifest where a 
data file is stored because that information is determined by writing a data 
file into a manifest. It's useful to keep track of that information for writes, 
but it isn't necessarily correct by the time the write commits so it is a hint 
only.

For server-side scan planning, I'm not sure that it makes sense to add this to 
the REST protocol. It seems odd to me that a client that can read and write 
metadata would use server-side planning. I typically think of the use case for 
server-side scan planning as primarily supporting cases where the client is 
either incapable of planning itself (like a simple client in a new language) or 
not allowed to read metadata files for security reasons. It is possible that 
server-side planning could be used in other cases, but I wouldn't expect it to 
be used by writers. And if it were, I think it's fine that the hints are not 
present.

Ryan

On Tue, Aug 12, 2025 at 11:46 AM Ma, Limin <l...@akamai.com.invalid> wrote:
Hi All,

/v1/{prefix}/namespaces/{namespace}/tables/{table}/plan
Response:
{

"file-scan-tasks": [

    {

      "data-file": {

        "file-path": "string",

        ...

       }

    }]

}



The spec does not indicate “data_file” has 'metadataLocation' property. But 
with Iceberg Vanilla local TableScan, DataFile (extends ContentFile) objects 
have ‘metadataLocation’ populated, which is used by MergingSnapshotProducer’s 
filterManager to efficiently identify relevant manifest files for filtering.

Any reason why Rest spec does not support that or will it be considered for 
support in future versions?

Thanks,
Limin

Reply via email to