Re: [DISCUSS] REST: Scan Planning mode

Russell Spitzer Wed, 28 Jan 2026 07:55:06 -0800

>
> Prior to the introduction of CATALOG_ONLY tables, reading a table
> implicitly required that the full table metadata be accessible to readers.
> This made it possible to migrate a table between catalog implementations by
> simply pointing */v1/{prefix}/namespaces/{namespace}/register* to the
> existing metadata.json, assuming the appropriate user privileges were in
> place.



This actually hasn’t been the case for quite a while across several vendors
(though not the one I work at — we still expose full metadata). There’s
nothing preventing, and in fact several vendors are already, shipping
Iceberg metadata that does not strictly represent the table. Properties,
snapshots, or even the table itself can redirect to another representation
of the same table, leaving no way to recover a true “ground truth” view via
the REST API. I’m also aware of folks shipping different versions of the
metadata or exposing what is essentially a read-only metadata.json layered
on top of a table in another format. So I think the ship has largely sailed
on relying on metadata as a guaranteed canonical view.

I do think it’s still important to preserve *portability*, or at least to
make it clear to end users whether or not their tables will be portable.
With that in mind, I was wondering if we should introduce an explicit
catalog export command that is essentially the inverse of register. Unlike
loadTable, it would be required to produce the path of a metadata.json that
represents the entire Iceberg table without modification.

That would give catalogs a clear way to signal whether they support
“unregistering” a table in a way that lets it be used in another system. We
could also scope permissions for this functionality so that only specific
users are allowed to perform an export.



On Wed, Jan 28, 2026 at 5:42 AM Péter Váry <[email protected]>
wrote:

> > I am not sure about the concern for lock-in. Users are free to adopt any
> catalog that is spec compliant. Catalog-only tables are not the choices of
> the catalog vendor/provider, it is the choice of the table owner by users
> for access control.
>
> Prior to the introduction of CATALOG_ONLY tables, reading a table
> implicitly required that the full table metadata be accessible to readers.
> This made it possible to migrate a table between catalog implementations by
> simply pointing */v1/{prefix}/namespaces/{namespace}/register* to the
> existing metadata.json, assuming the appropriate user privileges were in
> place.
>
> With CATALOG_ONLY tables, this implicit requirement is removed, and no
> alternative requirement is introduced. As a result, migrating the complete
> history of a table may become impossible without performing a manual
> traversal of the plan(s) and metadata.
>
> What I am suggesting is that the ability to re‑register an Iceberg table
> with a different catalog should be an explicit requirement for a
> spec‑compliant catalog.
>
> > Also this proposal doesn't say that the write path shouldn't produce the
> metadata.json file, which is still required today to be spec compliant.
>
> The Iceberg table specification describes metadata.json and manifest
> files, but after this change a catalog could be fully compliant with the
> Iceberg REST Catalog specification while still not exposing these files in
> a way that is accessible to users. This would effectively prevent use cases
> such as migrating tables between catalogs.
>
>
> Steven Wu <[email protected]> ezt írta (időpont: 2026. jan. 26., H,
> 20:33):
>
>> catching up on this thread.
>>
>> I am not sure about the concern for lock-in. Users are free to adopt any
>> catalog that is spec compliant. Catalog-only tables are not the choices of
>> the catalog vendor/provider, it is the choice of the table owner by users
>> for access control.
>>
>> Also this proposal doesn't say that the write path shouldn't produce the
>> metadata.json file, which is still required today to be spec compliant. It
>> is just that clients may not need to load the metadata.json (and manifest
>> list, manifest files) directly for client-side scan planning.
>>
>> I also like Dan's suggestion of not including client preference/config in
>> the spec.
>>
>> > I want to highlight that introducing "CATALOG_ONLY" planners implicitly
>> creates a new requirement for all compliant engines. Without support for
>> this, engines would be unable to read these new tables. This seems like a
>> significant change that we should call out explicitly.
>>
>> Agree with Peter that this is a significant new requirement for engines.
>> Iceberg libraries (Java or other languages) can probably hide it internally
>> in the scan planning implementation. Some engines may not use Iceberg
>> libraries. This would be a new requirement.
>>
>>
>>
>> On Tue, Jan 20, 2026 at 4:55 PM Prashant Singh <[email protected]>
>> wrote:
>>
>>> Thank you Peter, I will go ahead and find a slot that works for most of
>>> the folks interested in the discussion and put it in dev calendar ~
>>>
>>> Regarding Agenda : I would request to keep the discussion contained in
>>> context of what does this mean to have a mode of planning like catalog_only
>>> its use cases
>>> and side effects, for example READ only tables is something that can be
>>> done as of today, infacts folks use this in production, for example: tools
>>> such as Apache Xtable (incubating) or Uniform where one generates iceberg
>>> metadata on top of
>>> existing data files, having CATALOG_ONLY doesn't change much except the
>>> fact that now that fake metadata doesn't need to be written, but it was
>>> fake in the first place as an iceberg client didn't generate it and catalog
>>> is already fully capable of doing that.
>>>
>>> With that being said, I will definitely put all your suggestions on the
>>> agenda, let's discuss this more in depth, to understand the feedback
>>> better. I also wanna include the types of mode discussion. Maybe we should
>>> just keep client_only and catalog_only for now ? since preference is too
>>> much for the first phase ?
>>>
>>> Please let me circle back with concrete time, meeting links etc, i will
>>> post it here !
>>>
>>> Best,
>>> Prashant Singh
>>>
>>> On Sat, Jan 17, 2026 at 11:28 PM Péter Váry <[email protected]>
>>> wrote:
>>>
>>>> Hi Prashant,
>>>>
>>>> I agree that having a dedicated sync makes a lot of sense. I’d suggest
>>>> the following agenda items:
>>>>
>>>> 1. *Read-only tables*
>>>> During the early discussions around the File Format API, I suggested
>>>> starting with the read path, as this would allow us to integrate new data
>>>> sources more quickly. At the time, there were strong objections, with the
>>>> argument that every Iceberg table should be fully readable and writable
>>>> through Iceberg in order to be considered a “real” Iceberg table. I’m
>>>> interested to understand whether this position has changed since then.
>>>>
>>>> 2. *Table migration*
>>>> I see clear benefits in generating table metadata on the fly (e.g.,
>>>> easier integration with fast-changing systems, stricter security models,
>>>> and potential performance gains). My concern is that, if we allow this
>>>> without constraints, a fully compliant Iceberg catalog could choose not to
>>>> materialize metadata at all. This would make migration to another compliant
>>>> Iceberg catalog much harder. Openness and easy migration are major selling
>>>> points of Iceberg, and I think we should continue to enforce those values.
>>>>
>>>> 3. *Engine compatibility*
>>>> I want to highlight that introducing "CATALOG_ONLY" planners implicitly
>>>> creates a new requirement for all compliant engines. Without support for
>>>> this, engines would be unable to read these new tables. This seems like a
>>>> significant change that we should call out explicitly.
>>>>
>>>> 4. *CATALOG_ONLY tables*
>>>> If we reach agreement on the points above, I think the decision on this
>>>> topic will naturally follow.
>>>>
>>>> My current perspective on these topics:
>>>>
>>>> 1. *Read-only tables*
>>>> I like this idea, as it would allow Iceberg catalogs to more easily
>>>> expose external databases such as Delta, Lance, and others. My main
>>>> hesitation is that I’ve proposed this before and it was strongly rejected
>>>> by the community.
>>>>
>>>> 2. *Table migration*
>>>> My concern is that we may be taking incremental steps away from
>>>> Iceberg’s original position of full compliance, easy migration, and broad
>>>> compatibility, toward a more closed, catalog-bounded model. I’d like us to
>>>> step back and clearly define our core values, then enforce them in the
>>>> specification. This could be as simple as a few sentences in the
>>>> "LoadTableResponse" description requiring a way (for some users) to obtain
>>>> the full metadata JSON along with the corresponding manifest and data
>>>> files, or perhaps a dedicated migration endpoint that allows one catalog to
>>>> take over a table from another.
>>>>
>>>> 3. *Engine compatibility*
>>>> I have the sense that this “small” enum change actually introduces a
>>>> fairly large new requirement for engines, and I want to make sure we
>>>> explicitly highlight that.
>>>>
>>>> 4. *CATALOG_ONLY tables*
>>>> As above, I think our answers to the earlier questions will effectively
>>>> determine our position here.
>>>>
>>>> Overall, I like your proposal, but in a few areas it seems to move us
>>>> in a different direction from what we previously agreed on. I’d like to
>>>> understand whether the community is aligned with this new direction.
>>>>
>>>> Thanks,
>>>> Peter
>>>>
>>>>
>>>> On Thu, Jan 15, 2026, 20:34 Prashant Singh <[email protected]>
>>>> wrote:
>>>>
>>>>> Thank you for the discussion everyone,
>>>>> really appreciate all of you taking time !
>>>>>
>>>>> Unfortunately we were not able to discuss this in the catalog sync
>>>>> this week,  since we ran out of time, I was wondering if all the 
>>>>> interested
>>>>> folks would be open to a discussion.
>>>>> I can go ahead and request one in the iceberg calendar.
>>>>>
>>>>> Peter :
>>>>>
>>>>> > With the introduction of CATALOG_ONLY tables, storing Iceberg
>>>>> metadata files is no longer required for any operation
>>>>>
>>>>> I am not sure if i fully get the concern here, the client still writes
>>>>> the manifests and manifest lists to the tables which are given to the
>>>>> catalog where it creates / tracks the metadata.json, for writes we need to
>>>>> have hold of these manifests specially for cases such as validating no new
>>>>> data has been inserted to the table (conflict detection)
>>>>> please ref validateAddedDataFiles [1], this can't be achieved by scan
>>>>> planning at least not without breaking the existing iceberg clients as
>>>>> these validations are client side based on the isolation level, which 
>>>>> would
>>>>> make these tables unusable with client if we want to write.
>>>>>
>>>>> For the tables which are read only, I am not sure if those tables are
>>>>> sufficient for enforcing vendor lock in, in addition to what can be
>>>>> achieved today, I believe this would be circumvented though if we clarify 
>>>>> /
>>>>> tighten the metadata location expectation in the spec, that it should
>>>>> exactly state the state of the table as committed by clients
>>>>> i.e it should have precise references to the manifest and manifest
>>>>> list that the client created ?
>>>>>
>>>>> With that being said, I request everyone interested in this thread
>>>>> please let me know if you all are open for a dedicated community 
>>>>> discussion
>>>>> for this, happy to brainstorm together and reach consensus.
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/MergingSnapshotProducer.java#L377
>>>>>
>>>>> Best,
>>>>> Prashant Singh
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 14, 2026 at 7:38 AM Péter Váry <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Dan,
>>>>>>
>>>>>> > While it is possible and may feel like it would prevent
>>>>>> interoperability, that would be easily circumvented by just copying the
>>>>>> entire contents of the table through scan/plan.
>>>>>>
>>>>>> This enables the user to recreate a snapshot of the table, but it
>>>>>> does not provide the full history or complete table metadata. It is also
>>>>>> significantly more involved than simply calling the register table
>>>>>> operation.
>>>>>>
>>>>>> > REST Catalog implementations have always been able to restrict
>>>>>> access to physical storage regardless of whether a client could load the
>>>>>> table metadata or not.
>>>>>>
>>>>>> Previously, this was primarily a matter of gaining access to the
>>>>>> underlying storage. With the introduction of CATALOG_ONLY tables, storing
>>>>>> Iceberg metadata files is no longer required for any operation.
>>>>>>
>>>>>> > there are lots of different ways closed systems can restrict access
>>>>>> already (e.g. jdbc only or proprietary APIs), so I don't feel like this 
>>>>>> is
>>>>>> changing that dynamic.
>>>>>>
>>>>>> I’m not sure I understand this. Could you please provide more details?
>>>>>>
>>>>>> The goal, as I understand it, is that if a Catalog implements the
>>>>>> Iceberg specification, migration to and from this Catalog should be
>>>>>> possible with any other Catalog that adheres to the same specification.
>>>>>> Introducing CATALOG_ONLY tables, however, feels like another step away 
>>>>>> from
>>>>>> interoperability.
>>>>>>
>>>>>> > I think the motivation behind catalog only mode is more for cases
>>>>>> where the underlying data is either in a different representation or is
>>>>>> being adapted on-the-fly.  For example, if you wanted to expose a table
>>>>>> from a database that can export data to parquet, but doesn't natively
>>>>>> support Iceberg as a format, you can hide that behind scan plan 
>>>>>> interfaces.
>>>>>>
>>>>>> Using the Scan Planning interface has been optional until now, but
>>>>>> with the introduction of CATALOG_ONLY tables, it becomes mandatory. As a
>>>>>> result, compliant engines will need to implement it.
>>>>>>
>>>>>> > There may not be a full representation of the table metadata but
>>>>>> using a subset of Iceberg primitives, you can still achieve
>>>>>> interoperability (at least for read).
>>>>>>
>>>>>> In earlier discussions, we agreed that tables should not implement
>>>>>> only a subset of the Iceberg specification. This proposal seems to move 
>>>>>> in
>>>>>> a different direction. While I’m not opposed to the feature and recognize
>>>>>> the benefits of integrating non-Iceberg tables into Iceberg catalogs and
>>>>>> making them queryable by compatible engines, I believe it would be useful
>>>>>> to clarify our current understanding of the boundaries and the level of
>>>>>> feature parity we aim to maintain. Establishing this would provide a
>>>>>> consistent framework for evaluating similar proposals going forward.
>>>>>>
>>>>>> This seems like a good candidate for today’s catalog sync discussion.
>>>>>>
>>>>>> Thanks,
>>>>>> Peter
>>>>>>
>>>>>> Daniel Weeks <[email protected]> ezt írta (időpont: 2026. jan. 14.,
>>>>>> Sze, 0:23):
>>>>>>
>>>>>>> I don't feel we should be too concerned about catalogs switching to
>>>>>>> a "catalog only" mode and not providing direct access.  While it is
>>>>>>> possible and may feel like it would prevent interoperability, that 
>>>>>>> would be
>>>>>>> easily circumvented by just copying the entire contents of the table
>>>>>>> through scan/plan.  I wouldn't agree there was implied access just by
>>>>>>> having a metadata-location field either.  REST Catalog implementations 
>>>>>>> have
>>>>>>> always been able to restrict access to physical storage regardless of
>>>>>>> whether a client could load the table metadata or not.  I understand the
>>>>>>> concern about lock-in, but there are lots of different ways closed 
>>>>>>> systems
>>>>>>> can restrict access already (e.g. jdbc only or proprietary APIs), so I
>>>>>>> don't feel like this is changing that dynamic.
>>>>>>>
>>>>>>> I think the motivation behind catalog only mode is more for cases
>>>>>>> where the underlying data is either in a different representation or is
>>>>>>> being adapted on-the-fly.  For example, if you wanted to expose a table
>>>>>>> from a database that can export data to parquet, but doesn't natively
>>>>>>> support Iceberg as a format, you can hide that behind scan plan
>>>>>>> interfaces.  There may not be a full representation of the table 
>>>>>>> metadata
>>>>>>> but using a subset of Iceberg primitives, you can still achieve
>>>>>>> interoperability (at least for read).
>>>>>>>
>>>>>>> Introducing modes just is a way to express the intent/availability
>>>>>>> for the scan plan and coordinate between the client and server, but I 
>>>>>>> don't
>>>>>>> think it really affects whether a client could be prevented from reading
>>>>>>> table data directly (a catalog can do that regardless).
>>>>>>>
>>>>>>> I would add that I don't think the spec should include anything
>>>>>>> about the client modes (I added a comment to the PR on this).  The spec
>>>>>>> should only indicate what the server can return and what the 
>>>>>>> expectations
>>>>>>> should be for a client.  What a client implements and what 
>>>>>>> configurations
>>>>>>> it exposes is more of a client-side implementation detail and should 
>>>>>>> not be
>>>>>>> part of the spec.
>>>>>>>
>>>>>>>
>>>>>>> -Dan
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 13, 2026 at 11:07 AM Prashant Singh <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hello Peter,
>>>>>>>> Thank you for the feedback.
>>>>>>>>
>>>>>>>> IIUC, you mean to say an interpretation, could be a dummy file
>>>>>>>> which would in worst case simply not exist ? sure i believe we can be
>>>>>>>> explicit there to avoid this.
>>>>>>>> Note: this is predating this proposal though and happy to take a
>>>>>>>> stab in being explicit here.
>>>>>>>>
>>>>>>>> > users were required to have direct read access to the metadata
>>>>>>>> files in order to plan queries on the table. That implied an access
>>>>>>>> requirement, even though it was never explicitly documented
>>>>>>>>
>>>>>>>> while the requirement is true but it's not like every user would
>>>>>>>> get credentials to do so, it was strictly based on if the user is
>>>>>>>> authorized to read the table based on the privileges defined in the
>>>>>>>> catalog, loadTable's credential was optional meaning if a catalog 
>>>>>>>> wants it
>>>>>>>> could very well not vend any credentials despite the client
>>>>>>>> sending  X-Iceberg-Access-Delegation due to this [1]  and hence they 
>>>>>>>> can
>>>>>>>> cut off any client if they want to. I believe the flexibility
>>>>>>>> is there because we don't define authorization in IRC spec. As i
>>>>>>>> said the admin is the one who had given the access to storage to the
>>>>>>>> catalog in the first place so it can very well revoke that access to
>>>>>>>> storage and migrate if the catalog is misbehaving by calling every 
>>>>>>>> table to
>>>>>>>> itself to do planning and can move to a different catalog if the 
>>>>>>>> culprit
>>>>>>>> catalog doesn't fix it.
>>>>>>>>
>>>>>>>> > Maybe we add a sentence in the spec to enforce that there should
>>>>>>>> be some users where the catalog MUST provide access to the metadata 
>>>>>>>> files.
>>>>>>>>
>>>>>>>> Regarding the original feedback, there will always be an ADMIN user
>>>>>>>> who has configured the catalog in the first place with the storage
>>>>>>>> permission (lets say proving the IAM and establishing the trust
>>>>>>>> relationship) who can get hold of the storage directly and access those
>>>>>>>> metadata files directly from storage. So some are implicit in that 
>>>>>>>> sense.
>>>>>>>>
>>>>>>>> I believe by introducing CATALOG only mode for planning on existing
>>>>>>>> assumptions we are not introducing new ways to trap end users in 
>>>>>>>> getting
>>>>>>>> into vendor lock-in and like always existed a user has a way to walk 
>>>>>>>> out of
>>>>>>>> it with the constructs.
>>>>>>>>
>>>>>>>> Please let me know what WDYT is considering above ?
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://github.com/apache/iceberg/blob/fc434997fbc63a3f1f47481c0878073b1ccf6359/open-api/rest-catalog-open-api.yaml#L1886-L1887
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Prashant Singh
>>>>>>>>
>>>>>>>> On Tue, Jan 13, 2026 at 6:11 AM Péter Váry <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi Prashant,
>>>>>>>>>
>>>>>>>>> The specification states:
>>>>>>>>>
>>>>>>>>>> The corresponding file location of table metadata should be
>>>>>>>>>> returned in the `metadata-location` field
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> However, it does not specify that this location must be readable
>>>>>>>>> by any users. (Perhaps this is something we should revisit and clarify
>>>>>>>>> going forward.)
>>>>>>>>>
>>>>>>>>> Before the introduction of CATALOG_ONLY tables, users were
>>>>>>>>> required to have direct read access to the metadata files in order to 
>>>>>>>>> plan
>>>>>>>>> queries on the table. That implied an access requirement, even though 
>>>>>>>>> it
>>>>>>>>> was never explicitly documented. With the introduction of 
>>>>>>>>> CATALOG_ONLY,
>>>>>>>>> this implicit requirement no longer applies, and we currently do not 
>>>>>>>>> have
>>>>>>>>> an explicit requirement defined in the specification either.
>>>>>>>>>
>>>>>>>>> Prashant Singh <[email protected]> ezt írta (időpont:
>>>>>>>>> 2026. jan. 12., H, 23:33):
>>>>>>>>>
>>>>>>>>>> Thank you for the feedback everyone !
>>>>>>>>>>
>>>>>>>>>> Eduard : I am open to being it named _ENFORCED or even not having
>>>>>>>>>> _ONLY or _ENFORCED in the first place as Dan suggested here, please 
>>>>>>>>>> let me
>>>>>>>>>> know if you are ok with that as per [1]
>>>>>>>>>>
>>>>>>>>>> Amogh : Thank you for the feedback on the _preference mode, i
>>>>>>>>>> tried to document some concrete use cases that could benefit with it 
>>>>>>>>>> [2] as
>>>>>>>>>> I believe it can provide some options for the catalog and client to
>>>>>>>>>> negotiate when they are open to it please let me know wdyt ?
>>>>>>>>>>
>>>>>>>>>> Peter : I believe such kind of vendor locking would not be
>>>>>>>>>> possible to have since the model we are going after i.e in the 
>>>>>>>>>> loadTable
>>>>>>>>>> itself we get back the metadata pointer which is self describing and 
>>>>>>>>>> can be
>>>>>>>>>> used to register this table in the new catalog, also the way the 
>>>>>>>>>> catalog
>>>>>>>>>> (irc) specially has been laid out it decouple compute from storage
>>>>>>>>>> so in the end it's the Admin user of the catalog which has given
>>>>>>>>>> the catalog admin cred which gets scoped down based on the grants it 
>>>>>>>>>> had to
>>>>>>>>>> the catalog defined and the ADMIN can simply revoke the catalog from 
>>>>>>>>>> doing
>>>>>>>>>> it or can configure a new catalog with a different admin storage 
>>>>>>>>>> creds.
>>>>>>>>>> I tried elaborating more on this on the PR feedback too [3]
>>>>>>>>>> please let me know what wdyt ?
>>>>>>>>>>
>>>>>>>>>> I will be on top of both the PR and thread moving forward !
>>>>>>>>>> Appreciate all your feedback.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2673087002
>>>>>>>>>> [2]
>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2678941794
>>>>>>>>>> [3]
>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#discussion_r2678376025
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Prashant Singh
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 9, 2026 at 10:34 PM Péter Váry <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I have a concern about some catalogs starting to make every
>>>>>>>>>>> table `CATALOG_ONLY`, which would essentially lock users to the 
>>>>>>>>>>> catalog
>>>>>>>>>>> without providing a way to migrate the data to another catalog.
>>>>>>>>>>> Maybe we add a sentence in the spec to enforce, that there
>>>>>>>>>>> should be some users where the catalog MUST provide access to the 
>>>>>>>>>>> metadata
>>>>>>>>>>> files.
>>>>>>>>>>>
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Jan 8, 2026, 18:38 Amogh Jahagirdar <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I did a pass over PR but I guess I'm a little skeptical on what
>>>>>>>>>>>> notion of "preferences" truly gets us in the protocol. In case the 
>>>>>>>>>>>> endpoint
>>>>>>>>>>>> is available but not enforced, my mental model is to just let the 
>>>>>>>>>>>> client
>>>>>>>>>>>> make whatever choice it wants. If a server really thinks it's 
>>>>>>>>>>>> advantageous
>>>>>>>>>>>> to use the remote planning, I'd think it'd just say server side 
>>>>>>>>>>>> planning is
>>>>>>>>>>>> enforced. For the "momentary load" case, all a client would need 
>>>>>>>>>>>> to do is
>>>>>>>>>>>> just handle the server throttling and fallback to a client side 
>>>>>>>>>>>> planning
>>>>>>>>>>>> (don't think the protocol needs to expand just for that).
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jan 7, 2026 at 11:28 AM Russell Spitzer <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm in agreement with Prashsant's current plan, I have no
>>>>>>>>>>>>> preference on naming of Only vs Enforced"
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, Jan 7, 2026 at 4:42 AM Eduard Tudenhöfner <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Instead of calling it "ONLY", maybe "ENFORCED" would be a
>>>>>>>>>>>>>> better term? I think that would more naturally express the 
>>>>>>>>>>>>>> behavior without
>>>>>>>>>>>>>> having to define what "ONLY" really means.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Wed, Dec 24, 2025 at 12:05 AM Prashant Singh <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Hi everyone,*
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *JB:* Mostly yes, but it's more about what the server wants
>>>>>>>>>>>>>>> the client to do. The server can indicate if it supports a mode 
>>>>>>>>>>>>>>> or not via
>>>>>>>>>>>>>>> the /v1/config endpoint at this point.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> *Russell:* Thank you for the thorough feedback! I think it
>>>>>>>>>>>>>>> is a great idea to break the optional mode into *Prefer
>>>>>>>>>>>>>>> Client | Prefer Catalog*—it really opens up a lot of
>>>>>>>>>>>>>>> interesting use cases.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For example, the server might support planning but, due to
>>>>>>>>>>>>>>> momentary load, wants the client to see if it's open to 
>>>>>>>>>>>>>>> planning on the
>>>>>>>>>>>>>>> client side. Similarly, an argument can be made that if the 
>>>>>>>>>>>>>>> server has a
>>>>>>>>>>>>>>> table cached in memory, it would prefer the client comes to the 
>>>>>>>>>>>>>>> server.
>>>>>>>>>>>>>>> Earlier, with just the optional value, we were simply falling 
>>>>>>>>>>>>>>> back to
>>>>>>>>>>>>>>> server or client side planning based on whether the server 
>>>>>>>>>>>>>>> supported scan
>>>>>>>>>>>>>>> planning. Now, the client can express its own overrides via 
>>>>>>>>>>>>>>> catalog configs
>>>>>>>>>>>>>>> as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Based on our offline discussion, I have incorporated the
>>>>>>>>>>>>>>> feedback into the updated matrix [1] to document what the 
>>>>>>>>>>>>>>> planning modes
>>>>>>>>>>>>>>> would be based on the server response and client overrides:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    *CLIENT_ONLY + CATALOG_ONLY* = FAIL
>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    *One "ONLY" + opposite "PREFERRED"* = ONLY wins
>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    *Both "PREFERRED"* = Client config wins
>>>>>>>>>>>>>>>    -
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>    *Client not configured* = Use server config or default
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will update the reference implementation soon based on
>>>>>>>>>>>>>>> this. I would love to know what other folks think!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [1]
>>>>>>>>>>>>>>> https://github.com/apache/iceberg/pull/14867#issuecomment-3683989832
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 1:26 PM Russell Spitzer <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I can imagine one more
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (None - I would rename this) ClientOnly - Client can use
>>>>>>>>>>>>>>>> Catalog Planning or Local Planning
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PreferClient - Client should use local planning, but the
>>>>>>>>>>>>>>>> plan api is available for this table — I can only imagine this 
>>>>>>>>>>>>>>>> would be
>>>>>>>>>>>>>>>> useful for a scenario where most clients are heavy and have 
>>>>>>>>>>>>>>>> the resources
>>>>>>>>>>>>>>>> to do local planning (or engine distributed planning) but you 
>>>>>>>>>>>>>>>> still want to
>>>>>>>>>>>>>>>> support lightweight clients which can’t really do planning 
>>>>>>>>>>>>>>>> themselves.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> PreferCatalog - Client should use the plan API, but
>>>>>>>>>>>>>>>> credentials have been provided to enable local planning — This 
>>>>>>>>>>>>>>>> is probably
>>>>>>>>>>>>>>>> a transitional state as we move from clients that only support 
>>>>>>>>>>>>>>>> local
>>>>>>>>>>>>>>>> planning to those which can use the plan api.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> CatalogOnly - Clients are not provided with the credentials
>>>>>>>>>>>>>>>> required to read the table from the Metadata.json alone. If 
>>>>>>>>>>>>>>>> they do not
>>>>>>>>>>>>>>>> implement the scan plan API they should fail fast, otherwise 
>>>>>>>>>>>>>>>> they will fail
>>>>>>>>>>>>>>>> when they attempt to load a manifest_list file — This is used 
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>> circumstances where the catalog is giving either file specific 
>>>>>>>>>>>>>>>> credentials
>>>>>>>>>>>>>>>> or is protecting the delivered files in some way such that 
>>>>>>>>>>>>>>>> their contents
>>>>>>>>>>>>>>>> has been specially redacted or something like that.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I assume most catalogs will start with “ClientOnly” or
>>>>>>>>>>>>>>>> “None”
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then as Catalogs being to support planning API we will see
>>>>>>>>>>>>>>>> most tables move to
>>>>>>>>>>>>>>>> PreferCatalog with some perhaps extremely heavy or large
>>>>>>>>>>>>>>>> tables staying as PreferClient or Client Only.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Then catalogs with special protections may have some tables
>>>>>>>>>>>>>>>> return  CatalogOnly so they can either scope credentials more 
>>>>>>>>>>>>>>>> tightly or
>>>>>>>>>>>>>>>> manipulate the files that the client actually has access to in 
>>>>>>>>>>>>>>>> some way.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 1:09 AM Jean-Baptiste Onofré <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Prashant
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It makes sense to me. I guess we are using Catalog
>>>>>>>>>>>>>>>>> properties to indicate what the REST server supports to the 
>>>>>>>>>>>>>>>>> client, right ?
>>>>>>>>>>>>>>>>> I will take a look at the PR, but I like the idea.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>>>> JB
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Sat, Dec 20, 2025 at 12:53 AM Prashant Singh <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hey All,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I wanted to bring up the discussion of introducing a
>>>>>>>>>>>>>>>>>> concept of rest scan planning mode which would help the 
>>>>>>>>>>>>>>>>>> server to instruct
>>>>>>>>>>>>>>>>>> the client on how to plan the table via loadTableResponse or 
>>>>>>>>>>>>>>>>>> config at
>>>>>>>>>>>>>>>>>> table level override.
>>>>>>>>>>>>>>>>>> There are three possible values which one could think of
>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>> 1. *None* : i.e plan it on the client side, this may be
>>>>>>>>>>>>>>>>>> the table is too small and the additional rest request would 
>>>>>>>>>>>>>>>>>> add more
>>>>>>>>>>>>>>>>>> overhead than benefit.
>>>>>>>>>>>>>>>>>> 2. *Optional* : client can choose to plan it either
>>>>>>>>>>>>>>>>>> locally or can trigger server side planning.
>>>>>>>>>>>>>>>>>> 3. *Required* : client MUST do server side planning, the
>>>>>>>>>>>>>>>>>> server could suggest this if it has better indexed the 
>>>>>>>>>>>>>>>>>> iceberg metadata or
>>>>>>>>>>>>>>>>>> client is running on low resources or the table is 
>>>>>>>>>>>>>>>>>> protected. Server MAY
>>>>>>>>>>>>>>>>>> choose whatever way required to enforce the client cant 
>>>>>>>>>>>>>>>>>> bypass this for
>>>>>>>>>>>>>>>>>> example let's say don't vend cred as part of loadTable and 
>>>>>>>>>>>>>>>>>> only mint it
>>>>>>>>>>>>>>>>>> part of planning completion this would mean if the client 
>>>>>>>>>>>>>>>>>> doesn't call plan
>>>>>>>>>>>>>>>>>> table .
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I proactively have created a pull request [1], would love
>>>>>>>>>>>>>>>>>> to know all your feedback either here or in the PR directly !
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Wish you all a very happy Holidays, it has been great
>>>>>>>>>>>>>>>>>> working with you all.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [1] https://github.com/apache/iceberg/pull/14867
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>>>> Prashant Singh
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>

Re: [DISCUSS] REST: Scan Planning mode

Reply via email to