Hi Talat, That is a great idea.
As I mentioned in my comments on the document, there has been an ongoing discussion regarding this in Apache Polaris. You can find more details in this PR (https://github.com/apache/polaris/pull/4613) and the related documentation ( https://github.com/jbonofre/polaris/blob/12dfea48570d076d4012143e66f02e8b503c4f99/site/content/in-dev/unreleased/directories.md ). I am curious about where unstructured data support should be scoped. While Iceberg might be the right place, I wonder if the catalog—or even a third party—is more natural for managing credential vending and object access indirection. Regards, JB On Fri, Jun 26, 2026 at 2:52 AM Talat Uyarer via dev <[email protected]> wrote: > Hi everyone, > > I’d like to open a discussion on a new proposal to better support > unstructured data in Iceberg. > > As tables increasingly need to reference unstructured objects (images, > video, ML artifacts, PDFs) that are too large to embed, the current > fallback is to use bare string URI columns. This has a few structural > problems: it bypasses catalog governance (requiring engines to hold broad > bucket-level credentials), lacks cross-engine portability, and breaks read > determinism if the underlying object is overwritten. > > To solve this, There is already an active proposal in the Parquet > community to introduce a native File logical type for physical files. I've > drafted a proposal for a FileRef type (struct<path, etag>) which is > designed to layer directly on top of that work. While Parquet defines the > physical columnar representation, Iceberg's FileRef handles the > table-format layer (governance, read determinism, snapshot isolation, and > access brokering). A physical File column in Parquet will map 1:1 to > Iceberg's logical FileRef, ensuring a unified standard from the storage > layer up to the catalog. > > The core idea is to shift the responsibility of access control to the > Iceberg REST Catalog. Instead of granting compute engines direct bucket > access, the proposal introduces a new object-access endpoint. The catalog > brokers access by vending short-lived credentials or pre-signed URLs > strictly for the referenced objects (validated against a new > fileref.allowed-locations table property). > > You can read the full proposal draft here: > https://s.apache.org/iceberg-fileref > > I would love to get your feedback on this approach. > > Parquet Proposal: > https://docs.google.com/document/d/1AiwrstqkwkBoOZqgOkm9JGwSMcNeHyLR7EEj1CVqpZQ/edit?tab=t.0#heading=h.k8qyue4jj4rn > > Best, > Talat Uyarer >
