Re: Row-level ingestion and the REST Catalog

Russell Spitzer Tue, 28 Apr 2026 14:42:57 -0700

+1 on on Drew's proposal. I'm not super keen on the Catalog itself writing
data files, but I don't have a problem with it fully crafting metadata as
in the fine-grained commit proposal. If you have a client that can speak
REST, adding a little hook to also write a data file is probably acceptable
(PyIceberg does this and writes metadata.) However, one could argue that
once you have a REST client you likely have enough other Iceberg code in
your client that writing a data file isn't too difficult.


Would love to know if other folks really have a usecase for a generic JSON
-> Datafile/Iceberg api

On Tue, Apr 28, 2026 at 2:28 PM Kevin Liu <[email protected]> wrote:

> Hi Gokul*, *
>
> Thanks for bringing this up. I do think a row-level ingestion API is
> interesting from the user's perspective, abstracting away all table/file
> details is helpful in some ways.
>
> That being said, I don't think this functionality belongs in the IRC spec.
> The REST Spec primarily focuses on metadata and catalog operations. Adding
> a "row ingestion API" means the catalog must now handle data operations in
> addition to metadata operations, which is a significant scope expansion.
>
> One alternative worth considering can be a lightweight writer combined
> with the fine-grained commit proposal that Drew has been driving. The
> writer materializes rows to object store and work with IRC to commit the
> newly written rows. This keeps the catalog's role cleanly separated from
> data operations.
>
> Curious to hear what others think.
>
> Best,
> Kevin Liu
>
> On Mon, Apr 27, 2026 at 9:33 AM Soundararajan, Gokul <[email protected]>
> wrote:
>
>> Hi all,
>>
>> Something we've been hearing from customers is that getting rows into
>> Iceberg tables is still harder than it should be. The REST Catalog has done
>> a great job standardizing how clients manage tables and commit snapshots,
>> but the actual "send me some rows" part is left to each implementation to
>> figure out independently.
>>
>> Right now, customers either run a full compute engine to write Parquet
>> and commit, or they use whatever proprietary ingestion API their catalog
>> vendor offers. Both work, but neither is portable across catalogs the way
>> the REST Catalog made metadata operations portable.
>>
>> We've been thinking about whether there's a natural place for this in the
>> REST Catalog spec — something like a rows endpoint on a table that accepts
>> JSON or Arrow, validates against the schema, and lets the catalog handle
>> the rest. Not sure if this is the right level of abstraction for the spec
>> or if it's better left to implementations, but wanted to see if others in
>> the community have been thinking about this too.
>>
>>
>> Would love to hear if anyone else is seeing the same gap, or if there are
>> reasons this doesn't belong in the spec.
>>
>> Gokul
>>
>>

Re: Row-level ingestion and the REST Catalog

Reply via email to