Re: Row-level ingestion and the REST Catalog

Steven Wu Fri, 01 May 2026 11:37:53 -0700

I am not sure row-level ingestion belongs to IRC.

Catalog probably should probably remain at the file level granularity.
Current IRC already supports server-side scan planning and returns the list
of files to the client for the read path. I would also support the
server-side commit that Drew proposed previously to complete the story on
the write path.


On Tue, Apr 28, 2026 at 2:43 PM Russell Spitzer <[email protected]>
wrote:

> +1 on on Drew's proposal. I'm not super keen on the Catalog itself writing
> data files, but I don't have a problem with it fully crafting metadata as
> in the fine-grained commit proposal. If you have a client that can speak
> REST, adding a little hook to also write a data file is probably acceptable
> (PyIceberg does this and writes metadata.) However, one could argue that
> once you have a REST client you likely have enough other Iceberg code in
> your client that writing a data file isn't too difficult.
>
> Would love to know if other folks really have a usecase for a generic JSON
> -> Datafile/Iceberg api
>
> On Tue, Apr 28, 2026 at 2:28 PM Kevin Liu <[email protected]> wrote:
>
>> Hi Gokul*, *
>>
>> Thanks for bringing this up. I do think a row-level ingestion API is
>> interesting from the user's perspective, abstracting away all table/file
>> details is helpful in some ways.
>>
>> That being said, I don't think this functionality belongs in the IRC
>> spec. The REST Spec primarily focuses on metadata and catalog operations.
>> Adding a "row ingestion API" means the catalog must now handle data
>> operations in addition to metadata operations, which is a significant scope
>> expansion.
>>
>> One alternative worth considering can be a lightweight writer combined
>> with the fine-grained commit proposal that Drew has been driving. The
>> writer materializes rows to object store and work with IRC to commit the
>> newly written rows. This keeps the catalog's role cleanly separated from
>> data operations.
>>
>> Curious to hear what others think.
>>
>> Best,
>> Kevin Liu
>>
>> On Mon, Apr 27, 2026 at 9:33 AM Soundararajan, Gokul <[email protected]>
>> wrote:
>>
>>> Hi all,
>>>
>>> Something we've been hearing from customers is that getting rows into
>>> Iceberg tables is still harder than it should be. The REST Catalog has done
>>> a great job standardizing how clients manage tables and commit snapshots,
>>> but the actual "send me some rows" part is left to each implementation to
>>> figure out independently.
>>>
>>> Right now, customers either run a full compute engine to write Parquet
>>> and commit, or they use whatever proprietary ingestion API their catalog
>>> vendor offers. Both work, but neither is portable across catalogs the way
>>> the REST Catalog made metadata operations portable.
>>>
>>> We've been thinking about whether there's a natural place for this in
>>> the REST Catalog spec — something like a rows endpoint on a table that
>>> accepts JSON or Arrow, validates against the schema, and lets the catalog
>>> handle the rest. Not sure if this is the right level of abstraction for the
>>> spec or if it's better left to implementations, but wanted to see if others
>>> in the community have been thinking about this too.
>>>
>>>
>>> Would love to hear if anyone else is seeing the same gap, or if there
>>> are reasons this doesn't belong in the spec.
>>>
>>> Gokul
>>>
>>>

Re: Row-level ingestion and the REST Catalog

Reply via email to