+1 on on Drew's proposal. I'm not super keen on the Catalog itself writing data files, but I don't have a problem with it fully crafting metadata as in the fine-grained commit proposal. If you have a client that can speak REST, adding a little hook to also write a data file is probably acceptable (PyIceberg does this and writes metadata.) However, one could argue that once you have a REST client you likely have enough other Iceberg code in your client that writing a data file isn't too difficult.
Would love to know if other folks really have a usecase for a generic JSON -> Datafile/Iceberg api On Tue, Apr 28, 2026 at 2:28 PM Kevin Liu <[email protected]> wrote: > Hi Gokul*, * > > Thanks for bringing this up. I do think a row-level ingestion API is > interesting from the user's perspective, abstracting away all table/file > details is helpful in some ways. > > That being said, I don't think this functionality belongs in the IRC spec. > The REST Spec primarily focuses on metadata and catalog operations. Adding > a "row ingestion API" means the catalog must now handle data operations in > addition to metadata operations, which is a significant scope expansion. > > One alternative worth considering can be a lightweight writer combined > with the fine-grained commit proposal that Drew has been driving. The > writer materializes rows to object store and work with IRC to commit the > newly written rows. This keeps the catalog's role cleanly separated from > data operations. > > Curious to hear what others think. > > Best, > Kevin Liu > > On Mon, Apr 27, 2026 at 9:33 AM Soundararajan, Gokul <[email protected]> > wrote: > >> Hi all, >> >> Something we've been hearing from customers is that getting rows into >> Iceberg tables is still harder than it should be. The REST Catalog has done >> a great job standardizing how clients manage tables and commit snapshots, >> but the actual "send me some rows" part is left to each implementation to >> figure out independently. >> >> Right now, customers either run a full compute engine to write Parquet >> and commit, or they use whatever proprietary ingestion API their catalog >> vendor offers. Both work, but neither is portable across catalogs the way >> the REST Catalog made metadata operations portable. >> >> We've been thinking about whether there's a natural place for this in the >> REST Catalog spec — something like a rows endpoint on a table that accepts >> JSON or Arrow, validates against the schema, and lets the catalog handle >> the rest. Not sure if this is the right level of abstraction for the spec >> or if it's better left to implementations, but wanted to see if others in >> the community have been thinking about this too. >> >> >> Would love to hear if anyone else is seeing the same gap, or if there are >> reasons this doesn't belong in the spec. >> >> Gokul >> >>
