I am not sure row-level ingestion belongs to IRC. Catalog probably should probably remain at the file level granularity. Current IRC already supports server-side scan planning and returns the list of files to the client for the read path. I would also support the server-side commit that Drew proposed previously to complete the story on the write path.
On Tue, Apr 28, 2026 at 2:43 PM Russell Spitzer <[email protected]> wrote: > +1 on on Drew's proposal. I'm not super keen on the Catalog itself writing > data files, but I don't have a problem with it fully crafting metadata as > in the fine-grained commit proposal. If you have a client that can speak > REST, adding a little hook to also write a data file is probably acceptable > (PyIceberg does this and writes metadata.) However, one could argue that > once you have a REST client you likely have enough other Iceberg code in > your client that writing a data file isn't too difficult. > > Would love to know if other folks really have a usecase for a generic JSON > -> Datafile/Iceberg api > > On Tue, Apr 28, 2026 at 2:28 PM Kevin Liu <[email protected]> wrote: > >> Hi Gokul*, * >> >> Thanks for bringing this up. I do think a row-level ingestion API is >> interesting from the user's perspective, abstracting away all table/file >> details is helpful in some ways. >> >> That being said, I don't think this functionality belongs in the IRC >> spec. The REST Spec primarily focuses on metadata and catalog operations. >> Adding a "row ingestion API" means the catalog must now handle data >> operations in addition to metadata operations, which is a significant scope >> expansion. >> >> One alternative worth considering can be a lightweight writer combined >> with the fine-grained commit proposal that Drew has been driving. The >> writer materializes rows to object store and work with IRC to commit the >> newly written rows. This keeps the catalog's role cleanly separated from >> data operations. >> >> Curious to hear what others think. >> >> Best, >> Kevin Liu >> >> On Mon, Apr 27, 2026 at 9:33 AM Soundararajan, Gokul <[email protected]> >> wrote: >> >>> Hi all, >>> >>> Something we've been hearing from customers is that getting rows into >>> Iceberg tables is still harder than it should be. The REST Catalog has done >>> a great job standardizing how clients manage tables and commit snapshots, >>> but the actual "send me some rows" part is left to each implementation to >>> figure out independently. >>> >>> Right now, customers either run a full compute engine to write Parquet >>> and commit, or they use whatever proprietary ingestion API their catalog >>> vendor offers. Both work, but neither is portable across catalogs the way >>> the REST Catalog made metadata operations portable. >>> >>> We've been thinking about whether there's a natural place for this in >>> the REST Catalog spec — something like a rows endpoint on a table that >>> accepts JSON or Arrow, validates against the schema, and lets the catalog >>> handle the rest. Not sure if this is the right level of abstraction for the >>> spec or if it's better left to implementations, but wanted to see if others >>> in the community have been thinking about this too. >>> >>> >>> Would love to hear if anyone else is seeing the same gap, or if there >>> are reasons this doesn't belong in the spec. >>> >>> Gokul >>> >>>
