Rahil, thanks for working on this. It has some really good ideas that we hadn't considered before like a way for the service to plan how to break up the work of scan planning. I really like that idea because it makes it much easier for the service to keep memory consumption low across requests.
My primary feedback is that I think it's a little too complicated (with both sharding and pagination) and could be modified slightly so that the service doesn't need to be stateful. If the service isn't necessarily stateful then it should be easier to build implementations. To make it possible for the service to be stateless, I'm proposing that rather than creating shard IDs that are tracked by the service, the information for a shard can be sent to the client. My assumption here is that most implementations would create shards by reading the manifest list, filtering on partition ranges, and creating a shard for some reasonable size of manifest content. For example, if a table has 100MB of metadata in 25 manifests that are about 4 MB each, then it might create 9 shards with 1-4 manifests each. The service could send those shards to the client as a list of manifests to read and the client could send the shard information back to the service to get the data files in each shard (along with the original filter). There's a slight trade-off that the protocol needs to define how to break the work into shards. I'm interested in hearing if that would work with how you were planning on building the service on your end. Another option is to let the service send back arbitrary JSON that would get returned for each shard. Either way, I like that this would make it so the service doesn't need to persist anything. We could also make it so that small tables don't require multiple requests. For example, a client could call the route to get file tasks with just a filter. What do you think? Ryan On Fri, Dec 8, 2023 at 10:41 AM Chertara, Rahil <rcher...@amazon.com.invalid> wrote: > Hi all, > > My name is Rahil Chertara, and I’m a part of the Iceberg team at Amazon > EMR and Athena. I’m reaching out to share a proposal for a new Scan API > that will be utilized by the RESTCatalog. The process for table scan > planning is currently done within client engines such as Apache Spark. By > moving scan functionality to the RESTCatalog, we can integrate Iceberg > table scans with external services, which can lead to several benefits. > > For example, we can leverage caching and indexes on the server side to > improve planning performance. Furthermore, by moving this scan logic to the > RESTCatalog, non-JVM engines can integrate more easily. This all can be > found in the detailed proposal below. Feel free to comment, and add your > suggestions . > > Detailed proposal: > https://docs.google.com/document/d/1FdjCnFZM1fNtgyb9-v9fU4FwOX4An-pqEwSaJe8RgUg/edit#heading=h.cftjlkb2wh4h > > Github POC: https://github.com/apache/iceberg/pull/9252 > > Regards, > > Rahil Chertara > Amazon EMR & Athena > rcher...@amazon.com > > > -- Ryan Blue Tabular