Rahil, thanks for working on this. It has some really good ideas that we
hadn't considered before like a way for the service to plan how to break up
the work of scan planning. I really like that idea because it makes it much
easier for the service to keep memory consumption low across requests.

My primary feedback is that I think it's a little too complicated (with
both sharding and pagination) and could be modified slightly so that the
service doesn't need to be stateful. If the service isn't necessarily
stateful then it should be easier to build implementations.

To make it possible for the service to be stateless, I'm proposing that
rather than creating shard IDs that are tracked by the service, the
information for a shard can be sent to the client. My assumption here is
that most implementations would create shards by reading the manifest list,
filtering on partition ranges, and creating a shard for some reasonable
size of manifest content. For example, if a table has 100MB of metadata in
25 manifests that are about 4 MB each, then it might create 9 shards with
1-4 manifests each. The service could send those shards to the client as a
list of manifests to read and the client could send the shard information
back to the service to get the data files in each shard (along with the
original filter).

There's a slight trade-off that the protocol needs to define how to break
the work into shards. I'm interested in hearing if that would work with how
you were planning on building the service on your end. Another option is to
let the service send back arbitrary JSON that would get returned for each
shard. Either way, I like that this would make it so the service doesn't
need to persist anything. We could also make it so that small tables don't
require multiple requests. For example, a client could call the route to
get file tasks with just a filter.

What do you think?

Ryan

On Fri, Dec 8, 2023 at 10:41 AM Chertara, Rahil <rcher...@amazon.com.invalid>
wrote:

> Hi all,
>
> My name is Rahil Chertara, and I’m a part of the Iceberg team at Amazon
> EMR and Athena. I’m reaching out to share a proposal for a new Scan API
> that will be utilized by the RESTCatalog. The process for table scan
> planning is currently done within client engines such as Apache Spark. By
> moving scan functionality to the RESTCatalog, we can integrate Iceberg
> table scans with external services, which can lead to several benefits.
>
> For example, we can leverage caching and indexes on the server side to
> improve planning performance. Furthermore, by moving this scan logic to the
> RESTCatalog, non-JVM engines can integrate more easily. This all can be
> found in the detailed proposal below. Feel free to comment, and add your
> suggestions .
>
> Detailed proposal:
> https://docs.google.com/document/d/1FdjCnFZM1fNtgyb9-v9fU4FwOX4An-pqEwSaJe8RgUg/edit#heading=h.cftjlkb2wh4h
>
> Github POC: https://github.com/apache/iceberg/pull/9252
>
> Regards,
>
> Rahil Chertara
> Amazon EMR & Athena
> rcher...@amazon.com
>
>
>


-- 
Ryan Blue
Tabular

Reply via email to