I think Kevin has raised a few good points on the future of FileIO and the 
maintainability of the project going forwarded with the AAL default on Proposal.
I think we should schedule community sync about this.

Thank you!


On 2025/07/31 13:12:48 Steve Loughran wrote:
> On Fri, 25 Jul 2025 at 17:28, Kevin Liu 
> <ke...@apache.org<mailto:ke...@apache.org>> wrote:
>
> *> I think it would be great to also make these improvements available to
> older Iceberg clients.*
>
> Use the S3A connector and turn on vector reads through parquert and you
> currently get the same performance, about at 30% speedup in TPC benchmarks
> (I know, but what else do we have?). S3A connector is going to to move to
> making the AAL input stream the default in a future release because it's a
> better architeture overall.
>
> the vector IO stuff can also do speedup on azire and abfs if their
> connectors support it. (oh and local fs too, FWIW). scatter/gather IO for
> the win.
>
> what AAL adds is format awareness, which could allow for extra
> opportunities.
>
> As an aside, it'd be really good if FileIO added an overload newInputFile()
> api call which passed in the file type too, so that AAL &c would know what
> type to optimise for, rather than just guess of the extension. knowing what
> the v1 and maybe v2 schema offsets would save that GET request on the
> footter. AAL does a GET of a range at the bottom with the goal of including
> the schema, but knowing the exact range would be better.
>
> *> BTW, we have one-off community syncs about specific topics, I would be
> interested to talk more about this as well as other FileIOs. We use the
> "Iceberg Dev Events" calendar for scheduling if there's interest.*
>
> I"d like that too.
>
>
> >
>

Reply via email to