I agree

Cheers,
Tushar Choudhary


On Wed, 13 Aug 2025 at 10:39 PM, Kevin Liu <kevinjq...@apache.org> wrote:

> Hey everyone,
>
> As discussed on the community sync today, there's enough interest around
> AAL, S3FileIO, and FileIO in general that we would like to schedule an ad
> hoc sync for this topic.
>
> I'll work with Michael to find a suitable time. We'll add it to the "Iceberg
> Dev Events"
> <https://iceberg.apache.org/community/#apache-iceberg-community-calendar> and
> also post on the devlist when we figure out more details.
>
> Best,
> Kevin Liu
>
> On Mon, Aug 11, 2025 at 8:00 AM Kevin Liu <kevinjq...@apache.org> wrote:
>
>> Let's bring this up in the next community sync on Wed 8/13 and see if it
>> warrants another adhoc sync about FileIO.
>>
>> Best,
>> Kevin Liu
>>
>> On Wed, Aug 6, 2025 at 1:21 PM Stubbs, Michael
>> <michs...@amazon.co.uk.invalid> wrote:
>>
>>> I think Kevin has raised a few good points on the future of FileIO and
>>> the maintainability of the project going forwarded with the AAL default on
>>> Proposal.
>>>
>>> I think we should schedule community sync about this.
>>>
>>> Thank you!
>>>
>>>
>>>
>>> On 2025/07/31 13:12:48 Steve Loughran wrote:
>>>
>>> > On Fri, 25 Jul 2025 at 17:28, Kevin Liu <ke...@apache.org> wrote:
>>>
>>> >
>>>
>>> > *> I think it would be great to also make these improvements available
>>> to
>>>
>>> > older Iceberg clients.*
>>>
>>> >
>>>
>>> > Use the S3A connector and turn on vector reads through parquert and you
>>>
>>> > currently get the same performance, about at 30% speedup in TPC
>>> benchmarks
>>>
>>> > (I know, but what else do we have?). S3A connector is going to to move
>>> to
>>>
>>> > making the AAL input stream the default in a future release because
>>> it's a
>>>
>>> > better architeture overall.
>>>
>>> >
>>>
>>> > the vector IO stuff can also do speedup on azire and abfs if their
>>>
>>> > connectors support it. (oh and local fs too, FWIW). scatter/gather IO
>>> for
>>>
>>> > the win.
>>>
>>> >
>>>
>>> > what AAL adds is format awareness, which could allow for extra
>>>
>>> > opportunities.
>>>
>>> >
>>>
>>> > As an aside, it'd be really good if FileIO added an overload
>>> newInputFile()
>>>
>>> > api call which passed in the file type too, so that AAL &c would know
>>> what
>>>
>>> > type to optimise for, rather than just guess of the extension. knowing
>>> what
>>>
>>> > the v1 and maybe v2 schema offsets would save that GET request on the
>>>
>>> > footter. AAL does a GET of a range at the bottom with the goal of
>>> including
>>>
>>> > the schema, but knowing the exact range would be better.
>>>
>>> >
>>>
>>> > *> BTW, we have one-off community syncs about specific topics, I would
>>> be
>>>
>>> > interested to talk more about this as well as other FileIOs. We use the
>>>
>>> > "Iceberg Dev Events" calendar for scheduling if there's interest.*
>>>
>>> >
>>>
>>> > I"d like that too.
>>>
>>> >
>>>
>>> >
>>>
>>> > >
>>>
>>> >
>>>
>>

Reply via email to