Re: [DISCUSS] Iceberg Variant - Tracking Document & Sync Proposal

Neelesh Salian Thu, 30 Apr 2026 13:37:46 -0700

Hi folks,

I've set up a time starting next week on Thursday (May 7, 2026) at 10 am
Pacific time for a sync for the active work on Variant.
This will be a monthly sync (on the first Thursday of every month).
You can find it on the dev calendar.
Here is the calendar invite: https://calendar.app.google/b8ykdTV3EaNnVnkv8
I'll be recording the call and capturing notes in the sync document: Iceberg
- Variant Community Update
<https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing>
(Meeting
Notes tab).
Thanks.







On Mon, Apr 20, 2026 at 1:49 PM Steve Loughran <[email protected]> wrote:

> + regarding the rust, go and cpp impls, a status from each team would be
> great!
>
> I've been reviewing arrow parquet variant stuff and it is all there,
> including with some benchmarks and optimisations. Which may put it ahead of
> the others.
>
> It also has some special handling for sorted variants, as key search there
> is straightforward. AFAIK I don't think the others do that, and nor do I
> see them going to any effort to sort fields in an object. I think sorting
> would be good, but you would have to handle the case where there are
> duplicate keys. It's allowed in the spec, and seems like itcould creep in
> from nested variants. Has anyone looked at this?
>
> Also: has anyone created malformed parquet files with a shredded variant
> and a metadata entry of the same name. The requirement is "ignore the
> metadata one", but that's something to test. You'd have to write a shredded
> file and then edit the binary content to achieve this, or manually create
> one and put it into the parquet-testing repository under bad-data/
>
>
> On Mon, 20 Apr 2026 at 19:08, Qiegang Long <[email protected]> wrote:
>
>> Thanks for the doc to track the status! +1 on the dedicated
>> sync—definitely feels like there’s a lot of work before we see Variant’s
>> full potential.
>>
>> Qiegang
>>
>> On Mon, Apr 20, 2026 at 11:09 AM Steve Loughran <[email protected]>
>> wrote:
>>
>>>
>>> This is great, we need that tracker as it is cross-project. piece of
>>> work to say "this is readly
>>>
>>> I did have an agenda item from last month's community call which didn't
>>> get through. If we can retain that open time slot we could do a very quick
>>> summary of where we are (summarly slides of Qiegang's results and mine, key
>>> outstanding issues and next steps, then we can start that monthly session
>>> on it.
>>>
>>> Meanwhile, I have both parquet and iceberg PRs for benchmarks which I
>>> think are ready for review -please take a look
>>>
>>> Finally, I'm thinking about interop of those many, many variant readers
>>> out there. Has anyone explicitly cross-tested their implementations of
>>> variant? what about consistent handling of invalid data? That includes
>>> iceberg-rust, parquet-cpp and more...
>>>
>>> Steve
>>>
>>> On Sun, 19 Apr 2026 at 21:57, Neelesh Salian <[email protected]>
>>> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> The Variant umbrella issue (#10392
>>>> <https://github.com/apache/iceberg/issues/10392>) hasn't been updated
>>>> in a while, and with active work happening across multiple PRs in Iceberg,
>>>> Spark, and Parquet, it's been hard to keep track of where things stand.
>>>>
>>>> Since a few of us are actively working on variant features, I thought
>>>> it would help to put together a tracking document so the community has a
>>>> single place to see the current state, open work, and benchmark findings. I
>>>> plan to update this on a weekly basis to keep track of the issues and PRs
>>>> that are updated.
>>>>
>>>> Iceberg Variant Community Document
>>>> <https://docs.google.com/document/d/1IuhLRxw1rcPD_f4jgHuGe3SwFgy7Y5wgEGvLzf6311s/edit?usp=sharing>
>>>>
>>>> The document has three tabs:
>>>>
>>>>    1. Overview - what shipped in 1.10, what's merged to main, open
>>>>    work areas, and the dependency graph across Iceberg, Spark, and Parquet
>>>>    2. Tracker - all open variant issues and PRs across Iceberg,
>>>>    Parquet-Java, Parquet-Format, and Spark with authors and status
>>>>    3. Benchmarks - summary of three independent benchmark efforts
>>>>    (details below)
>>>>
>>>> *Benchmark findings*
>>>>
>>>> Three independent benchmarks have measured variant performance. All
>>>> converge on the same picture: variant is a modest improvement over JSON
>>>> strings today (1.1-1.7x faster reads), but 15-17x slower than typed 
>>>> columns.
>>>>
>>>>    1. Qiegang Long - 14 queries on GitHub Archive, 5 configs:
>>>>    https://qlong.github.io/posts/2026-03-30-variant-early-results
>>>>    2. Steve Loughran - JMH microbenchmarks, profiler-driven
>>>>    optimization:
>>>>    
>>>> https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html
>>>>    
>>>> <https://steveloughran.github.io/benchmarking-variants/benchmarking-variants.html>
>>>>    3. Neelesh Salian - Controlled baseline, 10M+100M rows, write +
>>>>    read:
>>>>    
>>>> https://github.com/nssalian/iceberg/tree/iceberg-variant-benchmark/benchmark
>>>>
>>>> If you're working on variant-related changes, please chime in or let me
>>>> know and I'll add it to the tracker. Feedback on the benchmarks or anything
>>>> else is welcome.
>>>>
>>>> I've been giving variant updates during the Iceberg Spark Sync
>>>> (Tuesdays, 10 AM PT), but given that this work now spans Iceberg, Spark,
>>>> Parquet, and Flink, I think it deserves its own forum. I'd like to propose
>>>> a monthly Variant Sync; a short call where contributors can share progress,
>>>> surface blockers, and coordinate across repos. If there's interest, I'll
>>>> set one up and share an invite on this thread.
>>>>
>>>> Thanks,
>>>> Neelesh Salian.
>>>>
>>>

Re: [DISCUSS] Iceberg Variant - Tracking Document & Sync Proposal

Reply via email to