It’s not required at compile time, only at test runtime.

On Thu, Jun 12, 2025 at 8:37 PM Ajantha Bhat <ajanthab...@gmail.com> wrote:

> > All we have to do is add the parquet module as a test dependency,
> working on a poc now.
>
> This will be a circular dependency on the core module. That's why I
> suggested abstracting out the test cases and executing them in a parquet
> module. Partition stats writing (as parquet) from the core module uses
> `InternalData` and does the same now. So, I guess it will be a similar work
> (but on a larger scale due to testcase refactoring).
>
> Let me know the results of your POC and happy to collaborate on this work.
>
>
> - Ajantha
>
> On Fri, Jun 13, 2025 at 3:16 AM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> All we have to do is add the parquet module as a test dependency, working
>> on a poc now. I don't think we really need to block on any other projects
>> although I'll probably hold off on any work on manifest-list since I hope
>> it won't be needed.
>>
>> On Thu, May 29, 2025 at 8:37 PM Ajantha Bhat <ajanthab...@gmail.com>
>> wrote:
>>
>>> I am interested in working on this proposal.
>>> I would assume it is to use `InternalData` with the format as
>>> `parquet`. But the challenge will be the test cases, the core module cannot
>>> write the parquet metadata due to circular dependency. We need to abstract
>>> out the test cases in the core module and run them from the parquet module
>>> I guess.
>>>
>>> I can work on a design doc as well. So, add me as a collaborator for the
>>> document.
>>> But should this work be done after we complete the work on "single file
>>> commit in v4" ? because metadata structure can change?
>>>
>>> - Ajantha
>>>
>>> On Thu, May 29, 2025 at 11:37 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> Hi Y'all
>>>>
>>>> As discussed in the last community sync, we are beginning to gather up
>>>> folks who are interested in various efforts for Iceberg V4. To that end,
>>>> I'd like to use this thread as a gathering point for folks
>>>> interested in the metadata file format shift to Parquet. I wrote a quick
>>>> abstract to
>>>> describe the purpose of this group.
>>>>
>>>> Following this I'll be working on a full design document or if someone
>>>> has one in prod please let us know and we can start discussing/working on
>>>> it there.
>>>>
>>>> *Abstract: Parquet as Metadata File Format*
>>>>
>>>> Currently the Iceberg SDK and Spec use Avro file format files for all
>>>> Manifest Lists and Manifests. The row oriented format was selected
>>>> because it was assumed that most metadata would be read in its
>>>> entirety. This has turned out to seldom be the case and the ability to read
>>>> single elements of the metrics would be very useful for query planning.
>>>> To address this we propose switching the underlying manifest format
>>>> from Avro to Parquet. In V4, Avro files would still be readable but all
>>>> new metadata files would be written in Parquet instead of Avro.
>>>>
>>>

Reply via email to