>
> I think the idea is quite neat -- as I understand your PR basically
> implements a change to the parquet writer that can efficiently detect
> duplication in the data and thus avoid storing it multiple times. Thank you
> for sharing it
I might be misunderstanding (only looked at code briefly)
I think the idea is quite neat -- as I understand your PR basically
implements a change to the parquet writer that can efficiently detect
duplication in the data and thus avoid storing it multiple times. Thank you
for sharing it
One comment I have is that I found the name "Content Defined Chunking