Thanks for identifying this issue and bringing it up here Cheng, that's really appreciated! @Russell Spitzer <russell.spit...@gmail.com> I do think it is an issue, if I understand the issue correctly, there are certain cases in 1.14.3 where we may not be reading the dictionary filter in its entirety, leading to correctness issues/data loss Concretely, I see we'd exercise that path in ParquetDictionaryRowGroupFilter when doing dictionaries.readDictionaryPage and in ParquetUtil#readDictionaryPage but someone else please double check my read of it.
I'd advocate for going back to Parquet 1.13.1 and cutting another release? Thanks, Amogh Jahagirdar On Mon, Nov 4, 2024 at 8:57 AM Russell Spitzer <russell.spit...@gmail.com> wrote: > We are currently including 1.14.3 as a build dependency, is that an issue? > > On Sun, Nov 3, 2024 at 12:47 PM Cheng Pan <pan3...@gmail.com> wrote: > >> FYI, I just identified a Parquet data loss issue(newly introduced in >> 1.14.0), and I confirmed it affects the Spark use cases, I’m not sure if it >> also affects Iceberg cases. >> >> https://github.com/apache/parquet-java/issues/3040 >> >> Thanks, >> Cheng Pan >> >> >> >> On Oct 31, 2024, at 06:06, Russell Spitzer <russell.spit...@gmail.com> >> wrote: >> >> Hey Y'all, >> >> I propose that we release the following RC as the official Apache Iceberg >> 1.7.0 release. >> >> The commit ID is 91e04c9c88b63dc01d6c8e69dfdc8cd27ee811cc >> * This corresponds to the tag: apache-iceberg-1.7.0-rc0 >> * https://github.com/apache/iceberg/commits/apache-iceberg-1.7.0-rc0 >> * >> https://github.com/apache/iceberg/tree/91e04c9c88b63dc01d6c8e69dfdc8cd27ee811cc >> >> The release tarball, signature, and checksums are here: >> * https://dist.apache.org/repos/dist/dev/iceberg/apache-iceberg-1.7.0-rc0 >> >> You can find the KEYS file here: >> * https://dist.apache.org/repos/dist/dev/iceberg/KEYS >> >> Convenience binary artifacts are staged on Nexus. The Maven repository >> URL is: >> * https://repository.apache.org/content/repositories/orgapacheiceberg- >> <ID>/ >> >> Please download, verify, and test. >> >> Please vote in the next 72 hours. >> >> [ ] +1 Release this as Apache Iceberg 1.7.0 >> [ ] +0 >> [ ] -1 Do not release this because... >> >> Only PMC members have binding votes, but other community members are >> encouraged to cast >> non-binding votes. This vote will pass if there are 3 binding +1 votes >> and more binding >> +1 votes than -1 votes. >> >> >>