What about JDK 8? If I remember Spark 2 was holding us, do we want to consider switching to JDK 11 for releases?
- Anton > On Apr 20, 2023, at 2:10 AM, Driesprong, Fokko <fo...@driesprong.frl> wrote: > > Thanks all for the response, much appreciated. > > That said, I'd love to hear from more people on this. I think it would be > great to drop support, but I don't know how many people still use it. Is > upgrading Hadoop a good reason to drop support for an engine? Hadoop seems > like a minor concern to me unless it is blocking something. > > I noticed that we needed to bump Hadoop when we wanted to upgrade to Parquet > 1.13.0 <https://github.com/apache/iceberg/pull/7301>. It would be nice to get > this in since it allows for removing a workaround from the Iceberg codebase > (see PR for details). > > Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are actively migrating > to Spark-3.x and Iceberg 1.1 (or later). I do not anticipate us using > Spark-2.4.4 with newer versions of Iceberg (>0.9). > > For Spark 2.4 Iceberg up to 1.2.1 is available: > https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-2.4 > <https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-2.4> > > As for the Hadoop upgrade, I think that could be problematic for us if > there's any non-backwards compatible API change required at compile time > since we're still running a 2.8.x version. > > Thanks for raising this. I took some time today to dig into this. There is an > effort to upgrade Hadoop <https://github.com/apache/iceberg/pull/5024> in > Iceberg, but that's stuck on incompatibilities with Tez. Unfortunately, > Parquet 1.13.0 > <https://github.com/apache/iceberg/actions/runs/4740904793/jobs/8417296190?pr=7301> > doesn't compile against Hadoop 2.8.5 and also bringing back support Hadoop > 2.8.x is going to be hard <https://github.com/apache/parquet-mr/pull/1075>. > For Parquet, I've created a PR to run the CI against Hadoop 2.9.2 > <https://github.com/apache/parquet-mr/pull/1076> so we know when we're > breaking compatibility. > > TLDR: It looks like if we want to upgrade Parquet, and other libraries in the > future, we need to drop Hadoop 2. I'm hesitant to do that right now because > we might exclude users that are still on older versions of Hadoop (such as > Airbnb). Spark has announced that Spark 3.5 Hadoop 2 will be dropped > <https://lists.apache.org/thread/vr6bx2bmkgo4mhdspjm9g29h2c3lmrrz>. I'll > create a PR for removing Spark 2.4 shortly because I see a consensus for > removing that. > > Kind regards, > Fokko > > Op wo 19 apr 2023 om 19:02 schreef Anton Okolnychyi > <aokolnyc...@apple.com.invalid>: > Yes, yes, yes! > > - Anton > >> On Apr 19, 2023, at 8:17 AM, Ryan Blue <b...@tabular.io >> <mailto:b...@tabular.io>> wrote: >> >> Sounds like we have consensus for removing Spark 2.4. >> >> Thanks, everyone! >> >> On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat <ajanthab...@gmail.com >> <mailto:ajanthab...@gmail.com>> wrote: >> +1, >> Spark-2.4 has reached EOL >> (https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb >> <https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb> and >> https://spark.apache.org/versioning-policy.html >> <https://spark.apache.org/versioning-policy.html>) >> >> Thanks, >> Ajantha >> >> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez >> <edgar.rodrig...@airbnb.com.invalid >> <mailto:edgar.rodrig...@airbnb.com.invalid>> wrote: >> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to Spark >> 3.x, if not already moved. >> >> As for the Hadoop upgrade, I think that could be problematic for us if >> there's any non-backwards compatible API change required at compile time >> since we're still running a 2.8.x version. >> >> Cheers, >> >> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang <hongyue_zh...@apple.com.invalid >> <mailto:hongyue_zh...@apple.com.invalid>> wrote: >> +1 for dropping Spark 2.4 support and we can clean up doc as well such as >> https://iceberg.apache.org/docs/latest/spark-queries/#spark-24 >> <https://iceberg.apache.org/docs/latest/spark-queries/#spark-24> >> >> Thanks, >> Steve Zhang >> >> >> >>> On Apr 13, 2023, at 12:53 PM, Jack Ye <yezhao...@gmail.com >>> <mailto:yezhao...@gmail.com>> wrote: >>> >>> +1 for dropping 2.4 support >>> >> >> >> >> -- >> Edgar R >> Data Warehouse Infrastructure >> >> >> -- >> Ryan Blue >> Tabular >