Great question. I don't have a good idea of who is on JDK 8 still. Maybe we should start another thread?
On Thu, Apr 20, 2023 at 1:05 PM Anton Okolnychyi <aokolnyc...@apple.com.invalid> wrote: > What about JDK 8? If I remember Spark 2 was holding us, do we want to > consider switching to JDK 11 for releases? > > - Anton > > On Apr 20, 2023, at 2:10 AM, Driesprong, Fokko <fo...@driesprong.frl> > wrote: > > Thanks all for the response, much appreciated. > > That said, I'd love to hear from more people on this. I think it would be >> great to drop support, but I don't know how many people still use it. Is >> upgrading Hadoop a good reason to drop support for an engine? Hadoop seems >> like a minor concern to me unless it is blocking something. > > > I noticed that we needed to bump Hadoop when we wanted to upgrade to > Parquet 1.13.0 <https://github.com/apache/iceberg/pull/7301>. It would be > nice to get this in since it allows for removing a workaround from the > Iceberg codebase (see PR for details). > > Netflix is still on Spark-2.4.4 with Iceberg-0.9. We are >> actively migrating to Spark-3.x and Iceberg 1.1 (or later). I do not >> anticipate us using Spark-2.4.4 with newer versions of Iceberg (>0.9). > > > For Spark 2.4 Iceberg up to 1.2.1 is available: > https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark-2.4 > > As for the Hadoop upgrade, I think that could be problematic for us if >> there's any non-backwards compatible API change required at compile time >> since we're still running a 2.8.x version. > > > Thanks for raising this. I took some time today to dig into this. There is > an effort to upgrade Hadoop <https://github.com/apache/iceberg/pull/5024> in > Iceberg, but that's stuck on incompatibilities with Tez. Unfortunately, > Parquet > 1.13.0 > <https://github.com/apache/iceberg/actions/runs/4740904793/jobs/8417296190?pr=7301> > doesn't > compile against Hadoop 2.8.5 and also bringing back support Hadoop 2.8.x > is going to be hard <https://github.com/apache/parquet-mr/pull/1075>. For > Parquet, I've created a PR to run the CI against Hadoop 2.9.2 > <https://github.com/apache/parquet-mr/pull/1076> so we know when we're > breaking compatibility. > > TLDR: It looks like if we want to upgrade Parquet, and other libraries in > the future, we need to drop Hadoop 2. I'm hesitant to do that right now > because we might exclude users that are still on older versions of Hadoop > (such as Airbnb). Spark has announced that Spark 3.5 Hadoop 2 will be > dropped <https://lists.apache.org/thread/vr6bx2bmkgo4mhdspjm9g29h2c3lmrrz>. > I'll create a PR for removing Spark 2.4 shortly because I see a consensus > for removing that. > > Kind regards, > Fokko > > Op wo 19 apr 2023 om 19:02 schreef Anton Okolnychyi < > aokolnyc...@apple.com.invalid>: > >> Yes, yes, yes! >> >> - Anton >> >> On Apr 19, 2023, at 8:17 AM, Ryan Blue <b...@tabular.io> wrote: >> >> Sounds like we have consensus for removing Spark 2.4. >> >> Thanks, everyone! >> >> On Wed, Apr 19, 2023 at 12:36 AM Ajantha Bhat <ajanthab...@gmail.com> >> wrote: >> >>> +1, >>> Spark-2.4 has reached EOL ( >>> https://lists.apache.org/thread/tdk7r5gx3nwrds3fg7qmp5h2jnqgc6tb and >>> https://spark.apache.org/versioning-policy.html) >>> >>> Thanks, >>> Ajantha >>> >>> On Wed, Apr 19, 2023 at 3:52 AM Edgar Rodriguez < >>> edgar.rodrig...@airbnb.com.invalid> wrote: >>> >>>> I'm generally +1 on dropping Spark 2.4 - mostly everyone is moving to >>>> Spark 3.x, if not already moved. >>>> >>>> As for the Hadoop upgrade, I think that could be problematic for us if >>>> there's any non-backwards compatible API change required at compile time >>>> since we're still running a 2.8.x version. >>>> >>>> Cheers, >>>> >>>> On Mon, Apr 17, 2023 at 3:50 PM Steve Zhang < >>>> hongyue_zh...@apple.com.invalid> wrote: >>>> >>>>> +1 for dropping Spark 2.4 support and we can clean up doc as well such >>>>> as https://iceberg.apache.org/docs/latest/spark-queries/#spark-24 >>>>> >>>>> Thanks, >>>>> Steve Zhang >>>>> >>>>> >>>>> >>>>> On Apr 13, 2023, at 12:53 PM, Jack Ye <yezhao...@gmail.com> wrote: >>>>> >>>>> +1 for dropping 2.4 support >>>>> >>>>> >>>>> >>>> >>>> -- >>>> Edgar R >>>> Data Warehouse Infrastructure >>>> >>> >> >> -- >> Ryan Blue >> Tabular >> >> >> > -- Ryan Blue Tabular