hadoop 3.2.x is the oldest of the hadoop branch 3 branches which gets active security patches, as was done last month. I would strongly recommend using it unless there are other compatibility issues (hive?)
On Tue, 14 Jun 2022 at 05:31, Pralabh Kumar <pralabhku...@gmail.com> wrote: > Hi Steve / Dev team > > Thx for the help . Have a quick question , How can we fix the above error > in Hadoop 3.1 . > > - Spark docker file have (Java 11) > > https://github.com/apache/spark/blob/branch-3.2/resource-managers/kubernetes/docker/src/main/dockerfiles/spark/Dockerfile > > - Now if we build Spark32 , Spark image will be having Java 11 . If > we run on a Hadoop version less than 3.2 , it will throw an exception. > > > > - Should there be a separate docker file for Spark32 for Java 8 for > Hadoop version < 3.2 . Spark 3.0.1 have Java 8 in docker file which works > fine in our environment (with Hadoop3.1) > > > Regards > Pralabh Kumar > > > > On Mon, Jun 13, 2022 at 3:25 PM Steve Loughran <ste...@cloudera.com> > wrote: > >> >> >> On Mon, 13 Jun 2022 at 08:52, Pralabh Kumar <pralabhku...@gmail.com> >> wrote: >> >>> Hi Dev team >>> >>> I have a spark32 image with Java 11 (Running Spark on K8s) . While >>> reading a huge parquet file via spark.read.parquet("") . I am getting >>> the following error . The same error is mentioned in Spark docs >>> https://spark.apache.org/docs/latest/#downloading but w.r.t to apache >>> arrow. >>> >>> >>> - IMHO , I think the error is coming from Parquet 1.12.1 which is >>> based on Hadoop 2.10 which is not java 11 compatible. >>> >>> >> correct. see https://issues.apache.org/jira/browse/HADOOP-12760 >> >> >> Please let me know if this understanding is correct and is there a way to >>> fix it. >>> >> >> >> >> upgrade to a version of hadoop with the fix. That's any version >= hadoop >> 3.2.0 which shipped since 2018 >> >>> >>> >>> java.lang.NoSuchMethodError: 'sun.misc.Cleaner >>> sun.nio.ch.DirectBuffer.cleaner()' >>> >>> at >>> org.apache.hadoop.crypto.CryptoStreamUtils.freeDB(CryptoStreamUtils.java:41) >>> >>> at >>> org.apache.hadoop.crypto.CryptoInputStream.freeBuffers(CryptoInputStream.java:687) >>> >>> at >>> org.apache.hadoop.crypto.CryptoInputStream.close(CryptoInputStream.java:320) >>> >>> at java.base/java.io.FilterInputStream.close(Unknown Source) >>> >>> at >>> org.apache.parquet.hadoop.util.H2SeekableInputStream.close(H2SeekableInputStream.java:50) >>> >>> at >>> org.apache.parquet.hadoop.ParquetFileReader.close(ParquetFileReader.java:1299) >>> >>> at >>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:54) >>> >>> at >>> org.apache.spark.sql.execution.datasources.parquet.ParquetFooterReader.readFooter(ParquetFooterReader.java:44) >>> >>> at >>> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readParquetFootersInParallel$1(ParquetFileFormat.scala:467) >>> >>> at >>> org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372) >>> >>> at >>> scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) >>> >>> at scala.util.Success.$anonfun$map$1(Try.scala:255) >>> >>> at scala.util.Success.map(Try.scala:213) >>> >>> at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) >>> >>> at >>> scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) >>> >>> at >>> scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) >>> >>> at >>> scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) >>> >>> at >>> java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(Unknown >>> Source) >>> >>> at >>> java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) >>> >>> at >>> java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown >>> Source) >>> >>> at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown >>> Source) >>> >>> at >>> java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) >>> >>> at >>> java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) >>> >>