Hi Evelina, Please verify if following compatibility and support matrix helps:
Spark Scala Version Compatibility Matrix: https://community.cloudera.com/t5/Community-Articles/Spark-Scala-Version-Compatibility-Matrix/ta-p/383713 Spark and Java versions Supportability Matrix: https://community.cloudera.com/t5/Community-Articles/Spark-and-Java-versions-Supportability-Matrix/ta-p/383669 Spark Python Supportability Matrix: https://community.cloudera.com/t5/Community-Articles/Spark-Python-Supportability-Matrix/ta-p/379144 regards, Guru On Wed, Oct 30, 2024 at 10:03 PM Evelina Dumitrescu <evelina.dumitrescu....@gmail.com> wrote: > > Yes, we use org.apache.hbase.connectors.spark:hbase-spark:1.0.0.7.2.16.0-287 > > În mie., 30 oct. 2024 la 15:30, Gurunandan <gurunandan....@gmail.com> a scris: >> >> Hi Evelina, >> Do you use Spark HBase Connector ( hbase-spark ) as part of the unit-test >> setup? >> >> regards, >> Guru >> >> On Wed, Oct 30, 2024 at 5:35 PM Evelina Dumitrescu >> <evelina.dumitrescu....@gmail.com> wrote: >> > >> > Hello, >> > >> > TLDR; The question is asked also here: >> > https://stackoverflow.com/questions/79139516/incompatible-configuration-used-between-spark-and-hbasetestingutility >> > >> > We are using the MiniDFSCluster and MiniHbaseCluster from >> > HBaseTestingUtility to run unit tests for our Spark jobs. >> > The Spark configuration that we use is : >> > >> > conf.set("spark.sql.catalogImplementation", "hive") >> > .set("spark.sql.warehouse.dir", getWarehousePath) >> > .set("javax.jdo.option.ConnectionURL", >> > s"jdbc:derby:;databaseName=$getMetastorePath;create=true") >> > .set("shark.test.data.path", dataFilePath) >> > .set("hive.exec.dynamic.partition.mode", "nonstrict") >> > .set("spark.kryo.registrator", "CustomKryoRegistrar") >> > .set("spark.serializer", >> > "org.apache.spark.serializer.KryoSerializer") >> > >> > .registerKryoClasses(Array(classOf[org.apache.hadoop.hbase.client.Result])) >> > >> > For the MiniDFSCluster and MiniHbaseCluster we use the default >> > HbaseTestingUtility configuration. >> > The release versions that we use are: >> > - hbase-testing-util Cloudera CDP 2.4.6.7.2.16.0-287 >> > - Spark 2.11 >> > >> > >> > >> > In our unit tests, when we try to run a Spark job that reads Hive data, we >> > get the following exception: >> > >> > >> > ``` >> > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 >> > in stage 14.0 failed 1 times, most recent failure: Lost task 0.0 in stage >> > 14.0 (TID 14, localhost, executor driver): java.lang.Unsuppo >> > rtedOperationException: Byte-buffer read unsupported by >> > org.apache.hadoop.fs.BufferedFSInputStream >> > >> > at >> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:158) >> > >> > at >> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:154) >> > >> > at >> > org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81) >> > >> > at >> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90) >> > >> > at >> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75) >> > >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:546) >> > >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:516) >> > >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:510) >> > >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:459) >> > >> > at >> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:371) >> > >> > at >> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.footerFileMetaData$1(ParquetFileFormat.scala:370) >> > >> > at >> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:374) >> > >> > at >> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:352) >> > >> > at >> > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) >> > >> > at >> > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) >> > >> > at >> > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101) >> > >> > at >> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown >> > Source) >> > >> > at >> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown >> > Source) >> > >> > at >> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) >> > >> > at >> > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:645) >> > >> > at >> > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:270) >> > >> > at >> > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:262) >> > >> > at >> > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858) >> > >> > at >> > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858) >> > >> > at >> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) >> > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) >> > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) >> > >> > at >> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) >> > >> > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346) >> > >> > at org.apache.spark.rdd.RDD.iterator(RDD.scala:310) >> > >> > at >> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) >> > >> > at org.apache.spark.scheduler.Task.run(Task.scala:123) >> > >> > at >> > org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:456) >> > >> > at >> > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334) >> > >> > at >> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:462) >> > >> > at >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> > >> > at >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> > >> > at java.lang.Thread.run(Thread.java:750) >> > >> > >> > Driver stacktrace: >> > at >> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1935) >> > at >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1923) >> > at >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1922) >> > at >> > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) >> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) >> > at >> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1922) >> > at >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953) >> > at >> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953) >> > at scala.Option.foreach(Option.scala:257) >> > at >> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:953) >> > ... >> > Cause: java.lang.UnsupportedOperationException: Byte-buffer read >> > unsupported by org.apache.hadoop.fs.BufferedFSInputStream >> > at >> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:158) >> > at >> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:154) >> > at >> > org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81) >> > at >> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90) >> > at >> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75) >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:546) >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:516) >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:510) >> > at >> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:459) >> > at >> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:371) >> > ``` >> > >> > >> > Is there an incompatible configuration used between Spark, MiniDFSCluster >> > and MiniHbaseCluster ? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org