Re: [SPARK CORE] Incompatible configuration used between Spark and HBaseTestingUtility

Gurunandan Wed, 06 Nov 2024 06:35:15 -0800

Hi Evelina,

Please verify if following compatibility and support matrix helps:


Spark Scala Version Compatibility Matrix:
https://community.cloudera.com/t5/Community-Articles/Spark-Scala-Version-Compatibility-Matrix/ta-p/383713

Spark and Java versions Supportability Matrix:
https://community.cloudera.com/t5/Community-Articles/Spark-and-Java-versions-Supportability-Matrix/ta-p/383669

Spark Python Supportability Matrix:
https://community.cloudera.com/t5/Community-Articles/Spark-Python-Supportability-Matrix/ta-p/379144


regards,

Guru

On Wed, Oct 30, 2024 at 10:03 PM Evelina Dumitrescu
<evelina.dumitrescu....@gmail.com> wrote:
>
> Yes, we use org.apache.hbase.connectors.spark:hbase-spark:1.0.0.7.2.16.0-287
>
> În mie., 30 oct. 2024 la 15:30, Gurunandan <gurunandan....@gmail.com> a scris:
>>
>> Hi Evelina,
>> Do you use Spark HBase Connector ( hbase-spark ) as part of the unit-test 
>> setup?
>>
>> regards,
>> Guru
>>
>> On Wed, Oct 30, 2024 at 5:35 PM Evelina Dumitrescu
>> <evelina.dumitrescu....@gmail.com> wrote:
>> >
>> > Hello,
>> >
>> > TLDR; The question is asked also here:
>> > https://stackoverflow.com/questions/79139516/incompatible-configuration-used-between-spark-and-hbasetestingutility
>> >
>> > We are using the MiniDFSCluster and MiniHbaseCluster from 
>> > HBaseTestingUtility to run unit tests for our Spark jobs.
>> > The Spark configuration that we use is :
>> >
>> >     conf.set("spark.sql.catalogImplementation", "hive")
>> >           .set("spark.sql.warehouse.dir", getWarehousePath)
>> >           .set("javax.jdo.option.ConnectionURL", 
>> > s"jdbc:derby:;databaseName=$getMetastorePath;create=true")
>> >           .set("shark.test.data.path", dataFilePath)
>> >           .set("hive.exec.dynamic.partition.mode", "nonstrict")
>> >           .set("spark.kryo.registrator", "CustomKryoRegistrar")
>> >           .set("spark.serializer", 
>> > "org.apache.spark.serializer.KryoSerializer")
>> >       
>> > .registerKryoClasses(Array(classOf[org.apache.hadoop.hbase.client.Result]))
>> >
>> > For the MiniDFSCluster and MiniHbaseCluster we use the default 
>> > HbaseTestingUtility configuration.
>> > The release versions that we use are:
>> > - hbase-testing-util Cloudera CDP 2.4.6.7.2.16.0-287
>> > - Spark 2.11
>> >
>> >
>> >
>> > In our unit tests, when we try to run a Spark job that reads Hive data, we 
>> > get the following exception:
>> >
>> >
>> > ```
>> >  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
>> > in stage 14.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
>> > 14.0 (TID 14, localhost, executor driver): java.lang.Unsuppo
>> > rtedOperationException: Byte-buffer read unsupported by 
>> > org.apache.hadoop.fs.BufferedFSInputStream
>> >
>> >         at 
>> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:158)
>> >
>> >         at 
>> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:154)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:546)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:516)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:510)
>> >
>> >         at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:459)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:371)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.footerFileMetaData$1(ParquetFileFormat.scala:370)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:374)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.apply(ParquetFileFormat.scala:352)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>> >
>> >         at 
>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>> >  Source)
>> >
>> >         at 
>> > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>> >  Source)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:645)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:270)
>> >
>> >         at 
>> > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:262)
>> >
>> >         at 
>> > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
>> >
>> >         at 
>> > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
>> >
>> >         at 
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>> >
>> >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>> >
>> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>> >
>> >         at 
>> > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>> >
>> >         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
>> >
>> >         at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
>> >
>> >         at 
>> > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>> >
>> >         at org.apache.spark.scheduler.Task.run(Task.scala:123)
>> >
>> >         at 
>> > org.apache.spark.executor.Executor$TaskRunner$$anonfun$12.apply(Executor.scala:456)
>> >
>> >         at 
>> > org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334)
>> >
>> >         at 
>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:462)
>> >
>> >         at 
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> >
>> >         at 
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> >
>> >         at java.lang.Thread.run(Thread.java:750)
>> >
>> >
>> > Driver stacktrace:
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1935)
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1923)
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1922)
>> >   at 
>> > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>> >   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1922)
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953)
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:953)
>> >   at scala.Option.foreach(Option.scala:257)
>> >   at 
>> > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:953)
>> >   ...
>> >   Cause: java.lang.UnsupportedOperationException: Byte-buffer read 
>> > unsupported by org.apache.hadoop.fs.BufferedFSInputStream
>> >   at 
>> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:158)
>> >   at 
>> > org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:154)
>> >   at 
>> > org.apache.parquet.hadoop.util.H2SeekableInputStream$H2Reader.read(H2SeekableInputStream.java:81)
>> >   at 
>> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:90)
>> >   at 
>> > org.apache.parquet.hadoop.util.H2SeekableInputStream.readFully(H2SeekableInputStream.java:75)
>> >   at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:546)
>> >   at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:516)
>> >   at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:510)
>> >   at 
>> > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:459)
>> >   at 
>> > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReaderWithPartitionValues$1.footerFileMetaData$lzycompute$1(ParquetFileFormat.scala:371)
>> > ```
>> >
>> >
>> > Is there an incompatible configuration used between Spark, MiniDFSCluster 
>> > and MiniHbaseCluster ?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: [SPARK CORE] Incompatible configuration used between Spark and HBaseTestingUtility

Reply via email to