[ 
https://issues.apache.org/jira/browse/HUDI-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451322#comment-17451322
 ] 

Manoj Govindassamy commented on HUDI-2894:
------------------------------------------

*Next Steps:*

 

 
{noformat}
Caused by: java.io.IOException: Positional read of 4387 bytes failed at offset 
0 (returned 1626)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1382)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
        at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
        at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
        at 
org.apache.hudi.io.storage.HoodieHFileReader.getRecordByKey(HoodieHFileReader.java:252)
        at 
org.apache.hudi.io.storage.HoodieFileReader.getRecordByKey(HoodieFileReader.java:49)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.readFromBaseAndMergeWithLogRecords(HoodieBackedTableMetadata.java:195)
        at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:141)
        ... 42 more{noformat}
 

 

1. Since HFile readBlockData is facing S3 IO exception and since the HFile 
being read is 14K in size and the error is thrown for the first 4K, this is 
most probably a blippy network issue. A retry should get the read for the whole 
HFile going. So, lets run with following property set to if the IO excetions 
can be retried safely. 
base.bulkload.retries.retryOnIOException : true
 

2. Run large scale + long runnung integ test to recreate this problem 
consistently and validate any potential fixes

> Metadata table read after compaction fails in S3
> ------------------------------------------------
>
>                 Key: HUDI-2894
>                 URL: https://issues.apache.org/jira/browse/HUDI-2894
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: sivabalan narayanan
>            Priority: Blocker
>             Fix For: 0.10.0
>
>
> Once compaction in metadata kicks in, future read fails( hunch is reading 
> from base hfile fails). 
>  
> {code:java}
> 21/11/30 15:35:20 INFO HoodieTableMetaClient: Loading HoodieTableMetaClient 
> from s3a://aws-logs-87995575
> 1789-us-west-1/infra-resources-dev/small/emr/home/hadoop/output
> 21/11/30 15:35:20 ERROR HoodieROTablePathFilter: Error checking path 
> :s3a://aws-logs-879955751789-us-wes
> t-1/infra-resources-dev/small/emr/home/hadoop/output/1970/01/04/135ac18a-db3f-4bc1-b376-960fd85a44c1-0_0
> -326-3529_20211130153211490.parquet, under folder: 
> s3a://aws-logs-879955751789-us-west-1/infra-resources
> -dev/small/emr/home/hadoop/output/1970/01/04
> org.apache.hudi.exception.HoodieMetadataException: Failed to retrieve files 
> in partition s3a://aws-logs-
> 879955751789-us-west-1/infra-resources-dev/small/emr/home/hadoop/output/1970/01/04
>  from metadata
>         at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:124)
>         at 
> org.apache.hudi.metadata.HoodieMetadataFileSystemView.listPartition(HoodieMetadataFileSystemV
> iew.java:65)
>         at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.lambda$ensurePartitionLoadedCor
> rectly$9(AbstractTableFileSystemView.java:290)
>         at 
> java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
>         at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.ensurePartitionLoadedCorrectly(
> AbstractTableFileSystemView.java:281)
>         at 
> org.apache.hudi.common.table.view.AbstractTableFileSystemView.getLatestBaseFiles(AbstractTabl
> eFileSystemView.java:449)
>         at 
> org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:194)
>         at 
> org.apache.spark.sql.execution.datasources.PathFilterWrapper.accept(InMemoryFileIndex.scala:1
> 65)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8(HadoopFSUtils.scala:285)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8$adapted(HadoopFSUtils.scala:285)
>         at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
>         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
>         at 
> scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
>         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
>         at 
> scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
>         at 
> scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:297)
>         at 
> scala.collection.mutable.ArrayOps$ofRef.filterImpl(ArrayOps.scala:198)
>         at scala.collection.TraversableLike.filter(TraversableLike.scala:395)
>         at scala.collection.TraversableLike.filter$(TraversableLike.scala:395)
>         at scala.collection.mutable.ArrayOps$ofRef.filter(ArrayOps.scala:198)
>         at 
> org.apache.spark.util.HadoopFSUtils$.listLeafFiles(HadoopFSUtils.scala:285)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$6(HadoopFSUtils.scala:136)
>         at scala.collection.immutable.Stream.map(Stream.scala:418)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$parallelListLeafFilesInternal$4(HadoopFSUtils.scala:126)
>         at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:863)
>         at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:863)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>         at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
>         at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>         at org.apache.spark.scheduler.Task.run(Task.scala:131)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hudi.exception.HoodieIOException: Error merging records 
> from metadata table for  1 key : 
>         at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:146)
>         at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:124)
>         at 
> org.apache.hudi.metadata.BaseTableMetadata.fetchAllFilesInPartition(BaseTableMetadata.java:184)
>         at 
> org.apache.hudi.metadata.BaseTableMetadata.getAllFilesInPartition(BaseTableMetadata.java:122)
>         ... 39 more
> Caused by: java.io.IOException: Positional read of 4387 bytes failed at 
> offset 0 (returned 1626)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1382)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
>         at 
> org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:584)
>         at 
> org.apache.hudi.io.storage.HoodieHFileReader.getRecordByKey(HoodieHFileReader.java:252)
>         at 
> org.apache.hudi.io.storage.HoodieFileReader.getRecordByKey(HoodieFileReader.java:49)
>         at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.readFromBaseAndMergeWithLogRecords(HoodieBackedTableMetadata.java:195)
>         at 
> org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:141)
>         ... 42 more
> 21/11/30 15:35:20 ERROR Executor: Exception in task 6.0 in stage 587.0 (TID 
> 6817)
> org.apache.hudi.exception.HoodieException: Error checking path 
> :s3a://aws-logs-879955751789-us-west-1/infra-resources-dev/small/emr/home/hadoop/output/1970/01/04/135ac18a-db3f-4bc1-b376-960fd85a44c1-0_0-326-3529_20211130153211490.parquet,
>  under folder: 
> s3a://aws-logs-879955751789-us-west-1/infra-resources-dev/small/emr/home/hadoop/output/1970/01/04
>         at 
> org.apache.hudi.hadoop.HoodieROTablePathFilter.accept(HoodieROTablePathFilter.java:230)
>         at 
> org.apache.spark.sql.execution.datasources.PathFilterWrapper.accept(InMemoryFileIndex.scala:165)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8(HadoopFSUtils.scala:285)
>         at 
> org.apache.spark.util.HadoopFSUtils$.$anonfun$listLeafFiles$8$adapted(HadoopFSUtils.scala:285)
>         at 
> scala.collection.TraversableLike.$anonfun$filterImpl$1(TraversableLike.scala:304)
>         at 
> scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) 
> {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to