parisni opened a new issue, #5767:
URL: https://github.com/apache/hudi/issues/5767

   hudi 0.11.0
   spark 3.2.1
   
   I turned off automatic cleaning because its timing is linearly increasing. 
When I get >= 50 s3 logs files to merge in then I get a s3 timeout on the file. 
That file exists and is not corrupted.
   
   ```
                     {"hoodie.clean.async", "false"}, 
                     {"hoodie.clean.automatic", "false"},
   ````
   
   ```
   32269086 [Driver] INFO  
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Scanning log 
file 
HoodieLogFile{pathStr='s3a://path_to_table/.hoodie/metadata/files/.files-0000_00000000000000.log.47_6-1125-177677',
 fileLen=-1}
   32269086 [Driver] INFO  org.apache.hadoop.fs.s3a.S3AInputStream  - Switching 
to Random IO seek policy
   32269108 [Driver] INFO  
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Reading a 
data block from file 
s3a://path_to_table/.hoodie/metadata/files/.files-0000_00000000000000.log.47_6-1125-177677
 at instant 20220606083314495
   32269108 [Driver] INFO  
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Merging the 
final data blocks
   32269108 [Driver] INFO  
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Number of 
remaining logblocks to merge 48
   32269129 [Driver] INFO  
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Number of 
remaining logblocks to merge 47
   32269178 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269178 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269178 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269178 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269178 [Driver] INFO  
org.apache.hudi.common.util.collection.ExternalSpillableMap  - Estimated 
Payload size => 5128
   32269178 [Driver] INFO  
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Number of 
remaining logblocks to merge 46
   32269226 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269226 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269226 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   32269226 [Driver] INFO  org.apache.hadoop.io.compress.CodecPool  - Got 
brand-new decompressor [.gz]
   
   ....
   
   35354466 [Driver] ERROR 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader  - Got exception 
when reading log file
   org.apache.hudi.exception.HoodieIOException: unable to initialize read with 
log file
           at 
org.apache.hudi.common.table.log.HoodieLogFormatReader.hasNext(HoodieLogFormatReader.java:113)
           at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:223)
           at 
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:192)
           at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:110)
           at 
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:103)
           at 
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.<init>(HoodieMetadataMergedLogRecordReader.java:63)
           at 
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.<init>(HoodieMetadataMergedLogRecordReader.java:51)
           at 
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader$Builder.build(HoodieMetadataMergedLogRecordReader.java:230)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:499)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:461)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:407)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$10(HoodieBackedTableMetadata.java:393)
           at 
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:393)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:202)
           at java.util.HashMap.forEach(HashMap.java:1290)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:200)
           at 
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:140)
           at 
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281)
           at 
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111)
           at 
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
           at 
org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
           at 
org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
           at 
org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
           at 
org.apache.hudi.BaseHoodieTableFileIndex.<init>(BaseHoodieTableFileIndex.java:139)
           at 
org.apache.hudi.SparkHoodieTableFileIndex.<init>(SparkHoodieTableFileIndex.scala:68)
           at 
org.apache.hudi.HoodieFileIndex.<init>(HoodieFileIndex.scala:81)
           at 
org.apache.hudi.HoodieBaseRelation.fileIndex$lzycompute(HoodieBaseRelation.scala:191)
           at 
org.apache.hudi.HoodieBaseRelation.fileIndex(HoodieBaseRelation.scala:189)
           at 
org.apache.hudi.BaseFileOnlyRelation.toHadoopFsRelation(BaseFileOnlyRelation.scala:146)
           at 
org.apache.hudi.DefaultSource.resolveBaseFileOnlyRelation(DefaultSource.scala:228)
           at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:113)
           at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:66)
           at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
           at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
           at 
org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:243)
           at scala.Option.map(Option.scala:230)
           at 
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to