parisni opened a new issue, #5767:
URL: https://github.com/apache/hudi/issues/5767
hudi 0.11.0
spark 3.2.1
I turned off automatic cleaning because its timing is linearly increasing.
When I get >= 50 s3 logs files to merge in then I get a s3 timeout on the file.
That file exists and is not corrupted.
```
{"hoodie.clean.async", "false"},
{"hoodie.clean.automatic", "false"},
````
```
32269086 [Driver] INFO
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Scanning log
file
HoodieLogFile{pathStr='s3a://path_to_table/.hoodie/metadata/files/.files-0000_00000000000000.log.47_6-1125-177677',
fileLen=-1}
32269086 [Driver] INFO org.apache.hadoop.fs.s3a.S3AInputStream - Switching
to Random IO seek policy
32269108 [Driver] INFO
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Reading a
data block from file
s3a://path_to_table/.hoodie/metadata/files/.files-0000_00000000000000.log.47_6-1125-177677
at instant 20220606083314495
32269108 [Driver] INFO
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Merging the
final data blocks
32269108 [Driver] INFO
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Number of
remaining logblocks to merge 48
32269129 [Driver] INFO
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Number of
remaining logblocks to merge 47
32269178 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269178 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269178 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269178 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269178 [Driver] INFO
org.apache.hudi.common.util.collection.ExternalSpillableMap - Estimated
Payload size => 5128
32269178 [Driver] INFO
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Number of
remaining logblocks to merge 46
32269226 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269226 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269226 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
32269226 [Driver] INFO org.apache.hadoop.io.compress.CodecPool - Got
brand-new decompressor [.gz]
....
35354466 [Driver] ERROR
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader - Got exception
when reading log file
org.apache.hudi.exception.HoodieIOException: unable to initialize read with
log file
at
org.apache.hudi.common.table.log.HoodieLogFormatReader.hasNext(HoodieLogFormatReader.java:113)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:223)
at
org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:192)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:110)
at
org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:103)
at
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.<init>(HoodieMetadataMergedLogRecordReader.java:63)
at
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader.<init>(HoodieMetadataMergedLogRecordReader.java:51)
at
org.apache.hudi.metadata.HoodieMetadataMergedLogRecordReader$Builder.build(HoodieMetadataMergedLogRecordReader.java:230)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:499)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getLogRecordScanner(HoodieBackedTableMetadata.java:461)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.openReaders(HoodieBackedTableMetadata.java:407)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getOrCreateReaders$10(HoodieBackedTableMetadata.java:393)
at
java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1660)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getOrCreateReaders(HoodieBackedTableMetadata.java:393)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:202)
at java.util.HashMap.forEach(HashMap.java:1290)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:200)
at
org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:140)
at
org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281)
at
org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111)
at
org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313)
at
org.apache.hudi.BaseHoodieTableFileIndex.getAllQueryPartitionPaths(BaseHoodieTableFileIndex.java:176)
at
org.apache.hudi.BaseHoodieTableFileIndex.loadPartitionPathFiles(BaseHoodieTableFileIndex.java:219)
at
org.apache.hudi.BaseHoodieTableFileIndex.doRefresh(BaseHoodieTableFileIndex.java:264)
at
org.apache.hudi.BaseHoodieTableFileIndex.<init>(BaseHoodieTableFileIndex.java:139)
at
org.apache.hudi.SparkHoodieTableFileIndex.<init>(SparkHoodieTableFileIndex.scala:68)
at
org.apache.hudi.HoodieFileIndex.<init>(HoodieFileIndex.scala:81)
at
org.apache.hudi.HoodieBaseRelation.fileIndex$lzycompute(HoodieBaseRelation.scala:191)
at
org.apache.hudi.HoodieBaseRelation.fileIndex(HoodieBaseRelation.scala:189)
at
org.apache.hudi.BaseFileOnlyRelation.toHadoopFsRelation(BaseFileOnlyRelation.scala:146)
at
org.apache.hudi.DefaultSource.resolveBaseFileOnlyRelation(DefaultSource.scala:228)
at
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:113)
at
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:66)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:350)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274)
at
org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:243)
at scala.Option.map(Option.scala:230)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]