jtmzheng commented on issue #2995:
URL: https://github.com/apache/hudi/issues/2995#issuecomment-853465482


   Update, I was able to run the metadata commands from the CLI and check 
`list-partitions` and `list-files`. It seems like every partition is there, but 
spot checking some partitions its clear not all the files are there:
   
   ```
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of log files 
scanned => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: MaxMemoryInBytes 
allowed for compaction => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of entries 
in MemoryBasedMap in ExternalSpillableMap => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Total size in bytes 
of MemoryBasedMap in ExternalSpillableMap => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Number of entries 
in DiskBasedMap in ExternalSpillableMap => 0
   21/06/03 00:10:32 INFO log.HoodieMergedLogRecordScanner: Size of file 
spilled to disk => 0
   21/06/03 00:10:32 INFO metadata.HoodieBackedTableMetadata: Opened metadata 
log files from [] at instant 20210524060712(dataset instant=20210524060712, 
metadata instant=20201216222013)
   21/06/03 00:10:32 INFO compress.CodecPool: Got brand-new decompressor [.gz]
   21/06/03 00:10:32 INFO metadata.HoodieBackedTableMetadata: Metadata read for 
key 2020/9/4 took [open, baseFileRead, logMerge] [94, 71, 0] ms
   21/06/03 00:10:32 INFO metadata.BaseTableMetadata: Listed file in partition 
from metadata: partition=2020/9/4, #files=121
   
   
.f8a8f054-6d0e-43e8-9412-3dfec79d7d53-0_20210509151344.log.1_10765-6377-69863781
           
.f75e9845-a3a2-4b41-b59a-8effc2ee049a-0_20210429061318.log.1_12603-1679-22268308
           
.f745a448-31fa-4a95-a3c4-5f88f6c95bb2-0_20210515000505.log.2_11136-1920-26157888
           
.f745a448-31fa-4a95-a3c4-5f88f6c95bb2-0_20210515000505.log.1_12428-158-1954195
   ...
   ```
   
   Listing shows 1,539 files under that partition.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to