[ https://issues.apache.org/jira/browse/KAFKA-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930807#comment-17930807 ]
Hasil Sharma commented on KAFKA-18753: -------------------------------------- > The remote index cache size is set to 512MB. We begun with default 1 GB > though that resulted in too many files and kafka started to run out of file > descriptors. We increased the cache size to 1GB and that helps with reducing the frequency though does not stop the error. Could there be a potential race condition between remote index cache purge and attempt to read the index as part of ~some command? We re-ran the kafka process with additional -XX flags to identify the exact line which resulted in the fatal error and found below - {code:java} J 6032 c2 java.nio.DirectByteBuffer.getInt(I)I java.base@17.0.14 (28 bytes) @ 0x00007927ad2f80f1 [0x00007927ad2f80a0+0x0000000000000051] j org.apache.kafka.storage.internals.log.OffsetIndex.relativeOffset(Ljava/nio/ByteBuffer;I)I+5 j org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/OffsetPosition;+11 j org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/IndexEntry;+3 j org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I+30 j org.apache.kafka.storage.internals.log.AbstractIndex.indexSlotRangeFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;)I+126 j org.apache.kafka.storage.internals.log.AbstractIndex.smallestUpperBoundSlotFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;)I+8 j org.apache.kafka.storage.internals.log.OffsetIndex.lambda$fetchUpperBoundOffset$2(Lorg/apache/kafka/storage/internals/log/OffsetPosition;I)Ljava/util/Optional;+20 J 36910 c2 kafka.log.remote.RemoteLogManager.read(Lorg/apache/kafka/storage/internals/log/RemoteStorageFetchInfo;)Lorg/apache/kafka/storage/internals/log/FetchDataInfo; (624 bytes) @ 0x00007927af7d1190 [0x00007927af7cff60+0x0000000000001230] J 37034 c2 kafka.log.remote.RemoteLogReader.call()Ljava/lang/Void; (262 bytes) @ 0x00007927af82a2e4 [0x00007927af82a1a0+0x0000000000000144] J 27891% c2 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base@17.0.14 (187 bytes) @ 0x00007927ae93cf4c [0x00007927ae93c740+0x000000000000080c] j java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@17.0.14 j java.lang.Thread.run()V+11 java.base@17.0.14 v ~StubRoutines::call_stub V [libjvm.so+0x85aed4] JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x334 V [libjvm.so+0x85c9bc] JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x20c V [libjvm.so+0x91ce50] thread_entry(JavaThread*, JavaThread*)+0x70 V [libjvm.so+0xee3e37] JavaThread::run()+0x127 V [libjvm.so+0xee6f61] Thread::call_run()+0xa1 V [libjvm.so+0xc54d33] thread_native_entry(Thread*)+0xe3 C [libc.so.6+0x9caa4] {code} Attached the in-depth error log as part of hs_err_pid1507409-redacted.log file. > Enabling S3 Tiered Storage Causes: A fatal error has been detected by the > Java Runtime Environment > -------------------------------------------------------------------------------------------------- > > Key: KAFKA-18753 > URL: https://issues.apache.org/jira/browse/KAFKA-18753 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage > Affects Versions: 3.8.1 > Environment: Current: > Linux 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec 9 23:59:34 UTC 2024 x86_64 > x86_64 x86_64 GNU/Linux > OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7) (build > 17.0.14+7-LTS) > Reporter: Hasil Sharma > Priority: Major > Attachments: hs_err_pid1507409-redacted.log, hs_err_pid2775295 - > redacted full.log > > > Allowing brokers to upload to S3 as part of S3 Tiered Storage rollout > (occasionally) results in errors shaped as below: > {code:java} > # > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x000075a38ea42564, pid=2775295, tid=2901446 > # > # JRE version: OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7) > (build 17.0.14+7-LTS) > # Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (17.0.14+7-LTS, > mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, > linux-amd64) > # Problematic frame: > # J 26432 c2 > org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I > (161 bytes) @ 0x000075a38ea42564 [0x000075a38ea421c0+0x00000000000003a4] > # > # Core dump will be written. Default location: Core dumps may be processed > with "/usr/local/bin/crash-handler -b '%e' -m 1 -d /pay/crash -p '%u.%p.%t' > -P '%P'" (or dumping to > /pay/deploy/kafka-brokers-kafkapub-northwest-green/deploy-1737677684489251978/core.2775295) > # > # If you would like to submit a bug report, please visit: > # https://github.com/corretto/corretto-17/issues/ > # {code} > > We ran into similar error with jdk11 and upgraded to jdk17, though the error > has not stopped. -- This message was sent by Atlassian Jira (v8.20.10#820010)