[ 
https://issues.apache.org/jira/browse/KAFKA-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17930807#comment-17930807
 ] 

Hasil Sharma commented on KAFKA-18753:
--------------------------------------

> The remote index cache size is set to 512MB. We begun with default 1 GB 
> though that resulted in too many files and kafka started to run out of file 
> descriptors.

We increased the cache size to 1GB and that helps with reducing the frequency 
though does not stop the error. Could there be a potential race condition 
between remote index cache purge and attempt to read the index as part of ~some 
command?

 

We re-ran the kafka process with additional -XX flags to identify the exact 
line which resulted in the fatal error and found below -
{code:java}
J 6032 c2 java.nio.DirectByteBuffer.getInt(I)I java.base@17.0.14 (28 bytes) @ 
0x00007927ad2f80f1 [0x00007927ad2f80a0+0x0000000000000051]
j  
org.apache.kafka.storage.internals.log.OffsetIndex.relativeOffset(Ljava/nio/ByteBuffer;I)I+5
j  
org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/OffsetPosition;+11
j  
org.apache.kafka.storage.internals.log.OffsetIndex.parseEntry(Ljava/nio/ByteBuffer;I)Lorg/apache/kafka/storage/internals/log/IndexEntry;+3
j  
org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I+30
j  
org.apache.kafka.storage.internals.log.AbstractIndex.indexSlotRangeFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;)I+126
j  
org.apache.kafka.storage.internals.log.AbstractIndex.smallestUpperBoundSlotFor(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;)I+8
j  
org.apache.kafka.storage.internals.log.OffsetIndex.lambda$fetchUpperBoundOffset$2(Lorg/apache/kafka/storage/internals/log/OffsetPosition;I)Ljava/util/Optional;+20
J 36910 c2 
kafka.log.remote.RemoteLogManager.read(Lorg/apache/kafka/storage/internals/log/RemoteStorageFetchInfo;)Lorg/apache/kafka/storage/internals/log/FetchDataInfo;
 (624 bytes) @ 0x00007927af7d1190 [0x00007927af7cff60+0x0000000000001230]
J 37034 c2 kafka.log.remote.RemoteLogReader.call()Ljava/lang/Void; (262 bytes) 
@ 0x00007927af82a2e4 [0x00007927af82a1a0+0x0000000000000144]
J 27891% c2 
java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
 java.base@17.0.14 (187 bytes) @ 0x00007927ae93cf4c 
[0x00007927ae93c740+0x000000000000080c]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@17.0.14
j  java.lang.Thread.run()V+11 java.base@17.0.14
v  ~StubRoutines::call_stub
V  [libjvm.so+0x85aed4]  JavaCalls::call_helper(JavaValue*, methodHandle 
const&, JavaCallArguments*, JavaThread*)+0x334
V  [libjvm.so+0x85c9bc]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, 
Symbol*, Symbol*, JavaThread*)+0x20c
V  [libjvm.so+0x91ce50]  thread_entry(JavaThread*, JavaThread*)+0x70
V  [libjvm.so+0xee3e37]  JavaThread::run()+0x127
V  [libjvm.so+0xee6f61]  Thread::call_run()+0xa1
V  [libjvm.so+0xc54d33]  thread_native_entry(Thread*)+0xe3
C  [libc.so.6+0x9caa4]
 {code}
 

 

Attached the in-depth error log as part of hs_err_pid1507409-redacted.log file.

 

 

> Enabling S3 Tiered Storage Causes: A fatal error has been detected by the 
> Java Runtime Environment
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-18753
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18753
>             Project: Kafka
>          Issue Type: Bug
>          Components: Tiered-Storage
>    Affects Versions: 3.8.1
>         Environment: Current:
> Linux 6.8.0-1021-aws #23-Ubuntu SMP Mon Dec  9 23:59:34 UTC 2024 x86_64 
> x86_64 x86_64 GNU/Linux
> OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7) (build 
> 17.0.14+7-LTS)
>            Reporter: Hasil Sharma
>            Priority: Major
>         Attachments: hs_err_pid1507409-redacted.log, hs_err_pid2775295 - 
> redacted full.log
>
>
> Allowing brokers to upload to S3 as part of S3 Tiered Storage rollout 
> (occasionally) results in errors shaped as below:
> {code:java}
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x000075a38ea42564, pid=2775295, tid=2901446
> #
> # JRE version: OpenJDK Runtime Environment Corretto-17.0.14.7.1 (17.0.14+7) 
> (build 17.0.14+7-LTS)
> # Java VM: OpenJDK 64-Bit Server VM Corretto-17.0.14.7.1 (17.0.14+7-LTS, 
> mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, 
> linux-amd64)
> # Problematic frame:
> # J 26432 c2 
> org.apache.kafka.storage.internals.log.AbstractIndex.binarySearch(Ljava/nio/ByteBuffer;JLorg/apache/kafka/storage/internals/log/IndexSearchType;Lorg/apache/kafka/storage/internals/log/AbstractIndex$SearchResultType;II)I
>  (161 bytes) @ 0x000075a38ea42564 [0x000075a38ea421c0+0x00000000000003a4]
> #
> # Core dump will be written. Default location: Core dumps may be processed 
> with "/usr/local/bin/crash-handler -b '%e' -m 1 -d /pay/crash -p '%u.%p.%t' 
> -P '%P'" (or dumping to 
> /pay/deploy/kafka-brokers-kafkapub-northwest-green/deploy-1737677684489251978/core.2775295)
> #
> # If you would like to submit a bug report, please visit:
> #   https://github.com/corretto/corretto-17/issues/
> # {code}
>  
> We ran into similar error with jdk11 and upgraded to jdk17, though the error 
> has not stopped.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to