Maurits Hartman created KAFKA-7867:
--------------------------------------

             Summary: Broker fails after corrupted page table
                 Key: KAFKA-7867
                 URL: https://issues.apache.org/jira/browse/KAFKA-7867
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 2.0.0
         Environment: openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment (build 11.0.1+13-Ubuntu-2ubuntu1)
OpenJDK 64-Bit Server VM (build 11.0.1+13-Ubuntu-2ubuntu1, mixed mode, sharing)

Ubuntu 18.10 (Cosmic)
Linux kernel 4.18.0-13
4x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
8GB RAM


            Reporter: Maurits Hartman


We noticed one of the brokers in our cluster was down. It was no longer leader 
for its partitions, nor was it following the other broker's partitions. Kafka 
was still running apparently though (the log was still being appended to).

In the systemd journal we noticed a "corrupted page table" error that happened 
a couple of hours before:
{code:java}
Jan 24 03:22:07 kafka9 kernel: kafka-request-h: Corrupted page table at address 
8402805f8 Jan 24 03:22:07 kafka9 kernel: Bad pagetable: 000d [#1] SMP PTI Jan 
24 03:22:07 kafka9 kernel: CPU: 3 PID: 3025 Comm: kafka-request-h Not tainted 
4.18.0-13-generic #14-Ubuntu Jan 24 03:22:07 kafka9 kernel: Hardware name: 
DigitalOcean Droplet, BIOS 20171212 12/12/2017 Jan 24 03:22:07 kafka9 kernel: 
RIP: 0033:0x7f8878c63f4f Jan 24 03:22:07 kafka9 kernel: Code: 00 00 0f 83 fa 05 
00 00 4d 89 9f 18 01 00 00 41 0f 0d 8b 00 01 00 00 bb a8 00 05 08 49 bb 00 00 
00 00 08 00 00 00 4d 8d 1c db <4d> 8b 9b b8 00 00 00 4c 89 18 c7 40 08 a8 00 05 
08 c7 40 0c 00 00 Jan 24 03:22:07 kafka9 kernel: RSP: 002b:00007f7fee2924b0 
EFLAGS: 00010283 Jan 24 03:22:07 kafka9 kernel: RAX: 00000000f757ba48 RBX: 
00000000080500a8 RCX: 00000000f757ba28 Jan 24 03:22:07 kafka9 kernel: RDX: 
00000000f757b9b8 RSI: 00000000f757b9ec RDI: 00000000005596e0 Jan 24 03:22:07 
kafka9 kernel: RBP: 0000000000000000 R08: 00000000d1d961e0 R09: 
0000000000559600 Jan 24 03:22:07 kafka9 kernel: R10: 0000000000559600 R11: 
0000000840280540 R12: 0000000000000000 Jan 24 03:22:07 kafka9 kernel: R13: 
00000000cea8da78 R14: 0000000000000001 R15: 00007f8888ac3800 Jan 24 03:22:07 
kafka9 kernel: FS: 00007f7fee293700 GS: 0000000000000000 Jan 24 03:22:07 kafka9 
kernel: Modules linked in: isofs nls_iso8859_1 kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev input_leds serio_raw 
ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 
nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG 
xt_limit xt_tcpudp xt_ad Jan 24 03:22:07 kafka9 kernel: crypto_simd cryptd 
glue_helper psmouse virtio_blk floppy virtio_net net_failover virtio_scsi 
failover Jan 24 03:22:07 kafka9 kernel: ---[ end trace 23d8fa22733ef791 ]--- 
Jan 24 03:22:07 kafka9 kernel: RIP: 0033:0x7f8878c63f4f Jan 24 03:22:07 kafka9 
kernel: Code: 00 00 0f 83 fa 05 00 00 4d 89 9f 18 01 00 00 41 0f 0d 8b 00 01 00 
00 bb a8 00 05 08 49 bb 00 00 00 00 08 00 00 00 4d 8d 1c db <4d> 8b 9b b8 00 00 
00 4c 89 18 c7 40 08 a8 00 05 08 c7 40 0c 00 00 Jan 24 03:22:07 kafka9 kernel: 
RSP: 002b:00007f7fee2924b0 EFLAGS: 00010283 Jan 24 03:22:07 kafka9 kernel: RAX: 
00000000f757ba48 RBX: 00000000080500a8 RCX: 00000000f757ba28 Jan 24 03:22:07 
kafka9 kernel: RDX: 00000000f757b9b8 RSI: 00000000f757b9ec RDI: 
00000000005596e0 Jan 24 03:22:07 kafka9 kernel: RBP: 0000000000000000 R08: 
00000000d1d961e0 R09: 0000000000559600 Jan 24 03:22:07 kafka9 kernel: R10: 
0000000000559600 R11: 0000000840280540 R12: 0000000000000000 Jan 24 03:22:07 
kafka9 kernel: R13: 00000000cea8da78 R14: 0000000000000001 R15: 
00007f8888ac3800 Jan 24 03:22:07 kafka9 kernel: FS: 00007f7fee293700(0000) 
GS:ffff97a29fd80000(0000) knlGS:0000000000000000 Jan 24 03:22:07 kafka9 kernel: 
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 03:22:07 kafka9 kernel: 
CR2: 00000008402805f8 CR3: 00000001f9f32006 CR4: 00000000007606e0 Jan 24 
03:22:07 kafka9 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000 Jan 24 03:22:07 kafka9 kernel: DR3: 0000000000000000 DR6: 
00000000fffe0ff0 DR7: 0000000000000400 Jan 24 03:22:07 kafka9 kernel: PKRU: 
55555554{code}
Broker configuration:
{code:java}
broker.id=9
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/mnt/volume_kafka9a/kafka,/mnt/volume_kafka9b/kafka,/mnt/volume_kafka9c/kafka
num.partitions=1
num.recovery.threads.per.data.dir=4
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=2
auto.leader.rebalance.enable=true
leader.imbalance.check.interval.seconds=30
unclean.leader.election.enable=true
log.retention.hours=168
log.segment.bytes=8388608
log.retention.check.interval.ms=300000
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
default.replication.factor=2
delete.topic.enable=true{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to