Maurits Hartman created KAFKA-7867: -------------------------------------- Summary: Broker fails after corrupted page table Key: KAFKA-7867 URL: https://issues.apache.org/jira/browse/KAFKA-7867 Project: Kafka Issue Type: Bug Components: core Affects Versions: 2.0.0 Environment: openjdk version "11.0.1" 2018-10-16 OpenJDK Runtime Environment (build 11.0.1+13-Ubuntu-2ubuntu1) OpenJDK 64-Bit Server VM (build 11.0.1+13-Ubuntu-2ubuntu1, mixed mode, sharing)
Ubuntu 18.10 (Cosmic) Linux kernel 4.18.0-13 4x Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz 8GB RAM Reporter: Maurits Hartman We noticed one of the brokers in our cluster was down. It was no longer leader for its partitions, nor was it following the other broker's partitions. Kafka was still running apparently though (the log was still being appended to). In the systemd journal we noticed a "corrupted page table" error that happened a couple of hours before: {code:java} Jan 24 03:22:07 kafka9 kernel: kafka-request-h: Corrupted page table at address 8402805f8 Jan 24 03:22:07 kafka9 kernel: Bad pagetable: 000d [#1] SMP PTI Jan 24 03:22:07 kafka9 kernel: CPU: 3 PID: 3025 Comm: kafka-request-h Not tainted 4.18.0-13-generic #14-Ubuntu Jan 24 03:22:07 kafka9 kernel: Hardware name: DigitalOcean Droplet, BIOS 20171212 12/12/2017 Jan 24 03:22:07 kafka9 kernel: RIP: 0033:0x7f8878c63f4f Jan 24 03:22:07 kafka9 kernel: Code: 00 00 0f 83 fa 05 00 00 4d 89 9f 18 01 00 00 41 0f 0d 8b 00 01 00 00 bb a8 00 05 08 49 bb 00 00 00 00 08 00 00 00 4d 8d 1c db <4d> 8b 9b b8 00 00 00 4c 89 18 c7 40 08 a8 00 05 08 c7 40 0c 00 00 Jan 24 03:22:07 kafka9 kernel: RSP: 002b:00007f7fee2924b0 EFLAGS: 00010283 Jan 24 03:22:07 kafka9 kernel: RAX: 00000000f757ba48 RBX: 00000000080500a8 RCX: 00000000f757ba28 Jan 24 03:22:07 kafka9 kernel: RDX: 00000000f757b9b8 RSI: 00000000f757b9ec RDI: 00000000005596e0 Jan 24 03:22:07 kafka9 kernel: RBP: 0000000000000000 R08: 00000000d1d961e0 R09: 0000000000559600 Jan 24 03:22:07 kafka9 kernel: R10: 0000000000559600 R11: 0000000840280540 R12: 0000000000000000 Jan 24 03:22:07 kafka9 kernel: R13: 00000000cea8da78 R14: 0000000000000001 R15: 00007f8888ac3800 Jan 24 03:22:07 kafka9 kernel: FS: 00007f7fee293700 GS: 0000000000000000 Jan 24 03:22:07 kafka9 kernel: Modules linked in: isofs nls_iso8859_1 kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev input_leds serio_raw ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_tcpudp xt_ad Jan 24 03:22:07 kafka9 kernel: crypto_simd cryptd glue_helper psmouse virtio_blk floppy virtio_net net_failover virtio_scsi failover Jan 24 03:22:07 kafka9 kernel: ---[ end trace 23d8fa22733ef791 ]--- Jan 24 03:22:07 kafka9 kernel: RIP: 0033:0x7f8878c63f4f Jan 24 03:22:07 kafka9 kernel: Code: 00 00 0f 83 fa 05 00 00 4d 89 9f 18 01 00 00 41 0f 0d 8b 00 01 00 00 bb a8 00 05 08 49 bb 00 00 00 00 08 00 00 00 4d 8d 1c db <4d> 8b 9b b8 00 00 00 4c 89 18 c7 40 08 a8 00 05 08 c7 40 0c 00 00 Jan 24 03:22:07 kafka9 kernel: RSP: 002b:00007f7fee2924b0 EFLAGS: 00010283 Jan 24 03:22:07 kafka9 kernel: RAX: 00000000f757ba48 RBX: 00000000080500a8 RCX: 00000000f757ba28 Jan 24 03:22:07 kafka9 kernel: RDX: 00000000f757b9b8 RSI: 00000000f757b9ec RDI: 00000000005596e0 Jan 24 03:22:07 kafka9 kernel: RBP: 0000000000000000 R08: 00000000d1d961e0 R09: 0000000000559600 Jan 24 03:22:07 kafka9 kernel: R10: 0000000000559600 R11: 0000000840280540 R12: 0000000000000000 Jan 24 03:22:07 kafka9 kernel: R13: 00000000cea8da78 R14: 0000000000000001 R15: 00007f8888ac3800 Jan 24 03:22:07 kafka9 kernel: FS: 00007f7fee293700(0000) GS:ffff97a29fd80000(0000) knlGS:0000000000000000 Jan 24 03:22:07 kafka9 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 24 03:22:07 kafka9 kernel: CR2: 00000008402805f8 CR3: 00000001f9f32006 CR4: 00000000007606e0 Jan 24 03:22:07 kafka9 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Jan 24 03:22:07 kafka9 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Jan 24 03:22:07 kafka9 kernel: PKRU: 55555554{code} Broker configuration: {code:java} broker.id=9 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 log.dirs=/mnt/volume_kafka9a/kafka,/mnt/volume_kafka9b/kafka,/mnt/volume_kafka9c/kafka num.partitions=1 num.recovery.threads.per.data.dir=4 offsets.topic.replication.factor=2 transaction.state.log.replication.factor=2 transaction.state.log.min.isr=2 auto.leader.rebalance.enable=true leader.imbalance.check.interval.seconds=30 unclean.leader.election.enable=true log.retention.hours=168 log.segment.bytes=8388608 log.retention.check.interval.ms=300000 zookeeper.connection.timeout.ms=6000 group.initial.rebalance.delay.ms=0 default.replication.factor=2 delete.topic.enable=true{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)