https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=236989
--- Comment #24 from Charles O'Donnell <c...@bus.net> --- New development. See three notes below. N.B. the system appears to have fully recovered. Normally I would have expected a freeze. 1. CPU alarm from a custom AWS monitor at 16:43 UTC (12:43 PM ET): Alarm Details: - Name: Starch CPU - Description: - State Change: OK -> ALARM - Reason for State Change: Threshold Crossed: 1 datapoint [31.4 (07/05/20 16:38:00)] was greater than or equal to the threshold (30.0). - Timestamp: Thursday 07 May, 2020 16:43:35 UTC - AWS Account: 539612714288 - Alarm Arn: arn:aws:cloudwatch:us-east-1:539612714288:alarm:Starch CPU Threshold: - The alarm is in the ALARM state when the metric is GreaterThanOrEqualToThreshold 30.0 for 300 seconds. 2. Sudden jump in failed 9k mbufs between 12:00 and 13:00 ET: ===> Thu May 7 10:00:00 EDT 2020 mbuf_jumbo_page: 4096, 490945, 0, 56,45464111, 0, 0 mbuf_jumbo_9k: 9216, 145465, 7538, 450,66361278,1640, 0 mbuf_jumbo_16k: 16384, 81824, 0, 0, 0, 0, 0 dev.ena.0.queue7.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue6.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue5.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue4.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue3.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue2.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue1.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue0.rx_ring.mjum_alloc_fail: 0 ===> Thu May 7 11:00:00 EDT 2020 mbuf_jumbo_page: 4096, 490945, 16, 113,45658689, 0, 0 mbuf_jumbo_9k: 9216, 145465, 7592, 397,66645310,1642, 0 mbuf_jumbo_16k: 16384, 81824, 0, 0, 0, 0, 0 dev.ena.0.queue7.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue6.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue5.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue4.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue3.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue2.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue1.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue0.rx_ring.mjum_alloc_fail: 0 ===> Thu May 7 12:00:00 EDT 2020 mbuf_jumbo_page: 4096, 490945, 182, 31,45730287, 0, 0 mbuf_jumbo_9k: 9216, 145465, 7461, 259,66753693,1693, 0 mbuf_jumbo_16k: 16384, 81824, 0, 0, 0, 0, 0 dev.ena.0.queue7.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue6.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue5.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue4.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue3.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue2.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue1.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue0.rx_ring.mjum_alloc_fail: 0 ===> Thu May 7 13:00:00 EDT 2020 mbuf_jumbo_page: 4096, 490945, 119, 109,46249719, 0, 0 mbuf_jumbo_9k: 9216, 145465, 7863, 207,67594999,2577, 0 mbuf_jumbo_16k: 16384, 81824, 0, 0, 0, 0, 0 dev.ena.0.queue7.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue6.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue5.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue4.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue3.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue2.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue1.rx_ring.mjum_alloc_fail: 0 dev.ena.0.queue0.rx_ring.mjum_alloc_fail: 0 3: ena0 reset at 12:43 ET: May 7 12:43:19 s4 kernel: ena0: The number of lost tx completion is above the threshold (129 > 128). Reset the device May 7 12:43:19 s4 kernel: ena0: Trigger reset is on May 7 12:43:19 s4 kernel: ena0: device is going DOWN May 7 12:43:22 s4 kernel: ena0: free uncompleted tx mbuf qid 3 idx 0x319ena0: free uncompleted tx mbuf qid 7 idx 0x173 May 7 12:43:23 s4 kernel: ena0: ena0: device is going UP May 7 12:43:23 s4 kernel: link is UP May 7 12:45:00 s4 kernel: ena0: The number of lost tx completion is above the threshold (129 > 128). Reset the device May 7 12:45:00 s4 kernel: ena0: Trigger reset is on May 7 12:45:00 s4 kernel: ena0: device is going DOWN May 7 12:45:04 s4 kernel: ena0: free uncompleted tx mbuf qid 3 idx 0x102 May 7 12:45:04 s4 kernel: ena0: ena0: device is going UP May 7 12:45:04 s4 kernel: link is UP May 7 12:45:26 s4 kernel: ena0: The number of lost tx completion is above the threshold (129 > 128). Reset the device May 7 12:45:26 s4 kernel: ena0: Trigger reset is on May 7 12:45:26 s4 kernel: ena0: device is going DOWN May 7 12:45:29 s4 kernel: ena0: free uncompleted tx mbuf qid 1 idx 0x3c7ena0: free uncompleted tx mbuf qid 2 idx 0x2c5ena0: free uncompleted tx mbuf qid 6 idx 0x2abena0: free uncompleted tx mbuf qid 7 idx 0x241 May 7 12:45:30 s4 kernel: May 7 12:45:30 s4 kernel: stray irq265 May 7 12:45:30 s4 kernel: ena0: ena0: device is going UP May 7 12:45:30 s4 kernel: link is UP May 7 12:46:05 s4 kernel: ena0: Keep alive watchdog timeout. May 7 12:46:05 s4 kernel: ena0: Trigger reset is on May 7 12:46:05 s4 kernel: ena0: device is going DOWN May 7 12:46:07 s4 kernel: ena0: free uncompleted tx mbuf qid 1 idx 0x123ena0: free uncompleted tx mbuf qid 3 idx 0xeeena0: free uncompleted tx mbuf qid 6 idx 0x208 May 7 12:46:08 s4 kernel: ena0: ena0: device is going UP May 7 12:46:08 s4 kernel: link is UP May 7 12:46:36 s4 kernel: ena0: The number of lost tx completion is above the threshold (129 > 128). Reset the device May 7 12:46:36 s4 kernel: ena0: Trigger reset is on May 7 12:46:36 s4 kernel: ena0: device is going DOWN May 7 12:46:37 s4 kernel: ena0: free uncompleted tx mbuf qid 0 idx 0x2c2ena0: free uncompleted tx mbuf qid 1 idx 0x135ena0: free uncompleted tx mbuf qid 2 idx 0xeeena0: free uncompleted tx mbuf qid 3 idx 0x373ena0: free uncompleted tx mbuf qid 4 idx 0x88ena0: free uncompleted t> May 7 12:46:38 s4 kernel: ena0: ena0: device is going UP May 7 12:46:38 s4 kernel: link is UP -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-virtualization@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization To unsubscribe, send any mail to "freebsd-virtualization-unsubscr...@freebsd.org"