From: wenzt [mailto:[email protected]]
Sent: Wednesday, March 6, 2019 6:28 PM
To: Zhang, Chen <[email protected]>
Cc: 'qemu-discuss' <[email protected]>
Subject: 答复: Latest Qemu-COLO Problems
I have tested Proxy with QMP: "{'execute': 'trace-event-set-state',
'arguments': {'name': 'colo*', 'enable': true} }"
I got this nothing except this logs on PVM side:
[email protected]:colo_compare_main<mailto:[email protected]:colo_compare_main>
: secondary: unsupported packet in
[email protected]:colo_compare_main<mailto:[email protected]:colo_compare_main>
: secondary: unsupported packet in
[email protected]:colo_compare_main<mailto:[email protected]:colo_compare_main>
: secondary: unsupported packet in
[email protected]:colo_compare_main<mailto:[email protected]:colo_compare_main>
: primary: unsupported packet in
[email protected]:colo_compare_main<mailto:[email protected]:colo_compare_main>
: secondary: unsupported packet in
My guest OS is Centos 7.5.
I did nothing but boot up the OS.
After that, firing some net IO still get those logs.
I did some debug, maybe some parse error in parse_packet_early(), get the wrong
ETH_P_protocolName
Hi Zhengtao,
I think your test environment have some net issue, can you get IP in the guest?
Without COLO guest’s status?
Or you use Jiaoyuwang to test? network switch do some job in ETH level(like
vlan)?
In my side primary node proxy report like that:
[email protected]:colo_send_message Send 'checkpoint-request' message
[email protected]:colo_receive_message Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1552455102, "microseconds": 148903}, "event": "STOP"}
[email protected]:colo_vm_state_change Change 'run' => 'stop'
[email protected]:colo_send_message Send 'vmstate-send' message
[email protected]:colo_send_message Send 'vmstate-size' message
[email protected]:colo_receive_message Receive 'vmstate-received' message
[email protected]:colo_receive_message Receive 'vmstate-loaded' message
{"timestamp": {"seconds": 1552455102, "microseconds": 277064}, "event":
"RESUME"}
[email protected]:colo_vm_state_change Change 'stop' => 'run'
[email protected]:colo_compare_main : compare udp
[email protected]:colo_compare_ip_info ppkt size = 81, ip_src =
10.239.161.136, ip_dst = 10.248.2.5, spkt size = 81, ip_src = 10.239.161.136,
ip_dst = 10.248.2.5
[email protected]:colo_compare_main : packet same and release packet
[email protected]:colo_compare_main : compare udp
[email protected]:colo_compare_ip_info ppkt size = 81, ip_src =
10.239.161.136, ip_dst = 10.239.27.228, spkt size = 81, ip_src =
10.239.161.136, ip_dst = 10.239.27.228
[email protected]:colo_compare_main : packet same and release packet
[email protected]:colo_compare_main : compare udp
[email protected]:colo_compare_ip_info ppkt size = 81, ip_src =
10.239.161.136, ip_dst = 172.17.6.9, spkt size = 81, ip_src = 10.239.161.136,
ip_dst = 172.17.6.9
[email protected]:colo_compare_main : packet same and release packet
[email protected]:colo_compare_main : compare udp
[email protected]:colo_compare_ip_info ppkt size = 81, ip_src =
10.239.161.136, ip_dst = 10.248.2.5, spkt size = 81, ip_src = 10.239.161.136,
ip_dst = 10.248.2.5
[email protected]:colo_compare_main : packet same and release packet
[email protected]:colo_compare_main : compare icmp
[email protected]:colo_compare_ip_info ppkt size = 157, ip_src =
10.239.161.136, ip_dst = 172.17.6.9, spkt size = 157, ip_src = 10.239.161.136,
ip_dst = 172.17.6.9
[email protected]:colo_compare_main : packet same and release packet
Thanks
Zhang Chen
Thanks,
Zhengtao
发件人: Zhang, Chen <[email protected]<mailto:[email protected]>>
发送时间: 2019年3月5日 23:32
收件人: wenzt <[email protected]<mailto:[email protected]>>
抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
主题: RE: Latest Qemu-COLO Problems
From: wenzt [mailto:[email protected]]
Sent: Thursday, February 28, 2019 10:00 AM
To: Zhang, Chen <[email protected]<mailto:[email protected]>>
Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
Subject: 答复: Latest Qemu-COLO Problems
This version: https://github.com/coloft/qemu/tree/colo-v4.1-periodic-mode
This is old version from 3 years ago, please drop it, use qemu upstream codes.
Another question:
What is the relationship between Proxy and Checkpoint ?
When PVM and SVM send different net packet, proxy will send a signal to
COLO-frame to do a checkpoint.
Do they work together ? I guess we should set checkpoint interval longer like
20s.
Yes, they work together, at the same time, we have periodic checkpoint
mechanism, like a timer. You can set the time too.
Does Proxy only works under network workload ? In my test, I feel like Proxy
not working.
Yes, as wiki said, colo-proxy compare the PVM and SVM packet to decide if do
checkpoint.
You can enable the COLO debug info to see proxy’s job in primary node like this:
"{'execute': 'trace-event-set-state', 'arguments': {'name': 'colo*', 'enable':
true} }"
Thanks
Zhang Chen
发件人: Zhang, Chen <[email protected]<mailto:[email protected]>>
发送时间: 2019年2月28日 9:34
收件人: wenzt <[email protected]<mailto:[email protected]>>
抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
主题: RE: Latest Qemu-COLO Problems
Which version?
COLO project always said the PVM and SVM execute in parallel.
Thanks
Zhang Chen
From: wenzt [mailto:[email protected]]
Sent: Thursday, February 28, 2019 9:21 AM
To: Zhang, Chen <[email protected]<mailto:[email protected]>>
Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
Subject: 答复: Latest Qemu-COLO Problems
But in earlier version, I noticed that SVM always inmigration status even doing
checkpoint.
No operation can be performed on SVM.
Thanks,
Zhengtao
发件人: Zhang, Chen <[email protected]<mailto:[email protected]>>
发送时间: 2019年2月27日 18:45
收件人: wenzt <[email protected]<mailto:[email protected]>>
抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
主题: RE: Latest Qemu-COLO Problems
From: wenzt [mailto:[email protected]]
Sent: Wednesday, February 27, 2019 6:04 PM
To: Zhang, Chen <[email protected]<mailto:[email protected]>>
Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
Subject: 答复: Latest Qemu-COLO Problems
Thanks for help !
I don’t know why we keep switching SVM between Run and Stop ?
Why we don’t keep SVM inmigration status ?
Because we need do checkpoint to sync all status between PVM and SVM.
We can’t guarantee that their status will be the same after a while.
Thanks
Zhang Chen
Thanks,
Zhengtao
发件人: Zhang, Chen <[email protected]<mailto:[email protected]>>
发送时间: 2019年2月26日 18:41
收件人: wenzt <[email protected]<mailto:[email protected]>>
抄送: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
主题: RE: Latest Qemu-COLO Problems
By the way, please read the COLO wiki use this command to trigger failover in
secondary node:
{ 'execute': 'nbd-server-stop' }
{ "execute": "x-colo-lost-heartbeat" }
Thanks
Zhang Chen
From: Zhang, Chen
Sent: Tuesday, February 26, 2019 2:46 PM
To: 'wenzt' <[email protected]<mailto:[email protected]>>
Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
Subject: RE: Latest Qemu-COLO Problems
Sorry for slow response.
I have fixed this bug in this series:
https://lists.nongnu.org/archive/html/qemu-devel/2019-02/msg06920.html
Please test it.
Thanks
Zhang Chen
From: wenzt [mailto:[email protected]]
Sent: Friday, February 15, 2019 7:54 PM
To: Zhang, Chen <[email protected]<mailto:[email protected]>>
Cc: 'qemu-discuss' <[email protected]<mailto:[email protected]>>
Subject: Latest Qemu-COLO Problems
Hi Zhang,
I have tested COLO with qemu-3.1.0 follow https://wiki.qemu.org/Features/COLO
I got this problems on PVM:
{"timestamp": {"seconds": 1550230616, "microseconds": 644348}, "event": "STOP"}
{"timestamp": {"seconds": 1550230616, "microseconds": 719003}, "event":
"RESUME"}
{"timestamp": {"seconds": 1550230616, "microseconds": 743554}, "event": "STOP"}
qemu-system-x86_64: Can't receive COLO message: Input/output error
qemu-system-x86_64: Can't receive COLO message: Input/output error
{"timestamp": {"seconds": 1550230618, "microseconds": 257209}, "event":
"COLO_EXIT", "data": {"mode": "primary", "reason": "error"}}
And on SVM:
{"timestamp": {"seconds": 1550230616, "microseconds": 731544}, "event": "STOP"}
[email protected]:colo_vm_state_change<mailto:[email protected]:colo_vm_state_change>
Change 'run' => 'stop'
[email protected]:colo_send_message<mailto:[email protected]:colo_send_message>
Send 'checkpoint-reply' message
[email protected]:colo_receive_message<mailto:[email protected]:colo_receive_message>
Receive 'vmstate-send' message
[email protected]:colo_flush_ram_cache_begin<mailto:[email protected]:colo_flush_ram_cache_begin>
dirty_pages 18446744073708498780
[email protected]:colo_flush_ram_cache_end<mailto:[email protected]:colo_flush_ram_cache_end>
[email protected]:colo_receive_message<mailto:[email protected]:colo_receive_message>
Receive 'vmstate-size' message
[email protected]:colo_send_message<mailto:[email protected]:colo_send_message>
Send 'vmstate-received' message
{"timestamp": {"seconds": 1550230616, "microseconds": 837436}, "event":
"RESUME"}
qemu-system-x86_64: block.c:5062: bdrv_detach_aio_context: Assertion
`!bs->walking_aio_notifiers' failed.
Aborted (core dumped)