Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation

Michael R. Hines Thu, 11 Apr 2013 13:34:04 -0700

On 04/11/2013 03:15 PM, Michael S. Tsirkin wrote:

On Thu, Apr 11, 2013 at 01:49:34PM -0400, Michael R. Hines wrote:

On 04/11/2013 10:56 AM, Michael S. Tsirkin wrote:

On Thu, Apr 11, 2013 at 04:50:21PM +0200, Paolo Bonzini wrote:

Il 11/04/2013 16:37, Michael S. Tsirkin ha scritto:

pg1 ->  pin -> req -> res -> rdma -> done
         pg2 ->  pin -> req -> res -> rdma -> done
                 pg3 -> pin -> req -> res -> rdma -> done
                        pg4 -> pin -> req -> res -> rdma -> done
                               pg4 -> pin -> req -> res -> rdma -> done


It's like a assembly line see?  So while software does the registration
roundtrip dance, hardware is processing rdma requests for previous
chunks.

Does this only affects the implementation, or also the wire protocol?

It affects the wire protocol.

I *do* believe chunked registration was a *very* useful request by
the community, and I want to thank you for convincing me to implement it.

But, with all due respect, pipelining is a "solution looking for a problem".

The problem is bad performance, isn't it?
If it wasn't we'd use chunk based all the time.

Improving the protocol does not help the behavior of any well-known
workloads,
because it is based on the idea the the memory footprint of a VM would
*rapidly* shrink and contract up and down during the steady-state iteration
rounds while the migration is taking place.

What gave you that idea? Not at all.  It is based on the idea
of doing control actions in parallel with data transfers,
so that control latency does not degrade performance.

Again, this parallelization is trying to solve a problem that doesn'texist.


As I've described before, I re-executed the worst-case memory stress hog
tests with RDMA *after* the bulk-phase round completes and determined
that RDMA throughput remains unaffected because most of the memory
was already registered in advance.

This simply does not happen - workloads don't behave that way - they either
grow really big or they grow really small and they settle that way
for a reasonable
amount of time before the load on the application changes at a
future point in time.

- Michael

What is the bottleneck for chunk-based? Can you tell me that?  Find out,
and you will maybe see pipelining will help.

Basically to me, when you describe the protocol in detail the problems
become apparent.

I think you worry too much about what the guest does, what APIs are
exposed from the migration core and the specifics of the workload. Build
a sane protocol for data transfers and layer the workload on top.

What is the point in enhancing a protocol to solve a problem will neverbe manifested?

We're trying to overlap two *completely different use cases* that arecompletely unrelated:


1. Static overcommit

2. Dynamic, fine-grained overcommit (at small time scales... seconds orminutes)

#1 Happens all the time. Cram a bunch of virtual machines with fixedworkloads

and fixed writable working sets into the same place, and you're good to go.

#2 never happens. Ever. It just doesn't happen, and the enhancements you've

described are trying to protect against #2, when we should really befocused on #1.

It is not standard practice for a workload to expect high overcommitperformancein the *middle* of a relocation and nobody in the industry that I havemet over the

years has expressed any such desire to do so.

Workloads just don't behave that way.

Dynamic registration does an excellent job at overcommitment for #1because mostof the registrations are done at the very beginning and can be furtheroptimized tocause little or no performance loss by simply issuing the registrationsbefore the

migration ever begins.

Performance for #2 even with dynamic registration is excellent and I am not
experiencing any problems associated with it.

So, we're discussing a non-issue.

- Michael



Overcommit has two

Re: [Qemu-devel] [RFC PATCH RDMA support v5: 03/12] comprehensive protocol documentation

Reply via email to