Hi,
Thanks for all the valuable inputs provided so far, I'll try to suggest
a design based on them.
The main inputs were about the use a new transport protocol between
repagent and rephub.
It was suggested to use some standard network storage protocol instead,
and use QMP commands for the control path.
The main idea is to use two NBD connections per protected volume:
NBD tap - protected VM is the client, rephub is the server, used to
report writes.
The tap is not a standard NBD backing - it is for replication,
meaning that its importance is lesser than
the main image path. Errors are not reported to the protected VM as
IO error.
NBD reader - protected VM is the server, rephub is the client, used for
reading the protected volume.
The NBD reader is a generic remote read (can add also write)
capability, probably usable for other various needs.
Actually the reader will probably be more useful as a
reader/writer, but for the agent - only read is required.
Here's a list of the protocol messages from the previous design and how
they're implemented in this design:
Rephub --> Repagent:
* Start protect
Will be done via QMP command.
* Read volume request
Covered by NBD reader
Repagent --> Rephub
* Protected write
Covered by NBD tap
* Report VM volumes
Isn't required in the protocol. I assume the management system
tracks the volumes
* Read Volume Response
Covered by NBD tap
* Agent shutdown
Not covered.
The start protect scenario will look something like:
* User calls start protect for a volume
* Mgmt system (e.g. Rhev) sends QMP command to VM - start protect, with
volume details (path) and a
IP+port number for NBD tap
--> Qemu connects to the NBD tap server
* Mgmt system sends QMP command to VM - start remote reader with volume
details and port number for NBD reader.
--> Qemu starts to listen as an NBD server on that port
Issues:
* As far as I understand, NBD requires socket/port per volume, which the
management system allocates. This is a little cumbersome
The original design had a single server in the rephub - a single
port allocation, and a socket per Qemu.
Appreciate any comments and ideas.
Thanks,
Ori