> On Sun, Aug 30, 2020 at 7:13 PM <thomas(a)hoberg.net&gt; wrote:
> 
> Using export domain is not a single click, but it is not that complicated.
> But this is good feedback anyway.
> 
> 
> I think the issue is gluster, not qemu-img.
> 
> 
> How did you try? transfer via the UI is completely different than
> transfer using the python API.
> 
> From the UI, you get the image content on storage, without sparseness
> support. If you
> download 500g raw sparse disk (e.g. gluster with allocation policy
> thin) with 50g of data
> and 450g of unallocated space, you will get 50g of data, and 450g of
> zeroes. This is very
> slow. If you upload the image to another system you will upload 500g
> of data, which will
> again be very slow.
> 
> From the python API, download and upload support sparseness, so you
> will download and
> upload only 50g. Both upload and download use 4 connections, so you
> can maximize the
> throughput that you can get from the storage. From python API, you can
> convert the image
> format during download/upload automatically, for example download raw
> disk to qcow2
> image.
> 
> Gluster is a challenge (as usual), since when using sharding (enabled
> by default for ovirt),
> it does not report sparness. So even from the python API you will
> download the entire 500g.
> We can improve this using zero detection but this is not implemented yet.
> 
> 
> In our lab we tested upload of 100 GiB image and 10 concurrent uploads
> of 100 GiB
> images, and we measured throughput of 1 GiB/s:
> https://bugzilla.redhat.com/show_bug.cgi?id=1591439#c24
> 
> I would like to understand the setup better:
> 
> - upload or download?
> - disk format?
> - disk storage?
> - how is storage connected to host?
> - how do you access the host (1g network? 10g?)
> - image format?
> - image storage?
> 
> 
> backup domain is a partly cooked feature and it is not very useful.
> There is no reason
> to use it for moving VMs from one environment to another.
> 
> I already explained how to move vms using a data domain. Check here:
> https://lists.ovirt.org/archives/list/[email protected]/message/ULLFLFKBAW7...
> https://lists.ovirt.org/archives/list/[email protected]/message/GFOK55O5N4S...
> 
> I'm not sure it is documented properly, please file a documentation
> bug if we need to
> add something to the documentation.
> 
> 
> If you cloned a vm to data domain and then detach the data domain
> there is nothing to cleanup in the source system.
> 
> 
> We have this in 4.4, try to select a VM and click "Export".
> 
> Nir
> On Sun, Aug 30, 2020 at 7:13 PM <thomas(a)hoberg.net&gt; wrote:
> 
> Using export domain is not a single click, but it is not that complicated.
> But this is good feedback anyway.
> 
> 
> I think the issue is gluster, not qemu-img.

From what I am gathering from your feedback, that may be very much so, and I 
think it's a  major concern.

I know RHV started out much like vSphere or Oracle Virtualization without HCI, 
but with separated storage and dedicated servers for the management. If you 
have scale, HCI is quite simply inefficient.

But if you have scale, you either are already cloud yourself or going there. So 
IMHO HCI in small lab, edge, industrial or embedded applications is *the* 
future for HCI products and with it for oVirt. In that sense I perfectly 
subscribe to your perspective that the 'Python-GUI' is the major selling point 
of oVirt towards developers, but where Ceph, NAS and SAN will most likely be 
managed professionally, the HCI stuff needs to work out of your box--perfectly.

In my case I am lego-ing surplus servers into an HCI to use both as resilient 
storage and for POC VMs which are fire and forget (a host goes down, the VMs 
get restarted elsewhere, no need to rush in and rewire things if an old host 
had it's final gasp).

The target model at the edge I see is more what I have at home in my home-lab, 
which is basically a bunch of NUCs, Atom J5005 with 32GB and 1TB SATA at the 
low end, and now with 14nm Core CPUs being pushed out of inventories for cheap, 
even a NUC10 i7-10710U with 64GB of RAM and 1TB of NVMe, a fault tolerant 
cluster well below 50Watts in normal operations and with no moving parts.

In the corporate lab these are complemented by big ML servers for the main 
research, where the oVirt HCI simply adds storage and VMs for automation jobs, 
but I'd love to be able to use those also as oVirt compute nodes, at least 
partially: The main workloads there run under Docker because of the easy GPU 
integration. It's not that dissimilar in the home-lab, where my workstations 
(not 24/7 and often running Windows) may sometimes be added as compute nodes, 
but not part of the HCI parts.

I'd love to string these all together via a USB3 Gluster and use the on-board 
1Gbit for the business end of the VMS, but since nobody offers a simple USB3 
peering network, I am using 2.5 or 5GBit USB Ethernet adapters instead for 
3-node HCI (main) and 1-node HCI (disaster/backup/migration).
> 
> 
> How did you try? transfer via the UI is completely different than
> transfer using the python API.
Both ways, using the Python sample code from the SDK you wrote. I didn't 
measure the GUI side... it finished over night, but the Python code echos a 
throughput figure at the end, which was 50MB/s in my case, while NFS typically 
reaches the 2.5Gbit Ethernet limits of 270MB/s.

And funny, that they should be so different, I keep thinking that the Web-GUI 
and the 'Python-GUI' are in lock-step, but I guess the 'different' mainly 
refers to the fact that the GUI needs to go through an image proxy.
> 
> From the UI, you get the image content on storage, without sparseness
> support. If you
> download 500g raw sparse disk (e.g. gluster with allocation policy
> thin) with 50g of data
> and 450g of unallocated space, you will get 50g of data, and 450g of
> zeroes. This is very
> slow. If you upload the image to another system you will upload 500g
> of data, which will
> again be very slow.
> 
> From the python API, download and upload support sparseness, so you
> will download and
> upload only 50g. Both upload and download use 4 connections, so you
> can maximize the
> throughput that you can get from the storage. From python API, you can
> convert the image
> format during download/upload automatically, for example download raw
> disk to qcow2
> image.
This comment helped me realize how different the GUI image transfers are from 
OVA, Export Domain and Python: While the first allows these transfers from 
'everywhere a GUI might run on', the latter will run on any node with 
hosted-engine capabilities, which implies VDSM runnung there and it having 
access to both ends of the storage locally.

But the critical insight was, that disk images Gluster failed to write/store 
with all the faster methods, were written and worked fine using the GUI or via 
the imageio proxy.

So one of the perhaps best ways to find the underlying Gluster bug is to see 
what's happening when the same image is transferred in both ways.

I can't see how a bug report to the Gluster team might have a chance of 
succeeding when I attach a 500GB disk image and ask them to find out 'why this 
image fails with qemu-img writes'...
> 
> Gluster is a challenge (as usual), since when using sharding (enabled
> by default for ovirt),
Somehow that message doesn't get to the headlines on oVirt: HCI is not 
advertised as a 'niche that might sometimes work'.

HCI is built on the premise and promise that the network protocols and software 
(as well as the physical network) are more reliable than the node hardware, 
otherwise it just becomes a very expensive source of entropy.

And of course, sharding is a must in HCI with VMs, even if it breaks one of the 
major benefits of Gluster: Access to the original files in the back bricks in 
case it fouls up. In an HPC environment with hundreds of nodes and bricks I 
guess I wouldn't use it, in 3-9 node HCI with VMs mostly, sharding and erasure 
codes is what I need to work perfectly.

I've gathered it's another team and they have now have major staffing and 
funding issues, but without the ability to manage cloud, DC on-premise and edge 
HCI deployments under a single management pane and with good interoperability, 
oVirt/RHV ceases to be a product: IMHO you can't afford that, even if it costs 
investments.
> it does not report sparness. So even from the python API you will
> download the entire 500g.
> We can improve this using zero detection but this is not implemented yet.
Since I have VDO underneath it might not even make such a big difference with 
regards to storage and with compression on the communications link, 
implementing yet another zero detection layer may not yield tons of benefit. I 
guess what I'd mostly expect is a option to the disk up/download that acts 
locally to the VDSM nodes, like the OVAs and domain exports/imports.

The other critical success element for oVirt (apart from offering something 
more reliable than a single physical host), is the ability to use it in a 
self-service manner. The 'Python-GUI' is quickly becoming the default, 
especially with the kids in the company, who no longer even know how to point 
and click a mouse and will code everything, but there are still the older guys 
like me, who expect to do things manually with a mouse on a GUI. So if these 
options are there, the GUI should support them.
> 
> 
> In our lab we tested upload of 100 GiB image and 10 concurrent uploads
> of 100 GiB
> images, and we measured throughput of 1 GiB/s:
> https://bugzilla.redhat.com/show_bug.cgi?id=1591439#c24
That doesn't sound so great, if the network is 100Gbit ;-)
So I am assuming you can saturate the network, something I am afraid of doing 
in an edge HCI with a single network port running Gluster and everything else. 
With native 10Gbit USB3 links supporting isochronous protocols I'd feel save, 
but with TCP/IP on Gbit...

In any case I'll do more testing, but currently that doesn't solve my problem, 
because I still need to have those VMs move from the NFS domain to Gluster and 
that fails.
> 
> I would like to understand the setup better:
> 
Currently the focus is on migrating clusters from 4.3 HCI to 4.4 with a full 
rebuild of the nodes and VMs in safe storage. The official migration procedure 
doesn't seem mistake reslient enough on a 3 node HCI gluster.

Moving VMs between Gluster and NFS domains seems to work well enough on export, 
imports work too, but once you move those VMs to Gluster on the target, 
qemu-img convert fails more often than not, evidently because of a Gluster bug, 
that does not trigger on GUI uploads.
> - upload or download?
both
> - disk format?
"thin provisioned" wherever I had a choice: The VMs in question are pretty much 
always about functionality not performance and not having to worry about disk 
sizes. VMs are given a large single disk, VDO, LVM_thin, QCOW2 expected to only 
what's written to.
> - disk storage?
3n or 1n HCI Gluster, detachable domains are local storage exported via NFS and 
meant to be temporary, because gluster storage doesn't move that easily. There 
is no SAN or enterprise NFS available.
> - how is storage connected to host?
PCIe or SATA SSD
> - how do you access the host (1g network? 10g?)
2.5/5/10 Gibt Ethernet
> - image format?
I tag "thin" whereever I get a choice. qemu-img info will still often report 
"raw" e.g. on export domain images.
> - image storage?
Gluster or NFS
> 
> 
> backup domain is a partly cooked feature and it is not very useful.
> There is no reason
> to use it for moving VMs from one environment to another.
The manual is terse. I guess the only functionality at the moment is that VMs 
in backup domains don't get launched.

The attribute also just seems to be a local flag, when a domain is re-attached 
the backup flag gets lost. I only noticed after I successfully launched VMs 
from the 'backup' domain re-attached to the 4.4 target.
> 
> I already explained how to move vms using a data domain. Check here:
> https://lists.ovirt.org/archives/list/[email protected]/message/ULLFLFKBAW7...
> https://lists.ovirt.org/archives/list/[email protected]/message/GFOK55O5N4S...
Since HCI and Gluster is my default, I didn't pay that much attention initially.
I have tested NFS domains more and I find them much easier to use, but without 
an enterprise NAS or with HCI as a target on source and target, that's not a 
solution until disks can be moved from NFS to Gluster without failing on 
qemu-img convert.
> 
> I'm not sure it is documented properly, please file a documentation
> bug if we need to
> add something to the documentation.
> 
> 
> If you cloned a vm to data domain and then detach the data domain
> there is nothing to cleanup in the source system.
At least on the 4.3 GUI clone doesn't have a target and only asks for a name: 
There is no cloning from gluster to NFS or vice-versa in the GUI. Instead I 
have to first clone (gluster2gluster) and then move (gluster2NFS) to make a VM 
movable. Perhaps that is different in Python/REST?
With 4.4 the clone operation is much more elaborate and allows fine tuning the 
'cloned' machine. But again, I don't see that I can change the storage domain 
there: There is a selection box, but it only allows the same domain as the 
clone source.

Actually that makes a lot of sense, because for VDI scenarios or similar, clone 
should be a copy-on-write operation, essentialy a snapshot into a distinct 
identity. So detaching tons of straddling VMs could be a challenge.

As far as I can tell on 4.3 clone is simply a full copy (with sparsity 
preserved) and with 4.4 you get a 'copy with reconfiguration'. The VDI type 
storage efficency needs to come from VDO, it doesn't seem to be managed by 
oVirt.
> 
> 
> We have this in 4.4, try to select a VM and click "Export".
Good, so the next migration will be easier...
> 
> Nir
Hey, sorry for piling on a bit: I really do appreciate both what you have been 
creating and your support.
It's just that for a product that is almost decades old now, it seems very beta 
right where and how I need to use it.

I am very much looking forward to next week and hear about the bright future 
you plan for oVirt/RHV, but in the mean-time I'd like to abuse this opportunity 
to push my agenda a bit:

1. Make HCI a true focus of the product, not a Nutanix also-ran sideline. 
Perhaps even make it your daily driver in QA
2. Find ways of fencing, that do not require enterprise hardware: NUCs or 
similar could be a giant opportunity in edge deployments, with various levels 
of concentration (and higher grade hardware) along the path towards DCs or 
clouds: Not having to switch the orchestrator API is a USP
3. With Thunderbolt being the new USB and Thunderbold being PCIe or NVMe over 
fabric etc.: Is there a way to make USB work for a HCI fabric? I use Mellanox 
host-chaining on our big boxes and while vendors would rather sell IF switches, 
labs would rather use software. And USB is even cheaper than Ethernet, because 
four ports come free with every box, allowing for quite a HCI mesh just adding 
cables. Gluster giving up on RDMA support (if I read correctly) is the wrong 
way to go.
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/privacy-policy.html
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/3BUF7XX5UUFECKLMNVK5HZVHWTD5X2JW/

Reply via email to