Hi Peter!

-    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created,
-                       errp);
+    fd = file_ram_open(mem_path, memory_region_name(mr), readonly, &created);
+    if (fd == -EACCES && !(ram_flags & RAM_SHARED) && !readonly) {
+        /*
+         * We can have a writable MAP_PRIVATE mapping of a readonly file.
+         * However, some operations like ftruncate() or fallocate() might fail
+         * later, let's warn the user.
+         */
+        fd = file_ram_open(mem_path, memory_region_name(mr), true, &created);
+        if (fd >= 0) {
+            warn_report("backing store %s for guest RAM (MAP_PRIVATE) opened"
+                        " readonly because the file is not writable", 
mem_path);

I can understand the use case, but this will be slightly unwanted,
especially the user doesn't yet have a way to predict when will it happen.

Users can set the file permissions accordingly I guess. If they don't want the file to never ever be modified via QEMU, set it R/O.


Meanwhile this changes the behavior, is it a concern that someone may want
to rely on current behavior of failing?

The scenario would be that someone passes a readonly file to "-mem-path" or "-object memory-backend-file,share=off,readonly=off", with the expectation that it would currently fail.

If it now doesn't fail (and we warn instead), what would happen is:
* In file_ram_alloc() we won't even try ftruncate(), because the file
  already had a size > 0. So ftruncate() is not a concern as I now
  realize.
* fallocate might fail later. AFAIKS, that only applies to
  ram_block_discard_range().
 -> virtio-mem performs an initial ram_block_discard_range() check and
    fails gracefully early.
 -> virtio-ballooon ignores any errors
 -> ram_discard_range() in migration code fails early for postcopy in
    init_range() and loadvm_postcopy_ram_handle_discard(), handling it
    gracefully.

So mostly nothing "bad" would happen, it might just be undesirable, and we properly warn.

Most importantly, we won't be corrupting/touching the original file in any case, because it is R/O.

If we really want to be careful, we could clue that behavior to compat machines. I'm not really sure yet if we really have to go down that path.

Any other alternatives? I'd like to avoid new flags where not really required.


To think from a higher level of current use case, the ideal solution seems
to me that if the ram file can be put on a file system that supports CoW
itself (like btrfs), we can snapshot that ram file and make it RW for the
qemu instance. Then here it'll be able to open the file.  We'll be able to
keep the interface working as before, meanwhile it'll work with fallocate
or truncations too I assume.

Would that be better instead of changing QEMU?

As I recently learned, using file-backed VMs (on real ssd/disks, not shmem/hugetlb) is usually undesired, because the dirtied pages will constantly get rewritten to disk by background writeback threads, eventually resulting in bad performance and SSD wear.

So while using a COW filesystem sounds cleaner in theory, it's not applicable in practice -- unless one disables any background writeback, which has different side effects because it cannot be configured on a per-file basis.

So for VM templating, it makes sense to capture the guest RAM and store it in a file, to then use a COW (MAP_PRIVATE) mapping. Using a read-only file makes perfect sense in that scenario IMHO.

[I'm curious at what point a filesystem will actually break COW. if it's wired up to the writenotify infrastructure, it would happen when actually writing to a page, not at mmap time. I know that filesystems use writenotify for lazy allocation of disk blocks on file holes, maybe they also do that for lazy allocation of disk blocks on COW]

Thanks!

--
Cheers,

David / dhildenb


Reply via email to