Zero out clusters after the end of the device, this makes restore handle
it correctly (even if it may try to write those zeros, it won't fail and
just ignore the out-of-bounds write to disk).

For not even 4k-aligned disks, there is a potential buffer overrun in
the memcpy (since always 4k are copied), which causes host-memory
leakage into VMA archives. Fix this by always zeroing the affected area
in the output-buffer.

Reported-by: Roland Kammerer <roland.kamme...@linbit.com>
Suggested-by: Lars Ellenberg <lars.ellenb...@linbit.com>
Signed-off-by: Stefan Reiter <s.rei...@proxmox.com>
---

Thanks again for the detailed report and reproducer! It seems Lars' idea for a
fix was indeed correct, I also added the mentioned memset to not leak memory
even on non-4k-aligned disks as well as some cleanup on the patch you sent.

Tested with aligned VMs (also with small efidisks), as well as a specifically
unaligned one, which no longer exhibits the bug (tested on a vg with 1k extent
alignment in a loop file). Hexdump of the resulting vma shows no more memory
leakage. Would of course be grateful for further testing, especially if it fixes
the originally reported bug (the DRBD-related stuff).

 vma-writer.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/vma-writer.c b/vma-writer.c
index 06cbc02b1e..f5d2c5d23c 100644
--- a/vma-writer.c
+++ b/vma-writer.c
@@ -633,17 +633,33 @@ vma_writer_write(VmaWriter *vmaw, uint8_t dev_id, int64_t 
cluster_num,
 
     DPRINTF("VMA WRITE %d %zd\n", dev_id, cluster_num);
 
+    uint64_t dev_size = vmaw->stream_info[dev_id].size;
     uint16_t mask = 0;
 
     if (buf) {
         int i;
         int bit = 1;
+        uint64_t byte_offset = cluster_num * VMA_CLUSTER_SIZE;
         for (i = 0; i < 16; i++) {
             const unsigned char *vmablock = buf + (i*VMA_BLOCK_SIZE);
-            if (!buffer_is_zero(vmablock, VMA_BLOCK_SIZE)) {
+
+            // Note: If the source is not 64k-aligned, we might reach 4k blocks
+            // after the end of the device. Always mark these as zero in the
+            // mask, so the restore handles them correctly.
+            if (byte_offset < dev_size &&
+                !buffer_is_zero(vmablock, VMA_BLOCK_SIZE))
+            {
                 mask |= bit;
                 memcpy(vmaw->outbuf + vmaw->outbuf_pos, vmablock,
                        VMA_BLOCK_SIZE);
+
+                // prevent memory leakage on unaligned last block
+                if (byte_offset + VMA_BLOCK_SIZE > dev_size) {
+                    uint64_t real_data_in_block = dev_size - byte_offset;
+                    memset(vmaw->outbuf + vmaw->outbuf_pos + 
real_data_in_block,
+                           0, VMA_BLOCK_SIZE - real_data_in_block);
+                }
+
                 vmaw->outbuf_pos += VMA_BLOCK_SIZE;
             } else {
                 DPRINTF("VMA WRITE %zd ZERO BLOCK %d\n", cluster_num, i);
@@ -651,6 +667,7 @@ vma_writer_write(VmaWriter *vmaw, uint8_t dev_id, int64_t 
cluster_num,
                 *zero_bytes += VMA_BLOCK_SIZE;
             }
 
+            byte_offset += VMA_BLOCK_SIZE;
             bit = bit << 1;
         }
     } else {
@@ -676,8 +693,8 @@ vma_writer_write(VmaWriter *vmaw, uint8_t dev_id, int64_t 
cluster_num,
 
     if (dev_id != vmaw->vmstate_stream) {
         uint64_t last = (cluster_num + 1) * VMA_CLUSTER_SIZE;
-        if (last > vmaw->stream_info[dev_id].size) {
-            uint64_t diff = last - vmaw->stream_info[dev_id].size;
+        if (last > dev_size) {
+            uint64_t diff = last - dev_size;
             if (diff >= VMA_CLUSTER_SIZE) {
                 vma_writer_set_error(vmaw, "vma_writer_write: "
                                      "read after last cluster");
-- 
2.20.1


_______________________________________________
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to