Re: [Qemu-devel] [PATCH 3/5] memory: Add RAM_NONPERSISTENT flag

Eduardo Habkost Thu, 22 Jun 2017 10:28:19 -0700

On Thu, Jun 22, 2017 at 01:14:58PM +0100, Dr. David Alan Gilbert wrote:
> * Eduardo Habkost (ehabk...@redhat.com) wrote:
> > The new flag will make qemu_ram_free() discard the contents of the
> > block.  It will be used to let QEMU be configured to avoid flushing file
> > contents to disk when exiting.  As MADV_REMOVE is not always supported,
> > the new code will try MADV_NOTNEEDED in case MADV_REMOVE fails.
> 
> I'd like to understand what semantics you're trying to achieve and thus
> why you prefer REMOVE to DONTNEED.   If you're trying to avoid changes
> being written back then doesn't a DONTNEED get rid of any changes that
> have yet to be written?  Or are there changes that have already been
> queued that REMOVE will kill off?
>


Generally speaking, it look(ed) like REMOVE is a superset of DONTNEED:
DONTNEED will free and zero pages only on anonymous private mappings;
REMOVE will free resources and zero pages on additional cases.

One case where I can think REMOVE would be useful is tmpfs when swapping
is involved: with REMOVE, the host can drop swap contents or avoid
writing memory contents to swap even if we are using a shared tmpfs
mapping.

Other filesystems might have similar cases where unnecessary I/O
operations might be performed even after madvise(MADV_DONTNEED) is
called.  MADV_REMOVE lets us simply tell the kernel to drop the data.

I'm CCing Zack Cornelius, who initially suggested MADV_REMOVE, in case
he can describe more specific use cases.


> If you're just trying to save-time in writeback, it's interesting to
> note my requirement is that by the time I exit this function the
> process of throwing away the memory contents must be complete;
> I think your requirements are a lot lazier as to when it happens.

This is a very good point.  I was assuming that REMOVE is a superset of
DONTNEED, but based on the manpage it doesn't seem to be guaranteed.
Probably I shouldn't try to reuse ram_block_discard_range() and write a
separate helper for madvise(MADV_REMOVE), as the requirements are
different.


> > The new flag will also indicate that ram_block_discard_range() can use
> > MADV_REMOVE when discarding memory pages.  I have considered calling
> > MADV_REMOVE unconditionally (as destroying the RAM contents seems to be
> > OK every time ram_block_discard_range() is called), but for safety I
> > decided to restrict the new code to blocks having RAM_NONPERSISTENT set.
> 
> The manpage on MADV_REMOVE is confusing; it says it doesn't work on Huge
> TLB pages, but says it does work on anything that can do
> FALLOC_FL_PUNCH_HOLE - which as far as I can tell hugetlbfs does.

Yes, it's confusing.  I need to do some testing to find out if HugeTLBFS
supports MADV_REMOVE today.  But my use case is just an optimization, so
it won't be a big deal if it doesn't cover every case in the first
version.

> 
> I've got some code in my shared-postcopy world that has this function do
> the following which is kind of similar:
> 
>         /* The logic here is messy;
>          *    madvise DONTNEED fails for hugepages
>          *    fallocate works on hugepages and shmem
>          */
>         need_madvise = (rb->page_size == qemu_host_page_size) &&
>                        (rb->fd == -1 || !(rb->flags & RAM_SHARED));
>         need_fallocate = rb->fd != -1;

This looks safer to me.  I was bothered by the missing check for
(rb->fd != -1) in the current code.

>         if (ret == -1 && need_fallocate) {
> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>             ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | 
> FALLOC_FL_KEEP_SIZE,
>                             start, length);
> #endif
>         }
>         if (need_madvise && (!need_fallocate || (ret == 0))) {

I'm confused by the (ret == 0) check here.  Do you still want to call
madvise() if fallocate() succeeded?

> #if defined(CONFIG_MADVISE)
>             ret =  madvise(host_startaddr, length, MADV_DONTNEED);
>             fprintf(stderr, "%s: Did madvise for %p got %d\n", __func__, 
> host_startaddr, ret);
> #endif
>         }


Anyway, now I'm considering simply not touching
ram_block_discard_range() and adding a new helper, because the
requirements are different.  Maybe in the future we can make the two
functions share code, if we decide FALLOC_FL_PUNCH_HOLE will be useful
for RAM_NONPERSISTENT too.

(BTW, I will probably rename "persistent=no"/RAM_NONPERSISTENT to
something more explicit about data being dropped, like
"free-on-exit=yes" or "disposable=yes").

> 
> Dave
> 
> > Signed-off-by: Eduardo Habkost <ehabk...@redhat.com>
> > ---
> >  exec.c | 17 ++++++++++++++++-
> >  1 file changed, 16 insertions(+), 1 deletion(-)
> > 
> > diff --git a/exec.c b/exec.c
> > index 585d6ed6d7..a6e9ed4ece 100644
> > --- a/exec.c
> > +++ b/exec.c
> > @@ -102,6 +102,11 @@ static MemoryRegion io_mem_unassigned;
> >   */
> >  #define RAM_RESIZEABLE (1 << 2)
> >  
> > +/* RAMBlock contents are not persistent, and we can discard memory contents
> > + * when freeing the memory block.
> > + */
> > +#define RAM_NONPERSISTENT (1 << 3)
> > +
> >  #endif
> >  
> >  #ifdef TARGET_PAGE_BITS_VARY
> > @@ -2061,6 +2066,10 @@ void qemu_ram_free(RAMBlock *block)
> >          ram_block_notify_remove(block->host, block->max_length);
> >      }
> >  
> > +    if (block->flags & RAM_NONPERSISTENT) {
> > +        ram_block_discard_range(block, 0, block->max_length);
> > +    }
> > +
> >      qemu_mutex_lock_ramlist();
> >      QLIST_REMOVE_RCU(block, next);
> >      ram_list.mru_block = NULL;
> > @@ -3537,7 +3546,13 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t 
> > start, size_t length)
> >              /* Note: We need the madvise MADV_DONTNEED behaviour of 
> > definitely
> >               * freeing the page.
> >               */
> > -            ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > +            if (rb->flags & RAM_NONPERSISTENT) {
> > +                ret = madvise(host_startaddr, length, MADV_REMOVE);
> > +            }
> > +            /* Fallback to MADV_DONTNEED if MADV_REMOVE fails */
> > +            if (ret || !(rb->flags & RAM_NONPERSISTENT)) {
> > +                ret = madvise(host_startaddr, length, MADV_DONTNEED);
> > +            }
> >  #endif
> >          } else {
> >              /* Huge page case  - unfortunately it can't do DONTNEED, but
> > -- 
> > 2.11.0.259.g40922b1
> > 
> --
> Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



-- 
Eduardo

Re: [Qemu-devel] [PATCH 3/5] memory: Add RAM_NONPERSISTENT flag

Reply via email to