On Wed 01-08-18 21:09:39, Michael Ellerman wrote: > Michal Hocko <mho...@kernel.org> writes: > > On Wed 25-07-18 13:11:15, John Allen wrote: > > [...] > >> Does a failure in do_migrate_range indicate that the range is unmigratable > >> and the loop in __offline_pages should terminate and goto failed_removal? > >> Or > >> should we allow a certain number of retrys before we > >> give up on migrating the range? > > > > Unfortunatelly not. Migration code doesn't tell a difference between > > ephemeral and permanent failures. > > What's to stop an ephemeral failure happening repeatedly?
If there is a short term pin on the page that prevents the migration then the holder of the pin should realease it and the next retry will succeed the migration. If the page gets freed on the way then it will not be reallocated because they are isolated already. I can only see complete OOM to be the reason to fail allocation of the target place as the migration failure and that is highly unlikely and sooner or later trigger the oom killer and release some memory. The biggest problem here is that we cannot tell ephemeral and long term pins... -- Michal Hocko SUSE Labs