https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=242427

            Bug ID: 242427
           Summary: pmap_remove() sometimes is very slow causing 10+
                    minutes long reboots
           Product: Base System
           Version: 11.3-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: b...@freebsd.org
          Reporter: p...@lysator.liu.se

I've noticed that on our file servers it can take a _very_ long time to reboot
- where most of the time spent is during the "shutdown" phase - time spent
after the "Uptime: " line is printed.

After a long debugging session I've pinpointed it to ZFS freeing the "zio"
cache, which, after many levels of function calls, ends up it in pmap_remove()
where some calls to it takes approximately 1 second. On a basically idle test
server it can take up to 10-20 minutes for the server to "shut down" (or even
more - the time seems to be relative to the server uptime). We've seen
production machines that seem to be "hung" (atleast an hour or more) so we gave
up and sent them a "hard reset" via IPMI.

Hardware:
Dell PowerEdge R730xd with dual Intel Xeon E5-2620v4 CPUs (32 "cpus") and 256GB
of RAM. No swap.

Software:
FreeBSD 11.3-RELEASE-p5. ZFS on boot & data. ZFS ARC limited to 128GB. Approx
24000 ZFS filesystems (empty on this test server). Snapshots taken every hour.


An example of how long it can take (I'm only printing timing info for calls
that take >=1s (the top 4) or >=2s (the rest):

kmem_unback: pmap_remove(kernel_pmap, 18446741877714755584,
18446741877714767872) took 1 seconds
kmem_free: kmem_unback(kmem_object, 18446741877714755584, 12288) took 1 seconds
page_free: kmem_free(kmem_arena, 18446741877714755584, 12288) took 1 seconds
keg_free_slab: keg->uk_freef(mem) {page_free} took 1 seconds

keg_drain: while-keg_free_slab-loop took 14 seconds [20021 loops, 14 slow
calls]
zone_drain_wait: zone_foreach_keg(zone, &keg_drain) took 14 seconds
zone_dtor: zone_drain_wait(zone, M_WAITOK) took 14 seconds
zone_free_item(zone=UMA Zones): zone->uz_dtor() took 14 seconds
uma_zdestroy(zio_buf_12288) took 14 seconds
kmem_cache_destroy: uma_zdestroy(0xfffff803467c8ac0) [zio_buf_12288] took 14
seconds
kmem_cache_destroy(zio_buf_cache[20]) took 14 seconds

Called from kern_shutdown() -> EVENTHANDLER_INVOKE(shutdown_post_sync) ->
zfsshutdown()


Ie, ~14 out of 20021 calls to keg_free_slab() takes 1 second instead of
executing really quickly (in this case).
But some kmem_cache:s are much bigger causing delays for 300-600 seconds (or
more).

(I use "time_second" for time measurements, should probably use something with
better granularity for the top 4 calls :-)

I added a sysctl() kern.shutdown.verbose that I can set to a number to make it
be more verbose (and added a lot of printf()s to get this info)... With it I
now can see the number of filesystems being unmounted (since that too can take
a little while - nothing close to the times above though).

- Peter

-- 
You are receiving this mail because:
You are the assignee for the bug.
_______________________________________________
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"

Reply via email to