On Sat, Feb 27, 2010 at 11:21:17AM +0200, Mikolaj Golub wrote: > Hi, > > Recently I run some tests, which create/destroy tun interface in loop, and > after several hours my system panicked with "kmem_map too small". It has > appeared that tun (or tap) device does not free memory after the device > destroy: > > zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1' > Type InUse MemUse HighUse Requests Size(s) > DEVFS1 139 35K - 157 256 > zhuzha:/usr/src/sys/kern% sudo ifconfig tun0 create > zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1' > Type InUse MemUse HighUse Requests Size(s) > DEVFS1 140 35K - 159 256 > zhuzha:/usr/src/sys/kern% sudo ifconfig tun0 destroy > zhuzha:/usr/src/sys/kern% sudo vmstat -m |grep 'Type\|DEVFS1' > Type InUse MemUse HighUse Requests Size(s) > DEVFS1 140 35K - 159 256 > > And when running create/destroy in loop: > > Time Type InUse MemUse HighUse Requests Size(s) > 09:20 DEVFS1 104 26K - 113 256 > 09:25 DEVFS1 8504 2126K - 16912 256 > 09:30 DEVFS1 31602 7901K - 63108 256 > 09:35 DEVFS1 54316 13579K - 108536 256 > 09:40 DEVFS1 77068 19267K - 154040 256 > 09:45 DEVFS1 99764 24941K - 199431 256 > 09:50 DEVFS1 122408 30602K - 244719 256 > 09:55 DEVFS1 144689 36173K - 289281 256 > > It looks like the problem is that tun/tap_clone_create() calls make_dev() and > then dev_ref(dev). make_dev() calls itself dev_refl(), so after device > creating > we have si_refcount == 2. But on device removal (tun/tap_clone_destroy()) > dev_rel() is never called, only destroy_dev(dev), which checks that > si_refcount is still not zero and places the dev in dead_cdevsw.d_devs list. > > And running kgdb we can see the following picture: > > (kgdb) p *dead_cdevsw.d_devs.lh_first > $2 = {__si_reserved = 0x0, si_flags = 0, si_atime = {tv_sec = 1267218482, > tv_nsec = 0}, si_ctime = { > tv_sec = 1267218482, tv_nsec = 0}, si_mtime = {tv_sec = 1267218482, > tv_nsec = 0}, si_uid = 66, > si_gid = 68, si_mode = 384, si_cred = 0x0, si_drv0 = 0, si_refcount = 1, > si_list = { > le_next = 0xcd183100, le_prev = 0xc0d90278}, si_clone = {le_next = 0x0, > le_prev = 0xc5c83b50}, > si_children = {lh_first = 0x0}, si_siblings = {le_next = 0x0, le_prev = > 0x0}, si_parent = 0x0, > si_name = 0xcaadc278 "tun0", si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, > si_iosize_max = 0, > si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, > __si_namebuf = "tun0", '\0' <repeats 59 times>} > (kgdb) p *dead_cdevsw.d_devs.lh_first->si_list.le_next > $3 = {__si_reserved = 0x0, si_flags = 0, si_atime = {tv_sec = 1267218421, > tv_nsec = 0}, si_ctime = { > tv_sec = 1267218421, tv_nsec = 0}, si_mtime = {tv_sec = 1267218421, > tv_nsec = 0}, si_uid = 66, > si_gid = 68, si_mode = 384, si_cred = 0x0, si_drv0 = 0, si_refcount = 1, > si_list = { > le_next = 0xcd183000, le_prev = 0xcaadc238}, si_clone = {le_next = 0x0, > le_prev = 0xc5c83b50}, > si_children = {lh_first = 0x0}, si_siblings = {le_next = 0x0, le_prev = > 0x0}, si_parent = 0x0, > si_name = 0xcd183178 "tun0", si_drv1 = 0x0, si_drv2 = 0x0, si_devsw = 0x0, > si_iosize_max = 0, > si_usecount = 0, si_threadcount = 0, __si_u = {__sid_snapdata = 0x0}, > __si_namebuf = "tun0", '\0' <repeats 59 times>} > ... and so on. > > Is dev_ref() needed in tun_clone_create() after make_dev() call? Can't it be > safely removed as in the patch below? I have run some tests with the patch and > it looks like it works for me.
CHEAPCLONE is unused too. Besides this, there are two races in the clone handling. First is that dev_clone handlers shall use make_dev_credf(MAKEDEV_REF) instead of make_dev/dev_ref. Second is that module unload shall drain clone events. Please test the patch below. diff --git a/sys/net/if_tap.c b/sys/net/if_tap.c index 950e96c..93801f1 100644 --- a/sys/net/if_tap.c +++ b/sys/net/if_tap.c @@ -192,10 +192,6 @@ tap_clone_create(struct if_clone *ifc, int unit, caddr_t params) if (i) { dev = make_dev(&tap_cdevsw, unit | extra, UID_ROOT, GID_WHEEL, 0600, "%s%d", ifc->ifc_name, unit); - if (dev != NULL) { - dev_ref(dev); - dev->si_flags |= SI_CHEAPCLONE; - } } tapcreate(dev); @@ -300,6 +296,7 @@ tapmodevent(module_t mod, int type, void *data) EVENTHANDLER_DEREGISTER(dev_clone, eh_tag); if_clone_detach(&tap_cloner); if_clone_detach(&vmnet_cloner); + drain_dev_clone_events(); mtx_lock(&tapmtx); while ((tp = SLIST_FIRST(&taphead)) != NULL) { @@ -381,12 +378,8 @@ tapclone(void *arg, struct ucred *cred, char *name, int namelen, struct cdev **d name = devname; } - *dev = make_dev(&tap_cdevsw, unit | extra, - UID_ROOT, GID_WHEEL, 0600, "%s", name); - if (*dev != NULL) { - dev_ref(*dev); - (*dev)->si_flags |= SI_CHEAPCLONE; - } + *dev = make_dev_credf(MAKEDEV_REF, &tap_cdevsw, unit | extra, + cred, UID_ROOT, GID_WHEEL, 0600, "%s", name); } if_clone_create(name, namelen, NULL); diff --git a/sys/net/if_tun.c b/sys/net/if_tun.c index 37b5e70..1fa02ac 100644 --- a/sys/net/if_tun.c +++ b/sys/net/if_tun.c @@ -188,10 +188,6 @@ tun_clone_create(struct if_clone *ifc, int unit, caddr_t params) /* No preexisting struct cdev *, create one */ dev = make_dev(&tun_cdevsw, unit, UID_UUCP, GID_DIALER, 0600, "%s%d", ifc->ifc_name, unit); - if (dev != NULL) { - dev_ref(dev); - dev->si_flags |= SI_CHEAPCLONE; - } } tuncreate(ifc->ifc_name, dev); @@ -237,12 +233,8 @@ tunclone(void *arg, struct ucred *cred, char *name, int namelen, name = devname; } /* No preexisting struct cdev *, create one */ - *dev = make_dev(&tun_cdevsw, u, + *dev = make_dev_credf(MAKEDEV_REF, &tun_cdevsw, u, cred, UID_UUCP, GID_DIALER, 0600, "%s", name); - if (*dev != NULL) { - dev_ref(*dev); - (*dev)->si_flags |= SI_CHEAPCLONE; - } } if_clone_create(name, namelen, NULL); @@ -303,6 +295,7 @@ tunmodevent(module_t mod, int type, void *data) case MOD_UNLOAD: if_clone_detach(&tun_cloner); EVENTHANDLER_DEREGISTER(dev_clone, tag); + drain_dev_clone_events(); mtx_lock(&tunmtx); while ((tp = TAILQ_FIRST(&tunhead)) != NULL) {
pgpr3tfsUM6Pw.pgp
Description: PGP signature