John, Will work on a new revision based on feedback.
Two things to note however: Already explored the idea of using kernel_sysctlbyname but rejected due to following: It makes little sense to have a rw sysctl that only takes effect "some times". This violates POLA at the expense of making code appear cleaner. Expectation is that writable sysctls take effect or are read only. They are not to be "write sometimes" unless we are to introduce a new flag. Instead of going to a confusing model we consider some form of rw sysctl that can set itself ro somehow. Otherwise people will be confused as to why nic queues says N while actually M. What the rw->ro api would look like I have no idea. Suggestions? Btw the review was cc'd to jvf and smh and gleb in phabricator. Smh responded. Jvf + gleb timed out after 2+ weeks for a relatively trivial change. Will look at the device_get_int stuff soon. Wasn't aware that function existed. Sent from my iPhone >> On Dec 1, 2014, at 6:32 AM, John Baldwin <j...@freebsd.org> wrote: >> >> On Wednesday, November 26, 2014 08:19:36 PM Alfred Perlstein wrote: >> Author: alfred >> Date: Wed Nov 26 20:19:36 2014 >> New Revision: 275136 >> URL: https://svnweb.freebsd.org/changeset/base/275136 >> >> Log: >> Make igb and ixgbe check tunables at probe time. >> >> This allows one to make a kernel module to tune the >> number of queues before the driver loads. >> >> This is needed so that a module at SI_SUB_CPU can set >> tunables for these drivers to take. Otherwise getenv >> is called too early by the TUNABLE macros. >> >> Reviewed by: smh >> Phabric: https://reviews.freebsd.org/D1149 > > The explicit TUNABLE_INT_FETCH strikes me as the wrong approach and hackish. > That is, each time you want to frob a tunable like this you will have to go > add a bunch of code to explicitly re-read the tunable, etc. Instead, you > could have simply changed your helper module to use kernel_sysctlbyname() to > set hw.igb.num_queues. That would have required only a single character > change to make the SYSCTL read/write instead of read/only for each tunable in > question. To be completely future proof (i.e. to handle loading the module > in > question post-boot), your module could both do setenv() and > kernel_sysctlbyname(). That seems to be a more extensible approach in terms > of allowing more of these to be changed in the future without having to do a > manual bypass of the existing tunable infrastructure for each tunable. I > would much prefer that you revert this part and change the relevant tunables > to CTLFLAG_RWTUN and update your out-of-tree module to use > kernel_sysctlbyname(). > > Also, if you didn't run this by the Intel folks (e.g. jfv@) that might have > been nice as a courtesy. > > Also, please use the existing resource_int_value() that uses 'hint.igb.0.foo' > instead of inventing a different wheel (device_get_int()). This is what all > the other drivers in the tree that provide per-instance tunables use. You > could perhaps have device_get_int() as a wrapper around resource_int_value(), > but we should use the existing convention, not invent a conflicting one. > >> Modified: >> head/sys/dev/e1000/if_igb.c >> head/sys/dev/ixgbe/ixgbe.c >> head/sys/kern/subr_bus.c >> head/sys/sys/bus.h >> >> Modified: head/sys/dev/e1000/if_igb.c >> ============================================================================ >> == --- head/sys/dev/e1000/if_igb.c Wed Nov 26 18:03:25 2014 (r275135) > +++ >> head/sys/dev/e1000/if_igb.c Wed Nov 26 20:19:36 2014 (r275136) @@ > -188,6 >> +188,7 @@ static char *igb_strings[] = { >> /********************************************************************* >> * Function prototypes >> *********************************************************************/ >> +static int igb_per_unit_num_queues(SYSCTL_HANDLER_ARGS); >> static int igb_probe(device_t); >> static int igb_attach(device_t); >> static int igb_detach(device_t); >> @@ -493,6 +494,11 @@ igb_attach(device_t dev) >> OID_AUTO, "nvm", CTLTYPE_INT|CTLFLAG_RW, adapter, 0, >> igb_sysctl_nvm_info, "I", "NVM Information"); >> >> + SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev), >> + SYSCTL_CHILDREN(device_get_sysctl_tree(dev)), >> + OID_AUTO, "num_queues", CTLTYPE_INT | CTLFLAG_RD, >> + adapter, 0, igb_per_unit_num_queues, "I", "Number of Queues"); >> + > > You don't need igb_per_unit_num_queues(). > SYSCTL_ADD_INT(..., &adapter->num_queues) should have been used instead. > >> @@ -2831,6 +2837,7 @@ igb_setup_msix(struct adapter *adapter) >> { >> device_t dev = adapter->dev; >> int bar, want, queues, msgs, maxqueues; >> + int n_queues; >> >> /* tuneable override */ >> if (igb_enable_msix == 0) >> @@ -2858,8 +2865,18 @@ igb_setup_msix(struct adapter *adapter) >> goto msi; >> } >> >> - /* Figure out a reasonable auto config value */ >> - queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus; >> + n_queues = 0; >> + /* try more specific tunable, then global, then finally default to boot >> time tunable if set. */ + if (device_getenv_int(dev, "num_queues", >> &n_queues) != 0) { >> + device_printf(dev, "using specific tunable num_queues=%d", >> n_queues); >> + } else if (TUNABLE_INT_FETCH("hw.igb.num_queues", &n_queues) != 0) { >> + if (igb_num_queues != n_queues) { >> + device_printf(dev, "using global tunable hw.igb.num_queues=%d", >> n_queues); + igb_num_queues = n_queues; >> + } >> + } else { >> + n_queues = igb_num_queues; >> + } >> >> #ifdef RSS >> /* If we're doing RSS, clamp at the number of RSS buckets */ >> @@ -2867,10 +2884,12 @@ igb_setup_msix(struct adapter *adapter) >> queues = rss_getnumbuckets(); >> #endif >> >> - >> - /* Manual override */ >> - if (igb_num_queues != 0) >> - queues = igb_num_queues; >> + if (n_queues != 0) { >> + queues = n_queues; >> + } else { >> + /* Figure out a reasonable auto config value */ >> + queues = (mp_ncpus > (msgs-1)) ? (msgs-1) : mp_ncpus; >> + } > > This moves the mp_ncpus auto config below the RSS bits, so now you > are ignoring the RSS value. You probably should have left the > mp_ncpus line where it was. > > One old bug here is that igb checked the tunable value twice. It > should only check it once at the end of all the auto-tuning. > >> /* Sanity check based on HW */ >> switch (adapter->hw.mac.type) { >> @@ -2893,12 +2912,17 @@ igb_setup_msix(struct adapter *adapter) >> maxqueues = 1; >> break; >> } >> - if (queues > maxqueues) >> + if (queues > maxqueues) { >> + device_printf(adapter->dev, "requested %d queues, but max for this >> adapter is %d\n", + queues, maxqueues); >> queues = maxqueues; >> - >> - /* Manual override */ >> - if (igb_num_queues != 0) >> - queues = igb_num_queues; >> + } else if (queues == 0) { >> + queues = 1; >> + } else if (queues < 0) { >> + device_printf(adapter->dev, "requested %d queues, but min for this >> adapter is %d\n", + queues, 1); >> + queues = 1; >> + } > > So here is where I would check the tunable to allow it to override. Assuming > you revert the TUNABLE_INT_FETCH business then this only needs: > > /* Manual override */ > if (device_get_int(&n_queues) != 0) > n_queues = igb_num_queues; > if (n_queues != 0) { > if (n_queues > maxqueues || n_queues < 0) > /* whine */ > else > queues = n_queues; > } > > -- > John Baldwin > _______________________________________________ svn-src-head@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"