Re: [PATCH v15 04/13] task_isolation: add initial support
On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: > On 8/29/2016 12:33 PM, Peter Zijlstra wrote: > >On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: > >>+ /* > >>+* Request rescheduling unless we are in full dynticks mode. > >>+* We would eventually get pre-empted without this, and if > >>+* there's another task waiting, it would run; but by > >>+* explicitly requesting the reschedule, we may reduce the > >>+* latency. We could directly call schedule() here as well, > >>+* but since our caller is the standard place where schedule() > >>+* is called, we defer to the caller. > >>+* > >>+* A more substantive approach here would be to use a struct > >>+* completion here explicitly, and complete it when we shut > >>+* down dynticks, but since we presumably have nothing better > >>+* to do on this core anyway, just spinning seems plausible. > >>+*/ > >>+ if (!tick_nohz_tick_stopped()) > >>+ set_tsk_need_resched(current); > >This is broken.. and it would be really good if you don't actually need > >to do this. > > Can you elaborate? We clearly do want to wait until we are in full > dynticks mode before we return to userspace. > > We could do it just in the prctl() syscall only, but then we lose the > ability to implement the NOSIG mode, which can be a convenience. So this isn't spelled out anywhere. Why does this need to be in the return to user path? > Even without that consideration, we really can't be sure we stay in > dynticks mode if we disable the dynamic tick, but then enable interrupts, > and end up taking an interrupt on the way back to userspace, and > it turns the tick back on. That's why we do it here, where we know > interrupts will stay disabled until we get to userspace. But but but.. task_isolation_enter() is explicitly ran with IRQs _enabled_!! It even WARNs if they're disabled. > So if we are doing it here, what else can/should we do? There really > shouldn't be any other tasks waiting to run at this point, so there's > not a heck of a lot else to do on this core. We could just spin and > check need_resched and signal status manually instead, but that > seems kind of duplicative of code already done in our caller here. What !? I really don't get this, what are you waiting for? Why is rescheduling making things better. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v15 04/13] task_isolation: add initial support
On Mon, Aug 29, 2016 at 12:53:30PM -0400, Chris Metcalf wrote: > Would it be cleaner to just replace the set_tsk_need_resched() call > with something like: > > set_current_state(TASK_INTERRUPTIBLE); > schedule(); > __set_current_state(TASK_RUNNING); > > or what would you recommend? That'll just get you to sleep _forever_... > Or, as I said, just doing a busy loop here while testing to see > if need_resched or signal had been set? Why do you care about need_resched() and or signals? How is that related to the tick being stopped or not? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PACTH v2 0/3] Implement /proc//totmaps
On Mon 29-08-16 16:37:04, Michal Hocko wrote: > [Sorry for a late reply, I was busy with other stuff] > > On Mon 22-08-16 15:44:53, Sonny Rao wrote: > > On Mon, Aug 22, 2016 at 12:54 AM, Michal Hocko wrote: > [...] > > But what about the private_clean and private_dirty? Surely > > those are more generally useful for calculating a lower bound on > > process memory usage without additional knowledge? > > I guess private_clean can be used as a reasonable estimate. I was thinking about this more and I think I am wrong here. Even truly MAP_PRIVATE|MAP_ANON will be in private_dirty. So private_clean will become not all that interesting and similarly misleading as its _dirty variant (mmaped file after [m]sync should become _clean) and that doesn't mean the memory will get freed after the process which maps it terminates. Take shmem as an example again. > private_dirty less so because it may refer to e.g. tmpfs which is not > mapped by other process and so no memory would be freed after unmap > without removing the file. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PACTH v2 0/3] Implement /proc//totmaps
On Wed 24-08-16 12:14:06, Marcin Jabrzyk wrote: [...] > Sorry to hijack the thread, but I've found it recently > and I guess it's the best place to present our point. > We are working at our custom OS based on Linux and we also suffered much > by /proc//smaps file. As in Chrome we tried to improve our internal > application memory management polices (Low Memory Killer) using data > provided by smaps but we failed due to very long time needed for reading > and parsing properly the file. I was already questioning Pss and also Private_* for any memory killer purpose earlier in the thread because cumulative numbers for all mappings can be really meaningless. Especially when you do not know about which resource is shared and by whom. Maybe you can describe how you are using those cumulative numbers for your decisions and prove me wrong but I simply haven't heard any sound arguments so far. Everything was just "we know what we are doing in our environment so we know those resouces and therefore those numbers make sense to us". But with all due respect this is not a reason to add a user visible API into the kernel. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] leds: trigger: Introduce an USB port trigger
On Fri, Aug 26, 2016 at 05:38:05PM +0200, Rafał Miłecki wrote: > On 25 August 2016 at 14:49, Greg KH wrote: > > On Thu, Aug 25, 2016 at 10:03:52AM +0200, Rafał Miłecki wrote: > >> +static void usbport_trig_activate(struct led_classdev *led_cdev) > >> +{ > >> + struct usbport_trig_data *usbport_data; > >> + int err; > >> + > >> + usbport_data = kzalloc(sizeof(*usbport_data), GFP_KERNEL); > >> + if (!usbport_data) > >> + return; > >> + usbport_data->led_cdev = led_cdev; > >> + > >> + /* Storing ports */ > >> + INIT_LIST_HEAD(&usbport_data->ports); > >> + usbport_data->ports_dir = kobject_create_and_add("ports", > >> + > >> &led_cdev->dev->kobj); > > > > If you _ever_ find yourself in a driver having to use kobject calls, > > it's a HUGE hint that you are doing something wrong. > > > > Hint, you are doing this wrong :) > > > > Use an attribute group if you need a subdirectory in sysfs, using a > > "raw" kobject like this hides it from all userspace tools and so no > > userspace program can see it (hint, try using libudev to access these > > files attached to the device...) > > Oops. Thanks for pointing groups to me. I was looking at sysfs.h > earlier but I didn't realize group can be a subdirectory. I can see > these sysfs_create_group(s) and friends now, thanks. No, use an attribute group, and name it, and the subdir will be created automagically for you. > >> + if (!usbport_data->ports_dir) > >> + goto err_free; > >> + > >> + /* API for ports management */ > >> + err = device_create_file(led_cdev->dev, &dev_attr_new_port); > >> + if (err) > >> + goto err_put_ports; > >> + err = device_create_file(led_cdev->dev, &dev_attr_remove_port); > >> + if (err) > >> + goto err_remove_new_port; > > > > Doesn't this race with userspace and fail? Shouldn't the led core be > > creating your leds for you? > > These questions aren't clear to me. What kind of race? Doing > echo usbport > trigger > results in trigger core calling usbport_trig_activate. We create new > attributes and then we return. Why aren't the attributes present when the device is found? What is "trigger" for? > I'm also not creating any leds there. This already has to be LED > available if you want to assign some trigger to it. That sounds really odd... > >> + > >> + /* Notifications */ > >> + usbport_data->nb.notifier_call = usbport_trig_notify, > >> + led_cdev->trigger_data = usbport_data; > >> + usb_register_notify(&usbport_data->nb); > > > > Don't abuse the USB notifier chain with stuff like this please, is that > > really necessary? Why can't your hardware just tell you what USB ports > > are in use "out of band"? > > I totally don't understand this part. This LED trigger is supposed to > react to USB devices appearing at specified ports. How else can I know > there is a new USB device if not by notifications? > I'm wondering if we are on the same page. Could it be my idea of this > LED trigger is not clear at all? Could you take a look at commit > message again, please? Mostly this part: > > This commit adds a new trigger responsible for turning on LED when USB > > device gets connected to the specified USB port. This can can useful for > > various home routers that have USB port(s) and a proper LED telling user > > a device is connected. > > Can I add something more to make it clear what this trigger is responsible > for? Yes please, as I totally missed that. And the USB notifier was created for some "special" parts of the USB core to use, this feels like an abuse of that in some way. But it's hard to define, I know... > >> + > >> + led_cdev->activated = true; > >> + return; > >> + > >> +err_remove_new_port: > >> + device_remove_file(led_cdev->dev, &dev_attr_new_port); > >> +err_put_ports: > >> + kobject_put(usbport_data->ports_dir); > >> +err_free: > >> + kfree(usbport_data); > >> +} > > > > And again, why is this USB specific? Why can't you use this same > > userspace api and codebase for PCI ports? For a sdcard port? For a > > thunderbolt port? > > I'm leaving this one unanswered as discussion on this continued in > V3.5 thread below my reply: > On 25 August 2016 at 07:14, Rafał Miłecki wrote: > > Good question. I would like to extend this USB port trigger in the > > future by reacting to USB activity. This involves playing with URBs > > and I believe that at that point it'd be getting too much USB specific > > to /rule them all/. No, I mean why not have this specific activity LED work for all types of devices, not just USB. Not that this specific LED works for other types of USB activity. thanks, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 04/20] x86: Secure Memory Encryption (SME) support
On 08/25/2016 08:04 AM, Thomas Gleixner wrote: > On Mon, 22 Aug 2016, Tom Lendacky wrote: > >> Provide support for Secure Memory Encryption (SME). This initial support >> defines the memory encryption mask as a variable for quick access and an >> accessor for retrieving the number of physical addressing bits lost if >> SME is enabled. > > What is the reason that this needs to live in assembly code? In later patches this code is expanded and deals with a lot of page table manipulation, cpuid/rdmsr instructions, etc. and so I thought it was best to do it this way. Thanks, Tom > > Thanks, > > tglx > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH v2 04/20] x86: Secure Memory Encryption (SME) support
On Aug 30, 2016 6:34 AM, "Tom Lendacky" wrote: > > On 08/25/2016 08:04 AM, Thomas Gleixner wrote: > > On Mon, 22 Aug 2016, Tom Lendacky wrote: > > > >> Provide support for Secure Memory Encryption (SME). This initial support > >> defines the memory encryption mask as a variable for quick access and an > >> accessor for retrieving the number of physical addressing bits lost if > >> SME is enabled. > > > > What is the reason that this needs to live in assembly code? > > In later patches this code is expanded and deals with a lot of page > table manipulation, cpuid/rdmsr instructions, etc. and so I thought it > was best to do it this way. None of that sounds like it needs to be in asm, though. I, at least, have a strong preference for minimizing the amount of asm in the low-level arch code. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v15 04/13] task_isolation: add initial support
On 8/30/2016 3:58 AM, Peter Zijlstra wrote: On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: On 8/29/2016 12:33 PM, Peter Zijlstra wrote: On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: + /* +* Request rescheduling unless we are in full dynticks mode. +* We would eventually get pre-empted without this, and if +* there's another task waiting, it would run; but by +* explicitly requesting the reschedule, we may reduce the +* latency. We could directly call schedule() here as well, +* but since our caller is the standard place where schedule() +* is called, we defer to the caller. +* +* A more substantive approach here would be to use a struct +* completion here explicitly, and complete it when we shut +* down dynticks, but since we presumably have nothing better +* to do on this core anyway, just spinning seems plausible. +*/ + if (!tick_nohz_tick_stopped()) + set_tsk_need_resched(current); This is broken.. and it would be really good if you don't actually need to do this. Can you elaborate? We clearly do want to wait until we are in full dynticks mode before we return to userspace. We could do it just in the prctl() syscall only, but then we lose the ability to implement the NOSIG mode, which can be a convenience. So this isn't spelled out anywhere. Why does this need to be in the return to user path? I'm not sure where this should be spelled out, to be honest. I guess I can add some commentary to the commit message explaining this part. The basic idea is just that we don't want to be at risk from the dyntick getting enabled. Similarly, we don't want to be at risk of a later global IPI due to lru_add_drain stuff, for example. And, we may want to add additional stuff, like catching kernel TLB flushes and deferring them when a remote core is in userspace. To do all of this kind of stuff, we need to run in the return to user path so we are late enough to guarantee no further kernel things will happen to perturb our carefully-arranged isolation state that includes dyntick off, per-cpu lru cache empty, etc etc. Even without that consideration, we really can't be sure we stay in dynticks mode if we disable the dynamic tick, but then enable interrupts, and end up taking an interrupt on the way back to userspace, and it turns the tick back on. That's why we do it here, where we know interrupts will stay disabled until we get to userspace. But but but.. task_isolation_enter() is explicitly ran with IRQs _enabled_!! It even WARNs if they're disabled. Yes, true! But if you pop up to the caller, the key thing is the task_isolation_ready() routine where we are invoked with interrupts disabled, and we confirm that all our criteria are met (including tick_nohz_tick_stopped), and then leave interrupts disabled as we return from there onwards to userspace. The task_isolation_enter() code just does its best-faith attempt to make sure all these criteria are met, just like all the other TIF_xxx flag tests do in exit_to_usermode_loop() on x86, like scheduling, delivering signals, etc. As you know, we might run that code, go around the loop, and discover that the TIF flag has been re-set, and we have to run the code again before all of that stuff has "quiesced". The isolation code uses that same model; the only difference is that we clear the TIF flag manually in the loop by checking task_isolation_ready(). So if we are doing it here, what else can/should we do? There really shouldn't be any other tasks waiting to run at this point, so there's not a heck of a lot else to do on this core. We could just spin and check need_resched and signal status manually instead, but that seems kind of duplicative of code already done in our caller here. What !? I really don't get this, what are you waiting for? Why is rescheduling making things better. We need to wait for the last dyntick to fire before we can return to userspace. There are plenty of options as to what we can do in the meanwhile. 1. Try to schedule(). Good luck with that in practice, since a userspace process that has enabled task isolation is going to be alone on its core unless something pretty broken is happening on the system. But, at least folks understand the idiom of scheduling out while you wait. 2. Another variant of that: set up a wait completion and have the dynticks code complete it when the tick turns off. But this adds complexity to option 1, and really doesn't buy us much in practice that I can see. 3. Just admit that we are likely alone on the core, and just burn cycles in a busy loop waiting for that last tick to fire. Obviously if we do this we also need to test for signals and resched so the core remains responsive. We can either do this in a loop just by spinning explicitly, or I could literally just remove the line in the current patchset that sets TIF_NEED_R
Re: [PATCH v15 04/13] task_isolation: add initial support
On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: > On 8/30/2016 3:58 AM, Peter Zijlstra wrote: >> >> On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: >>> >>> On 8/29/2016 12:33 PM, Peter Zijlstra wrote: On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: > > + /* > +* Request rescheduling unless we are in full dynticks mode. > +* We would eventually get pre-empted without this, and if > +* there's another task waiting, it would run; but by > +* explicitly requesting the reschedule, we may reduce the > +* latency. We could directly call schedule() here as well, > +* but since our caller is the standard place where schedule() > +* is called, we defer to the caller. > +* > +* A more substantive approach here would be to use a struct > +* completion here explicitly, and complete it when we shut > +* down dynticks, but since we presumably have nothing better > +* to do on this core anyway, just spinning seems plausible. > +*/ > + if (!tick_nohz_tick_stopped()) > + set_tsk_need_resched(current); This is broken.. and it would be really good if you don't actually need to do this. >>> >>> Can you elaborate? We clearly do want to wait until we are in full >>> dynticks mode before we return to userspace. >>> >>> We could do it just in the prctl() syscall only, but then we lose the >>> ability to implement the NOSIG mode, which can be a convenience. >> >> So this isn't spelled out anywhere. Why does this need to be in the >> return to user path? > > > I'm not sure where this should be spelled out, to be honest. I guess > I can add some commentary to the commit message explaining this part. > > The basic idea is just that we don't want to be at risk from the > dyntick getting enabled. Similarly, we don't want to be at risk of a > later global IPI due to lru_add_drain stuff, for example. And, we may > want to add additional stuff, like catching kernel TLB flushes and > deferring them when a remote core is in userspace. To do all of this > kind of stuff, we need to run in the return to user path so we are > late enough to guarantee no further kernel things will happen to > perturb our carefully-arranged isolation state that includes dyntick > off, per-cpu lru cache empty, etc etc. None of the above should need to *loop*, though, AFAIK. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 3/3] drivers: Add visorbus to the drivers/virt directory
> -Original Message- > From: Greg KH [mailto:gre...@linuxfoundation.org] > Sent: Sunday, August 21, 2016 2:05 PM > To: Kershner, David A > > On Fri, Jun 10, 2016 at 11:23:56PM -0400, David Kershner wrote: > > visorbus is currently located at drivers/staging/visorbus, > > this patch moves it to drivers/virt. > > > > Signed-off-by: David Kershner > > Reviewed-by: Tim Sell > > I picked one random file here, this last one, and found a number of > "odd" things in it. > > So, given that I can't really comment on the patch itself, I'm going to > include the file below, quote it, and then provide some comments, ok? There's obviously a lot of work we have to do here (and that work is underway), but I'd like to focus this email on explaining a class of comments you made about our error-handling. You have correctly pointed out some areas where we are deficient in our error-handling, but I think there are others where our error-handling is indeed sufficient, but this is perhaps NOT obvious because we are sending error codes to our partitioning firmware environment (via a message-passing interface across a shared-memory interface known as the 'controlvm channel') instead of specifying them in function return values. This partitioning firmware environment is NOT Linux, and is external to the Linux OS environment where the visorbus driver is running. It helps to understand the overall flow of visorchipset.c, which basically fills the role of the server responsible for handling requests initiated by the partitioning firmware environment: * controlvm_periodic_work() / handle_command() reads a request from the partitioning firmware environment, which will be something like "create a virtual bus" or "create a virtual device". * A case statement in handle_command() calls one of various functions to handle the request, which sends the success/failure result code of the operation back to the partitioning firmware environment using one of the paths: * bus_epilog() / bus_responder() / controlvm_respond() * device_epilog() / device_responder() / controlvm_respond() This success/failure result code is indicated via one of those CONTROLVM_RESP_* error mnemonics (which you clearly indicated your hatred for, and which we will change). * The partitioning firmware environment will react accordingly to the error code returned. E.g., if the request was to create the virtual bus containing the boot device, and the error code was CONTROLVM_RESP_ERROR_KMALLOC_FAILED, the partitioning firmware environment would fail the boot of the Linux guest and inform the user that the Linux guest boot failed due to an out-of-memory condition. (This also should clarify why we use awkward mnemonics like CONTROLVM_RESP_ERROR_KMALLOC_FAILED instead of -ENOMEM -- it's because the error codes are not defined by a Linux environment, because it isn't a Linux environment that is actually making the controlvm requests. But that's another issue.) Let me use visorchipset.c bus_create() as an example: static void bus_create(struct controlvm_message *inmsg) { struct controlvm_message_packet *cmd = &inmsg->cmd; u32 bus_no = cmd->create_bus.bus_no; int rc = CONTROLVM_RESP_SUCCESS; struct visor_device *bus_info; struct visorchannel *visorchannel; bus_info = visorbus_get_device_by_id(bus_no, BUS_ROOT_DEVICE, NULL); if (bus_info && (bus_info->state.created == 1)) { POSTCODE_LINUX_3(BUS_CREATE_FAILURE_PC, bus_no, POSTCODE_SEVERITY_ERR); rc = -CONTROLVM_RESP_ERROR_ALREADY_DONE; goto out_bus_epilog; } bus_info = kzalloc(sizeof(*bus_info), GFP_KERNEL); if (!bus_info) { POSTCODE_LINUX_3(BUS_CREATE_FAILURE_PC, bus_no, POSTCODE_SEVERITY_ERR); rc = -CONTROLVM_RESP_ERROR_KMALLOC_FAILED; goto out_bus_epilog; } INIT_LIST_HEAD(&bus_info->list_all); bus_info->chipset_bus_no = bus_no; bus_info->chipset_dev_no = BUS_ROOT_DEVICE; POSTCODE_LINUX_3(BUS_CREATE_ENTRY_PC, bus_no, POSTCODE_SEVERITY_INFO); visorchannel = visorchannel_create(cmd->create_bus.channel_addr, cmd->create_bus.channel_bytes, GFP_KERNEL, cmd->create_bus.bus_data_type_uuid); if (!visorchannel) { POSTCODE_LINUX_3(BUS_CREATE_FAILURE_PC, bus_no, POSTCODE_SEVERITY_ERR); rc = -CONTROLVM_RESP_ERROR_KMALLOC_FAILED; kfree(bus_info); bus_info = NULL; goto out_bus_epilog; } bus_info->visorchannel = visorchannel; if (uuid_le_cmp(cmd->create_bus.bus_inst_uuid, spar_siovm_uuid) == 0) {
Re: [PATCH v14 04/14] task_isolation: add initial support
On Tue, Aug 30, 2016 at 11:41:36AM -0400, Chris Metcalf wrote: > On 8/29/2016 8:55 PM, Frederic Weisbecker wrote: > >On Mon, Aug 15, 2016 at 10:59:55AM -0400, Chris Metcalf wrote: > >>On 8/11/2016 2:50 PM, Christoph Lameter wrote: > >>>On Thu, 11 Aug 2016, Frederic Weisbecker wrote: > >>> > Do we need to quiesce vmstat everytime before entering userspace? > I thought that vmstat only need to be offlined once and for all? > >>>Once is sufficient after disabling the tick. > >>It's true that task_isolation_enter() is called every time before > >>returning to user space while task isolation is enabled. > >> > >>But once we enter the kernel again after returning from the initial > >>prctl() -- assuming we are in NOSIG mode so doing so is legal in the > >>first place -- almost anything can happen, certainly including > >>restarting the tick. Thus, we have to make sure that normal quiescing > >>happens again before we return to userspace. > >Yes but we need to sort out what needs to be called only once on prctl(). > > > >Once vmstat is quiesced, it's not going to need quiescing again even if we > >restart the tick. > > That's true, but I really do like the idea of having a clean structure > where we verify all our prerequisites in task_isolation_ready(), and > have code to try to get things fixed up in task_isolation_enter(). > If we start moving some things here and some things there, it gets > harder to manage. I think by testing "!vmstat_idle()" in > task_isolation_enter() we are avoiding any substantial overhead. I think that making the code clearer on what needs to be done once for all on prctl() and what needs to be done on all actual syscall return is more important for readability. > > I think it would be clearer to rename task_isolation_enter() > to task_isolation_prepare(); it might be less confusing. For the prctl part, why not. > > Remember too that in general, we really don't need to think about > return-to-userspace as a hot path for task isolation, unlike how we > think about it all the rest of the time. So it makes sense to > prioritize keeping things clean from a software development > perspective first, and high-performance only secondarily. > > >>The thing to remember is that this is only relevant if the user has > >>explicitly requested the NOSIG behavior from task isolation, which we > >>don't really expect to be the default - we are implicitly encouraging > >>use of the default semantics of "you can't enter the kernel again > >>until you turn off isolation". > >That's right. Although NOSIG is the only thing we can afford as long as > >we drag around the 1Hz. > > True enough. Hopefully we'll finish sorting that out soon enough. > > + if (!tick_nohz_tick_stopped()) > + set_tsk_need_resched(current); > Again, that won't help > >>It won't be better than spinning in a loop if there aren't any other > >>schedulable processes, but it won't be worse either. If there is > >>another schedulable process, we at least will schedule it sooner than > >>if we just sat in a busy loop and waited for the scheduler to kick > >>us. But there's nothing else we can do anyway if we want to maintain > >>the guarantee that the dyn tick is stopped before return to userspace. > >I don't think it helps either way. If reschedule is pending, the current > >task already has TIF_RESCHED set. > > See the other thread with Peter Z for the longer discussion of this. > At this point I'm leaning towards replacing the set_tsk_need_resched() with > > set_current_state(TASK_INTERRUPTIBLE); > schedule(); > __set_current_state(TASK_RUNNING); I don't see how that helps. What will wake the thread up except a signal? -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 04/14] task_isolation: add initial support
On 8/29/2016 8:55 PM, Frederic Weisbecker wrote: On Mon, Aug 15, 2016 at 10:59:55AM -0400, Chris Metcalf wrote: On 8/11/2016 2:50 PM, Christoph Lameter wrote: On Thu, 11 Aug 2016, Frederic Weisbecker wrote: Do we need to quiesce vmstat everytime before entering userspace? I thought that vmstat only need to be offlined once and for all? Once is sufficient after disabling the tick. It's true that task_isolation_enter() is called every time before returning to user space while task isolation is enabled. But once we enter the kernel again after returning from the initial prctl() -- assuming we are in NOSIG mode so doing so is legal in the first place -- almost anything can happen, certainly including restarting the tick. Thus, we have to make sure that normal quiescing happens again before we return to userspace. Yes but we need to sort out what needs to be called only once on prctl(). Once vmstat is quiesced, it's not going to need quiescing again even if we restart the tick. That's true, but I really do like the idea of having a clean structure where we verify all our prerequisites in task_isolation_ready(), and have code to try to get things fixed up in task_isolation_enter(). If we start moving some things here and some things there, it gets harder to manage. I think by testing "!vmstat_idle()" in task_isolation_enter() we are avoiding any substantial overhead. I think it would be clearer to rename task_isolation_enter() to task_isolation_prepare(); it might be less confusing. Remember too that in general, we really don't need to think about return-to-userspace as a hot path for task isolation, unlike how we think about it all the rest of the time. So it makes sense to prioritize keeping things clean from a software development perspective first, and high-performance only secondarily. The thing to remember is that this is only relevant if the user has explicitly requested the NOSIG behavior from task isolation, which we don't really expect to be the default - we are implicitly encouraging use of the default semantics of "you can't enter the kernel again until you turn off isolation". That's right. Although NOSIG is the only thing we can afford as long as we drag around the 1Hz. True enough. Hopefully we'll finish sorting that out soon enough. + if (!tick_nohz_tick_stopped()) + set_tsk_need_resched(current); Again, that won't help It won't be better than spinning in a loop if there aren't any other schedulable processes, but it won't be worse either. If there is another schedulable process, we at least will schedule it sooner than if we just sat in a busy loop and waited for the scheduler to kick us. But there's nothing else we can do anyway if we want to maintain the guarantee that the dyn tick is stopped before return to userspace. I don't think it helps either way. If reschedule is pending, the current task already has TIF_RESCHED set. See the other thread with Peter Z for the longer discussion of this. At this point I'm leaning towards replacing the set_tsk_need_resched() with set_current_state(TASK_INTERRUPTIBLE); schedule(); __set_current_state(TASK_RUNNING); -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v15 04/13] task_isolation: add initial support
On 8/30/2016 12:30 PM, Andy Lutomirski wrote: On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: On 8/30/2016 3:58 AM, Peter Zijlstra wrote: On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: On 8/29/2016 12:33 PM, Peter Zijlstra wrote: On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: + /* +* Request rescheduling unless we are in full dynticks mode. +* We would eventually get pre-empted without this, and if +* there's another task waiting, it would run; but by +* explicitly requesting the reschedule, we may reduce the +* latency. We could directly call schedule() here as well, +* but since our caller is the standard place where schedule() +* is called, we defer to the caller. +* +* A more substantive approach here would be to use a struct +* completion here explicitly, and complete it when we shut +* down dynticks, but since we presumably have nothing better +* to do on this core anyway, just spinning seems plausible. +*/ + if (!tick_nohz_tick_stopped()) + set_tsk_need_resched(current); This is broken.. and it would be really good if you don't actually need to do this. Can you elaborate? We clearly do want to wait until we are in full dynticks mode before we return to userspace. We could do it just in the prctl() syscall only, but then we lose the ability to implement the NOSIG mode, which can be a convenience. So this isn't spelled out anywhere. Why does this need to be in the return to user path? I'm not sure where this should be spelled out, to be honest. I guess I can add some commentary to the commit message explaining this part. The basic idea is just that we don't want to be at risk from the dyntick getting enabled. Similarly, we don't want to be at risk of a later global IPI due to lru_add_drain stuff, for example. And, we may want to add additional stuff, like catching kernel TLB flushes and deferring them when a remote core is in userspace. To do all of this kind of stuff, we need to run in the return to user path so we are late enough to guarantee no further kernel things will happen to perturb our carefully-arranged isolation state that includes dyntick off, per-cpu lru cache empty, etc etc. None of the above should need to *loop*, though, AFAIK. Ordering is a problem, though. We really want to run task isolation last, so we can guarantee that all the isolation prerequisites are met (dynticks stopped, per-cpu lru cache empty, etc). But achieving that state can require enabling interrupts - most obviously if we have to schedule, e.g. for vmstat clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or just while waiting for that last dyntick interrupt to occur. I'm also not sure that even something as simple as draining the per-cpu lru cache can be done holding interrupts disabled throughout - certainly there's a !SMP code path there that just re-enables interrupts unconditionally, which gives me pause. At any rate at that point you need to retest for signals, resched, etc, all as usual, and then you need to recheck the task isolation prerequisites once more. I may be missing something here, but it's really not obvious to me that there's a way to do this without having task isolation integrated into the usual return-to-userspace loop. -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v15 04/13] task_isolation: add initial support
On Aug 30, 2016 10:02 AM, "Chris Metcalf" wrote: > > On 8/30/2016 12:30 PM, Andy Lutomirski wrote: >> >> On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: >>> >>> On 8/30/2016 3:58 AM, Peter Zijlstra wrote: On Mon, Aug 29, 2016 at 12:40:32PM -0400, Chris Metcalf wrote: > > On 8/29/2016 12:33 PM, Peter Zijlstra wrote: >> >> On Tue, Aug 16, 2016 at 05:19:27PM -0400, Chris Metcalf wrote: >>> >>> + /* >>> +* Request rescheduling unless we are in full dynticks mode. >>> +* We would eventually get pre-empted without this, and if >>> +* there's another task waiting, it would run; but by >>> +* explicitly requesting the reschedule, we may reduce the >>> +* latency. We could directly call schedule() here as well, >>> +* but since our caller is the standard place where schedule() >>> +* is called, we defer to the caller. >>> +* >>> +* A more substantive approach here would be to use a struct >>> +* completion here explicitly, and complete it when we shut >>> +* down dynticks, but since we presumably have nothing better >>> +* to do on this core anyway, just spinning seems plausible. >>> +*/ >>> + if (!tick_nohz_tick_stopped()) >>> + set_tsk_need_resched(current); >> >> This is broken.. and it would be really good if you don't actually need >> to do this. > > Can you elaborate? We clearly do want to wait until we are in full > dynticks mode before we return to userspace. > > We could do it just in the prctl() syscall only, but then we lose the > ability to implement the NOSIG mode, which can be a convenience. So this isn't spelled out anywhere. Why does this need to be in the return to user path? >>> >>> >>> I'm not sure where this should be spelled out, to be honest. I guess >>> I can add some commentary to the commit message explaining this part. >>> >>> The basic idea is just that we don't want to be at risk from the >>> dyntick getting enabled. Similarly, we don't want to be at risk of a >>> later global IPI due to lru_add_drain stuff, for example. And, we may >>> want to add additional stuff, like catching kernel TLB flushes and >>> deferring them when a remote core is in userspace. To do all of this >>> kind of stuff, we need to run in the return to user path so we are >>> late enough to guarantee no further kernel things will happen to >>> perturb our carefully-arranged isolation state that includes dyntick >>> off, per-cpu lru cache empty, etc etc. >> >> None of the above should need to *loop*, though, AFAIK. > > > Ordering is a problem, though. > > We really want to run task isolation last, so we can guarantee that > all the isolation prerequisites are met (dynticks stopped, per-cpu lru > cache empty, etc). But achieving that state can require enabling > interrupts - most obviously if we have to schedule, e.g. for vmstat > clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or > just while waiting for that last dyntick interrupt to occur. I'm also > not sure that even something as simple as draining the per-cpu lru > cache can be done holding interrupts disabled throughout - certainly > there's a !SMP code path there that just re-enables interrupts > unconditionally, which gives me pause. > > At any rate at that point you need to retest for signals, resched, > etc, all as usual, and then you need to recheck the task isolation > prerequisites once more. > > I may be missing something here, but it's really not obvious to me > that there's a way to do this without having task isolation integrated > into the usual return-to-userspace loop. > What if we did it the other way around: set a percpu flag saying "going quiescent; disallow new deferred work", then finish all existing work and return to userspace. Then, on the next entry, clear that flag. With the flag set, vmstat would just flush anything that it accumulates immediately, nothing would be added to the LRU list, etc. Also, this cond_resched stuff doesn't worry me too much at a fundamental level -- if we're really going quiescent, shouldn't we be able to arrange that there are no other schedulable tasks on the CPU in question? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 04/14] task_isolation: add initial support
On 8/30/2016 1:36 PM, Chris Metcalf wrote: See the other thread with Peter Z for the longer discussion of this. At this point I'm leaning towards replacing the set_tsk_need_resched() with set_current_state(TASK_INTERRUPTIBLE); schedule(); __set_current_state(TASK_RUNNING); I don't see how that helps. What will wake the thread up except a signal? The answer is that the scheduler will keep bringing us back to this point (either after running another runnable task if there is one, or else just returning to this point immediately without doing a context switch), and we will then go around the "prepare exit to userspace" loop and perhaps discover that enough time has gone by that the last dyntick interrupt has triggered and the kernel has quiesced the dynticks. At that point we stop calling schedule() over and over and can return normally to userspace. Oops, you're right that if I set TASK_INTERRUPTIBLE, then if I call schedule(), I never get control back. So I don't want to do that. I suppose I could do a schedule_timeout() here instead and try to figure out how long to wait for the next dyntick. But since we don't expect anything else running on the core anyway, it seems like we are probably working too hard at this point. I don't think it's worth it just to go into the idle task and (possibly) save some power for a few microseconds. The more I think about this, the more I think I am micro-optimizing by trying to poke the scheduler prior to some external thing setting need_resched, so I think the thing to do here is in fact, nothing. I won't worry about rescheduling but will just continue going around the prepare-exit-to-userspace loop until the last dyn tick fires. -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v15 04/13] task_isolation: add initial support
On Tue, Aug 30, 2016 at 12:37 PM, Chris Metcalf wrote: > On 8/30/2016 2:43 PM, Andy Lutomirski wrote: >> >> On Aug 30, 2016 10:02 AM, "Chris Metcalf" wrote: >>> >>> On 8/30/2016 12:30 PM, Andy Lutomirski wrote: On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: > > The basic idea is just that we don't want to be at risk from the > dyntick getting enabled. Similarly, we don't want to be at risk of a > later global IPI due to lru_add_drain stuff, for example. And, we may > want to add additional stuff, like catching kernel TLB flushes and > deferring them when a remote core is in userspace. To do all of this > kind of stuff, we need to run in the return to user path so we are > late enough to guarantee no further kernel things will happen to > perturb our carefully-arranged isolation state that includes dyntick > off, per-cpu lru cache empty, etc etc. None of the above should need to *loop*, though, AFAIK. >>> >>> Ordering is a problem, though. >>> >>> We really want to run task isolation last, so we can guarantee that >>> all the isolation prerequisites are met (dynticks stopped, per-cpu lru >>> cache empty, etc). But achieving that state can require enabling >>> interrupts - most obviously if we have to schedule, e.g. for vmstat >>> clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or >>> just while waiting for that last dyntick interrupt to occur. I'm also >>> not sure that even something as simple as draining the per-cpu lru >>> cache can be done holding interrupts disabled throughout - certainly >>> there's a !SMP code path there that just re-enables interrupts >>> unconditionally, which gives me pause. >>> >>> At any rate at that point you need to retest for signals, resched, >>> etc, all as usual, and then you need to recheck the task isolation >>> prerequisites once more. >>> >>> I may be missing something here, but it's really not obvious to me >>> that there's a way to do this without having task isolation integrated >>> into the usual return-to-userspace loop. >>> >> What if we did it the other way around: set a percpu flag saying >> "going quiescent; disallow new deferred work", then finish all >> existing work and return to userspace. Then, on the next entry, clear >> that flag. With the flag set, vmstat would just flush anything that >> it accumulates immediately, nothing would be added to the LRU list, >> etc. > > > This is an interesting idea! > > However, there are a number of implementation ideas that make me > worry that it might be a trickier approach overall. > > First, "on the next entry" hides a world of hurt in four simple words. > Some platforms (arm64 and tile, that I'm familiar with) have a common > chunk of code that always runs on every entry to the kernel. It would > not be too hard to poke at the assembly and make those platforms > always run some task-isolation specific code on entry. But x86 scares > me - there seem to be a whole lot of ways to get into the kernel, and > I'm not convinced there is a lot of shared macrology or whatever that > would make it straightforward to intercept all of them. Just use the context tracking entry hook. It's 100% reliable. The relevant x86 function is enter_from_user_mode(), but I would just hook into user_exit() in the common code. (This code is had better be reliable, because context tracking depends on it, and, if context tracking doesn't work on a given arch, then isolation isn't going to work regardless. > > Then, there are the two actual subsystems in question. It looks like > we could intercept LRU reasonably cleanly by hooking pagevec_add() > is to return zero when we are in this "going quiescent" mode, and that > would keep the per-cpu vectors empty. The vmstat stuff is a little > trickier since all the existing code is built around updating the per-cpu > stuff and then only later copying it off to the global state. I suppose > we could add a test-and-flush at the end of every public API and not > worry about the implementation cost. Seems reasonable to me. If anyone cares about the performance hit, they can fix it. > > But it does seem like we are adding noticeable maintenance cost on > the mainline kernel to support task isolation by doing this. My guess > is that it is easier to support the kind of "are you clean?" / "get clean" > APIs for subsystems, rather than weaving a whole set of "stay clean" > mechanism into each subsystem. My intuition is that it's the other way around. For the mainline kernel, having a nice clean well-integrated implementation is nicer than having a bolted-on implementation that interacts in potentially complicated ways. Once quiescence support is in mainline, the size of the diff or the degree to which it's scattered around is irrelevant because it's not a diff any more. > > So to pop up a level, what is your actual concern about the existing > "do it in a loop" model? The macrology current
Re: [PATCH v15 04/13] task_isolation: add initial support
On 8/30/2016 2:43 PM, Andy Lutomirski wrote: On Aug 30, 2016 10:02 AM, "Chris Metcalf" wrote: On 8/30/2016 12:30 PM, Andy Lutomirski wrote: On Tue, Aug 30, 2016 at 8:32 AM, Chris Metcalf wrote: The basic idea is just that we don't want to be at risk from the dyntick getting enabled. Similarly, we don't want to be at risk of a later global IPI due to lru_add_drain stuff, for example. And, we may want to add additional stuff, like catching kernel TLB flushes and deferring them when a remote core is in userspace. To do all of this kind of stuff, we need to run in the return to user path so we are late enough to guarantee no further kernel things will happen to perturb our carefully-arranged isolation state that includes dyntick off, per-cpu lru cache empty, etc etc. None of the above should need to *loop*, though, AFAIK. Ordering is a problem, though. We really want to run task isolation last, so we can guarantee that all the isolation prerequisites are met (dynticks stopped, per-cpu lru cache empty, etc). But achieving that state can require enabling interrupts - most obviously if we have to schedule, e.g. for vmstat clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or just while waiting for that last dyntick interrupt to occur. I'm also not sure that even something as simple as draining the per-cpu lru cache can be done holding interrupts disabled throughout - certainly there's a !SMP code path there that just re-enables interrupts unconditionally, which gives me pause. At any rate at that point you need to retest for signals, resched, etc, all as usual, and then you need to recheck the task isolation prerequisites once more. I may be missing something here, but it's really not obvious to me that there's a way to do this without having task isolation integrated into the usual return-to-userspace loop. What if we did it the other way around: set a percpu flag saying "going quiescent; disallow new deferred work", then finish all existing work and return to userspace. Then, on the next entry, clear that flag. With the flag set, vmstat would just flush anything that it accumulates immediately, nothing would be added to the LRU list, etc. This is an interesting idea! However, there are a number of implementation ideas that make me worry that it might be a trickier approach overall. First, "on the next entry" hides a world of hurt in four simple words. Some platforms (arm64 and tile, that I'm familiar with) have a common chunk of code that always runs on every entry to the kernel. It would not be too hard to poke at the assembly and make those platforms always run some task-isolation specific code on entry. But x86 scares me - there seem to be a whole lot of ways to get into the kernel, and I'm not convinced there is a lot of shared macrology or whatever that would make it straightforward to intercept all of them. Then, there are the two actual subsystems in question. It looks like we could intercept LRU reasonably cleanly by hooking pagevec_add() to return zero when we are in this "going quiescent" mode, and that would keep the per-cpu vectors empty. The vmstat stuff is a little trickier since all the existing code is built around updating the per-cpu stuff and then only later copying it off to the global state. I suppose we could add a test-and-flush at the end of every public API and not worry about the implementation cost. But it does seem like we are adding noticeable maintenance cost on the mainline kernel to support task isolation by doing this. My guess is that it is easier to support the kind of "are you clean?" / "get clean" APIs for subsystems, rather than weaving a whole set of "stay clean" mechanism into each subsystem. So to pop up a level, what is your actual concern about the existing "do it in a loop" model? The macrology currently in use means there is zero cost if you don't configure TASK_ISOLATION, and the software maintenance cost seems low since the idioms used for task isolation in the loop are generally familiar to people reading that code. Also, this cond_resched stuff doesn't worry me too much at a fundamental level -- if we're really going quiescent, shouldn't we be able to arrange that there are no other schedulable tasks on the CPU in question? We aren't currently planning to enforce things in the scheduler, so if the application affinitizes another task on top of an existing task isolation task, by default the task isolation task just dies. (Unless it's using NOSIG mode, in which case it just ends up stuck in the kernel trying to wait out the dyntick until you either kill it, or re-affinitize the offending task.) But I'm reluctant to guarantee every possible way that you might (perhaps briefly) have some schedulable task, and the current approach seems pretty robust if that sort of thing happens. -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe from this list: send the line "unsubscribe linux
Re: [PATCH V4] leds: trigger: Introduce an USB port trigger
On 30 August 2016 at 14:05, Greg KH wrote: > On Fri, Aug 26, 2016 at 05:38:05PM +0200, Rafał Miłecki wrote: >> On 25 August 2016 at 14:49, Greg KH wrote: >> > On Thu, Aug 25, 2016 at 10:03:52AM +0200, Rafał Miłecki wrote: >> >> +static void usbport_trig_activate(struct led_classdev *led_cdev) >> >> +{ >> >> + struct usbport_trig_data *usbport_data; >> >> + int err; >> >> + >> >> + usbport_data = kzalloc(sizeof(*usbport_data), GFP_KERNEL); >> >> + if (!usbport_data) >> >> + return; >> >> + usbport_data->led_cdev = led_cdev; >> >> + >> >> + /* Storing ports */ >> >> + INIT_LIST_HEAD(&usbport_data->ports); >> >> + usbport_data->ports_dir = kobject_create_and_add("ports", >> >> + >> >> &led_cdev->dev->kobj); >> > >> > If you _ever_ find yourself in a driver having to use kobject calls, >> > it's a HUGE hint that you are doing something wrong. >> > >> > Hint, you are doing this wrong :) >> > >> > Use an attribute group if you need a subdirectory in sysfs, using a >> > "raw" kobject like this hides it from all userspace tools and so no >> > userspace program can see it (hint, try using libudev to access these >> > files attached to the device...) >> >> Oops. Thanks for pointing groups to me. I was looking at sysfs.h >> earlier but I didn't realize group can be a subdirectory. I can see >> these sysfs_create_group(s) and friends now, thanks. > > No, use an attribute group, and name it, and the subdir will be created > automagically for you. Oh, so I got it wrong again. Thanks for explaining. >> >> + if (!usbport_data->ports_dir) >> >> + goto err_free; >> >> + >> >> + /* API for ports management */ >> >> + err = device_create_file(led_cdev->dev, &dev_attr_new_port); >> >> + if (err) >> >> + goto err_put_ports; >> >> + err = device_create_file(led_cdev->dev, &dev_attr_remove_port); >> >> + if (err) >> >> + goto err_remove_new_port; >> > >> > Doesn't this race with userspace and fail? Shouldn't the led core be >> > creating your leds for you? >> >> These questions aren't clear to me. What kind of race? Doing >> echo usbport > trigger >> results in trigger core calling usbport_trig_activate. We create new >> attributes and then we return. > > Why aren't the attributes present when the device is found? What is > "trigger" for? > >> I'm also not creating any leds there. This already has to be LED >> available if you want to assign some trigger to it. > > That sounds really odd... Please take a look at Documentation to get some idea of LED triggers: Documentation/leds/leds-class.txt Basically a LED (/sys/class/leds/foo/) can be controller with "brightness" sysfs file like this: echo 0 > brightness echo 5 > brightness echo 255 > brightness (most LEDs react only 0 and non-0 values). As you quite often need more complex LED management, there are triggers that were introduced in 2006 by c3bc9956ec52f ("[PATCH] LED: add LED trigger tupport"). Some triggers are trivial and could be implemented in userspace as well (e.g. "timer"). Some had to be implemented in kernelspace (CPU activity, MTD activity, etc.). Having few triggers compiled, you can assign them to LEDs at it pleases you. Your hardware may have generic LED (not labeled) and you can dynamically assign various triggers to it, depending e.g. on user actions. E.g. if user (using GUI or whatever) wants to see flash activity, your userspace script should do: echo mtd > /sys/class/leds/foo/trigger I hope it explains things a big and you can see know why trigger code is independent of creating LEDs. It's about layers and making things generic I believe. There is a place for LED driver that is responsible for creating "struct led_classdev" with proper callback setting brightness. It provides LED name, but is unaware of the way it should be lighted/turned off. There is LED subsystem. There are triggers that are unaware of LED hardware details and only set LED brightness using generic API. I'm obviously not author of all of that and I can't explain some design decisions, but I hope I gave you a clue how it works. >> >> + >> >> + /* Notifications */ >> >> + usbport_data->nb.notifier_call = usbport_trig_notify, >> >> + led_cdev->trigger_data = usbport_data; >> >> + usb_register_notify(&usbport_data->nb); >> > >> > Don't abuse the USB notifier chain with stuff like this please, is that >> > really necessary? Why can't your hardware just tell you what USB ports >> > are in use "out of band"? >> >> I totally don't understand this part. This LED trigger is supposed to >> react to USB devices appearing at specified ports. How else can I know >> there is a new USB device if not by notifications? >> I'm wondering if we are on the same page. Could it be my idea of this >> LED trigger is not clear at all? Could you take a look at commit >> message again, please? Mostly this part: >> > This commit adds
Re: [PATCH V4] leds: trigger: Introduce an USB port trigger
On Tue, 30 Aug 2016, Rafał Miłecki wrote: > Please take a look at Documentation to get some idea of LED triggers: > Documentation/leds/leds-class.txt > > Basically a LED (/sys/class/leds/foo/) can be controller with > "brightness" sysfs file like this: > echo 0 > brightness > echo 5 > brightness > echo 255 > brightness > (most LEDs react only 0 and non-0 values). > > As you quite often need more complex LED management, there are > triggers that were introduced in 2006 by c3bc9956ec52f ("[PATCH] LED: > add LED trigger tupport"). Some triggers are trivial and could be > implemented in userspace as well (e.g. "timer"). Some had to be > implemented in kernelspace (CPU activity, MTD activity, etc.). Having > few triggers compiled, you can assign them to LEDs at it pleases you. > Your hardware may have generic LED (not labeled) and you can > dynamically assign various triggers to it, depending e.g. on user > actions. E.g. if user (using GUI or whatever) wants to see flash > activity, your userspace script should do: > echo mtd > /sys/class/leds/foo/trigger So for example, you might want to do: echo usb1-4 >/sys/class/leds/foo/trigger and then have the "foo" LED toggle whenever an URB was submitted or completed for a device attached to the 1-4 port. Right? > I hope it explains things a big and you can see know why trigger code > is independent of creating LEDs. It's about layers and making things > generic I believe. > > There is a place for LED driver that is responsible for creating > "struct led_classdev" with proper callback setting brightness. It > provides LED name, but is unaware of the way it should be > lighted/turned off. > There is LED subsystem. > There are triggers that are unaware of LED hardware details and only > set LED brightness using generic API. > > I'm obviously not author of all of that and I can't explain some > design decisions, but I hope I gave you a clue how it works. > >> >> + /* Notifications */ > >> >> + usbport_data->nb.notifier_call = usbport_trig_notify, > >> >> + led_cdev->trigger_data = usbport_data; > >> >> + usb_register_notify(&usbport_data->nb); > >> > > >> > Don't abuse the USB notifier chain with stuff like this please, is that > >> > really necessary? Why can't your hardware just tell you what USB ports > >> > are in use "out of band"? > >> > >> I totally don't understand this part. This LED trigger is supposed to > >> react to USB devices appearing at specified ports. How else can I know > >> there is a new USB device if not by notifications? > >> I'm wondering if we are on the same page. Could it be my idea of this > >> LED trigger is not clear at all? Could you take a look at commit > >> message again, please? Mostly this part: > >> > This commit adds a new trigger responsible for turning on LED when USB > >> > device gets connected to the specified USB port. This can can useful for > >> > various home routers that have USB port(s) and a proper LED telling user > >> > a device is connected. > >> > >> Can I add something more to make it clear what this trigger is responsible > >> for? > > > > Yes please, as I totally missed that. > > > > And the USB notifier was created for some "special" parts of the USB > > core to use, this feels like an abuse of that in some way. But it's > > hard to define, I know... > > I didn't mean to abuse this USB notifier, can you think of any other > way to achieve the same result? We could think about modifying USB > core to call some symbol from the trigger driver (see usage of > ledtrig_mtd_activity symbol) but is it any better than using > notification block? I don't think this is such a bad use of the USB notifier. All you need to know is when a device is attached to or unplugged from an LED's port. Neither of these is a very frequent event. However, there is still the question of how to know when an URB is submitted or completed for the device in question. Obviously the USB core would have to call an LED routine. But how would you determine which LED(s) to toggle? Go through the entire list, looking for ones that are bound to the USB device in question? This seems inefficient. Use a hash table? Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH V4] leds: trigger: Introduce an USB port trigger
On 30 August 2016 at 22:54, Alan Stern wrote: > On Tue, 30 Aug 2016, Rafał Miłecki wrote: > >> Please take a look at Documentation to get some idea of LED triggers: >> Documentation/leds/leds-class.txt >> >> Basically a LED (/sys/class/leds/foo/) can be controller with >> "brightness" sysfs file like this: >> echo 0 > brightness >> echo 5 > brightness >> echo 255 > brightness >> (most LEDs react only 0 and non-0 values). >> >> As you quite often need more complex LED management, there are >> triggers that were introduced in 2006 by c3bc9956ec52f ("[PATCH] LED: >> add LED trigger tupport"). Some triggers are trivial and could be >> implemented in userspace as well (e.g. "timer"). Some had to be >> implemented in kernelspace (CPU activity, MTD activity, etc.). Having >> few triggers compiled, you can assign them to LEDs at it pleases you. >> Your hardware may have generic LED (not labeled) and you can >> dynamically assign various triggers to it, depending e.g. on user >> actions. E.g. if user (using GUI or whatever) wants to see flash >> activity, your userspace script should do: >> echo mtd > /sys/class/leds/foo/trigger > > So for example, you might want to do: > > echo usb1-4 >/sys/class/leds/foo/trigger > > and then have the "foo" LED toggle whenever an URB was submitted or > completed for a device attached to the 1-4 port. Right? Not really as it won't cover some pretty common use cases. Many home routers have few USB ports (2-5) and only 1 USB LED. It has to be possible to assign few USB ports to a single LED (trigger). That way LED should be turned on (and kept on) if there is at least 1 USB device connected. You obviously can't do: echo "usb1-1 usb1-2 usb2-1" > /sys/class/leds/foo/trigger This was already brought up by Rob (who mentioned CPU trigger) and I replied him pretty much the same way in: https://lkml.org/lkml/2016/7/29/38 (reply starts with "Anyway, the serious limitation I see"). >> I hope it explains things a big and you can see know why trigger code >> is independent of creating LEDs. It's about layers and making things >> generic I believe. >> >> There is a place for LED driver that is responsible for creating >> "struct led_classdev" with proper callback setting brightness. It >> provides LED name, but is unaware of the way it should be >> lighted/turned off. >> There is LED subsystem. >> There are triggers that are unaware of LED hardware details and only >> set LED brightness using generic API. >> >> I'm obviously not author of all of that and I can't explain some >> design decisions, but I hope I gave you a clue how it works. > >> >> >> + /* Notifications */ >> >> >> + usbport_data->nb.notifier_call = usbport_trig_notify, >> >> >> + led_cdev->trigger_data = usbport_data; >> >> >> + usb_register_notify(&usbport_data->nb); >> >> > >> >> > Don't abuse the USB notifier chain with stuff like this please, is that >> >> > really necessary? Why can't your hardware just tell you what USB ports >> >> > are in use "out of band"? >> >> >> >> I totally don't understand this part. This LED trigger is supposed to >> >> react to USB devices appearing at specified ports. How else can I know >> >> there is a new USB device if not by notifications? >> >> I'm wondering if we are on the same page. Could it be my idea of this >> >> LED trigger is not clear at all? Could you take a look at commit >> >> message again, please? Mostly this part: >> >> > This commit adds a new trigger responsible for turning on LED when USB >> >> > device gets connected to the specified USB port. This can can useful for >> >> > various home routers that have USB port(s) and a proper LED telling user >> >> > a device is connected. >> >> >> >> Can I add something more to make it clear what this trigger is >> >> responsible for? >> > >> > Yes please, as I totally missed that. >> > >> > And the USB notifier was created for some "special" parts of the USB >> > core to use, this feels like an abuse of that in some way. But it's >> > hard to define, I know... >> >> I didn't mean to abuse this USB notifier, can you think of any other >> way to achieve the same result? We could think about modifying USB >> core to call some symbol from the trigger driver (see usage of >> ledtrig_mtd_activity symbol) but is it any better than using >> notification block? > > I don't think this is such a bad use of the USB notifier. All you need > to know is when a device is attached to or unplugged from an LED's > port. Neither of these is a very frequent event. > > However, there is still the question of how to know when an URB is > submitted or completed for the device in question. Obviously the USB > core would have to call an LED routine. But how would you determine > which LED(s) to toggle? Go through the entire list, looking for ones > that are bound to the USB device in question? This seems inefficient. > Use a hash table? I was hoping to bring it to discussion later, as it seems even the basic version of t
Re: [PATCH v14 04/14] task_isolation: add initial support
On 8/30/2016 1:10 PM, Frederic Weisbecker wrote: On Tue, Aug 30, 2016 at 11:41:36AM -0400, Chris Metcalf wrote: On 8/29/2016 8:55 PM, Frederic Weisbecker wrote: On Mon, Aug 15, 2016 at 10:59:55AM -0400, Chris Metcalf wrote: On 8/11/2016 2:50 PM, Christoph Lameter wrote: On Thu, 11 Aug 2016, Frederic Weisbecker wrote: Do we need to quiesce vmstat everytime before entering userspace? I thought that vmstat only need to be offlined once and for all? Once is sufficient after disabling the tick. It's true that task_isolation_enter() is called every time before returning to user space while task isolation is enabled. But once we enter the kernel again after returning from the initial prctl() -- assuming we are in NOSIG mode so doing so is legal in the first place -- almost anything can happen, certainly including restarting the tick. Thus, we have to make sure that normal quiescing happens again before we return to userspace. Yes but we need to sort out what needs to be called only once on prctl(). Once vmstat is quiesced, it's not going to need quiescing again even if we restart the tick. That's true, but I really do like the idea of having a clean structure where we verify all our prerequisites in task_isolation_ready(), and have code to try to get things fixed up in task_isolation_enter(). If we start moving some things here and some things there, it gets harder to manage. I think by testing "!vmstat_idle()" in task_isolation_enter() we are avoiding any substantial overhead. I think that making the code clearer on what needs to be done once for all on prctl() and what needs to be done on all actual syscall return is more important for readability. We don't need to just do it on prctl(), though. We also need to do it whenever we have been in the kernel for another reason, which can happen with NOSIG. So we need to do this on the common return to userspace path no matter what, right? Or am I missing something? It seems like if, for example, we do mmaps or page faults, then on return to userspace, some of those counters will have been incremented and we'll need to run the quiet_vmstat_sync() code. + if (!tick_nohz_tick_stopped()) + set_tsk_need_resched(current); Again, that won't help It won't be better than spinning in a loop if there aren't any other schedulable processes, but it won't be worse either. If there is another schedulable process, we at least will schedule it sooner than if we just sat in a busy loop and waited for the scheduler to kick us. But there's nothing else we can do anyway if we want to maintain the guarantee that the dyn tick is stopped before return to userspace. I don't think it helps either way. If reschedule is pending, the current task already has TIF_RESCHED set. See the other thread with Peter Z for the longer discussion of this. At this point I'm leaning towards replacing the set_tsk_need_resched() with set_current_state(TASK_INTERRUPTIBLE); schedule(); __set_current_state(TASK_RUNNING); I don't see how that helps. What will wake the thread up except a signal? The answer is that the scheduler will keep bringing us back to this point (either after running another runnable task if there is one, or else just returning to this point immediately without doing a context switch), and we will then go around the "prepare exit to userspace" loop and perhaps discover that enough time has gone by that the last dyntick interrupt has triggered and the kernel has quiesced the dynticks. At that point we stop calling schedule() over and over and can return normally to userspace. It's very counter-intuitive to burn cpu time intentionally in the kernel. I really don't see another way to resolve this, though. I don't think it would be safe, for example, to just promote the next dyntick to running immediately (rather than waiting a few microseconds until it is scheduled to go off). -- Chris Metcalf, Mellanox Technologies http://www.mellanox.com -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] docs-rst: improve typedef parser
Improve the parser to handle typedefs like: typedef bool v4l2_check_dv_timings_fnc(const struct v4l2_dv_timings *t, void *handle); Signed-off-by: Mauro Carvalho Chehab --- scripts/kernel-doc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index bac0af4fc659..d94870270d8e 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -2191,7 +2191,9 @@ sub dump_typedef($$) { $x =~ s@/\*.*?\*/@@gos;# strip comments. # Parse function prototypes -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/) { +if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || + $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { + # Function typedefs $return_type = $1; $declaration_name = $2; -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] Fix kernel-doc parser for typedef functions
The kernel-doc parser has two issues when handling typedef functions: 1) its parse is incomplete; 2) it causes warnings when a typedef is used as a function argument. This series partially addresses (1), adding one extra syntax for the parser. I'm pretty sure that the parser is still incomplete and that we'll get some other places where it fails. Jon, My plan is to apply the last patch on my tree, together with a series of patches that I'm writing to fix the warnings on nitpick mode. The other two patches better fit on your tree, IMHO. Mauro Carvalho Chehab (3): docs-rst: improve typedef parser docs-rst: kernel-doc: fix typedef output in RST format [media] v4l2-dv-timings.h: let kernel-doc parte the typedef argument include/media/v4l2-dv-timings.h | 4 ++-- scripts/kernel-doc | 36 ++-- 2 files changed, 28 insertions(+), 12 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] docs-rst: kernel-doc: fix typedef output in RST format
When using a typedef function like this one: typedef bool v4l2_check_dv_timings_fnc (const struct v4l2_dv_timings * t, void * handle); The Sphinx C domain expects it to create a c:type: reference, as that's the way it creates the type references when parsing a c:function:: declaration. So, a declaration like: .. c:function:: bool v4l2_valid_dv_timings (const struct v4l2_dv_timings * t, const struct v4l2_dv_timings_cap * cap, v4l2_check_dv_timings_fnc fnc, void * fnc_handle) Will create a cross reference for :c:type:`v4l2_check_dv_timings_fnc`. So, when outputting such typedefs in RST format, we need to handle this special case, as otherwise it will produce those warnings: ./include/media/v4l2-dv-timings.h:43: WARNING: c:type reference target not found: v4l2_check_dv_timings_fnc ./include/media/v4l2-dv-timings.h:60: WARNING: c:type reference target not found: v4l2_check_dv_timings_fnc ./include/media/v4l2-dv-timings.h:81: WARNING: c:type reference target not found: v4l2_check_dv_timings_fnc So, change the kernel-doc script to produce a RST output for the above typedef as: .. c:type:: v4l2_check_dv_timings_fnc **Typedef**: timings check callback **Syntax** ``bool v4l2_check_dv_timings_fnc (const struct v4l2_dv_timings * t, void * handle);`` Signed-off-by: Mauro Carvalho Chehab --- scripts/kernel-doc | 32 +++- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index d94870270d8e..091e49167b44 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1831,13 +1831,22 @@ sub output_function_rst(%) { my %args = %{$_[0]}; my ($parameter, $section); my $oldprefix = $lineprefix; -my $start; +my $start = ""; -print ".. c:function:: "; +if ($args{'typedef'}) { + print ".. c:type:: ". $args{'function'} . "\n\n"; + print_lineno($declaration_start_line); + print " **Typedef**: "; + $lineprefix = ""; + output_highlight_rst($args{'purpose'}); + $start = "\n\n**Syntax**\n\n ``"; +} else { + print ".. c:function:: "; +} if ($args{'functiontype'} ne "") { - $start = $args{'functiontype'} . " " . $args{'function'} . " ("; + $start .= $args{'functiontype'} . " " . $args{'function'} . " ("; } else { - $start = $args{'function'} . " ("; + $start .= $args{'function'} . " ("; } print $start; @@ -1857,11 +1866,15 @@ sub output_function_rst(%) { $count++; } } -print ")\n\n"; -print_lineno($declaration_start_line); -$lineprefix = " "; -output_highlight_rst($args{'purpose'}); -print "\n"; +if ($args{'typedef'}) { + print ");``\n\n"; +} else { + print ")\n\n"; + print_lineno($declaration_start_line); + $lineprefix = " "; + output_highlight_rst($args{'purpose'}); + print "\n"; +} print "**Parameters**\n\n"; $lineprefix = " "; @@ -2204,6 +2217,7 @@ sub dump_typedef($$) { output_declaration($declaration_name, 'function', {'function' => $declaration_name, + 'typedef' => 1, 'module' => $modulename, 'functiontype' => $return_type, 'parameterlist' => \@parameterlist, -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html