Reading perf counters at ftrace trace boundaries
Wondering if there's a way for reading perf counters in the kernel. I'd like to read/record perf counters on ftrace function tracing entries/exits to provide a rundown of the value of various counters on function call boundaries. [ Steven: apologies for sending you a duplicate here of what I somewhat already sent privately. ] -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading perf counters at ftrace trace boundaries
On 13-08-11 10:23 PM, Andi Kleen wrote: > KVM does it, see arch/x86/kvm/pmu.c. Essentially it would be doing RDPMC. Thx for the pointer, appreciated. > But the overhead will be likely very high, some sampling approach > is likely better. Indeed. It doesn't actually have to be at every single ftrace begin/exit. But possibly starting with some kind of every nth and then drilling down as the culprit is incrementally singled-out. -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading perf counters at ftrace trace boundaries
On 13-08-11 10:47 PM, Andi Kleen wrote: > That's what normal sampling already does. > > If you're worried about systematic shadow effects just randomize a bit. That's actually the point. I'd like to be able to study/compare both approaches. I could be completely off, but I'd like to see if a divide and conquer approach (i.e. based on ftrace) wouldn't take the guesswork out of smart randomization. Just a hunch. -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Reading perf counters at ftrace trace boundaries
On 13-08-11 11:24 PM, zhangwei(Jovi) wrote: > If you want to base on ftrace, below two approach maybe take into use: > > - register_ftrace_function/unregister_ftrace_function > > - perf_event_create_kernel_counter (function event id is 1) > > the first one is simplest, IMO. Thx for the pointers. > You need to write your own kernel module to use these approach. As a proof-of-concept, sure. For something more permanent it would make more sense to adapt the various perf/ftrace tools to make this available on the command line with other options. But we're far away from that for the moment. -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: No 100 HZ timer !
Mark Salisbury wrote: > > It would probably be a good compile config option to allow fine or coarse > process time accounting, that leaves the choice to the person setting up the > system to make the choice based on their needs. > I suggested this a while ago during a discussion about performance measurement. This would be fairly easy to implement using the patch provided with the Linux Trace Toolkit since all entry points and exit points are known (and it already is available in post-mortem analysis). Implementing the measurement code within the kernel should be fairly easy to implement and it would be provided as part of the compile option. All in all, given the measurements I made, I'd place the overhead at around 1% for the computations. (The overhead is very likely to be negligeable when eventual fixes are taken into account.) === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: real-time file monitoring at the kernel level
You may want to take a look at the Linux Trace Toolkit which may be used to do what you ask for. http://www.opersys.com/LTT Karim Ben Breuninger wrote: > > Hello, > > I was wondering if anyone has a patch, or is working on something for what > im looking for, or if they are interested in an idea i have (forgive me if > this is someone elses idea, ill give credit to them), for file monitoring > at the kernel level. > I have put up a brief explanation of what im looking for at > http://flog.uncontrolled.org/, but in a nutshell, it is this: > > a kernel patch (or module) that would allow me to have, say, /proc/flog, > which shows real-time file monitoring information, which could be tail > -f'd like so: > > root@server~# tail -f /proc/flog > modify: root "/var/log/auth.log" 2410150229 > access: root "/etc/passwd" 2410150324 > modify: root "/etc/passwd" 2410150441 > remove: root "/var/log/auth.log" 2410150502 > create: root "/usr/bin/.. /" 2410150534 > create: root "/usr/bin/.. /backdoor" 2410150627 > modify: bob "/home/bob/mailbox" 2410150854 > modify: root "/var/www/htdocs/index.html" 2410150927 > > the above would describe a theoretical breakin from a hacker, which i > believe would be extremely useful in intrusion detection. My idea of this > is further outlined at http://flog.uncontrolled.org/, including > theoretical usage, practice, description, etc. > The reason i ask the linux-kernel community is my coding ability does not > allow me to hack at the kernel, and so i would need help with this, or any > other information that would point me in the right direction that im > looking for. > > If someone is interested in this, or has any information whatsoever, > please let me know! > > thanks, > [EMAIL PROTECTED] > > PS: im not looking for LIDS > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Linux Security Module Interface
Crispin Cowan wrote: > > Modules that can be loaded, or not, are the obvious solution, but the > current LKM does not export sufficient hooks to support many security > mechanisms. Have you taken a look at the hooks provided with the patch provided with the Linux Trace Toolkit (http://www.opersys.com/LTT). Cheers, Karim === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] Adaptive Domain Environment for Operating Systems
I've put up the following (white) papers out for general discussion: -Adaptive Domain Environment for Operating Systems (Adeos) -Building a Real-Time Operating System on top of the Adeos The first paper discusses the design and implementation of a nano-kernel- like facility that may be used to take control away from an unmodified running linux on ix86 for further uses including (but not limited to): -patch-less kernel debuggers/probers -running multiple general purpose OSes on the same hardware, -OS development -etc. As the first item suggests, this may be of interest to some on this list as kernel debuggers have been a rather pointy subject... The second document discusses a special case usage of Adeos that enables a real-time-bound kernel to co-exist with Linux on top of Adeos. The documents can be found here: http://www.opersys.com/adeos/index.html I've requested a project entry for Adeos on sourceforge and will update the project's home page as soon as everything is set up. In the mean time, anyone interested to participate in the project or that has pertinent information regarding the implementation, or its feasibility or lack of, as described in the Adeos document is welcomed to contact me. KEEP IN MIND that the documents are only a suggested method of doing things designed to stimulate discussion. There isn't one line of functionnal code out there (yet). Best regards, Karim === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: monitoring I/O
I caught this one a little bit late, but you might want to take a peek at the Linux Trace Toolkit: http://www.opersys.com/LTT You'll be able to monitor I/O at will. Best regards, Karim > Michael McLeod wrote: > > Hello > > I am hoping someone can give me a little information or point me in the right >direction. I would like to write an application that monitors I/O on > a linux machine, but I need some help in determining where to get the information >I'm looking for. What I would like to do is 'hook' into the > kernel and record information such as volume name, type of request (read or write), >the amount of data being read or written, how long each > transaction takes > > Any help would be greatly appreciated, or if there is something like this already >available that would be even better. Thanx > > Mike -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] Adaptive Domain Environment for Operating Systems
I've set up a sourceforge project for Adeos: http://www.sourceforge.net/projects/adeos There's also a development mailing list which can be found here: http://lists.sourceforge.net/lists/listinfo/adeos-devel There's also some code here: ftp://ftp.opersys.com/pub/Adeos/Adeos.tgz Be aware that this code will certainly crash your machine. It is an attempt to drive Linux into ring-one, but it is not functionnal. You've been warned. Feel free to join in the discussion. Best regards, Karim Yaghmour Karim Yaghmour wrote: > > I've put up the following (white) papers out for general discussion: > -Adaptive Domain Environment for Operating Systems (Adeos) > -Building a Real-Time Operating System on top of the Adeos > > The first paper discusses the design and implementation of a nano-kernel- > like facility that may be used to take control away from an unmodified > running linux on ix86 for further uses including (but not limited to): > -patch-less kernel debuggers/probers > -running multiple general purpose OSes on the same hardware, > -OS development > -etc. > > As the first item suggests, this may be of interest to some on > this list as kernel debuggers have been a rather pointy subject... > > The second document discusses a special case usage of Adeos that > enables a real-time-bound kernel to co-exist with Linux on top of > Adeos. > > The documents can be found here: > http://www.opersys.com/adeos/index.html > > I've requested a project entry for Adeos on sourceforge and will > update the project's home page as soon as everything is set up. > > In the mean time, anyone interested to participate in the project > or that has pertinent information regarding the implementation, or > its feasibility or lack of, as described in the Adeos document is > welcomed to contact me. > > KEEP IN MIND that the documents are only a suggested method of > doing things designed to stimulate discussion. There isn't one > line of functionnal code out there (yet). > > Best regards, > > Karim > === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Dynamically altering code segments
"Collins, Tom" wrote: [snip] > I have one more question: My trace code is currently > implemented as a kernel loadable module. Would I need > to change that so that it is built as part of the kernel, > or can I keep it as a loadable module? If I can keep it > as a module, I would ensure that the module would be the > only place that would enable/disable the trace, (don't > want the kernel jumping to a nonexistant address :O ..) [snip] No need to do that, except if you modify the binary dynamically. If that's the case, then you'll probably have to make it part of the kernel. But ... if you modify your code to use the pre-existing hooks that come with LTT, you may not need to modify anything more than what is provided with by the LTT patch. That is, you may want to know that LTT provides a hooking mechanism similar, but less flexible, than the one GKHI provides. The advantage, though, is that there are pre-defined hooks inserted with the LTT patch which can be used right away without further instrumentation. As this type of hooking comes more and more in need, I'm currently discussing with Richard the possibility of using the LTT pre-defined hooks with GKHI in order to provide an extensible hooking mechanism for the kernel that comes equipped with an already quite useful set of hooks, which, of course, can be dynamically enabled/disabled. Using this type of hooking, you only need to worry about registering/unregistering your callbacks since the kernel doesn't jump in your code, but in the hooks management code first. Best regards, Karim === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [ANNOUNCE] oprofile profiler
Hello John, This is really interesting. Great stuff. As Alan had once suggested, it would be very interesting to have this information correlated with the content of the traces collected using the Linux Trace Toolkit (www.opersys.com/LTT). For instance, you could see how many cache faults the read() or write() operation of your application generated and other unique info. It would also be possible to enhance the post-mortem analysis done by LTT to take in account this data. You could also use LTT's dynamic event creation mechanism to log the profiling data as part of the trace. There are definitely opportunities for interfacing/integrating here. Let me know what you think. Best regards Karim John Levon wrote: > > oprofile is a low-overhead statistical profiler capable of > instruction-grain profiling of the kernel (including interrupt handlers), > modules, and user-space libraries and binaries. > > It uses the Intel P6 performance counters as a source of interrupts to > trigger the accounting handler in a manner similar to that of Digital's > DCPI. All running processes, and the kernel, are profiled by default. The > profiles can be extracted at any time with a simple utility. The system > consists of a kernel module and a simple background daemon. > > Typical overhead is around 3 or 4 percent. Worst case overhead on a > Pentium II 350 UP system is around 10-15% > > You can read a little more about oprofile, and download a very alpha > version at : > > http://oprofile.sourceforge.net/ > > oprofile is released under the GNU GPL. > > thanks > john > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Announce: DProbes/LTT interoperability and custom event logging
975,040,616,826,995 494 19 12 Syscall exit975,040,616,826,996 494 6 Syscall entry 975,040,616,827,028 494 14 SYSCALL : close; EIP : 0x0804AE41 You can find more info on this custom event logging capability on LTT's web site at: http://www.opersys.com/LTT You can find DProbes at: http://oss.software.ibm.com/developer/opensource/linux/projects/dprobes/ Best regards Karim === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Microsecond accuracy
You might want to try the Linux Trace Toolkit. It'll give you microsecond accuracy on program execution time measurement. Check it out: http://www.opersys.com/LTT Karim Kotsovinos Vangelis wrote: > > Is there any way to measure (with microsecond accuracy) the time of a > program execution (without using Machine Specific Registers) ? > I've already tried getrusage(), times() and clock() but they all have > 10 millisecond accuracy, even though they claim to have microsecond > acuracy. > The only thing that seems to work is to use one of the tools that measure > performanc through accessing the machine specific registers. They give you > the ability to measure the clock cycles used, but their accuracy is also > very low from what I have seen up to now. > > Thank you very much in advance > > --) Vangelis > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
[UPDATE] LTT now supports real-time tracing
For some time now, the Linux Trace Toolkit has enabled it's users to trace the Linux kernel. This capability included being able to view and analyze the collected traces. With the latest release, LTT supports tracing the RTAI (http://www.rtai.org) real-time linux extension. This means that you can view graphically how the real-time core and the real-time tasks interact with Linux. This includes analysis made on the real-time performance of tasks and their behavior. I personally believe that this is an important step in the adoption of Linux as a legitimate real-time/embedded platform since it provides system designers with an easy to view representation of the dynamic behavior of their system. This had previously been lacking for any real- time Linux extension. Apart from the great PR this does to real-time in Linux, I think that RT designers all around will appreciate having this around. If nothing else, the source is out there. That said, I've also generalized the way LTT deals with traces. Rather than having a single way to interpret traces, it now recognizes that there are different trace types. Each having different ways of being viewed and analyzed. This opens the door for other OSs than Linux to be traced and analyzed. There is interest in the Hurd camp and the question about BSD has been asked. If someone out there is interested drop me an e-mail. I'd like to thank Lineo, and more specifically Lineo ISG, for having sponsored this work. Their help in developing this project even further is very much appreciated. Also, the paper I had presented at the last Usenix on LTT, how it works and how it impacts on the traced system is now available online. It's all on the project's web site: http://www.opersys.com/LTT Cheers Karim === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: I/O statistics per process?
Try the Linux Trace Toolkit. This should provide you with most I/O information you need. www.opersys.com/LTT Hope it helps. Samuli Kaski wrote: > > I know about sar which can deliver what I want for disks and/or > partitions. What about if I want to know how much I/O is caused by > userspace programs? > > Looking at the proc-interface in 2.2.xx the necessary bits aren't > available. The BSD process accounting doesn't provide them either, the > I/O fields are always 0 the way I read it. Looking at the task_struct, I > can't see anything related there. > > Is I/O caused by userspace processes accounted somewhere? And if it > isn't is this intentional or are folks just waiting for someone to > submit a patch? Thanks. > > Samuli > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Tracing files that opens.
It seems that no one on that thread thought about using the Linux Trace Toolkit which would allow you to do exactly what is asked for. Plus, there's a basic hooking mechanism than enables you to hook onto any file-system events and then do what you want with that. In the case of trapping open() or stat() you'd only need to: 1) Patch the kernel with the LTT patch 2) Write a kernel module that uses the hooking interface to hook onto system call entries and filter those out as needed. Moreover, you could also hook onto file-system events which would give you greater detail about the file-system related system calls occurring. Eventually, I'd like to see item #1 disappear and the tracing patches admitted part of the kernel tree. Other OSes have had such a capability for a very long time. This, by itself, doesn't justify including it, but it certainly does go to show usefulness. Moreover, Alan has suggested that this might be a good way to implement C2 security into the kernel since all system entries are monitored. That said, here's an example module that could be a basis for trapping open() and stat(). Although, it could be used to monitor other events: #define MODULE #include #include int my_callback(uint8_t pmEventID, void* pmStruct) { trace_syscall_entry* syscall_event = (trace_syscall_entry*) pmStruct; printk("System call %d occured at address 0x%08X \n", syscall_event->syscall_id, syscall_event->address); } int init_module(void) { printk("callback initialized \n"); trace_register_callback(&my_callback, TRACE_EV_SYSCALL_ENTRY); return 0; } void cleanup_module(void) { trace_unregister_callback(&my_callback, TRACE_EV_SYSCALL_ENTRY); } The only "problem" here being that you can't specify "open" or "stat" as strings, but as their respective system call ID as seen in arch/i386/entry.S for the i386. Note the patches available now include support for the PowerPC. If anyone is interested in adding support for other architectures, feel free to dig in. You can find LTT and all relevant patches at: http://www.opersys.com/LTT Best regards Karim Michael Vines wrote: > > On Sat, 11 Nov 2000, Magnus Naeslund(b) wrote: > > > Is there a nice way to trap on file open() and stat() ? > > That way i could have nice file statistics. > > There was a thread about this a couple days ago. > > >http://x52.deja.com/threadmsg_ct.xp?AN=690272012.1&mhitnum=0&CONTEXT=973965178.1986985995 > > Michael > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Issue compiling 2.4test10
Michael Schmitz wrote: > > Would this patch help? > > --- drivers/input/keybdev.c.org Thu Nov 2 10:13:39 2000 > +++ drivers/input/keybdev.c Thu Nov 2 10:19:43 2000 > @@ -36,7 +36,7 @@ > #include > #include > > -#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(__alpha__) || >defined(__mips__) > +#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(__alpha__) || >defined(__mips__) || defined(CONFIG_MAC_HID) > I've tried this on my PowerBook and it doesn't work. The keymap is broken and pressing anything on the keyboard will output something completely different. This is fixed if the "defined(CONFIG_MAC_HID)" gets move the "#elif" part of the "#if" mentionned above. That said, 2 and 3 button emulation is broken for (at least) the PowerBook on test-10. I've tried the echo "1" > /proc/sys/dev/mac_hid/mouse_button_emulation and there's no effect. Anyone know what this is about? Thanks. === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Mac-buttons emulation broken in 2.4.0-test10
The mac_hid_mouse_emulate_buttons() in drivers/macintosh/mac_hid.c which takes care of emulating multiple buttons on a mac doesn't seem to be used anywhere. In fact, by doing a "grep -r mac_hid... *" in the kernel's base directory yields only one result and it's the one in mac_hid.c. Shouldn't this be called upon from the keyboard and mouse handlers? ======= Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: Mac-buttons emulation broken in 2.4.0-test10
Well, it seems I found a solution to my own problem :) Here are patches that fix the problem. Doing this, I discovered there are 2 modes to button emulation (3 if you include no emulation): Mode 0: No emulation whatsoever. Mode 1: echo "1" > /proc/sys/dev/mac_.../mouse_... In this mode, when you press on fct-ctrl or fct-alt, then it's like if you pressed on the corresponding mouse button. Mode 2: echo "2" > /proc/sys/dev/mac_.../mouse_... In this mode, you have to hold down fct-ctrl or fct-alt __and__ click the mouse to get the corresponding mouse button. Cheers Karim --- --- linux/drivers/input/keybdev.c Thu Jul 27 21:36:54 2000 +++ linux-2.4.0-test10/drivers/input/keybdev.c Mon Nov 13 08:19:48 2000 @@ -90,7 +90,7 @@ return 0; } -#elif defined(CONFIG_ADB_KEYBOARD) +#elif defined(CONFIG_ADB_KEYBOARD) || defined(CONFIG_MAC_HID) static unsigned char mac_keycodes[128] = { 0, 53, 18, 19, 20, 21, 23, 22, 26, 28, 25, 29, 27, 24, 51, 48, @@ -129,9 +129,19 @@ } } +#ifdef CONFIG_MAC_EMUMOUSEBTN +extern int mac_hid_mouse_emulate_buttons(int caller, unsigned int keycode, int down); +#endif + void keybdev_event(struct input_handle *handle, unsigned int type, unsigned int code, int down) { if (type != EV_KEY) return; + +#ifdef CONFIG_MAC_EMUMOUSEBTN + /* There should be an if() here to determine whether emulate_raw() is to be +called or not. +If the key is caught, emulate_raw() should not be called. K.Y. */ + mac_hid_mouse_emulate_buttons(1, code, down); +#endif if (emulate_raw(code, down)) printk(KERN_WARNING "keyboard.c: can't emulate rawmode for keycode %d\n", code); --- linux/drivers/input/mousedev.c Tue Aug 22 12:06:31 2000 +++ linux-2.4.0-test10/drivers/input/mousedev.c Mon Nov 13 08:25:41 2000 @@ -79,6 +79,10 @@ static struct mousedev *mousedev_table[MOUSEDEV_MINORS]; static struct mousedev mousedev_mix; +#ifdef CONFIG_MAC_EMUMOUSEBTN +extern int mac_hid_mouse_emulate_buttons(int caller, unsigned int keycode, int down); +#endif + static void mousedev_event(struct input_handle *handle, unsigned int type, unsigned int code, int value) { struct mousedev *mousedevs[3] = { handle->private, &mousedev_mix, NULL }; @@ -132,6 +136,9 @@ case BTN_MIDDLE: index = 2; break; default: return; } +#ifdef CONFIG_MAC_EMUMOUSEBTN + index = mac_hid_mouse_emulate_buttons(2, +index, 0); +#endif switch (value) { case 0: clear_bit(index, &list->buttons); break; case 1: set_bit(index, &list->buttons); break; ------- Karim Yaghmour wrote: > > The mac_hid_mouse_emulate_buttons() in drivers/macintosh/mac_hid.c > which takes care of emulating multiple buttons on a mac doesn't > seem to be used anywhere. In fact, by doing a "grep -r mac_hid... *" > in the kernel's base directory yields only one result and it's > the one in mac_hid.c. Shouldn't this be called upon from the > keyboard and mouse handlers? > === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
[EMAIL PROTECTED] wrote: > One big argument against RAS of any sort is that it bloats the kernel and > not every one wants it (until they have a problem). A further argument with > Linux is that you may have to do quite a bit of hard work to get the subset > of RAS you need to co-exist, if it exists at all. Something we're working > on which may help resolve this, and will be made available with the next > drop of Dynamic Probes is Generalised Kernel Hooks Interface (GKHI). The > idea here is to make all our RAS function the option of being dynamically > loadable kernel modules. In most cases we don't need to modify kernel > function, just get control at the right time. So we place hooks in kernel > source, which remain dormant until activated by the GKHI when a RAS module > asks it to. Maybe this will provide a way out of the difficulty. Sorry for catching this a bit late, but I would like to point out that there already is a generalized kernel hooks interface, that does exactly what is described above, as part of the Linux Trace Toolkit. The hooks inserted in the kernel source don't modify the kernel's behavior, though they can trigger callback functions. To hook onto an event, the following function is used: int trace_register_callback(tracer_call pmTraceFunction, uint8_t pmEventID) Once this is called, the occurrence of the given event will generate a call to the given callback function. Hence the inserted hooks are dormant until used. On top of this callback interface, I am currently in the process of completing a state machine engine that would enable it's user to specify event driven state machines. What does this mean? Well, as Alan had suggested, this could be used to test a driver's actual behavior with the state-machine that models it's theoretical behavior. Furthermore, and I think this is a field open with a lot of very interesting opportunities, state machines could be developed that model intrusions and attacks. Hence, the state machine engine could be used as the basis of a very powerful intrusion detection system. The basic example of this is stack overflows. A lot of very cleaver schemes have been developed in order to detect these types of hacks. Yet, with a state-machine that models the types of attacks being conducted, it wouldn't matter which stack overflowed or who did what since the state machine would catch any unauthorized event sequence and, possibly, kill the culprit process, suspend it or warn the sysadmin. That said, I do think that dynamically inserted probes are useful. As Richard has pointed out, there are situations where this makes a big difference. In a sense, Dprobes could use the architecture already put forward by LTT to log custom events in a system trace and could use the trace hooking mechanism already available to implement whatever RAS function comes on top. For a full discussion on the performance and architecture issues regarding LTT, I invite the interested reader to take a look at the paper I presented last June at the annual Usenix technical conference: http://www.opersys.com/LTT/ltt-usenix.ps.gz And LTT can be found at: http://www.opersys.com/LTT/ Cheers === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
Hello Richard, Part of your analysis is correct. The hooks were designed to take care of static tracepoints only. That said, dynamic allocation of event IDs was next on my list and the hooking mechanism would have been modified consequently. As for "multiple exits registered per hook", if you mean that you can have more than one function called back for each event, then this is already possible. The other items you mention such as atomicity and prioritization seem interesting indeed, although I am not sure what you mean by MP compliant as the only thing that stops the current generalized hooking mechanism to be MP compliant is the insertion of correct locks during callback registration. Please understand that the purpose wasn't to discredit your work, but rather to stop duplication of work as efforts could be deployed elsewhere. I think that your work and the work already done on LTT can be brought together in a way that would profit all. This is what I was hinting to towards the end of the posting. It was an invitation more than anything else. Apart from the hooking mechanism, there were other items which I mentioned that merit discussion, such as the ability to enable dynamic probes to log events in normal LTT traces or the event-driven state machine engine. Hence, if you are interested in joining forces to further enhance probing and tracing capabilities in Linux, I think this would be a good opportunity. Best regards Karim [EMAIL PROTECTED] wrote: > > Yes, we looked at that and it didn't seem to provide the generality we > needed - multipe exits registered per hook, ability to arm a set of hooks > atomically, ability to prioritise dispatching order of a hook exit, MP > complient. I may be wrong but the Linux Trace Toolkit hooks like like they > were specifically designed to cater for inserting static tracepoints into > the kernel. > > Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). > > http://oss.software.ibm.com/developerworks/opensource/linux > Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 > IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK -- ======= Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
Thought I'd let you know that I will reply to your suggestions (which are quite interesting by the way) ... but I need to catch up some sleep as it's close to 7AM here in Montreal and my brains are failing ... ;) === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: The case for a standard kernel debugger
es, there must be a pre-defined set of key events. An example of "other purposes" is using the pre-defined trace points to implement C2 security in the kernel (this was suggested by Alan). Hence, yes I can provide an interface from the kernel to log a trace event with a variable length buffer, but I don't think that taking away the statically defined trace points is the right thing to do. (I might have gotten this completely wrong, though ... My presumption about your suggestion of using Dprobes to "drive" LTT, is that you mean that all events should come from Dprobes and Drpobes alone. I could be wrong). So here's what I suggest: There's already two event types within the events recognized by LTT which had been planned for this type of usage. They are: "New event" and "Custom event". The first is used to declare a new event type and the second is used to log all such events. To declare a new event, the caller would call upon an event ID creation function providing it with an event size. The function would use the "New event" type to declare a new event in the log and would return a unique event ID. Thereafter, the normal tracing function, already available through the LTT kernel patch, could be used to log the new events. This could be used by Dprobes to enable dynamically inserted probe points to be logged within a normal trace and, thereafter, be part of trace analysis. Does this fit your needs? > > Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). > > http://oss.software.ibm.com/developerworks/opensource/linux > Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 > IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK > > Karim Yaghmour <[EMAIL PROTECTED]> on 06/10/2000 09:16:12 > > Please respond to Karim Yaghmour <[EMAIL PROTECTED]> > > To: Richard J Moore/UK/IBM@IBMGB > cc: [EMAIL PROTECTED] > Subject: Re: The case for a standard kernel debugger > > Hello Richard, > > Part of your analysis is correct. The hooks were designed to take care of > static tracepoints only. That said, dynamic allocation of event IDs was > next on my list and the hooking mechanism would have been modified > consequently. > > As for "multiple exits registered per hook", if you mean that you can have > more > than one function called back for each event, then this is already > possible. > The other items you mention such as atomicity and prioritization seem > interesting > indeed, although I am not sure what you mean by MP compliant as the only > thing that stops the current generalized hooking mechanism to be MP > compliant > is the insertion of correct locks during callback registration. > > Please understand that the purpose wasn't to discredit your work, but > rather > to stop duplication of work as efforts could be deployed elsewhere. I think > that your work and the work already done on LTT can be brought together in > a way that would profit all. This is what I was hinting to towards the end > of the posting. It was an invitation more than anything else. > > Apart from the hooking mechanism, there were other items which I mentioned > that merit discussion, such as the ability to enable dynamic probes to log > events in normal LTT traces or the event-driven state machine engine. > Hence, > if you are interested in joining forces to further enhance probing and > tracing > capabilities in Linux, I think this would be a good opportunity. > > Best regards > > Karim > > [EMAIL PROTECTED] wrote: > > > > Yes, we looked at that and it didn't seem to provide the generality we > > needed - multipe exits registered per hook, ability to arm a set of hooks > > atomically, ability to prioritise dispatching order of a hook exit, MP > > complient. I may be wrong but the Linux Trace Toolkit hooks like like > they > > were specifically designed to cater for inserting static tracepoints into > > the kernel. > > > > Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). > > > > http://oss.software.ibm.com/developerworks/opensource/linux > > Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 > > IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK > > -- > === > Karim Yaghmour >[EMAIL PROTECTED] > Operating System Consultant > (Linux kernel, real-time and distributed systems) > === > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Operating System Consultant (Linux kernel, real-time and distributed systems) === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] Please read the FAQ at http://www.tux.org/lkml/
Re: DProbes with LTT
Richard, Definitely a good idea. Enabling the programmer to specify the format of the custom data to be printed would be great. Having this in mind, this is why LTT has two events to enable custom tracing, the "New event" and the "custom event". Therefore, extending the definition of "New event" leaves a lot of possibilities open. Here's what I had in mind for LTT (feel free to comment on this as it is only a design for now): In the creation of a new event, the caller of the "create event ID" function would provide the following information: 1) An event-type string that will mainly be used to identify this amongst the other events. ex: an IRQ entry has a string describing it which is "IRQ entry", it also has a string describing the event in detail, this is the purpose of #2 below. 2) A printf-style string used to print out the formatted event string. ex: "XYZ Driver received unknown event %d on I/O port %03X with error %C" 3) A 0-terminated table containing a structure-type which has 2 entries: -A data-length type (fixed or variable) -A data-length (if fixed) Each entry would describe each of the data types that will be used with the printf-like string ex using the above string: the "%d" would be the first entry with a fixed data-length of 4 bytes, the "%03X" would be the second entry with a fixed data-length of 4 bytes, the "%C" would be the third entry with a fixed data-length of 2 bytes. In the case of a "%s", the data-length type would be "variable". The last entry in the table would be filled with zeros as to show the table's end. As previously mentioned, the "create event ID" would return a unique event Id for the newly created event. With this scheme, recording a custom event would amount to providing the existing trace function with the custom event ID and a pointer to a buffer containing the packed data to be used with the pre-provided string. Using the example above, the caller would pass a buffer containing the following data packed in a single buffer: 4 bytes data for "%d", 4 bytes data for "%03X", 2 bytes for "%C", for a total of a 10 byte-buffer. The tracing function will automatically determine the length of the buffer since it was determined upon event ID creation. In the case that the buffer contained a string, the first word before the string would contain the string size so that the function would determine the exact length of the whole buffer. That said, it must be stressed that using strings in trace statements is expensive given the processing cost of finding out buffer lengths and so on. Therefore, strings should be regarded as a last resort. Once the trace is complete, the trace visualization tool would retrieve the custom events list and read the trace according to those descriptions. It would then output the description strings and the details string to signal the event's occurrence in the trace. To print out the details string, printf or one of it's variants would be provided with the printf-like string, provided upon event-type creation, and the data belonging to the event traced. With the example above, this would be something like: printf("XYZ Driver received unknown event %d on I/O port %03X with error %C", "the 4 bytes given for %d", "the 4 bytes given for %03X" , etc.); This is figurative as the real parameters would most likely be pointers and since the printf call would have a variable amount of parameters (as always). The advantage of using this rather than major-minor code is that the data formatting capabilities provided are exactly the ones most programmers are already familiar with. Though I might have missed some limitations of this scheme that the major-minor code scheme overcomes. What do you think? Karim [EMAIL PROTECTED] wrote: > > Karim, > > I've been back through an initial evaluation we did for LTT, back in May. > One of the feature we highlighted we'd like to see was an ability to > specify custom formatting templates. Our original OS/2 trace facility > allowed the user to generate formatting templates which would specify > printf-like controls. The templates were defined per major-minor code > specification, which was used to identify uniquly a formatting type and was > recorded with the trace record in the header. > > We'd like to see that functionality in LTT. Would port the code from OS/2 > if LTT had a suitable formatting exit for custom events. Any thoughts on > this? > > Richard > > Richard Moore - RAS Project Lead - Linux Technology Centre (PISC). > > http://oss.software.ibm.com/developerworks/opensource/linux > Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 > IB
[ANNOUNCE] Linux Trace Toolkit version 0.9.4
This new release of the Linux Trace Toolkit includes complete support for Linux and RTAI on both ix86 and PPC. With this out, work on other architectures is in its way. Anyone wanting to dig-in is welcomed to do so. Also, 0.9.4 includes all the additions that were made in the 0.9.4preX series. This includes interfacing with DProbes using dynamic event creation and usage of rvmalloc and friends to avoid having to copy large portions of memory from kernel space to user space. In order to encourage exchanges and discussions, I've set up mailing lists for LTT. Please take a look at the "mailing lists" section of the project's web-site for more detail. You can find LTT at: http://www.opersys.com/LTT Cheers, Karim Yaghmour === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[ANNOUNCE] Linux Trace Toolkit 0.9.5pre1
LTT 0.9.5pre1 is out. As the name says, this is a development version and should be treated as such. Only one kernel is supported with 0.9.5pre1, linux 2.4.0-test10. What it includes: -Cross-platform reading capability submitted by Andy Lowe -Visualizer enhancements submitted by Rocky Craig -Patch fixes by Peng Dai and Bob Montgomery -Many bug fixes seen using the "-Wall" flag to build the user tools The trace format has changed again to support cross-platform reading capabilities. 0.9.5pre1 has no support for RTAI. pre2 will include the cross- platform capabilities for RTAI. Here's what should be in pre2: -Support for 2.2.18/2.4.2 -Support for the latest RTAI, including cross-platform capabilities -Benchmark fixes from Rocky Craig -SH support by Greg Banks Check the project's web-site for details on 0.9.5pre1: http://www.opersys.com/LTT Cheers, Karim === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Ingo Molnar wrote: > So why do your "ping flood" results show such difference? It really is > just another type of interrupt workload and has nothing special in it. ... > are you suggesting this is not really a benchmark but a way to test how > well a particular system withholds against extreme external load? Look, you're basically splitting hairs. No matter how involved an explanation you can provide, it remains that both vanilla and I-pipe were subject to the same load. If PREEMPT_RT consistently shows the same degradation under the same setup, and that is indeed the case, then the problem is with PREEMPT_RT, not the tests. > so you can see ping packet flow fluctuations in your tests? Then you > cannot use those results as any sort of benchmark metric. I didn't say this. I said that if fluctuation there is, then maybe this is something we want to see the effect of. In real world applications, interrupts may not come in at a steady pace, as you try to achieve in your own tests. > and from this point on you should see zero lmbench overhead from flood > pinging. Can vanilla or I-PIPE do that? Let's not get into what I-pipe can or cannot do, that's not what these numbers are about. It's pretty darn amazing that we're even having this conversation. The PREEMPT_RT stuff is being worked on by more than a dozen developers spread accross some of the most well-known Linux companies out there (RedHat, MontaVista, IBM, TimeSys, etc.). Yet, despite this massive involvement, here we have a patch developed by a single guy, Philippe, who's doing this work outside his regular work hours, and his patch, which does provide guaranteed deterministic behavior, is: a) Much smaller than PREEMPT_RT b) Less intrusive than PREEMPT_RT c) Performs very well, as-good-as if not sometimes even better than PREEMPT_RT Splitting hairs won't erase this reality. And again, before the I get the PREEMPT_RT mob again on my back, this is just for the sake of argument, both approaches remain valid, and are not mutually exclusive. Like I said before, others are free to publish their own numbers showing differently from what we've found. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Andrew Morton wrote: > Still, first let us get a handle on who wants relayfs now and in the future > and for what. Then we can better decide. We used relayfs for our series of tests on PREEMPT_RT and I-Pipe. Specifically, we used relayfs buffers to store the timestamps for our interrupt latency measurements. This allowed us to easily have access to very large buffering areas without having to worry about any form of detailed resource allocation, or runtime overhead of logging. IOW, it allowed us to concentrate on our main priority: log a very large amount of timestamps. On the LTT side, relayfs is bound to be at the center of whatever architecture we settle on for the ongoing rewrite. For having used it for past releases of LTT, we know that it can handle very heavy data throughput with little overhead using a relatively simple API. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH wrote: > What ever happened to exporting the relayfs file ops, and just using > debugfs as your controlling fs instead? As all of the possible users > fall under the "debug" type of kernel feature, it makes more sense to > confine users to that fs, right? Actually, like we discussed the last time this surfaced, there are far more users for relayfs than just debugging. What we settled on was having relayfs export its file ops so that indeed debugfs users could use it to log things in conjunction with debugfs. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH wrote: > Based on the proposed users of this fs, I don't see any. What ones are > you saying are not "debug" type operations? And yes, I consider LTT a > "debug" type operation :) > > The best part of this, is it gives distros and users a consistant place > to mount the fs, and to know where this kind of thing shows up in the fs > namespace. Except that relayfs contains files that all behave in a very specific way: as relayfs buffers, while debugfs may contain a variety of different types of files. I kind'a see what you're trying to say, and I fully understand that some debugfs users may indeed use the relayfs fileops to add an entry in debugfs which serves as a buffer, and that's the very reason we exported them to boot. But there's something to be said about having a single filesystem (and therefore tree somewhere in /) which contains entries dedicated to a single purpose: dump huge amounts of data out of the kernel and into userspace whether or not the system is being debuged. >From a user point of view, it sounds awfully weird if they're using "debugfs" on a production system ... > Last I looked, this was not possible. Has this changed in the latest > version? Here's from 2.6.13-rc2-mm1 fs/relayfs/inode.c > +EXPORT_SYMBOL_GPL(relayfs_open); > +EXPORT_SYMBOL_GPL(relayfs_poll); > +EXPORT_SYMBOL_GPL(relayfs_mmap); > +EXPORT_SYMBOL_GPL(relayfs_release); > +EXPORT_SYMBOL_GPL(relayfs_file_operations); > +EXPORT_SYMBOL_GPL(relayfs_create_dir); > +EXPORT_SYMBOL_GPL(relayfs_remove_dir); It's been there ever since you've asked for it earlier this year :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Greg KH wrote: > The path/filename dictates how it is used, so putting relayfs type files > in debugfs is just fine. debugfs allows any types of files to be there. ... > New trees in / are not LSB compliant, hence the reason for writing > securityfs to get rid of /selinux and other LSM filesystems that were > starting to sprout up. ... > But that's exactly what debugfs is for, to allow data to be dumped out > of the kernel for different usages. ... > Ok, have a better name for it? It's simple and easy to understand. It also carries with it the stigma of "kernel debugging", which I just don't see production system maintainers liking very much. So tell you what, how about if we merged what's in debugfs into relayfs instead? We'll still end up with one filesystem, but we'll have a more inocuous name. After all, if debugfs is indeed for dumping data from the kernel to user-space for different usages, then relaying is what it's actually doing, right? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tomasz KÅ‚oczko wrote: > *NOT using realyfs* if it is not neccessary for possibly big amout > of feactures future KProbes IMO in this case is *fundamental*. > > To time where this base not requiring relayfs feactures will not be > integrated in kernel code better IMO will be stop merging relayfs. This part of the thread is really veering off-topic. This counters thing is your own personal crusade and has nothing to do with the fundamental need for a generic buffering mechanism such as relayfs. I would suggest you start a separate thread to discuss the implementation of a generic counters mechanism, if that's indeed what you're interested in. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Roman Zippel wrote: > The point is to design a simple and flexible relayfs layer, which means > not every possible function has to be done in the relayfs layer, as long > it's flexible enough to build additional functionality on top of it (for > which it can again provide some library functions). I guess I just don't get the point here. Why cut something away if many users will need it. If it's that popular that you're ready to provide a library function to do it, then why not just leave it to boot? One of the goals of relayfs is to avoid code duplication with regards to buffering in general. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Weird USB errors on HD
I have a usb-attached HD that I use from time to time. When it's connected to my desktop through a hub it works flawlessly. When connected to my Dell D600 Laptop, however, it sometimes randomly exhibits a loud click (as if the heads went berzerk) and the device goes unrecognized (i.e. the USB layer drops the device and then redetects it again; meanwhile there is FS corruption.) The same behavior happens with 2.4.x and 2.6.x In /var/log/messages I see something like: hub 3-0:1.0: over-current change on port 1 hub 1-0:1.0: over-current change on port 3 ... usb 1-3: USB disconnect, address 2 usb 1-3: new high speed USB device using ehci_hcd and address 3 ... usb-storage: device found at 3 usb-storage: waiting for device to settle before scanning This doesn't seem too good. Here's the complete passage from /var/log/messages: SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 384296 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 384296 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 384296 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 384296 EXT3-fs error (device sda): ext3_free_branches: Read failure, inode=1046532, block=48037 Aborting journal on device sda. SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 4176 printk: 813 messages suppressed. Buffer I/O error on device sda, logical block 522 lost page write due to I/O error on sda SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 lost page write due to I/O error on sda EXT3-fs error (device sda) in ext3_reserve_inode_write: Journal has aborted SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 lost page write due to I/O error on sda EXT3-fs error (device sda) in ext3_reserve_inode_write: Journal has aborted SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 lost page write due to I/O error on sda EXT3-fs error (device sda) in ext3_orphan_del: Journal has aborted SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 lost page write due to I/O error on sda EXT3-fs error (device sda) in ext3_truncate: Journal has aborted SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 0 Buffer I/O error on device sda, logical block 0 lost page write due to I/O error on sda ext3_abort called. EXT3-fs error (device sda): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 3254080 hub 3-0:1.0: over-current change on port 1 hub 1-0:1.0: over-current change on port 3 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 3254088 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 3254096 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 3254104 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 3254088 SCSI error : <0 0 0 0> return code = 0x7 end_request: I/O error, dev sda, sector 3254088 usb 1-3: USB disconnect, address 2 scsi0 (0:0): rejecting I/O to device being removed Buffer I/O error on device sda, logical block 458754 lost page write due to I/O error on sda scsi0 (0:0): rejecting I/O to device being removed Buffer I/O error on device sda, logical block 517070 lost page write due to I/O error on sda scsi0 (0:0): rejecting I/O to device being removed Buffer I/O error on device sda, logical block 1 lost page write due to I/O error on sda scsi0 (0:0): rejecting I/O to device being removed Buffer I/O error on device sda, logical block 393218 lost page write due to I/O error on sda scsi0 (0:0): rejecting I/O to device being removed scsi0 (0:0): rejecting I/O to device being removed scsi0 (0:0): rejecting I/O to device being removed scsi0 (0:0): rejecting I/O to device being removed scsi0 (0:0): rejecting I/O to device being removed scsi0 (0:0): rejecting I/O to dead device EXT3-fs error (device sda): ext3_find_entry: reading directory #228929 offset 0 scsi0 (0:0): rejecting I/O to dead device EXT3-fs error (device sda): ext3_find_entry: reading directory #1046529 offset 0 usb 1-3: new high speed USB device using ehci_hcd and address 3 scsi1 : SCSI emulation for USB Mass Storage devices usb-storage: device found at 3 usb-storage: waiting for device to settle before scanning scsi0 (0:0): rejecting I/O to dead device EXT3-fs error (device sda): ext3_find_entry: reading directory #196225 offset 0 scsi0 (0:0): rejecting I/O to dead device EXT3-fs error (device sda): ext3_find_entry: reading
Re: Weird USB errors on HD
Greg KH wrote: > Ugh, you have a bad device or power supply, or aren't giving it enough > power to drive the thing. Nothing we can do in Linux for that, sorry. > Buy a wall-powered usb hub, that usually helps. I have one. I naively thought I could just plug the drive directly to the laptop without using the wall-powered hub. I'll try that instead. Thanks. That being said, shouldn't there be a way for the kernel to refuse to use this hd if it's not getting enough power. I don't know enough about USB to say, but isn't there something more elegant that could be done in software? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi wrote: > - removed the deliver() callback > - removed the relay_commit() function This breaks LTT. Any reason why this needed to be removed? In the end, the code will just end up being duplicated in ltt and all other users. IOW, this is not some potential future use, but something that's currently being used. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Re: relayfs documentation sucks?
Christoph Hellwig wrote: > That beein said I wish LTT folks would make a little more progress so > we could actually include it. We're working on it. On the topic of revamping LTT, 3 different people came up with 3 different implementations. Following your feedback on the patch I sent a few weeks back, I headed out asking myself "what is the bare-minimum tracing functionality that will actually fly while still being flexible enough to add to it?" I spent some time at the OLS comparing notes with others interested in this area, and I think we've got something that should fit the bill. We should be able to post something sooner rather than later. Now if only I could remember what I talked about after I left the Black Thorn at 2h45am and the guy in the elevator at Les Suites pressed on a button and said "'M' for more beer" ... Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Weird USB errors on HD
Alistair John Strachan wrote: > You can get special USB cables that link two USB ports' 5Vs together in > parallel, which seems to help supply the necessary current; after the HD has > spun up you can remove the second "dummy" USB connector (my laptop only has > two USB ports and I require the second port). Yeah, there was one of these in the box with the drive, but the first time I saw it I remember thinking: what the hell is this thing? Then when I figured it out, I found myself wondering whether the USB interface was ever planed for such a such and whether it wouldn't have been better to just ship a real adapter with the thing ... Anyhow, I will not be using the drive anymore without a powered hub. Thanks for all those that helped, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Merging relayfs?
Tom Zanussi wrote: > In userspace, the sub-buffer reading loop looks at the commit value in > the sub-buffer, and if it matches (sub-buffer size - padding), the > buffer has been completely written and can be saved, otherwise it's > not yet complete and is checked again the next time around. This way, > there's no need for a deliver() callback, the relay_commit() is > replaced with the increment of the reserved commit value, the arrays > aren't needed and you get the same result in the end in a much simpler > way, IMHO. Actually this has a much greater potential of loosing buffers because we have to poll the buffer for completion. Seen another way, the kernel- side has got to wait until the user-side has "figured out" that it needs to commit content to disk. As it was originally, it was relatively straightforward to dertermine why data was lost: ok, we've signaled it from kernel space, but the daemon never flushed it out. Without commit/ deliver, things are much less clear, and I still miss what gain we are making by removing them. I would very much like to see the commit/deliver functionality back. Such mechanisms are required for any sane producer-consumer model. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH/RFC] Significantly reworked LTT core
Christoph Hellwig wrote: > We're not gonna add hooks to the kernel so you can copile the same > horrible code you had before against it out of tree. Do a sane demux > and submit it. If I just wanted hooks, I would have submitted a patch that did just that, without any logging function. The code for the mux that goes on top of that code is actually on its way to be completely rewritten. I can see that you may have read my posting as indicating that we were recompiling the same previous code out of tree, but that is certainly not the intent. FWIW, we'll look submitting a minimal mux with the patch. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Missing attachment herein included. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 L M B E N C H 2 . 0 S U M M A R Y Processor, Processes - times in microseconds - smaller is better null null open signal signalforkexecve /bin/sh kernel call I/O statfstatclose install handle process process process - --- --- --- --- --- --- --- --- --- --- HIGHMEM-RT-V0.7.50-35 0.18 0.2947 3.02 0.42 3.62 0.59 1.98 156 448 1481 NOHIGHMEM-RT-V0.7.50-35 0.18 0.28635 2.91 0.42 3.70 0.58 2.02 111 383 1372 HIGHMEM-RT-V0.7.51-02 0.18 0.27045 2.47 0.39 3.02 0.56 1.75 103 372 1352 NOHIGHMEM-RT-V0.7.51-02 0.18 0.2673 2.36 0.39 2.77 0.56 1.72 90 351 1328 File select - times in microseconds - smaller is better --- select select select select select select select select kernel 10 fd 100 fd 250 fd 500 fd 10 tcp 100 tcp 250 tcp 500 tcp - --- --- --- --- --- --- --- --- HIGHMEM-RT-V0.7.50-35 1.29 5.7013.2125.76 1.49 7.8809 18.6905 na NOHIGHMEM-RT-V0.7.50-35 1.26 5.6913.2525.84 1.47 na na na HIGHMEM-RT-V0.7.51-02 1.01 3.88 8.8217.08 1.24 na 14.1979 27.8158 NOHIGHMEM-RT-V0.7.51-02 1.02 3.90 8.8417.12 1.30 6.0573 na na Context switching with 0K - times in microseconds - smaller is better - 2proc/0k 4proc/0k 8proc/0k 16proc/0k 32proc/0k 64proc/0k 96proc/0k kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch - - - - - - - - HIGHMEM-RT-V0.7.50-35 4.87 5.55 5.01 4.47 4.00 4.45 5.13 NOHIGHMEM-RT-V0.7.50-35 3.25 3.92 3.53 3.10 2.96 3.46 4.09 HIGHMEM-RT-V0.7.51-02 2.70 3.48 3.51 3.50 3.36 3.93 4.82 NOHIGHMEM-RT-V0.7.51-02 1.86 2.23 2.41 2.41 2.41 3.02 3.92 Context switching with 4K - times in microseconds - smaller is better - 2proc/4k 4proc/4k 8proc/4k 16proc/4k 32proc/4k 64proc/4k 96proc/4k kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch - - - - - - - - HIGHMEM-RT-V0.7.50-35 5.48 4.75 4.47 4.76 4.68 5.90 7.24 NOHIGHMEM-RT-V0.7.50-35 3.88 4.54 4.02 3.91 4.04 4.93 5.85 HIGHMEM-RT-V0.7.51-02 3.25 3.59 3.85 3.89 4.18 5.41 6.75 NOHIGHMEM-RT-V0.7.51-02 2.70 3.01 2.99 3.04 3.31 4.56 6.16 Context switching with 8K - times in microseconds - smaller is better - 2proc/8k 4proc/8k 8proc/8k 16proc/8k 32proc/8k 64proc/8k 96proc/8k kernel ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch ctx swtch - - - - - - - - HIGHMEM-RT-V0.7.50-35 6.09 5.31 5.22 5.09 5.68 7.82 8.87 NOHIGHMEM-RT-V0.7.50-35 4.51 5.08 4.54 4.36 4.44 6.49 7.75 HIGHMEM-RT-V0.7.51-02 3.85 4.01 4.20 4.31 5.27 7.38 8.51 NOHIGHMEM-RT-V0.7.51-02 3.05 3.49 3.53 3.60 3.99 6.37 7.56 Context switching with 16K - times in microseconds - smaller is better -- 2p
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Paul Rolland wrote: >>mmap | 794us | 654us (+18%) | 822us (+4%) > > You mean -18%, not +18% I think. Doh ... too many numbers flying around ... yes, -18% :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Ingo Molnar wrote: > yeah, they definitely have helped, and thanks for this round of testing > too! I'll explain the recent changes to PREEMPT_RT that resulted in > these speedups in another mail. Great, I'm very much looking forward to it. > Looking at your numbers i realized that the area where PREEMPT_RT is > still somewhat behind (the flood ping +~10% overhead), you might be > using an invalid test methodology: I've got to smile reading this :) If one thing became clear out of these threads is that no matter how careful we are with our testing, there is always something that can be criticized about them. Take the highmem thing, for example, I never really bought the argument that highmem was the root of all evil ;) , and the last comparison we did between 50-35 and 51-02 with and without highmem clearly showed that indeed while highmem is a factor, there are inherent problems elsewhere than the disabling of highmem doesn't erase. Also, both vanilla and I-pipe were run with highmem, and if they don't suffer from it, then the problem is/was with PREEMPT_RT. With ping floods, as with other things, there is room for improvement, but keep in mind that these are standard tests used as-is by others to make measurements, that each run is made 5 times, and that the values in those tables represent the average of 5 runs. So while they may not be as exact as could be, I don't see why they couldn't be interpreted as giving us a "good idea" of what's happening. For one thing, the heavy fluctuation in ping packets may actually induce a state in the monitored kernel which is more akin to the one we want to measure than if we had a steady flow of packets. I would usually like very much to entertain this further, but we've really busted all the time slots I had allocated to this work. So at this time, we really think others should start publishing results. After all, our results are no more authoritative than those published by others. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Karim Yaghmour wrote: > I would usually like very much to entertain this further, but we've > really busted all the time slots I had allocated to this work. So at > this time, we really think others should start publishing results. > After all, our results are no more authoritative than those > published by others. BTW, we've also released the latest very of the LRTBF we used to publish these latest results, so others can it a try too :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PREEMPT_RT and I-PIPE: the numbers, part 4
Can't type right anymore ... Karim Yaghmour wrote: > BTW, we've also released the latest very of the LRTBF we used to version Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: list patches in kernel
Brad Tilley wrote: > Is there an easy way to make a running kernel display how it has been > patched from vanilla? Probably not, but I thought I'd ask. This issue does come up every so often. If you look in the archives you should find some info about this, including a patch if my memory is correct. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Average instruction length in x86-built kernel?
I'm wondering if anyone's ever done an analysis on the average length of instructions in an x86-built kernel. Googling around, I can find references claiming that the average instruction length on x86 is anywhere from 2.7 to 3.5 bytes, but I can't find anything studying Linux specifically. Just curious, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Average instruction length in x86-built kernel?
Hello Ingo, Ingo Oeser wrote: > Just study the output od objdump -d and average the differences > of the first hex number in a line printed, which are followed by a ":" Here's a script that does what I was looking for: #!/bin/bash # Dissassemble objdump -d $1 -j .text > $2-dissassembled-kernel # Remove non-instruction lines: sed /^[^c].*/d $2-dissassembled-kernel > $2-stage-1 # Remove empty lines: sed /^'\t'*$/d $2-stage-1 > $2-stage-2 # Remove function names: sed /^c[0-9,a-f]*' '\<.*\>:$/d $2-stage-2 > $2-stage-3 # Remove addresses: sed s/^c[0-9,a-f]*:'\t'// $2-stage-3 > $2-stage-4 # Remove instruction text: sed s/'\t'.*// $2-stage-4 > $2-stage-5 # Remove trailing whitespace: sed s/'\s'*$// $2-stage-5 > $2-stage-6 # Separate instructions depending on size: egrep "([0-9a-f]{2}[' ']*){5}" $2-stage-6 > $2-more-or-eq-5 egrep "^([0-9a-f]{2}[' ']*){0,4}$" $2-stage-6 > $2-less-or-eq-4 # Find out how much of each we've got: wc -l $2-stage-6 wc -l $2-more-or-eq-5 wc -l $2-less-or-eq-4 The last part can easily be changed to iterate through and separate those that are 1 byte, 2 bytes, etc. and automatically come up with stats, but this was fine for what I was looking for. I ran it on a 2.4.x and a 2.6.x kernel and about 3/4 of instructions are 4 bytes or less. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relayfs question
Jan Engelhardt wrote: > Ok, urandom was a bad example. I have my tty logger (ttyrpld.sf.net) which > moves a lot of data (depends) to userspace. It uses a ring buffer of "fixed" > size (set at module load time). Apart from that relayfs could use a dynamic > sized ring buffer, I would not see any need to move it to relayfs, would you? First, please note that the info on Opersys' site is out-of-date. While it was relevant while we were still maintaining relayfs separately, it has somewhat lost its relevance since we started posting the most up-to- date code directly to LKML. For one thing, the dynamic resizing was dropped very early in relayfs' inclusion review. What relayfs does, and does very well, is move very large amounts of data out of the kernel and make them available to user-space with very little overhead. In the actual case of your tty logger, I've browsed through the code briefly, and I think that with relayfs you should be able to: - Get rid of half the code: - No need to manage your own user/kernel-buffer boundary (Most of the code in uio_*()). - No need to do any buffer management at all. - Get better performance out of your logging functions. - Get per-cpu buffers for free. Basically, all the transport code you are doing in the kernel side of your logger would be taken care of by relayfs. And given that there are a lot of people doing similar ad-hoc buffering code, it just makes sense to have one well-tested yet generic mechanism. Have a look at Documentation/filesystems/relayfs.txt for the API details. On a separate yet related topic: Looking closer at rpldev.c, I believe that you'll be able to get rid of it entirely (or very close to) once I actually get the time to refactor the tracing code in LTT to make it generic. What I intend to do is to obsolete the need for functions like your kio_*, and make it all automatically generated at build time (you'll still to add the instrumentation, but won't need to hand-code the callbacks). This is still on the top of my to-do list and I should be able to get to this shortly. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relayfs question
Karim Yaghmour wrote: > What relayfs does, and does very well, is move very large amounts of > data out of the kernel and make them available to user-space with very > little overhead. In the actual case of your tty logger, I've browsed > through the code briefly, and I think that with relayfs you should be > able to: Just to avoid any confusion, note that I'm referring mainly to rpldev.c, which is the kernel-side driver for the logger, I haven't looked at any of the user tools. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Relayfs question
Jan Engelhardt wrote: > Well, what about things like urandom? It also moves "a lot" of data and does > nothing else. Forgive my slowness today, but I don't get the angle here: - Relayfs is not a replacement for char devices, we've never claimed it to be. - Urandom generates a lot of data, and uses copy_to_user() to get it to user-space, but it isn't a generalized buffering mechanism for transfering large amounts of data to user-space. If what you're inquiring about is a comparison between relayfs' mechanisms and the underlying mechanisms that urandom is using, then I don't think there can be a comparison: the goals are different. For example, urandom relies on a global spin lock and uses copy_to_user() for its transfers. This is just fine for this type of application. If you wanted to transfer a huge amount of data from the kernel to user- space (the kind of data generated by tracing facilities, for example), however, these mechanisms would be simply inadequate. If we're generating the amount of data LTT can gather, for example, (say 2MB/s as was described in the earlier thread regarding relayfs), then you need per-cpu buffering and you need to not write anything back to user-space, but dump it to disk ASAP, etc. This is where relayfs comes in handy. On the other hand, using relayfs to replace what urandom currently uses is just the wrong thing to do. If nothing else, /dev/urandom would behave entirely differently (API, dynamics, etc.). There would also be no clear added benefit for using relayfs. What character drivers do (mainly copy_to_user()) and what relayfs is used for are entirely different. To use a slightly exagerated example to illustrate the difference: replacing the standard mechanisms drivers use to transfer data to user-space with relayfs would be like renting a supersonic jet to get your package to a foreign country instead of just using Fedex. It works ... but it's clearly the wrong approach. Please read relayfs.txt. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Andi Kleen wrote: > It's doing a complicated function call which does who knows what in > the logging fast path (I stopped reading after some point) > It definitely is not putc ! I was anticipating some people would have this requirement, and this is why I introduced the ad-hoc mode. Roman asked me to get rid of it because nobody had yet asked for it, so this is why it was dropped. As it is, the implementation you are suggesting is insufficient for LTT, which is relayfs' first formal client. I think that it would be better to provide two underlying mechanisms for relayfs at this point, we had already stripped it as thin as it would go for things like LTT. >>separate grabbing a slot in the buffer from the memcpy because some >>applications such as ltt want to be able to directly write into the >>slot without having to copy it into another buffer first. How about > > > If the inline function to log was fast enough it wouldn't need > any such hacks. Actually that's not true. There are two problems with this statement: a- It requires prepackaged data units. b- It's only useful for fixed-size data units. Any efficient client that has complex data units will want to write directly into the buffer instead of creating an intermediate package which is then memcopied. With the modified code, we are now forced to create an intermediate package, which is wrong. Also, if the client wants to log variable-size events, he would have to re-implement lots of the writing code. Note that I really think relay_write() should be dropped altogether. Clients should call on relay_reserve() and do whatever is necessary after that. > Note that gcc is quite good at optimizing memcpy, so essentially > when you e.g. do log(singleint) it should be roughly equivalent > to a int store into the buffer + the check if there is enough > buffer space. I understand the point you are trying to make, but I really think that this is best implemented as two separate buffering schemes instead of breaking the existing one (which had already been trimmed down quite thin following Roman's input.) > You could avoid the local_irq_save() if you use separate interrupt > buffers that are only accessed in non nesting interrupt context > (like softirqs) That would require a sorting step at output though. Not > sure if it's worth it. The problem is that hardirqs can nest anyways, > so it wouldn't work for them. However a lot of important code runs > in softirq (like the network stack) where this is true. For the kind of data sizes we are looking at for LTT (100GBs) splitting buffers is not viable anyway. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Tom Zanussi wrote: > OK, makes sense to me - I'll get rid of relay_reserve and replace it > with the simple putc write and variant. Please don't do that. Instead, bring back the ad-hoc mode code, that's what is was for anyway. > You could just create and log into a separate relayfs channel, if you > wanted to. Not sure we need to add anything special to support that. Postprocessing doesn't solve world famine ;) As far as LTT goes, splitting events like this makes it impossible to read large traces. Other clients are free to do as they wish. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Tom Zanussi wrote: > I don't think they need to be mutually exclusive - we could keep > relay_reserve(), but the relay_write() that's currently built on top > of relay_reserve() would use the putc code instead. It's complicating > the API a bit, but if it makes everyone happy... Actually I think that this would be a much better use of relay_write(), which is unlikely to be used by any client that requires relay_reserve() to start with. Also, I don't think it complicates the API at all. Compared to the original API, what we've got now is very simple. So it basically boils down to: - use relay_write() if you want putc-like functionality. - use relay_reserve() if you want to reserve space and write separately. This is even better than having a separate ad-hoc mode. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Greg KH wrote: > On Fri, Jan 28, 2005 at 01:38:22PM -0600, Tom Zanussi wrote: > >>+extern void * alloc_rchan_buf(unsigned long size, >>+ struct page ***page_array, >>+ int *page_count); >>+extern void free_rchan_buf(void *buf, >>+struct page **page_array, >>+int page_count); > > > As these will be "polluting" the global namespace of the kernel, could > you add "relayfs_" to the front of them? BTW, these functions are in buffers.h which is an internal header to fs/relayfs/*.c files. buffers.h is not included in anything outside. Correct me if I'm wrong, but there is no namespace pollution in that case, right? All that does contribute to namespace pollution is in include/linux/relayfs_fs.h. Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux, part 2
Greg KH wrote: > When relayfs is built into the kernel, those symbols are then global to > the whole static kernel. > > Please be nice and rename them. My pleasure :) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs crash
Kingsley Cheung wrote: > To solve the problem I applied a patch similar to the one you posted > back in July and it fixed the problem. Could we consider putting this > patch into relayfs? Its similar to the one posted in July 2004, except > it also moves clear_readers() before INIT_WORK in relay_release (is > that acceptable?). Tom, correct me if I'm wrong but these fixes were integrated in the first relayfs redux I sent to LKML a few weeks back, right? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Hello Thomas, I don't mind having a general discussion about instrumentation, but it has to be understood that the topic is so general and means so many different things to different people that we are unlikely to reach any useful consensus. Believe me, it's not for the lack of trying. More below. Thomas Gleixner wrote: > :D > One of those backends is LTT+relayfs. > I really respect the work you have done there, but please accept that I > just see the limitations and try to figure out a way to make it more > generic and flexible before it is cemented into the kernel and makes it > hard to use for other interesting instrumentation aspects and maybe > enforces redundant implementation of infrastructure related > functionality. > > E.g. tracking down timing related issues can make use from such > functionality if the infrastructure is provided seperately. > I guess a lot of developers would be happy to use it when it is already > around in the kernel and it can help testers for giving better > information to developers. I would invite you to review the history behind LTT and the history behind the efforts to get LTT integrated in the kernel (which are two separate topics.) If you look back, you will see that I worked very hard trying to get people to think about a common framework and that I and others made numerous suggestions in this regard. Here are a few examples: - DProbes (kprobes ancestor): Shortly after dprobes came out in 2000, I was one of the first to suggest that there could be interfacing between both to allow dynamically added trace points. We worked with, and eventually joined forces with, the IBM team working on this and very early on, LTT and DProbes were interfacing: http://marc.theaimsgroup.com/?l=linux-kernel&m=97079714009328&w=2 - OProfile: When time came to integrate oprofile in the kernel, I tried to push for oprofile to use ltt as it's logging engine (to John's utter horror.) relayfs didn't exist at the time, and obviously oprofile made it in without relying on ltt. Here's a posting from July 2002 where I suggested oprofile rely on ltt. In that same posting I listed a number of drivers/subsystems that already contained tracing statements. Obviously I was pointing out that there was an opportunity to create a common, uniform infrastructure based on ltt: http://marc.theaimsgroup.com/?l=linux-kernel&m=102624656615567&w=2 - Syscalltrack: In replying to a posting of someone looking for tracing info, there was a brief discussion as to how syscalltrack could use ltt instead of: a) redirecting the syscall table, b) have its own buffering mechanism. Again, relayfs didn't exist at the time: http://marc.theaimsgroup.com/?l=linux-kernel&m=102822343523369&w=2 - Event logging: When there was discussion about event logging, there was suggestion to use ltt's engine. Again, relayfs wasn't there: http://marc.theaimsgroup.com/?l=linux-kernel&m=101836133400796&w=2 And there are many other cases. As you can see, it's not as if I didn't try to have this discussion before. Unfortunately, interest in this was rather limited. In addition, and this is a very important issue, quite a few kernel developers mistook LTT for a kernel debugging tool, which it was never meant to be. When, in fact, if you ask those who have looked at using it for that purpose (try Marcelo or Andrea) you will see that they didn't find it to be appropriate for them. And rightly so, it was never meant for that purpose. Even lately, when I suggested Ingo try using relayfs instead of his custom tracing code for his preemption work, he looked at it and said that it wasn't suited, but would consider reusing parts of it if it were in the kernel. So, in general, one thing I learned over the years is to not touch the topic of kernel debugging even with a 10 foot poll when discussing LTT. What you are hinting at here (mention of developers vs. testers, for example), and your stated preference for the type of ring-buffer you described earlier clearly goes in the direction I've learned to avoid: buffering support for the general purpose of kernel debugging. Let me say outright that I see the relevance of what you are looking for, but let me also say that what we tried to achieve with relayfs is to provide a general mechanism for kernel subsystems that need to convey large amounts of data to user-space. We did not attempt to solve the problem of providing a buffering framework for core kernel debugging. As I mentioned to Ingo in the mail I referred to earlier regarding the type of buffering you are looking for: > The above tracer may indeed be very appropriate for kernel development, > but it doesn't provide enough functionality for the requirements of > mainstream users. If there is interest for using either relayfs and/or ltt for that purpose, then this is an entirely different mandate and a few things would need to be added for that to happen. For starters, we could add another mode to relayfs. Currently, it supports a locking and
Re: 2.6.11-rc1-mm1
Hello Thomas, In the interest of avoiding expanding the thread too thin, I'm replying to both emails in the same time. Thomas Gleixner wrote: >>relayfs is a generalized buffering mechanism. Tracing is one application >>it serves. Check out the web site: "high-speed data-relay filesystem." >>Fancy name huh ... > > > I do not doubt that. > > But hardwiring an instrumentation framework on it is also hardwiring > implicit restrictions on the usability of the instrumentation for > certain purposes. To a certain extent this is true. Please refer to my reply to your RFC for a discussion of this. >>Well for one thing, a portion of code running in user-context won't >>disable interrupts while it's attempting to get buffer space, and >>therefore won't impact on interrupt delivery. > > > The do {} while loops are in the fast ltt_log_event path You mean that it would impact on interrupt deliver? This code's behavior has actually been carefully studied, and what has been seen is that there code almost never loops, and when it does, it very rarely does it more than twice. In the case of an interrupt, you'd have to receive an interrupt while reserving space for logging a current's interrupt occurrence for the loop to be done twice. I've CC'ed Bob Wisniewski on this as he's the one that implemented this code and studied its behavior in depth. > Yeah, did you answer one of my arguments except claiming that I'm to > stupid to understand how it works ? If I miss-spoke, then I appologize. For one thing, I've never thought of you as stupid. I'm just trying to get specifics here. > I just dont like the idea, that instrumentation is bound on relayfs and > adds a feature to the kernel which fits for a restricted set of problems > rather than providing a generic optimized instrumentation framework, > where one can use relayfs as a backend, if it fits his needs. Making > this less glued together leaves the possibility to use other backends. Yes, I understand and I hope my other mail properly addresses this issue. > There is a loop in ltt_log_event, which enforces the processing of each > event twice. Spliting traces is postprocessing and can be done > elsewhere. Sorry, this is not postprocessing. Let me explain: Basically, the ltt framework allows only one tracing session to be active at all times. IOW, if you were planning on starting a 2 week trace and after doing so wanted to trace a short 10s on an application then you are screwed, LTT won't allow you to do that. Currently this is a limitation which we haven't heard any complaints about, so we're not going to generalize it until there is proof that people really need this. However, there are cases where you want to have tracing running at _all_ times in what is refered to as flight-recorder mode and only dump the content of the buffers when something special happens. Yet, those who are interested in having this 24x7 mode also know enough about tracing that they do need to actually trace other things for short periods without disrupting their flight-recording. That's why there's a loop. An event will be processed twice only if you're tracing AND flight- recording in the same time. There is no way to do an equivalent of what I just described with any form of postprocessing. Here's the proper snippet from include/linux/ltt-events.h: /* We currently support 2 traces, normal trace and flight recorder */ #define NR_TRACES 2 #define TRACE_HANDLE0 #define FLIGHT_HANDLE 1 > In _ltt_log_event lives quite a bunch of if(...) processing decisions > which have to be evaluated for _each_ event. Correct, and I'm honest enough with myself to admit that this is the bit of code that I think needs the most reviewing. So, in order to help you help me, here's the various code snippets and things I can think of which would help make the code faster/simpler: Here's the preamble where we check some make some basic sanity checks: if (!trace) return -ENOMEDIUM; if (trace->paused) return -EBUSY; tracer_handle = trace->trace_handle; if (!trace->flight_recorder && (trace->daemon_task_struct == NULL)) return -ENODEV; channel_handle = trace_channel_handle(tracer_handle, cpu_id); if ((trace->tracer_started == 1) || (event_id == LTT_EV_START) || (event_id == LTT_EV_BUFFER_START)) goto trace_event; return -EBUSY; trace_event: if (!ltt_test_bit(event_id, &trace->traced_events)) return 0; Basically, unless we've succeeded in all those if's, we're not going to write anything. I think we could get rid of the first 4 ones by simply maintaining a state-machine for the tracer. Then we could either have a single if or even use function pointers (though I think this costs more) to call or not call _ltt_log_event. As for checking whether the event has a certain ID (EV_START or EV_BUFFER_STAR
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Hello Roman, Roman Zippel wrote: > On Sat, 15 Jan 2005, Karim Yaghmour wrote: >>In addition, and this is a very important issue, quite a few >>kernel developers mistook LTT for a kernel debugging tool, which >>it was never meant to be. When, in fact, if you ask those who have >>looked at using it for that purpose (try Marcelo or Andrea) you will >>see that they didn't find it to be appropriate for them. And >>rightly so, it was never meant for that purpose. Even lately, when >>I suggested Ingo try using relayfs instead of his custom tracing >>code for his preemption work, he looked at it and said that it >>wasn't suited, but would consider reusing parts of it if it were >>in the kernel. > > Well, that's really a core problem. We don't want to duplicate > infrastructure, which practically does the same. So if relayfs isn't > usable in this kind of situation, it really raises the question whether > relayfs is usable at all. We need to make relayfs generally usable, > otherwise it will join the fate of devfs. Hmm, coming from you I will take this is a pretty strong endorsement for what I was suggesting earlier: provide a basic buffering mode in relayfs to be used in kernel debugging. However, it must be understood that this is separate from the existing modes and ltt, for example, could not use such a basic infrastructure. If this is ok with you, and no one wants to complain too loudly about this, I will go ahead and add this to our to-do list for relayfs. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Hello Roman, Roman Zippel wrote: > It's interesting to read more about ltt's requirements, but I still think > it's possible to leave this work to the relayfs layer. Ok, I'm willing to play ball, but can you be a little bit more specific. > Why not just move the ltt buffer management into relayfs and provide a > small library, which extracts the event stream again? Otherwise you have > to duplicate this work for every serious relayfs user anyway. Ok, I've been meditating over what you say above for some time in order to understand how best to follow what you are suggesting. So here's what I've been able to come up with. Let me know if you have other suggestions: Drop the buffer-start/end callbacks altogether. Instead, allow user to specify in the channel properties whether they want to have sub-buffer delimiters. If so, relayfs would automatically prepend and append the structures currently written by ltt: /* Start of trace buffer information */ typedef struct _ltt_buffer_start { struct timeval time;/* Time stamp of this buffer */ u32 tsc;/* TSC of this buffer, if applicable */ u32 id; /* Unique buffer ID */ } LTT_PACKED_STRUCT ltt_buffer_start; /* End of trace buffer information */ typedef struct _ltt_buffer_end { struct timeval time;/* Time stamp of this buffer */ u32 tsc;/* TSC of this buffer, if applicable */ } LTT_PACKED_STRUCT ltt_buffer_end; This would also allow dropping the start_reserve, end_reserve, and channel_start_reserve. The latter can be added by ltt as its first event. Is this what you are looking for and is there something else we should be doing. > Completely abstracting the buffer management would the make whole > interface simpler and it would be a lot easier to change without breaking > everything. E.g. it would be possible to use per cpu buffers and remove > the need for different locking mechanisms, for a good tracing mechanism > it's not just important that it's lockless, but also that the cpus don't > share cache lines in the fast path. In this regard relayfs/ltt has really > still too much overhead and the complex relayfs API isn't really making it > easy to fix this. The per-cpu buffering issue is really specific to the client. It just so happens that LTT creates one channel for each CPU. Not everyone who needs to ship lots of data to user-space needs/wants one channel per cpu. You could, for example, use a relayfs channel as a big chunk of memory visible to both a user-space app and its kernel buddy in order to exchange data without ever using either needing more than one such channel for your entire subsystem. As for lockless vs. locking there is a need for both. Not having to get locks has obvious advantages, but if you require strict timing you will want to use the locking scheme because its logging time is linear (see Thomas' complaints about lockless elsewhere in this thread, and Ingo's complaints about relayfs somewhere back in October.) But in trying to make things simpler, here's a reworked API: rchan* relay_open(channel_path, mode, bufsize, nbufs); intrelay_close(*rchan); intrelay_reset(*rchan) intrelay_write(*rchan, *data_ptr, count, **wrote-pos); intrelay_info(*rchan, *channel_info) void relay_set_property(*rchan, property, value); void relay_get_property(*rchan, property, *value); For direct writing (currently already used by ltt, for example): char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting) void relay_commit(*rchan, *from, len, reserve_code, interrupting); These are the related macros: #define relay_write_direct(DEST, SRC, SIZE) \ #define relay_lock_channel(RCHAN, FLAGS) \ #define relay_unlock_channel(RCHAN, FLAGS) \ As I hinted elsewhere, we would now have three modes for relayfs channels: - locking => relies on local_irq_save. - lockless => relies on try_reserve/fail->retry (based on cmpxchg). - kdebug => this is for kernel debugging. The last one could be based on Ingo's tracing code, or any implementation suggestions by Thomas. It wouldn't do all the checks and provide all the capabilities of the other two mechanisms, but would really be a hot-path logger with only minimalistic provisions for content loss and other such things. (note to Tom: time_delta_offset that used to be in relay_write should be a property set using relay_set_property). What I'm dropping for now is all the functions that allow a subsystem to read from a channel from within the kernel. So, for example, if you want to obtain large amounts of data from user-space via a relayfs channel you won't be able to. Here are the functions that would go: rchan_reader *add_rchan_reader(channel_id, auto_consume) intremove_rchan_reader(rchan_reader *reader) rchan_reader *add_map_reader(channel_id) intremove_map_reader(rchan_reader *reader) intrelay_read(reader, buf, count, wait, *actual_read_offset) void relay_buffers_consumed
Re: Event tools, do they exist
Hellor George, As others have suggested, you can do what you are asking for using LTT (http://www.opersys.com/LTT). Specifically, you may want to use the event allocation capabilities. This will enable you to add your own events and view these as part of the trace. By the way, there are mailing lists for LTT if you're interested to make a contribution. Cheers, Karim george anzinger wrote: > > This is an attempt to look in the wheel locker. > > I need a simple event sub system for use in the kernel. I envision at > least two types of events: the history event and the timing event. > > The timing event would keep track of start/stop times by class. If, for > example, I wanted to know how much time the kernel spends doing the > recalc in schedule() I would put and event start in front of it and an > end at the other end. The sub system would note the first event time > and the cumulative time between all starts and stops on the same event. > When reported by /proc/ it would give the total event time, the elapsed > time and the % of processor time for each of the possibly several > classes. > > The history event would record each events time, location, data1, > data2. It would keep N of these (the last N) and report M (M= /proc/. This list should also be kept in a format that a simple > debugger can easily examine. > > Somebody must have written these routines and have them in their > library. Sure would help if I could have a peek. > > George > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to [EMAIL PROTECTED] > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- === Karim Yaghmour [EMAIL PROTECTED] Embedded and Real-Time Linux Expert === - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: read() on relayfs channel returns premature 0
Jan Engelhardt wrote: > Hm? Relayfs does not support a `cat /dev/relay/AChannelName` anymore? This was a requirement for it to be included. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Hello Christoph, Christoph Hellwig wrote: > Why would you want anything but read access? Fine, we can put it read-only, we'll drop the "mode" field. > I think random access is overkill. Keeping the code simple is more > important and user-space can post-process it. it's overkill if you're thinking in terms of kbs or mbs of data. it isn't if you're looking at gbs and 100gbs. please read my other posting as to who is using this and how. but regardless of access, you have to have some way of telling relayfs of the size of the channel you want. bufsize, nbufs just tell relayfs the size of the buffers you want and how many buffers there are in the ring. both of which are really basic to any sort of buffering scheme. > Auto-resizing sounds like a really bad idea. Ok, it will go. > And why can't you do this from that code? It just needs an initcall-like > thing that runs after mounting of relayfs. Ok, we'll leave it to the caller to do a relay_write() with his init-bufs at startup. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Christoph Hellwig wrote: > the lockless mode is really just loops around cmpxchg. It's spinlocks > reinvented poorly. I beg to differ. You have to use different spinlocks depending on where you are: - serving user-space - bh-derivatives - irq lockless is the same primitive regardless of your current state, it's not the same as spinlocks. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Hello Roman, Roman Zippel wrote: > It seems we first need to specify, what relayfs actually is supposed to > be. Is it a relaying mechanism for large amount of data from kernel to > user space or is it a general communication channel between kernel and > user space? You have to choose one, if you mix contradicting requirements, > you'll never get a simple abstraction layer and relayfs will always be a > pain to work with. I think we want to concentrate on the former, though I suspect the latter will happen eventually. But let's keep our focus on providing a mechanism for relaying large amounts of data from the kernel to user-space. > You can make it even simpler by dropping this completely. Every buffer is > simply a list of events and you can let ltt write periodically a timer > event. In userspace you can randomly seek at buffer boundaries and search > for the timer events. It will require a bit more work for userspace, but > even large amount of tracing data stays managable. We already do write a heartbeat event periodically to have readable traces in the case where the lower 32 bits of the TSC wrap-around. As I mentioned elsewhere, please don't think of this in terms of kbs or mbs of data. What we're talking about here is gbs if not 100gbs of data. Having to start reading each sub-buffer until you hit a heartbeat really is a killer for such large traces. If there was a significant impact on relayfs for having this I would have understood the argument, but relayfs needs to do buffer-management anyway, so I don't see that much complexity being added by allowing the channel user to ask relayfs for delimiters. > Userspace can then easily restore the original order of events. As above, restoring the original order of events is fine if you are looking at mbs or kbs of data. It's just totally unrealistic for the amounts of data we want to handle. But like I said earlier, the added relayfs mode (kdebug) would allow for exactly what you are suggesting: event_id = atomic_inc_return(&event_cnt); So here's the new API based on input from Christoph and Tom: rchan* relay_open(channel_path, bufsize, nbufs); intrelay_close(*rchan); intrelay_reset(*rchan) intrelay_write(*rchan, *data_ptr, count, **wrote-pos); intrelay_info(*rchan, *channel_info) void relay_set_property(*rchan, property, value); void relay_get_property(*rchan, property, *value); For direct writing (currently already used by ltt, for example): char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting) void relay_commit(*rchan, *from, len, reserve_code, interrupting); void relay_buffers_consumed(*rchan, u32) These are the related macros: #define relay_write_direct(DEST, SRC, SIZE) \ #define relay_lock_channel(RCHAN, FLAGS) \ #define relay_unlock_channel(RCHAN, FLAGS) \ What we are dropping for later review: read/write semantics from user-space. It has to be understood that we believe that this is a major drawback. For one thing, you won't be able to do something like: $ cat /relayfs/xchg/my-file > ~/test-data Instead, you will have to write a custom app that does open(), mmap(), write(). We could still provide a small app/library that did this automagically, but you've got to admit that nothing beats the real thing. Also note that there are people who currently use this already, so there will be some unhappy campers. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Thomas Gleixner wrote: > This implies to seperate > > - infrastructure > - event registration > - transport mechanism Like I said in my first response: we can't be everything for everbody, the requirements are just too broad. ISO tried it with OSI. Have a look at net/* for the result. Currently, LTT provides the first two in one piece, and relayfs provides the third. Like I acknowledged earlier, there is room for generalizing the transport mechanism, and I'm thinking of amending the relayfs API proposal further and rename the modes to make them more straight-forward: - Managed (locking or lockless.) - Ad-Hoc (which works like Ingo, yourself, and others have requested.) If you really want to define layers, then there are actually four layers: 1- hooking mechanism 2- event definition / registration 3- event management infrastructure 4- transport mechanism LTT currently does 1, 2 & 3. Clearly, as in the mail I refered to earlier, there is code in the kernel that already does 1, 2, 3, and 4 in very hardwired/ad-hoc fashion and there isn't anyone asking for them to remove it. We're offering 4 separately and are putting LTT on top of it. If you want to get 1 & 2 separately, have a look at kernel hooks and genevent: http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ http://www.listserv.shafik.org/pipermail/ltt-dev/2003-January/000408.html We'd gladly take a serious look at using the former if it was included, and there is work in progress being conducted on getting the latter being the standard way for declaring LTT events instead of using a static ltt-events.h. Five years ago, there was a discussion about integrating GKHI into the kernel (the kernel hooks ancestor). Have a look for yourself as to the response to this suggestion (basically people weren't ready to accept a generalized hooking mechanism without a defined set of hooks, and then others didn't like the idea at all because creating general hooks in the kernel which anybody can register to creates legal and maintenance problems ... basically it's a can of worms): http://marc.theaimsgroup.com/?l=linux-kernel&m=97371908916365&w=2 There's only so much we can push into the kernel in the same time. Not to mention that before you can be generic, you've got to have some specific implementation to start working off on. I believe that what we've ironed out through the discussion of the past two days is a good basis. There is some irony in all this. For years, we were told that we couldn't make it into the kernel because we were perceived as providing a kernel debugging tool, and now that we're starting to get our things seriously reviewed we're being told that maybe it ain't really that useful because those who want to do kernel debugging can't use it as-is ... go figure. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Thomas Gleixner wrote: > Which is every 1.42 seconds on a 3GHz machine. I guess we don't have > GB's of data when the 1.42 seconds elapse without an event. My argument was about being able to browse the amount of data I was refering to. The hearbeat thing was an asside to Roman as to the fact that we already do what he's suggesting. > I still don't see the point. The implicit ability of LTT to allow > tracing of up to 8192 bytes user data, strings and XML makes this > neccecary. I do not see any neccecarity to integrate this special usage > modes instead of an generic usable instrumentation implementation. I've already clarified your mischaracterization of custom events, you are being dissengenious here. If you want a generalized hooking mechanism, feel free to ask Andrew to take kernel hooks: http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ > If relayfs is giving those users the ability to do so then they can do > it, but I object the fact that LTT/relayfs is occupying the place of a > more generic implementation in the way it is implemeted now. Again, damned if we do, damned if don't. LTT isn't meant for kernel debugging per se, though you can use it to that end to a certain extent. However, if you are kernel debugging, you will find the ad-hoc mode I'm talking about adding to relayfs quite useful. > For normal event tracing you have about 32-64 byte of data per event. So > disabling interrupts in order to copy this amount of imformation into a > buffer is cheaper on most architectures than doing the whole magic in > LTT and relayfs. This also keeps your buffers consistent and does not > need any magic for postprocessing. Oh, now you want to lighten the weight on postprocessing? Common Thomas, please stop wasting my time. Note, however, that we are thinking of dropping the lockless scheme for now. We will pick up this discussion separately further down the road. > Sorting out disabled events in the hot path and moving the if > (pid/gid/grp) whatever stuff into userspace postprocessing is not an > alien request. It is. Have you even read what I suggested to change in my other mail: if ((any_filtering) && !(ltt_filter(event_id, event_struct, data))) return -EINVAL; You're not honestly telling me that checking for any_filtering is going to ruin your day. > You are talking of Gigabytes of data. In what time ? > > Let's do some math. > > For simplicity all events use 64 Byte event space. > > ~ 64kB/sec for 1000 events/s (event frequency 1kHz) ( 1 ms) > 1024kB/sec for 16 events/ms (event frequency 16kHz) (62 us) > 2048kB/sec for 32 events/ms (event frequency 32kHz) (31 us) > 4096kB/sec for 64 events/ms (event frequency 64kHz) (15 us) > 8192kB/sec for 128 events/ms (event frequency 128kHz) ( 8 us) > > where a 100Mbit network can theoretically transport 10240kB/sec and > practically does 4000-8000 kB/sec. > > An event frequency of 8us even on a 3 GHz machine is complete illusion, > because we spend already a couple of usecs in servicing the legacy 8254 > timer. > > So the realistic assumption on a 3Ghz machine is definitely below 64kHz, > which means we have to handle max. 4Mb of data per second. Actually, on a PII-350MHz, I was already generating 0.5MB/s of data just by running an X session. If we assume that a machine 10 times faster generates 10 times as many events, we've already got 5MB/s, and I'm sure that there are heavier cases than X. Here's the paper if you want to read it: http://www.opersys.com/ftp/pub/LTT/Documentation/ltt-usenix.ps.gz > I'm not impressed. Disabling interrupts for a couple of nano seconds to > store the trace data in the buffer does not hurt at all. Running through > a big bunch of out of cache line instructions does. Like I said above, fighting for/against lockless is not our immediate goal, and we will likely remove it. > If you try to trace more than this amount you are toast anyway. > > Please beware me of "reality has bitten" arguments. The whole if(..) > scenario in _ltt_event_log() is doing postprocessing, which can be done > in userspace. I don't care about the required time as long as it does > not introduce additional burden into the kernel. Not even Ingo hinted at getting rid of filtering. Remember the earlier e-mail I refered to? Here's what he was suggesting: > void trace(event, data1, data2, data3) > { > int cpu = smp_processor_id(); > int idx, pending, *curr = curr_idx + cpu; > struct trace_event *t; > unsigned long flags; > > if (!event_wanted(current, event, data1, data2, data3)) > return; > > local_irq_save(flags); > > idx = ++curr_idx[cpu] & (NR_TRACE_ENTRIES - 1); > pending = ++curr_pending[cpu]; > > t = trace_ring[cpu] + idx; > > t->event = event; > rdtscll(t->timestamp); > t->data1 = data1; > t->data2 = data2; > t->data3 = data3; > > if (curr_pending == TRACE_LOW_WATERMARK &
Re: 2.6.11-rc1-mm1
Thomas Gleixner wrote: > Sorting out disabled events is the filtering you have to do in kernel > and you should do it in the hot path or remove the unneccecary > tracepoints at compiletime. Do you actually read my replies or do you just grep for something you can object to? If you care to read my replies you will see that this has already been answered. > You are not answering my argument. 8MB/sec is an event frequency of > 128hz when we assume 64byte/event. It's one event every 8us. So every > unneccecary computation, every leaving the hotpath for nothing is just > giving you performance loss. I have, you just choose not to read. Here's what I said earlier: > Note, however, that we are thinking of dropping the lockless scheme > for now. We will pick up this discussion separately further down the > road. IOW, we will be using cli/sti. So there is no "leaving the hotpath". > I said: > >>>Sorting out disabled events in the hot path > > > s/Sorting/Filtering/ > > I never said this should not be done. You're either on crack or I don't know how to read english. Here's what you said: > Sorting out disabled events in the hot path and moving the if > (pid/gid/grp) whatever stuff into userspace postprocessing is not an > alien request. Clearly you are suggesting to moving the filtering into user-space. > Seperating layers as I suggested before is not making it a generic > debugging tool. It makes parts of those layers available for other usage > and gives us the chance to reuse the parts for cleaning up already > available code which has the same hardwired structure. This has already been answered. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Thomas Gleixner wrote: > Thats the point. Adding another hardwired implementation does not give > us a possibility to solve the hardwired problem of the already available > stuff. Well then, like I said before, you know what you need to do: http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/ Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Hello Roman, Roman Zippel wrote: > Periodically can also mean a buffer start call back from relayfs > (although that would mean the first entry is not guaranteed) or a > (per cpu) eventcnt from the subsystem. The amount of needed search would > be limited. The main point is from the relayfs POV the buffer structure > has always the same (simple) structure. But two e-mails ago, you told us to drop the start_reserve and end_reserve and move the details of the buffer management into relayfs and out of ltt? Either we have a callback, like you suggest, and then we need to reserve some space to make sure that the callback is guaranteed to have the first entry, or we drop the callback and provide an option to the user for relayfs to write this first entry for him. Providing a callback without reservation is no different than relying purely on the heartbeat, which, like I said before and for the reasons illustrated below, is unrealistic. > You have to be more specific, what's so special about this amount of data. > You likely want to (incrementally) build an index file, so you don't have > to repeat the searches, but even with your current format you would > benefit from such an index file. [snip] >>As above, restoring the original order of events is fine if you are >>looking at mbs or kbs of data. It's just totally unrealistic for >>the amounts of data we want to handle. > > > Why is it "totally unrealistic"? Ok, let's expand a little here on the amount of data. Say you're getting 2MB/s of data (which is not unrealistic on a loaded system.) That means that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour). In practice, users aren't necessarily interested in plowing through the entire 345GB, they just want to view a given portion of it. Now, if I follow what you are suggesting, I have to go through the entire 345GB to: a) create indexes, b) reorder events, and likely c) have to rewrite another 345GB of data. And I haven't yet discussed the kind of problems you would encounter in trying to reorder such a beast that contains, by definition, variable-sized events. For one thing, if event N+1 doesn't follow N, then you would be forced to browse forward until you actually found it before you could write a properly ordered trace. And it just takes a few processes that are interrupted and forced to sleep here and there to make this unusable. That's without the RAM or fs space required to store those index tables ... At 3 to 12 bytes per events, that's a lot of space for indexes ... If I keep things as they are with ordered events and delimiters on buffer boundaries, I can skip to any place within this 345GB and start processing from there. And that's for two days. If you're a sysadmin encountering a transient problem on a server, you may actually want more than that. >>But like I said earlier, the added relayfs mode (kdebug) would allow >>for exactly what you are suggesting: >> event_id = atomic_inc_return(&event_cnt); > > > Actually that would be already too much for low level kernel debugging. > Why do you want to put this into relayfs? I don't. I was just saying that with the adhoc mode, a relayfs client could use the code snippet you were suggesting. > What are the _specific_ reasons you need these various modes, why can't > you build any special requirements on top of a very light weight relay > mechanism? Because of the opposite requirements. Here are the two modes I'm suggesting in relayfs and how they operate: Managed: - Presumes active user-space daemon interested in catching _all_ events. - Allows N buffers in buffer ring - Provides limit-checking (callback on end of sub-buffer) - Provides buffer delimiters (writes timestamp at beg and end) - Suited for all types of event sizes (both fixed and variable) at very high frequency. - Daemon is woken up when buffer is ready for writing, executes a write() on an mmaped area and notifies relevant kernel subsystem, which in turn notifies relayfs that buffer can now be reused. - Relies on proper abstraction of cli/sti. Ad-Hoc: - Presumes transient userspace tool interested in event snapshots. - Single circular buffer. - No limits checking (or very basic: as in stop if overwrite). - No buffer delimiters. - Best suited for fixed-size events at extreme high frequency. - User-space tool simply does a write() on an mmaped area and exits or goes back to sleep. - Relies on proper abstraction of cli/sti. Basically, the ad-hoc modes abides by the principles of KISS, whereas the managed is a more elaborate for clients like LTT. Rhetorical: Couldn't the ad-hoc mode case be a special case of the managed mode? In theory yes, in practice no. The various conditionals and code paths for switching buffers, invoking callbacks, writing delimiters and the likes, which make this mode useful to client like LTT, will always be a problem for those seeking the shortest path to buffer comital. In the case of Ingo, for example, I'm sure he'd
Re: 2.6.11-rc1-mm1
Hello Chistoph, Christoph Hellwig wrote: > The thing I'm unhappy with is what the code does currently. I haven't > looked at the code enough nor through about the problem enough to tell > you what's the right thing to do. Knowing that will involve review of > the architecture and serious benchmarking on a few plattforms. Like I was saying elswhere, we are likely going to drop the lockless code for now (i.e. the code that does the cmpxchg). Instead we will depend on normal cli/sti abstractions. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Thomas Gleixner wrote: > I know, what I have said. I said reduce the filtering to the absolute > minimum and do the rest in userspace. You keep adopting the interpretation which best suits you, taking quotes out of context, and keep repeating things that have already been answered. There are limits to one's patience. What you did is change your position twice. It's there for anyone to see. > The now builtin filters are defined to fit somebodys needs or idea of > what the user should / wants to see. They will not fit everybodys > needs / ideas. So we start modifying, adding and #ifdefing kernel > filters, which is a scary vision. Ah, finally. Here's an actual suggestion. _IF_ you want, I'll just export a ltt_set_filter(*callback) and rewrite the if in _ltt_log_event() to: if ((ltt_filter != NULL) && !(Enabling and disabling events is a valid basic filter request, which > should live in the kernel. Anything else should go into userspace, IMO. What you are suggesting is that a system administator that wants to monitor his sendmail server over a period of three weeks should just postprocess 1.8TB (1MB/s) of data because Thomas Gleixner didn't like the idea of kernel event filtering based on anything but events. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Thomas Gleixner wrote: > If we add another hardwired implementation then we do not have said > benefits. Please stop handwaving. Folks like Andrew, Christoph, Zwane, Roman, and others actually made specific requests for changes in the code. What makes you think you're so special that you think you are entitled to stay on the side and handwave about concepts. If there is a limitation with the code, please present actual snippets that need to be changed and suggest alternatives. That's what everyone else does on this list. If you want to clean-up the existing tracing code in the kernel, then here are some ltt calls you may be interested in: int ltt_create_event(char *event_type, char *event_desc, int format_type, char *format_data); int ltt_log_raw_event(int event_id, int event_size, void *event_data); And here's an actual example: ... delta_id = ltt_create_event("Delta", NULL, CUSTOM_EVENT_FORMAT_TYPE_HEX, NULL); ... ltt_log_raw_event(delta_id, sizeof(a_delta_event), &a_delta_event); ... ltt_destroy_event(delta_id); You can then use LibLTT to read the trace and extract your custom events and format your binary data as it suits you. Save the bandwidth and start cleaning. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Hello Roman, Roman Zippel wrote: > An additional comment about the order of events. What you're doing in > lockless_reserve is bogus anyway. There is no single correct time to > write into the event. By artificially synchronizing event order and event > time you only cheat yourself. You either take it into account during > postprocessing that events can be interrupted or the time stamp doesn't > seem to be that important, but there is nothing you can do during the > recording of the event except of completely disabling interrupts. Correct and like I said before, we are dropping the lockless scheme. Ergo, disabling interrupts we will. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Thomas Gleixner wrote: > Provide a hook, export it and load your filters as a module, but keep > the filters out of the mainline kernel code. Great idea! I will do exactly that. Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Hello Roman, Roman Zippel wrote: > Why is so important that it's at the start of the buffer? What's wrong > with a special event _near_ the start of a buffer? [snip] > What gives you the idea, that you can't do this with what I proposed? > You can still seek freely within the data at buffer boundaries and you > only have to search a little into the buffer to find the delimiter. Events > are not completely at random, so that the little reordering can be done at > runtime. Sorry, but I don't get what kind of unsolvable problems you see > here. Actually I just checked the code and this is a non-issue. The callback can only be called when the condition is met, which itself happens only on buffer switch, which itself only happens when we try to reserve something bigger than what is left in the buffer. IOW, there is no need for reserving anything. Here's what the code does: if (!finalizing) { bytes_written = rchan->callbacks->buffer_start ... cur_write_pos(rchan) += bytes_written; } With that said, I hope we've agreed that we'll have a callback for letting relayfs clients know that they need to write the begining of the buffer event. There won't be any associated reserve. Conversly, I hope it is not too much to ask to have an end-of-buffer callback. > Wrong question. What compromises can be made on both sides to create a > common simple framework? Your unwillingness to compromise a little on the > ltt requirements really amazes me. Roman, of all people I've been more than happy to change my stuff following your recommendations. Do I have to list how far down relayfs has been stripped down? I mean, we got rid of the lockless scheme (which was one of ltt's explicit requirements), we got rid of the read/write capabilities for user-space, etc. And we are now only left with the bare-bones API: rchan* relay_open(channel_path, bufsize, nbufs, flags, *callbacks); intrelay_close(*rchan); intrelay_reset(*rchan); intrelay_write(*rchan, *data_ptr, count, **wrote-pos); char* relay_reserve(*rchan, len, *ts, *td, *err, *interrupting); void relay_commit(*rchan, *from, len, reserve_code, interrupting); void relay_buffers_consumed(*rchan, u32); #define relay_write_direct(DEST, SRC, SIZE) \ #define relay_lock_channel(RCHAN, FLAGS) \ #define relay_unlock_channel(RCHAN, FLAGS) \ This is a far-cry from what we had before, have a look at the relayfs.txt file in 2.6.11-rc1-mm1's Documentation/filesystems if you want to compare. Please at least acknowledge as much. I'm more than willing to compromise, but at least give me something substantive to feed on. I've explained why I believe there needs to be two modes for relayfs. If you don't think they are appropriate, then please explain why. Either my experience blinds me or it rightly compels me to continue defending it. You ask what compromises can be found from both sides to obtain a single implementation. I have looked at this, and given how stripped down it has become, anything less from relayfs will make it useless for LTT. IOW, I would have to reimplement a buffering scheme within LTT outside of relayfs. Can't you see that not all buffering schemes are adapted to all applications and that it's preferable to have a single API transparently providing separate mechanisms instead of a single mechanism that doesn't satisfy any of its users? If I can't convince you of the concept, can I at least convince you to withhold your final judgement until you actually see the code for the managed vs. ad-hoc schemes? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Aaron Cohen wrote: > I've got a quick question and I just want to be clear that it > doesn't have a political agenda behind it. :) > Here goes, why can't LTT and/or relayfs, work similar to the way > syslog does and just fill a buffer (aka ring-buffer or whatever is > appropriate), while a userspace daemon of some kind periodically reads > that buffer and massages it. I'm probably being naive but if the > difficulty is with huge several hundred-gig files, the daemon if it > monitors the buffer often enough could stuff it into a database or > whatever high-performance format you need. Because of the bandwidth it is not possible to do any sort of live processing of any kind. The only thing the daemon can possibly do is write large blocks of tracing info to disk as rapidly as possible. > It also seems to me that Linus' nascent "splice and tee" work would > be really useful for something like this to avoid a lot of unnecessary > copying by the userspace daemon. There is no copying by the userspace daemon. All it does is open(), then mmap(), and then it sleeps until it is woken up by the ltt kernel subsystem. When that happens, it only does a write() on the mmaped area, tells the ltt subsystem that it commited X number of sub-buffers and goes back asleep. This is all zero-copy. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Thomas, Thomas Gleixner wrote: > Yes, I did already start cleaning > > cat ../broken-out/ltt* | patch -p1 -R :D If it gives you a warm and fuzzy feeling to have the last cheap-shot, then I'm all for it, it is of no consequence anyway. And _please_ don't forget to answer this very email with something of the same substance. For my part I consider that I've invested a substantial amount of time in responding to both your conceptual and practical feedback, as the archives clearly show. That being said, I have to thank you for making sure that all the obvious questions have been asked. I now have more than a dozen archive links of my answers to those. I'll sure come in handy when writing an FAQ. Thanks again, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Tom Zanussi wrote: > I have to disagree. Awhile back, if you remember, I posted a patch to > the LTT daemon that would monitor the trace stream in real time, and > process it using an embedded Perl interpreter, no less: > > http://marc.theaimsgroup.com/?l=linux-kernel&m=109405724500237&w=2 > > It didn't seem to have any problems keeping up with the trace stream > even though it was monitoring all LTT event types (and a couple of > others - custom events injected using kprobes) and not doing any > filtering in the kernel, through kernel compiles, normal X traffic, > etc. I don't know what volume of event traffic would cause this model > to break down, but I think it shows that at least some level of > non-trivial live processing is possible... Good Point. My bad. Thanks for bringing this up. Obviously this didn't get as much attention as it should've had the last time it was posted, especially as it allows very easy scripting of filtering in userspace. That email you refer to is pretty loaded and I'm sure those who are interested will dig through it. But in the interest of helping everyone get a rapid understanding of what it does and how it does it, can you break it down in to a short description, possibly with a diagram? I'm sure many will find this very interesting. Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Werner Almesberger wrote: >>From all I've heard and seen of LTT (and I have to admit that most > of it comes from reading this thread, not from reading the code), Might I add that this is part of the problem ... No personal offence intended, but there's been _A LOT_ of things said about LTT that were based on third-hand account and no direct contact with the toolset/code. And part of the problem is that _many_ people on this list, and elsewhere, have done some form of tracing or another as part of their development, so they all have their idea of how this is best done. Yet, while such experience can help provide additional ideas to LTT's development, it also often requires re-explaining to every new suggestor why we added features he couldn't imagine would be useful to any of his/her own tracing needs ... Sometimes I wish my interests lied in some arcane feature that few had ever played with ;) IOW, while I don't discount anybody else's experience with tracing, please give us at least the benefit of the doubt by actually: a) Looking at the code b) Looking at the mailing list archives c) Asking us questions directly related to the code > I have the impression that it may try to be a bit too specialized, > and thus might miss opportunities for synergy. Bare with me on this one ... > You must be getting tired of people trying to redesign things from > scratch, but maybe you'll humor me anyway ;-) Hey, from you Werner I'll take anything. It's always a pleasure talking with you :) > Karim Yaghmour wrote: > >>If you really want to define layers, then there are actually four >>layers: >>1- hooking mechanism >>2- event definition / registration >>3- event management infrastructure >>4- transport mechanism > > > For 1, kprobes would seem largely sufficient. In cases where you > don't have a usable attachment point (e.g. in the middle of a > function and you need access to variables with unknown location), > you can add lightweight instrumentation that arranges the code > flow suitably. [1, 2] Let me say outright, as I said to Andi early on in the sister thread, that I have no problems with having the trace points being fed by kprobes. In fact, in 2000, way back before kprobes even existed, LTT was already interfacing with DProbes for dynamic insertion of trace points. ... There I said it ... now watch me have to repeat this yet again later on ... :/ However, kprobes is not magic: a) Like I said to Andi: > As far as kprobes go, then you still need to have some form or another > of marking the code for key events, unless you keep maintaining a set > of kprobes-able points separately, which really makes it unusable for > the rest of us, as the users of LTT have discovered over time (having > to create a new patch for every new kernel that comes out.) b) Like I said to Andrew back in July: > I've double-checked what I already knew about kprobes and have looked again > at the site and the patch, and unless there's some feature of kprobes I don't > know about that allows using something else than the debug interrupt to add > hooks, ... > Generating new interrupts is simply unacceptable for LTT's functionality. > Not to mention that it breaks LTT because tracing something will generate > events of its own, which will generating tracing events of their own ... > recursion. Ok, you can argue about the recursion thing with an "if()", but you'll have to admit that like in the case I described to Roman: > ... Say you're getting > 2MB/s of data (which is not unrealistic on a loaded system.) That means > that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour). IOW, something like 200,000events/s (average of 10bytes/event). Do I really need to explain that 200,000 traps/interrupts per second is not something you want ... ? But don't despair, like I said to Andi: > So lately I've been thinking that there may be a middle-ground here > where everyone could be happy. Define three states for the hooks: > disabled, static, marker. The third one just adds some info into > System.map for allowing the automation of the insertion of kprobes > hooks (though you would still need the debugging info to find the > values of the variables that you want to log.) Hence, you get to > choose which type of poison you prefer. For my part, I think the > noop/early-check should be sufficient to get better performance from > the existing hook-set. I have received very little feedback on this suggestion, though I really think it's worth entertaining, especially with your mention of uml-sim markers further below. As for the location of ltt trace points, then they are very rarely at function boundaries. Here's a classic: prepare_arch_
Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)
Werner Almesberger wrote: > - if the probe target is an instruction long enough, replace it with >a jump or call (that's what I think the kprobes folks are working >on. I remember for sure that they were thinking about it.) I heard about this years ago, but I don't know that anything came of it. I suspect that this is not as simple as it looks and that the only reliable way to do it is with a trap. > Probably because everybody saw that it was good :-) Great, thanks. That's what we'll aim for then. We've already got the "disable" and "static" implemented, so now we need to figure out how do we best implement this tagging. IBM's kernel hooks allowed the NOP solution, so I'm guessing it shouldn't be that much of a stretch to extend it for marking up the code for kprobes and friends. I don't know whether this code is still maintained or not, but I'd like to hear input as to whether this is a good basis, or whether you're thinking of something like your uml-sim hooks? > So you need seeking, even in the presence of fine-grained control > over what gets traced in the first place ? (As opposed to extracting > the interesting data from the full trace, given that the latter > shouldn't contain too much noise.) The problem is that you don't necessarily know beforehand what's the problem. So here's an actual example: I had a client who had this box on which a task was always getting picked up by the OOM killer. Try as they might, the development team couldn't figure out which part of the code was causing this. So we put LTT in there and in less than 5 minutes we found the problem. It turned out that a user-space access to a memory-mapped FPGA caused an unexpected FP interrupt to occur, and the application found itself in a recursive signal handler. In this case there was an application symptom, but it was a hardware problem. This is just a simple example, but there are plenty of other examples where a sysadmin will be experiencing some weird hard to reproduce bugs on some of his systems and he'll spend a considerable amount of time trying to guess what's happening. This is especially complicated when there's no indication as to what's the root of the problem. So at that point being able to log everything and being able to rapidely browse through it is critical. Once you've done such a first trace you _may_ _possibly_ be able to refine your search requirements and relog with that in mind, but that's after the fact. > Or that they have been consumed. My question is just whether this > kind of aggregation is something you need. Absolutely. If you're thinking about short 100kb or MBs traces, then a simpler scheme would be possible. But when we're talking about GB and 100GBs spaning days, there's got to be a managed way of doing it. >>I have nothing against kprobes. People keep refering to it as if >>it magically made all the related problems go away, and it doesn't. > > > Yes, I know just too well :-) In umlsim, I have pretty much the > same problems, and the solutions aren't always nice. So far, I've > been lucky enough that I could almost always find a suitable > function entry to abuse. Glad you acknowledge as much. > However, since a kprobes-based mechanism is - in the worst case, > i.e. when needing markup - as good as direct calls to LTT, and gives > you a lot more flexibility if things aren't quite as hostile, I > think it makes sense to focus on such a solution. You certainly have a lot more experience than I do with that, so I'd like to solicit your help. As above: what's the best way to provide this in addition to the static and disable points? > Yup, but you could move even more intelligence outside the kernel. > All you really need in the kernel is a place to put the probe, > plus some debugging information to tell you where you find the > data (the latter possibly combined with gently coercing the > compiler to put it at some accessible place). Right, but then you end up with a mechanism with generalized hooks. Actually there was a time when LTT was a driver and you could either build it as a module or keep it built-in. However, when we published patches to get LTT accepted in 2.5 we were told on LKML to move LTT into kernel/ and avoid all this driver stuff. Having it, or parts of it, in the kernel makes it much simpler and much more likely that the existing ad-hoc tracing code spreading accross the sources be removed in exchange for a single agreed upon way of doing things. It must be said that like I had done with relayfs, the LTT patch will go through a major redux and I will post the patches for review like before on LKML. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.htm
Re: [RFC] tracepipe -- event streams, debugfs, and pipe_buffers
Zach Brown wrote: > Thoughts? I, for one, am tired of writing throw-away per-cpu tracing > patches ;) Have you taken a look at relayfs and ltt? Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux for 2.6.10: lean and mean
Greg KH wrote: > Hm, how about this idea for cutting about 500 more lines from the code: > > Why not drop the "fs" part of relayfs and just make the code a set of > struct file_operations. That way you could have "relayfs-like" files in > any ram based file system that is being used. Then, a user could use > these fops and assorted interface to create debugfs or even procfs files > using this type of interface. > > As relayfs really is almost the same (conceptually wise) as debugfs as > far as concept of what kinds of files will be in there (nothing anyone > would ever rely on for normal operations, but for debugging only) this > keeps users and developers from having to spread their debugging and > instrumenting files from accross two different file systems. However this assumes that the users of relayfs are not going to want it during normal system operation. This is an assumption that fails with at least LTT as it is targeted at sysadmins, application developers and power users who need to be able to trace their systems at any time. I don't mind piggy-backing off another fs, if it makes sense, but unlike debugfs, relayfs is meant for general use, and all files in there are of the same type: relay channels for dumping huge amounts of data to user-space. It seems to me the target audience and basic idea (relay channels only in the fs) are different, but let me know if there's a compeling argument for doing this in another way without making it too confusing for users of those special "files" (IOW, when this starts being used in distros, it'll be more straightforward for users to understand if all files in a mounted fs behave a certain way than if they have certain "odd" files in certain directories, even if it's /proc.) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] tracepipe -- event streams, debugfs, and pipe_buffers
Zach Brown wrote: > Only briefly. They've always seemed more involved than the sort of > thing I was after. I'll try and sit down and investigate in more detail. There's definitely an opportunity for interfacing here. If nothing else, this clearly shows the interest for the kind of things both relayfs and ltt attempt to achieve. So here are a few comments regading the implementation and how this relates to the stuff I'm working on. > While it's running the kernel subsystem can send binary blobs, less than > the length of a page, down this channel. The blobs are copied into > per-cpu lists of pages. Cutesy little headers with get_cycles() and the > cpu id are prepended to each blob. The traces are only recorded if user > space has open references to the file. In the case of LTT, we just open one relay channel per cpu. This avoids having to write the CPUID to the trace, that's 2 bytes less per event, and also avoids any need for synchronization. As for get_cycles(), some architectures don't have anything useful to give. Here's for ARM (include/asm-arm/timex.h): static inline cycles_t get_cycles (void) { return 0; } In the case of LTT, we just use the, albeit expensive, do_gettimeofday when hardware counters aren't there (currently all non-x86 tracing does this, but this should be fixed.) Also, in the case of the x86 at least, we just write the lower 32-bits of the TSC, so that's 4 bytes less per event. Instead, we use the buffer_start and buffer_end callbacks provided by relayfs to write a header and footer containing full do_gettimeofday value and TSC value. > As the pages fill they're kicked off to a work_struct worker who puts > them in the bufs[] array in the debugfs pipe file. Userspace can then > do whatever it wants with the data via the pipe. One can imagine it > wanting to splice() these pages to disk in huge batches, or perhaps some > zero-copy network card, etc. I've only tested this so far as verifying > that 'cat' is able to push data into a regular file. It seems to me that while this is a nice use of pipes, it isn't as fast as ram-locked pages. Basically relayfs does the bttv driver magic (or what used to be done in there, I haven't checked what they do lately.) Basically, we allocate pages, lock them into ram and remap them for use as a single memory area. No caching necessary. It goes from the buffer to whatever media you want (disk, network, etc.) IOW, user-space does a open(), mmap(), write(). Also, the channels exist whether user-space has done an open or not. That's good for flight-recording. Looking at the code: - tracepipe_event() does a get_cpu()/put_cpu() for protecting the writing to the buffer. What about tracing within an interrupt? local_irq_save()? - I hadn't thought of doing something like this to write the header: + hdr = tcpu->next_region; + hdr->cycles = get_cycles(); + hdr->cpu = cpu; I will replace some of the memcpy() code in LTT with something like this. - From what I assume is a "whishlist": + * - actually communicate missed to userspace Already done in LTT. + * - how to specify wrapping or dropping relayfs provides RELAY_MODE_CONTINUOUS and RELAY_MODE_NO_OVERWRITE. + * - non-temporal stores into bufs The latest relayfs code doesn't care about timestamps. It's its clients job to do that (ex. ltt). + * - let caller reserve space and get a pointer into buf This is the relevant relayfs function: char* relay_reserve(struct rchan *rchan, u32 len, int *err, int *interrupting) Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
OK, I finally come around to answering this ... Roman Zippel wrote: > Sorry, you missunderstood me. At the moment I'm only secondarily > interested in the API details, primarily I want to work out the details of > what exactly relayfs/ltt are supposed to do. One main question here I > can't answer yet, why you insist on multiple relayfs modes. I should have avoided earlier confusing the use of a certain type of relayfs channel for a given purpose (i.e. LTT should not necessarily depend on the managed mode.) I believe that there is a need for more than one mode in relayfs independently of LTT. There are users who want to be able to manage the data in a buffer (by manage I mean: receive notification of important buffer events, be able to insert important data at boundaries, etc.), and there are users who just want to dump as much information as possible in as fast a way as possible without having to deal with non-essential codepaths. > This is what I basically have in mind for the relay_write function: > > cpu = get_cpu(); > buffer = relay_get_buffer(chan, cpu); > while(1) { > offset = local_add_return(buffer->offset, length); > if (likely(offset + length <= buffer->size)) > break; > buffer = relay_switch_buffer(chan, buffer, offset); > } > memcpy(buffer->data + offset, data, length); > put_cpu(); looking at this code: 1) get_cpu() and put_cpu() won't do. You need to outright disable interrupts because you may be called from an interrupt handler. 2) You assume that relayfs creates one buffer per cpu for each channel. We think this is wrong. Relayfs should not need to care about the number of CPUs, it's the clients' responsibility to create as many channels as they see fit, whether it be one channel per CPU or 10 channels per CPU or 1 channel per interrupt, etc. 3) I'm unclear about the need for local_add_return(), why not just: if (likely(buffer->offset + length <= buffer->size) In any case, here's what we do in relay_write(): write_pos = relay_reserve(rchan, count, &reserve_code, &interrupting); If there's any buffer switching required, that will be done in relay_reserve. This has the added advantage that clients that want to write directly to the buffer without using relay_write() can do so by calling relay_reserve() and not care about required buffer switching. 4) After securing the area, you simply go ahead and do a memcpy() and leave. We think that this is insufficient. Here's what we do: if (likely(write_pos != NULL)) { relay_write_direct(write_pos, data_ptr, count); relay_commit(rchan, write_pos, count, reserve_code, interrupting); *wrote_pos = write_pos; the relay_write_direct() is basically an memcpy(). We also do a relay_commit(). This actually effects the delivery of the event. If, for example, there had been a buffer switch at the previous relay_reserve(), then this call to relay_commit() will generate a call to the client's deliver() callback function. In the case of LTT, for example, this is how it knows that it's got to notify the user-space daemon that there are buffers to consume (i.e. write to disk.) > ltt_log_event should only be a few lines more (for writing header and > event data). Actually no, you don't want ltt_log_event using relay_write(), for one thing because is can generate variable size events. Instead, ltt_log_event does (basically): data_size = sizeof(event_id) + sizeof(time_delta) + sizeof(data_size); relay_lock_channel(); relay_reserve(); relay_write_direct(&event_id, sizeof(event_id)); relay_write_direct(&time_delta, sizeof(event_id)); if (var_data) { relay_write_direct(var_data, var_data_len); data_size += var_data_len; } relay_write_direct(&data_size, sizeof(data_size)); relay_commit(); relay_unlock_channel(); > What I'd like to know now are the reasons why you need more than this. I hope the above explanation clarifies things. > It's not the amount of data and any timing requirements have to be done by > the caller. During processing you either take the events in the order they > were recorded (often that's good enough) or you sort them which is not > that difficult. Ordering is a non-issue to be honest. Unless you've got some hardware scope in there, it's almost impossible to pinpoint exactly when an event occurred. There is no single line of code where an event occurs, so it's all an educated guess anyway. You want things to resemble what really happened in as much as possible though. > I know you don't want to touch the topic of kernel debugging, but its > requirements greatly overlap with what you want to do with ltt, e.g. one > needs very often information about scheduling events as many kernel > processes rely more and more on kernel threads. The only real
Re: 2.6.11-rc1-mm1
Hello Roman, Roman Zippel wrote: > Well, let's concentrate for a moment on the last thing and check later > if and how they fit into relayfs. Since ltt will be first main user, let's > optimize it for this. > Also since relayfs is intended for large, fast data transfers, per cpu > buffers are pretty much always required, so it would make sense to leave > this to relayfs (less to get wrong for the client). But how does relayfs organize the namespace then? What if I have multiple channels per CPU, each for a different type of data, will all channels for the same CPU be under the same directory or will each type of data have its own directory with one entry per CPU? I don't have an answer to that, and I don't know that we should. Why not just leave it to the client to organize his data as he wishes. If we must assume that everyone will have at least one channel per CPU, then why not provide helper functions built on top of very basic functions instead of fixing the namespace in stone? > I have to modify it a little (only the if (!buffer) part is new): > > cpu = get_cpu(); > buffer = relay_get_buffer(chan, cpu); > while(1) { > offset = local_add_return(buffer->offset, length); > if (likely(offset + length <= buffer->size)) > break; > buffer = relay_switch_buffer(chan, buffer, offset); > if (!buffer) { > put_cpu(); > return; > } > } > memcpy(buffer->data + offset, data, length); > put_cpu(); > > This has a very short fast path and I need very good reasons to change/add > anything here. OTOH the slow path with relay_switch_buffer() is less > critical and still leaves a lot of flexibility. This is not good for any client that doesn't know beforehand the exact size of their data units, as in the case of LTT. If LTT has to use this code that means we are going to loose performance because we will need to fill an intermediate data structure which will only be used for relay_write(). Instead of zero-copy, we would have an extra unnecessary copy. There has got to be a way for clients to directly reserve and write as they wish. Even Zach Brown recognized this in his tracepipe proposal, here's from his patch: + * - let caller reserve space and get a pointer into buf >>1) get_cpu() and put_cpu() won't do. You need to outright disable >>interrupts because you may be called from an interrupt handler. > > > Look closer, it's already interrupt safe, the synchronization for the > buffer switch is left to relay_switch_buffer(). Sorry, I'm still missing something. What exactly does local_add_return() do? I assume this code has got to be interrupt safe? Something like: #define local_add_return(OFFSET, LEN) \ do {\ ... local_irq_save(); \ OFFSET += LEN; local_irq_restore(); \ ... } while(0); I'm assuming local_irq_XXX because we were told by quite a few people in the related thread to avoid atomic ops because they are more expensive on most CPUs than cli/sti. Also how does relay_get_buffer() operate? What if I'm writing an event from within a system call and I'm about to switch buffers and get an interrupt at the if(likely(...))? Isn't relay_get_buffer() going to return the same pointer as the one obtained for the syscall, and aren't both cases now going to effect relay_switch_buffer(), one of which will be superfluous? > This adds a conditional and is not really needed. Above shows how to make > it interrupt safe and if the clients wants to reuse the same buffer, leave > the locking to the client. Fine, but how is the client going to be able to reuse the same buffer if relayfs always assumes per-CPU buffer as you said above? This would be solved if at its core relayfs' functions worked on single channels and additional code provided helpers for making the SMP case very simple. > That's quite a lot of code with at least 14 conditions (or 13 conditions > too much) and this is just relayfs. I believe Tom has refactored the code with your comments in mind, and has something ready for review. I just want to clear up the above before we make this final. Among other things, he just dropped all modes, and there's only a basic relay_write() that closely resembles what you have above. > That's not always true, where perfomance matters we provide different > functions (e.g. spinlocks), so having an alternative version of > relay_write is a possibility (although I'd like to see the user first). Sure, see above in the case of LTT. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http:
Re: 2.6.11-rc1-mm1
Karim Yaghmour wrote: > This is not good for any client that doesn't know beforehand the exact > size of their data units, as in the case of LTT. If LTT has to use this > code that means we are going to loose performance because we will need to > fill an intermediate data structure which will only be used for relay_write(). > Instead of zero-copy, we would have an extra unnecessary copy. There has > got to be a way for clients to directly reserve and write as they wish. > Even Zach Brown recognized this in his tracepipe proposal, here's from > his patch: > + * - let caller reserve space and get a pointer into buf Actually, come to think of it, this code is not good for any client that needs to fill complex data structures, whether they be fixed-size or not, because it requires having a prepackaged structure already available. Any client that wants to have zero-copying will want to write data directly into the buffer instead of filling an intermediate buffer first. And this requires being able to atomically reserve. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] relayfs redux for 2.6.10: lean and mean
Greg KH wrote: > Are they willing to trade off the performance of LTT to get this? I > thought this was being touted as a "when you need to test" type of > thing, not a "run it all the time" type of feature. The problem is that you never know beforehand when you're going to get that weird glitch on your server, or how much time you're going to need to reproduce it. People who manage thousands of servers will want to be able to fire this off at will without having to reboot/recompile their kernel. What has to be done is make the cost of the tracing infrastructure as minimal as possible when it is indeed built into the kernel (of course if it's disabled it should cost the same thing as if it wasn't there to boot: nothing.) This, though, is a separate topic which is being addressed in other threads. Have a look at Werner's resent postings if you're interested on the "[RFC] instrumentation" thread. > And a driver will never want to have both a relay channel, and a simple > debug output at the same time? You are now requiring them to look for > that data in two different points in the fs. [snip] > So, since you are proposing that relayfs be mounted all the time, where > do you want to mount it at? I had to provide a "standard" location for > debugfs for people to be happy with it, and the same issue comes up > here. > > Also, why not export your relayfs ops so that someone useing debugfs can > create a relay channel in it, or in any other type of fs they might > create? Ok, there are a couple of things in there: - First I don't object to having the relayfs ops being exported so that they could be used in conjunction with other filesystems, in addition to having relayfs live as an independent fs. So as in the case above, we should be able to accomodate the device driver writer who wants to have all his files in the same fs. However, for the first case relayfs was built for, I think there is merit for having it live as a separate fs. Is this a good compromise for you? - As for where relayfs should be mounted, then this is a very good question. We've taken to the habit of having a /relayfs. If this is too problematic, I don't see any problem with /mnt/relayfs also. In either case, I have to admit frankly that I'm not familiar with the exact formal rules for introducing something like this. Of course I'm aware of the FHS and LSB, but let me know what you think is the best way to proceed here. Thanks, Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Karim Yaghmour wrote: > This is not good for any client that doesn't know beforehand the exact > size of their data units, as in the case of LTT. If LTT has to use this > code that means we are going to loose performance because we will need to > fill an intermediate data structure which will only be used for relay_write(). > Instead of zero-copy, we would have an extra unnecessary copy. There has > got to be a way for clients to directly reserve and write as they wish. > Even Zach Brown recognized this in his tracepipe proposal, here's from > his patch: > + * - let caller reserve space and get a pointer into buf Also, if the reserve is exported, then a client that chooses so, can do something like: local_irq_save(); relay_reserve(); write(); write(); write(); ... local_irq_restore(); And therefore enforce in-order events is he so chooses. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.11-rc1-mm1
Roman Zippel wrote: > Ok, great. > BTW I don't really expect the first version to be fully optimized (unless > you want to :) ), but once the basics are right, that can still be added > later. Agreed. Tom will post updated patches sometime this week. I'll follow up with the LTT stuff separately as agreed. Karim -- Author, Speaker, Developer, Consultant Pushing Embedded and Real-Time Linux Systems Beyond the Limits http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/05] Linux Kernel Markers, non optimized architectures
- KRYPTIVA PACKAGED MESSAGE - PACKAGING TYPE: SIGNED Hello Mathieu, Mathieu Desnoyers wrote: > Yes, that was indeed the first way I implemented it, as a "disable" option. One of the main thing we have to figure out before I modify this is if we want to have the generic version of markers available in a "forced" manner at the marker site with the GEN_MARK macro instead of the MARK macro (this is the actual implementation). It has proven to be useful to instrument lockdep.c irq > enable/disable tracing functions. The reason why is because they are called just before the trap handler returns and I need it to do XMC on x86 and x86_64. It would therefore cause a recursive trap. > > I think it makes sense to have this kind of support for hard-to-instrument sites within the marker infrastructure, but the cost is to have two marker flavors : MARK and GEN_MARK (but really GEN_MARK is only intended for a few sites). I must admit that I'm unsure about the use of different marker macros. How about bitwise flags that could be coded as part of the marker at the marker site? Something like "MARKER_TYPE_FORCED". This would still allow some form of toplevel control at the macro definition. Otherwise there's some digging to be done on a per-marker basis ... Karim - KRYPTIVA SIGNED MESSAGE - This email claims to have been packaged by Kryptiva. To process this email and authenticate its origin, get the free plugin from: http://www.kryptiva.com/downloads - KRYPTIVA SIGNATURE START - AvWVqAIBTiACAQC3AQAIAgECABTXxT4xHdR4/1uU1hL2 +TaPrqNB0wMAFNa8GHXZWJH5Dz+D76vfh6JhvWLvBAAUpuIZcCAkCC+ldyaBuoAWxK50HiQF ABRI38gc/foDHQsS6X3W0VP4xTukBwYAFB0lithGcxNZYBHaLDONjp6eo/LoBwAU6OwGS0m1 IVdBt6tKzhaPW8MOfncRABgAAABOIEXcozcACATMABkTAAQAggQA mHAJeFbYUzxSX+zkI0DtoVKcqqSp2Ztc9GtY7ZtuLBmeqg5pW0rIbkhutQiztTXlJQ0Ye9bV yzEVWd/m7GhDAgRBmyg3kCOt7g7potr1l5J3X5K8TiqtWXbNo3k6AHRlGZyn0190iIBSvf85 nVh3hKiNPsw8DYs1NKb+KMON+4g= - KRYPTIVA SIGNATURE END - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/05] Linux Kernel Markers, non optimized architectures
- KRYPTIVA PACKAGED MESSAGE - PACKAGING TYPE: SIGNED Mathieu Desnoyers wrote: > The problem with your proposal, I guess, is that people will have to add a supplementary parameter to the macro. > > It is not uncommon to have two slightly versions of macros/functions in the kernel (preempt_enable()/preempt_enable_no_resched(), or macros starting with underscores). Normally, the underscore states that the macro does not do the proper locking itself (this is not our case). Therefore, I would suggest using a name that suggests against what the macro is protected. For instance, a marker > pointing to the generic version is only needed to protect against the debug trap handler and should only be used on x86 and x86_64. I can see your point, to a degree. The difference here is that the variants you mention are actually macros that do something, they aren't stubs for code. IOW, you actually know what's happening underneath a foo() vs. _foo() by its name only. Maybe this applies the same to markers, I don't know. But maybe we want to make it easy for those looking at markers that there's a master kill switch somewhere that all markers go through and through which they can all be disabled very simply (say by using a "#if 0"). While different names *may* be doing that, a same name *does* that. But I don't feel too strongly either way, it's really up to those who maintaining the code to say. Karim - KRYPTIVA SIGNED MESSAGE - This email claims to have been packaged by Kryptiva. To process this email and authenticate its origin, get the free plugin from: http://www.kryptiva.com/downloads - KRYPTIVA SIGNATURE START - AvWVqAIBTiACAQC3AQAIAgECABTXxT4xHdR4/1uU1hL2 +TaPrqNB0wMAFNa8GHXZWJH5Dz+D76vfh6JhvWLvBAAUpuIZcCAkCC+ldyaBuoAWxK50HiQF ABRI38gc/foDHQsS6X3W0VP4xTukBwYAFB0lithGcxNZYBHaLDONjp6eo/LoBwAUpXC6F2jf nElq3fnZQpGW97Fk/2QRABgAAABOIEXcvqAADJ5wAB4TAAQAggP/ RQ/W0H9H9bhrZyC67an//DbWC4D38PgLoeMG6Tjvx7jWTpEh79DeQ/+sbb9aYZvbwYwtaVaJ VuPEiRnPZX0mqnOFm+GDzE9jB6202lR0Nzczh1WCifbrrXI7CSEjOwI3ve0jcCoGxTEzZRYj LGxuubV8Hh5HU12zi3Mxgdz031Y= - KRYPTIVA SIGNATURE END - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Re: [PATCH 05/05] Linux Kernel Markers, non optimized architectures
- KRYPTIVA PACKAGED MESSAGE - PACKAGING TYPE: SIGNED Mathieu Desnoyers wrote: > The main goal of this config option is for embedded systems which doesn't support live code modification. Maybe we can put that under "embedded sytems" menu ? Not sure whether you had had other feedback on this elsewhere in the rest of the thread, but yes, this would make sense if the "embedded" angle is the only reason we need this (and not, say, performance, etc.) Also, having done that, maybe it would make some sense to have it be a "disable" rather than "enable": CONFIG_MARKERS_DISABLE_OPTIMIZATION? Karim - KRYPTIVA SIGNED MESSAGE - This email claims to have been packaged by Kryptiva. To process this email and authenticate its origin, get the free plugin from: http://www.kryptiva.com/downloads - KRYPTIVA SIGNATURE START - AvWVqAIBTiACAQC3AQAIAgECABTXxT4xHdR4/1uU1hL2 +TaPrqNB0wMAFNa8GHXZWJH5Dz+D76vfh6JhvWLvBAAUpuIZcCAkCC+ldyaBuoAWxK50HiQF ABRI38gc/foDHQsS6X3W0VP4xTukBwYAFDzvzh+u6zVtolglAZrnE7FOmtZDBwAUTxyTas6N WLapdnSnAwVHeC06/ioRABgAAABOIEXWD8AACTdnAN8TAAQAggP+ K8Gk1SWj+c67jiJerodkr1gntoa9dJVVN6InxB824CfKC6flE4JMWtffw0Dxh0cJ8iOQ8UeC zoWzTs9Z+K9j1CL11CHkIIit3RK3hnfnby6whr4xoZ9UX/BUUv8FVKZeyRg7SbDKlhEZTwIH 7axjVQJ6MGU7h+0/5dKCDMEtzPY= - KRYPTIVA SIGNATURE END - - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 1/2] Provide in-kernel headers for making it easy to extend the kernel
Hi Geert, On 3/11/19 4:03 AM, Geert Uytterhoeven wrote: [ snip ] OK. Now about the actual solution: what is your opinion on embedding e.g. a squashfs image in the kernel instead, which would be a more generic solution, not adding more ABI to /proc? I'm not familiar enough with the intricacies of squashfs to have an educated opinion, but I hear that it's got its quirks (need for user-space tools, etc.) and possibly security issues. Also, I wonder whether it's a generalized solution that still kicks the ABI can down the road -- ultimately the kernel would still have a path/format/foo for making kheaders available in that squashfs image and that convention would become ABI. The only "benefit" being that said ABI wouldn't appear under /proc, and, tbh, I'm not sure that that's actually a benefit or is even idiomatic since kconfig.gz is already under /proc. To an extent, the precedent set by kconfig favors kheaders to also be available in the same location using a similar mechanism ... i.e. bonus points for consistency. But that's my hand-wavy gut-reaction response to your question. I'm sure others on this thread have far more informed opinions about the specifics than I could have. My priority was to clarify the basis for the need being addressed. Cheers, -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour
Logging/buffering mechanism comparison? (ring buffer, relay, etc.)
Just wondering if anyone had some pointers on a comparison between the various logging/buffering mechanisms out there (ring buffer, relay, lttng buffering, etc.)? Googling was inconclusive. Anything that has benchmarks/pros/cons would be great. Thanks, -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/15] tracing: 'hist' triggers
On 15-03-02 02:45 PM, Steven Rostedt wrote: > Interesting. The Android devices I have still have it enabled (rooted, > but still running the stock system). I don't know that there's any policy to disable tracing on Android. The Android framework in fact has generally been instrumented by Google itself to output trace info into trace_marker. And the systrace/atrace tools made available to app developers need to get access to this tracing info. So, if Android had tracing disabled, systrace/atrace wouldn't work. https://developer.android.com/tools/debugging/systrace.html -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/15] tracing: 'hist' triggers
On 15-03-02 03:33 PM, Alexei Starovoitov wrote: > that's interesting. thanks for the link. > > I don't see tracing being explicitly enabled in defconfig: > https://source.android.com/devices/tech/kernel.html > or here: > https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg I don't know that either of these is "authoritative". I know of both of these, but I've never looked at them as being the reference for what manufacturers ship. Instead, most manufacturers get their default kernels from SoC vendors. So it's much likelier that an Androidized kernel tree from Qualcomm or Intel is closer to what gets really shipped than the two links above. -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 00/15] tracing: 'hist' triggers
On 15-03-02 03:48 PM, Alexei Starovoitov wrote: > good. thanks for explaining. > all makes sense now. > > btw, that fancy systrace seems to be parsing text from trace_pipe > https://android.googlesource.com/platform/external/chromium-trace/+/jb-dev/src/tracing/linux_perf_importer.js > with a bunch of regex... > including sched_switch: next_prio... Yes, it does. This is why it's not meant for analyzing large traces. -- Karim Yaghmour CEO - Opersys inc. / www.opersys.com http://twitter.com/karimyaghmour -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/