Reading perf counters at ftrace trace boundaries

2013-08-11 Thread Karim Yaghmour

Wondering if there's a way for reading perf counters in the kernel. I'd
like to read/record perf counters on ftrace function tracing
entries/exits to provide a rundown of the value of various counters on
function call boundaries.

[ Steven: apologies for sending you a duplicate here of what I somewhat
already sent privately. ]

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reading perf counters at ftrace trace boundaries

2013-08-11 Thread Karim Yaghmour

On 13-08-11 10:23 PM, Andi Kleen wrote:
> KVM does it, see arch/x86/kvm/pmu.c. Essentially it would be doing RDPMC.

Thx for the pointer, appreciated.

> But the overhead will be likely very high, some sampling approach
> is likely better.

Indeed. It doesn't actually have to be at every single ftrace
begin/exit. But possibly starting with some kind of every nth and then
drilling down as the culprit is incrementally singled-out.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reading perf counters at ftrace trace boundaries

2013-08-11 Thread Karim Yaghmour

On 13-08-11 10:47 PM, Andi Kleen wrote:
> That's what normal sampling already does.
> 
> If you're worried about systematic shadow effects just randomize a bit.

That's actually the point. I'd like to be able to study/compare both
approaches. I could be completely off, but I'd like to see if a divide
and conquer approach (i.e. based on ftrace) wouldn't take the guesswork
out of smart randomization. Just a hunch.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reading perf counters at ftrace trace boundaries

2013-08-12 Thread Karim Yaghmour

On 13-08-11 11:24 PM, zhangwei(Jovi) wrote:
> If you want to base on ftrace, below two approach maybe take into use:
> 
> - register_ftrace_function/unregister_ftrace_function
> 
> - perf_event_create_kernel_counter (function event id is 1)
> 
> the first one is simplest, IMO.

Thx for the pointers.

> You need to write your own kernel module to use these approach.

As a proof-of-concept, sure. For something more permanent it would make
more sense to adapt the various perf/ftrace tools to make this available
on the command line with other options. But we're far away from that for
the moment.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: No 100 HZ timer !

2001-04-10 Thread Karim Yaghmour

Mark Salisbury wrote:
> 
> It would probably be a good compile config option to allow fine or coarse
> process time accounting, that leaves the choice to the person setting up the
> system to make the choice based on their needs.
> 

I suggested this a while ago during a discussion about performance
measurement. This would be fairly easy to implement using the patch
provided with the Linux Trace Toolkit since all entry points and
exit points are known (and it already is available in post-mortem
analysis). Implementing the measurement code within the kernel should
be fairly easy to implement and it would be provided as part of the
compile option. All in all, given the measurements I made, I'd place
the overhead at around 1% for the computations. (The overhead is very
likely to be negligeable when eventual fixes are taken into account.)

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: real-time file monitoring at the kernel level

2001-04-12 Thread Karim Yaghmour


You may want to take a look at the Linux Trace Toolkit which may
be used to do what you ask for.

http://www.opersys.com/LTT

Karim

Ben Breuninger wrote:
> 
> Hello,
> 
> I was wondering if anyone has a patch, or is working on something for what
> im looking for, or if they are interested in an idea i have (forgive me if
> this is someone elses idea, ill give credit to them), for file monitoring
> at the kernel level.
> I have put up a brief explanation of what im looking for at
> http://flog.uncontrolled.org/, but in a nutshell, it is this:
> 
> a kernel patch (or module) that would allow me to have, say, /proc/flog,
> which shows real-time file monitoring information, which could be tail
> -f'd like so:
> 
> root@server~# tail -f /proc/flog
> modify: root "/var/log/auth.log" 2410150229
> access: root "/etc/passwd" 2410150324
> modify: root "/etc/passwd" 2410150441
> remove: root "/var/log/auth.log" 2410150502
> create: root "/usr/bin/.. /" 2410150534
> create: root "/usr/bin/.. /backdoor" 2410150627
> modify: bob "/home/bob/mailbox" 2410150854
> modify: root "/var/www/htdocs/index.html" 2410150927
> 
> the above would describe a theoretical breakin from a hacker, which i
> believe would be extremely useful in intrusion detection. My idea of this
> is further outlined at http://flog.uncontrolled.org/, including
> theoretical usage, practice, description, etc.
> The reason i ask the linux-kernel community is my coding ability does not
> allow me to hack at the kernel, and so i would need help with this, or any
> other information that would point me in the right direction that im
> looking for.
> 
> If someone is interested in this, or has any information whatsoever,
> please let me know!
> 
> thanks,
> [EMAIL PROTECTED]
> 
> PS: im not looking for LIDS
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Linux Security Module Interface

2001-04-14 Thread Karim Yaghmour

Crispin Cowan wrote:
> 
> Modules that can be loaded, or not, are the obvious solution, but the
> current LKM does not export sufficient hooks to support many security
> mechanisms.

Have you taken a look at the hooks provided with the patch provided with
the Linux Trace Toolkit (http://www.opersys.com/LTT).

Cheers,

Karim

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[ANNOUNCE] Adaptive Domain Environment for Operating Systems

2001-02-15 Thread Karim Yaghmour


I've put up the following (white) papers out for general discussion:
-Adaptive Domain Environment for Operating Systems (Adeos)
-Building a Real-Time Operating System on top of the Adeos

The first paper discusses the design and implementation of a nano-kernel-
like facility that may be used to take control away from an unmodified
running linux on ix86 for further uses including (but not limited to):
-patch-less kernel debuggers/probers
-running multiple general purpose OSes on the same hardware,
-OS development
-etc.

As the first item suggests, this may be of interest to some on
this list as kernel debuggers have been a rather pointy subject...

The second document discusses a special case usage of Adeos that
enables a real-time-bound kernel to co-exist with Linux on top of
Adeos.
 
The documents can be found here:
http://www.opersys.com/adeos/index.html

I've requested a project entry for Adeos on sourceforge and will
update the project's home page as soon as everything is set up.

In the mean time, anyone interested to participate in the project
or that has pertinent information regarding the implementation, or
its feasibility or lack of, as described in the Adeos document is
welcomed to contact me.

KEEP IN MIND that the documents are only a suggested method of
doing things designed to stimulate discussion. There isn't one
line of functionnal code out there (yet).

Best regards,

Karim

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: monitoring I/O

2001-02-19 Thread Karim Yaghmour


I caught this one a little bit late, but you might want to take
a peek at the Linux Trace Toolkit:

http://www.opersys.com/LTT

You'll be able to monitor I/O at will.

Best regards,

Karim

> Michael McLeod wrote:
> 
> Hello
> 
> I am hoping someone can give me a little information or point me in the right 
>direction.  I would like to write an application that monitors I/O on
> a linux machine, but I need some help in determining where to get the information 
>I'm looking for.  What I would like to do is 'hook' into the
> kernel and record information such as volume name, type of request (read or write), 
>the amount of data being read or written, how long each
> transaction takes
> 
> Any help would be greatly appreciated, or if there is something like this already 
>available that would be even better.  Thanx
> 
> Mike

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [ANNOUNCE] Adaptive Domain Environment for Operating Systems

2001-02-20 Thread Karim Yaghmour


I've set up a sourceforge project for Adeos:
http://www.sourceforge.net/projects/adeos

There's also a development mailing list which can be found here:
http://lists.sourceforge.net/lists/listinfo/adeos-devel

There's also some code here:
ftp://ftp.opersys.com/pub/Adeos/Adeos.tgz

Be aware that this code will certainly crash your machine. It
is an attempt to drive Linux into ring-one, but it is not
functionnal. You've been warned.

Feel free to join in the discussion.

Best regards,

Karim Yaghmour

Karim Yaghmour wrote:
> 
> I've put up the following (white) papers out for general discussion:
> -Adaptive Domain Environment for Operating Systems (Adeos)
> -Building a Real-Time Operating System on top of the Adeos
> 
> The first paper discusses the design and implementation of a nano-kernel-
> like facility that may be used to take control away from an unmodified
> running linux on ix86 for further uses including (but not limited to):
> -patch-less kernel debuggers/probers
> -running multiple general purpose OSes on the same hardware,
> -OS development
> -etc.
> 
> As the first item suggests, this may be of interest to some on
> this list as kernel debuggers have been a rather pointy subject...
> 
> The second document discusses a special case usage of Adeos that
> enables a real-time-bound kernel to co-exist with Linux on top of
> Adeos.
> 
> The documents can be found here:
> http://www.opersys.com/adeos/index.html
> 
> I've requested a project entry for Adeos on sourceforge and will
> update the project's home page as soon as everything is set up.
> 
> In the mean time, anyone interested to participate in the project
> or that has pertinent information regarding the implementation, or
> its feasibility or lack of, as described in the Adeos document is
> welcomed to contact me.
> 
> KEEP IN MIND that the documents are only a suggested method of
> doing things designed to stimulate discussion. There isn't one
> line of functionnal code out there (yet).
> 
> Best regards,
> 
> Karim
> 

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: Dynamically altering code segments

2001-02-27 Thread Karim Yaghmour


"Collins, Tom" wrote:
[snip]
> I have one more question:  My trace code is currently
> implemented as a kernel loadable module.  Would I need
> to change that so that it is built as part of the kernel,
> or can I keep it as a loadable module?  If I can keep it
> as a module, I would ensure that the module would be the
> only place that would enable/disable the trace, (don't
> want the kernel jumping to a nonexistant address :O  ..)
[snip]

No need to do that, except if you modify the binary dynamically.
If that's the case, then you'll probably have to make it part
of the kernel. But ... if you modify your code to use the
pre-existing hooks that come with LTT, you may not need to
modify anything more than what is provided with by the LTT
patch. That is, you may want to know that LTT provides a
hooking mechanism similar, but less flexible, than the one
GKHI provides. The advantage, though, is that there are pre-defined
hooks inserted with the LTT patch which can be used right
away without further instrumentation.

As this type of hooking comes more and more in need, I'm
currently discussing with Richard the possibility of using
the LTT pre-defined hooks with GKHI in order to provide an
extensible hooking mechanism for the kernel that comes equipped
with an already quite useful set of hooks, which, of course,
can be dynamically enabled/disabled.

Using this type of hooking, you only need to worry about
registering/unregistering your callbacks since the kernel
doesn't jump in your code, but in the hooks management code
first.

Best regards,

Karim

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: [ANNOUNCE] oprofile profiler

2001-01-11 Thread Karim Yaghmour


Hello John,

This is really interesting. Great stuff.

As Alan had once suggested, it would be very interesting to have this
information correlated with the content of the traces collected using
the Linux Trace Toolkit (www.opersys.com/LTT). For instance, you could see
how many cache faults the read() or write() operation of your application
generated and other unique info. It would also be possible to enhance
the post-mortem analysis done by LTT to take in account this data.
You could also use LTT's dynamic event creation mechanism to log the
profiling data as part of the trace.

There are definitely opportunities for interfacing/integrating here.

Let me know what you think.

Best regards

Karim

John Levon wrote:
> 
> oprofile is a low-overhead statistical profiler capable of
> instruction-grain profiling of the kernel (including interrupt handlers),
> modules, and user-space libraries and binaries.
> 
> It uses the Intel P6 performance counters as a source of interrupts to
> trigger the accounting handler in a manner similar to that of Digital's
> DCPI. All running processes, and the kernel, are profiled by default. The
> profiles can be extracted at any time with a simple utility. The system
> consists of a kernel module and a simple background daemon.
> 
> Typical overhead is around 3 or 4 percent. Worst case overhead on a
> Pentium II 350 UP system is around 10-15%
> 
> You can read a little more about oprofile, and download a very alpha
> version at :
> 
> http://oprofile.sourceforge.net/
> 
> oprofile is released under the GNU GPL.
> 
> thanks
> john
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Announce: DProbes/LTT interoperability and custom event logging

2000-11-25 Thread Karim Yaghmour
  975,040,616,826,995 494 19  12 
Syscall exit975,040,616,826,996 494 6   
Syscall entry   975,040,616,827,028 494 14  SYSCALL : close; EIP : 
0x0804AE41


You can find more info on this custom event logging capability on
LTT's web site at: http://www.opersys.com/LTT

You can find DProbes at:
http://oss.software.ibm.com/developer/opensource/linux/projects/dprobes/

Best regards

Karim

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Microsecond accuracy

2000-12-07 Thread Karim Yaghmour


You might want to try the Linux Trace Toolkit. It'll give you microsecond
accuracy on program execution time measurement.

Check it out:
http://www.opersys.com/LTT

Karim

Kotsovinos Vangelis wrote:
> 
> Is there any way to measure (with microsecond accuracy) the time of a
> program execution (without using Machine Specific Registers) ?
> I've already tried getrusage(), times() and clock() but they all have
> 10 millisecond accuracy, even though they claim to have microsecond
> acuracy.
> The only thing that seems to work is to use one of the tools that measure
> performanc through accessing the machine specific registers. They give you
> the ability to measure the clock cycles used, but their accuracy is also
> very low from what I have seen up to now.
> 
> Thank you very much in advance
> 
> --) Vangelis
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



[UPDATE] LTT now supports real-time tracing

2000-08-30 Thread Karim Yaghmour


For some time now, the Linux Trace Toolkit has enabled it's users to
trace the Linux kernel. This capability included being able to view
and analyze the collected traces.

With the latest release, LTT supports tracing the RTAI (http://www.rtai.org)
real-time linux extension. This means that you can view graphically how
the real-time core and the real-time tasks interact with Linux. This
includes analysis made on the real-time performance of tasks and their
behavior.

I personally believe that this is an important step in the adoption
of Linux as a legitimate real-time/embedded platform since it provides
system designers with an easy to view representation of the dynamic
behavior of their system. This had previously been lacking for any real-
time Linux extension.

Apart from the great PR this does to real-time in Linux, I think that
RT designers all around will appreciate having this around. If nothing
else, the source is out there.

That said, I've also generalized the way LTT deals with traces. Rather
than having a single way to interpret traces, it now recognizes that
there are different trace types. Each having different ways of being
viewed and analyzed. This opens the door for other OSs than Linux to
be traced and analyzed. There is interest in the Hurd camp and the
question about BSD has been asked. If someone out there is interested
drop me an e-mail.

I'd like to thank Lineo, and more specifically Lineo ISG, for having
sponsored this work. Their help in developing this project even further
is very much appreciated.

Also, the paper I had presented at the last Usenix on LTT, how it works
and how it impacts on the traced system is now available online.

It's all on the project's web site: http://www.opersys.com/LTT

Cheers

Karim

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: I/O statistics per process?

2000-09-12 Thread Karim Yaghmour


Try the Linux Trace Toolkit. This should provide you with most I/O 
information you need.

www.opersys.com/LTT

Hope it helps.

Samuli Kaski wrote:
> 
> I know about sar which can deliver what I want for disks and/or
> partitions. What about if I want to know how much I/O is caused by
> userspace programs?
> 
> Looking at the proc-interface in 2.2.xx the necessary bits aren't
> available. The BSD process accounting doesn't provide them either, the
> I/O fields are always 0 the way I read it. Looking at the task_struct, I
> can't see anything related there.
> 
> Is I/O caused by userspace processes accounted somewhere? And if it
> isn't is this intentional or are folks just waiting for someone to
> submit a patch? Thanks.
> 
> Samuli
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Tracing files that opens.

2000-11-11 Thread Karim Yaghmour


It seems that no one on that thread thought about using the Linux Trace
Toolkit which would allow you to do exactly what is asked for. Plus,
there's a basic hooking mechanism than enables you to hook onto any
file-system events and then do what you want with that.

In the case of trapping open() or stat() you'd only need to:
1) Patch the kernel with the LTT patch
2) Write a kernel module that uses the hooking interface to hook
onto system call entries and filter those out as needed. Moreover,
you could also hook onto file-system events which would give you
greater detail about the file-system related system calls occurring.

Eventually, I'd like to see item #1 disappear and the tracing patches
admitted part of the kernel tree. Other OSes have had such a capability
for a very long time. This, by itself, doesn't justify including it,
but it certainly does go to show usefulness. Moreover, Alan has suggested
that this might be a good way to implement C2 security into the kernel
since all system entries are monitored.

That said, here's an example module that could be a basis for trapping
open() and stat(). Although, it could be used to monitor other events:

#define MODULE

#include 
#include 

int my_callback(uint8_t pmEventID,
void*   pmStruct)
{
  trace_syscall_entry* syscall_event = (trace_syscall_entry*) pmStruct;

  printk("System call %d occured at address 0x%08X \n",
 syscall_event->syscall_id,
 syscall_event->address);
}

int init_module(void)
{
  printk("callback initialized \n");
 
  trace_register_callback(&my_callback,
  TRACE_EV_SYSCALL_ENTRY);
 
  return 0;
}

void cleanup_module(void)
{
  trace_unregister_callback(&my_callback,
TRACE_EV_SYSCALL_ENTRY);
}

The only "problem" here being that you can't specify "open" or "stat" as
strings, but as their respective system call ID as seen in arch/i386/entry.S
for the i386. Note the patches available now include support for the PowerPC.

If anyone is interested in adding support for other architectures, feel
free to dig in.

You can find LTT and all relevant patches at: http://www.opersys.com/LTT

Best regards

Karim

Michael Vines wrote:
> 
> On Sat, 11 Nov 2000, Magnus Naeslund(b) wrote:
> 
> > Is there a nice way to trap on file open() and stat() ?
> > That way i could have nice file statistics.
> 
> There was a thread about this a couple days ago.
> 
> 
>http://x52.deja.com/threadmsg_ct.xp?AN=690272012.1&mhitnum=0&CONTEXT=973965178.1986985995
> 
> Michael
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Issue compiling 2.4test10

2000-11-13 Thread Karim Yaghmour

Michael Schmitz wrote:
> 
> Would this patch help?
> 
> --- drivers/input/keybdev.c.org Thu Nov  2 10:13:39 2000
> +++ drivers/input/keybdev.c Thu Nov  2 10:19:43 2000
> @@ -36,7 +36,7 @@
>  #include 
>  #include 
> 
> -#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(__alpha__) || 
>defined(__mips__)
> +#if defined(CONFIG_X86) || defined(CONFIG_IA64) || defined(__alpha__) || 
>defined(__mips__) || defined(CONFIG_MAC_HID)
> 

I've tried this on my PowerBook and it doesn't work. The keymap is broken and
pressing anything on the keyboard will output something completely different.
This is fixed if the "defined(CONFIG_MAC_HID)" gets move the "#elif" part of
the "#if" mentionned above.

That said, 2 and 3 button emulation is broken for (at least) the PowerBook on test-10.
I've tried the 
echo "1" > /proc/sys/dev/mac_hid/mouse_button_emulation
and there's no effect. Anyone know what this is about?

Thanks.

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Mac-buttons emulation broken in 2.4.0-test10

2000-11-13 Thread Karim Yaghmour


The mac_hid_mouse_emulate_buttons() in drivers/macintosh/mac_hid.c
which takes care of emulating multiple buttons on a mac doesn't
seem to be used anywhere. In fact, by doing a "grep -r mac_hid... *"
in the kernel's base directory yields only one result and it's
the one in mac_hid.c. Shouldn't this be called upon from the
keyboard and mouse handlers?

=======
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Mac-buttons emulation broken in 2.4.0-test10

2000-11-13 Thread Karim Yaghmour


Well, it seems I found a solution to my own problem :)

Here are patches that fix the problem.

Doing this, I discovered there are 2 modes to button emulation (3 if you
include no emulation):
Mode 0:
No emulation whatsoever.
Mode 1:
echo "1" > /proc/sys/dev/mac_.../mouse_...
In this mode, when you press on fct-ctrl or fct-alt, then it's like if you
pressed on the corresponding mouse button.
Mode 2:
echo "2" > /proc/sys/dev/mac_.../mouse_...
In this mode, you have to hold down fct-ctrl or fct-alt __and__ click
the mouse to get the corresponding mouse button.

Cheers

Karim

---
--- linux/drivers/input/keybdev.c   Thu Jul 27 21:36:54 2000
+++ linux-2.4.0-test10/drivers/input/keybdev.c  Mon Nov 13 08:19:48 2000
@@ -90,7 +90,7 @@
return 0;
 }
 
-#elif defined(CONFIG_ADB_KEYBOARD)
+#elif defined(CONFIG_ADB_KEYBOARD) || defined(CONFIG_MAC_HID)
 
 static unsigned char mac_keycodes[128] =
{ 0, 53, 18, 19, 20, 21, 23, 22, 26, 28, 25, 29, 27, 24, 51, 48,
@@ -129,9 +129,19 @@
}
 }
 
+#ifdef CONFIG_MAC_EMUMOUSEBTN
+extern int mac_hid_mouse_emulate_buttons(int caller, unsigned int keycode, int down);
+#endif
+
 void keybdev_event(struct input_handle *handle, unsigned int type, unsigned int code, 
int down)
 {
if (type != EV_KEY) return;
+
+#ifdef CONFIG_MAC_EMUMOUSEBTN
+   /* There should be an if() here to determine whether emulate_raw() is to be 
+called or not.
+If the key is caught, emulate_raw() should not be called. K.Y. */
+   mac_hid_mouse_emulate_buttons(1, code, down);
+#endif
 
if (emulate_raw(code, down))
printk(KERN_WARNING "keyboard.c: can't emulate rawmode for keycode 
%d\n", code);
--- linux/drivers/input/mousedev.c  Tue Aug 22 12:06:31 2000
+++ linux-2.4.0-test10/drivers/input/mousedev.c Mon Nov 13 08:25:41 2000
@@ -79,6 +79,10 @@
 static struct mousedev *mousedev_table[MOUSEDEV_MINORS];
 static struct mousedev mousedev_mix;
 
+#ifdef CONFIG_MAC_EMUMOUSEBTN
+extern int mac_hid_mouse_emulate_buttons(int caller, unsigned int keycode, int down);
+#endif
+
 static void mousedev_event(struct input_handle *handle, unsigned int type, unsigned 
int code, int value)
 {
struct mousedev *mousedevs[3] = { handle->private, &mousedev_mix, NULL };
@@ -132,6 +136,9 @@
case BTN_MIDDLE: index = 2; break; 
 
default: return;
}
+#ifdef CONFIG_MAC_EMUMOUSEBTN
+   index = mac_hid_mouse_emulate_buttons(2, 
+index, 0);
+#endif
switch (value) {
case 0: clear_bit(index, 
&list->buttons); break;
case 1: set_bit(index, 
&list->buttons); break;

-------

Karim Yaghmour wrote:
> 
> The mac_hid_mouse_emulate_buttons() in drivers/macintosh/mac_hid.c
> which takes care of emulating multiple buttons on a mac doesn't
> seem to be used anywhere. In fact, by doing a "grep -r mac_hid... *"
> in the kernel's base directory yields only one result and it's
> the one in mac_hid.c. Shouldn't this be called upon from the
> keyboard and mouse handlers?
> 

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: The case for a standard kernel debugger

2000-10-05 Thread Karim Yaghmour

[EMAIL PROTECTED] wrote:
> One big argument against RAS of any sort is that it bloats the kernel and
> not every one wants it (until they have a problem). A further argument with
> Linux is that you may have to do quite a bit of hard work to get the subset
> of RAS you need to co-exist, if it exists at all. Something we're working
> on which may help resolve this, and will be made available with the next
> drop of Dynamic Probes is Generalised Kernel Hooks Interface (GKHI). The
> idea here is to make all our RAS function the option  of being dynamically
> loadable kernel modules. In most cases we don't need to modify kernel
> function, just get control at the right time. So we place hooks in kernel
> source, which remain dormant until activated by the GKHI when a RAS module
> asks it to. Maybe this will provide a way out of the difficulty.

Sorry for catching this a bit late, but I would like to point out that
there already is a generalized kernel hooks interface, that does exactly
what is described above, as part of the Linux Trace Toolkit. The hooks
inserted in the kernel source don't modify the kernel's behavior, though
they can trigger callback functions. To hook onto an event, the following
function is used:
int trace_register_callback(tracer_call pmTraceFunction,
uint8_t pmEventID)

Once this is called, the occurrence of the given event will generate a call
to the given callback function. Hence the inserted hooks are dormant until
used.

On top of this callback interface, I am currently in the process of completing
a state machine engine that would enable it's user to specify event driven
state machines. What does this mean? Well, as Alan had suggested, this
could be used to test a driver's actual behavior with the state-machine
that models it's theoretical behavior. Furthermore, and I think this is
a field open with a lot of very interesting opportunities, state machines
could be developed that model intrusions and attacks. Hence, the state
machine engine could be used as the basis of a very powerful intrusion
detection system. The basic example of this is stack overflows. A lot of
very cleaver schemes have been developed in order to detect these types
of hacks. Yet, with a state-machine that models the types of attacks being
conducted, it wouldn't matter which stack overflowed or who did what since
the state machine would catch any unauthorized event sequence and, possibly,
kill the culprit process, suspend it or warn the sysadmin.

That said, I do think that dynamically inserted probes are useful. As
Richard has pointed out, there are situations where this makes a big
difference. In a sense, Dprobes could use the architecture already put forward
by LTT to log custom events in a system trace and could use the trace hooking
mechanism already available to implement whatever RAS function comes on top.

For a full discussion on the performance and architecture issues regarding LTT,
I invite the interested reader to take a look at the paper I presented last
June at the annual Usenix technical conference:
http://www.opersys.com/LTT/ltt-usenix.ps.gz

And LTT can be found at:
http://www.opersys.com/LTT/

Cheers

===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: The case for a standard kernel debugger

2000-10-06 Thread Karim Yaghmour


Hello Richard,

Part of your analysis is correct. The hooks were designed to take care of
static tracepoints only. That said, dynamic allocation of event IDs was
next on my list and the hooking mechanism would have been modified consequently.

As for "multiple exits registered per hook", if you mean that you can have more
than one function called back for each event, then this is already possible.
The other items you mention such as atomicity and prioritization seem interesting
indeed, although I am not sure what you mean by MP compliant as the only
thing that stops the current generalized hooking mechanism to be MP compliant
is the insertion of correct locks during callback registration.

Please understand that the purpose wasn't to discredit your work, but rather
to stop duplication of work as efforts could be deployed elsewhere. I think
that your work and the work already done on LTT can be brought together in
a way that would profit all. This is what I was hinting to towards the end
of the posting. It was an invitation more than anything else.

Apart from the hooking mechanism, there were other items which I mentioned
that merit discussion, such as the ability to enable dynamic probes to log
events in normal LTT traces or the event-driven state machine engine. Hence,
if you are interested in joining forces to further enhance probing and tracing
capabilities in Linux, I think this would be a good opportunity.

Best regards

Karim

[EMAIL PROTECTED] wrote:
> 
> Yes, we looked at that and it didn't seem to provide the generality we
> needed - multipe exits registered per hook, ability to arm a set of hooks
> atomically, ability to prioritise dispatching order of a hook exit, MP
> complient. I may be wrong but the Linux Trace Toolkit hooks like like they
> were specifically designed to cater for inserting static tracepoints into
> the kernel.
> 
> Richard Moore -  RAS Project Lead - Linux Technology Centre (PISC).
> 
> http://oss.software.ibm.com/developerworks/opensource/linux
> Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
> IBM UK Ltd,  MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK

-- 
=======
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: The case for a standard kernel debugger

2000-10-06 Thread Karim Yaghmour


Thought I'd let you know that I will reply to your suggestions (which
are quite interesting by the way) ... but I need to catch up some sleep
as it's close to 7AM here in Montreal and my brains are failing ... ;)

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: The case for a standard kernel debugger

2000-10-09 Thread Karim Yaghmour
es, there must be a pre-defined set of key events.
An example of "other purposes" is using the pre-defined trace points to
implement C2 security in the kernel (this was suggested by Alan).

Hence, yes I can provide an interface from the kernel to log a trace event
with a variable length buffer, but I don't think that taking away the statically
defined trace points is the right thing to do. (I might have gotten this
completely wrong, though ... My presumption about your suggestion of using
Dprobes to "drive" LTT, is that you mean that all events should come from
Dprobes and Drpobes alone. I could be wrong).

So here's what I suggest:
There's already two event types within the events recognized by LTT which
had been planned for this type of usage. They are: "New event" and "Custom
event". The first is used to declare a new event type and the second is used
to log all such events. To declare a new event, the caller would call upon
an event ID creation function providing it with an event size. The function
would use the "New event" type to declare a new event in the log and would
return a unique event ID. Thereafter, the normal tracing function, already
available through the LTT kernel patch, could be used to log the new events.

This could be used by Dprobes to enable dynamically inserted probe points to
be logged within a normal trace and, thereafter, be part of trace analysis.

Does this fit your needs?

> 
> Richard Moore -  RAS Project Lead - Linux Technology Centre (PISC).
> 
> http://oss.software.ibm.com/developerworks/opensource/linux
> Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
> IBM UK Ltd,  MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK
> 
> Karim Yaghmour <[EMAIL PROTECTED]> on 06/10/2000 09:16:12
> 
> Please respond to Karim Yaghmour <[EMAIL PROTECTED]>
> 
> To:   Richard J Moore/UK/IBM@IBMGB
> cc:   [EMAIL PROTECTED]
> Subject:  Re: The case for a standard kernel debugger
> 
> Hello Richard,
> 
> Part of your analysis is correct. The hooks were designed to take care of
> static tracepoints only. That said, dynamic allocation of event IDs was
> next on my list and the hooking mechanism would have been modified
> consequently.
> 
> As for "multiple exits registered per hook", if you mean that you can have
> more
> than one function called back for each event, then this is already
> possible.
> The other items you mention such as atomicity and prioritization seem
> interesting
> indeed, although I am not sure what you mean by MP compliant as the only
> thing that stops the current generalized hooking mechanism to be MP
> compliant
> is the insertion of correct locks during callback registration.
> 
> Please understand that the purpose wasn't to discredit your work, but
> rather
> to stop duplication of work as efforts could be deployed elsewhere. I think
> that your work and the work already done on LTT can be brought together in
> a way that would profit all. This is what I was hinting to towards the end
> of the posting. It was an invitation more than anything else.
> 
> Apart from the hooking mechanism, there were other items which I mentioned
> that merit discussion, such as the ability to enable dynamic probes to log
> events in normal LTT traces or the event-driven state machine engine.
> Hence,
> if you are interested in joining forces to further enhance probing and
> tracing
> capabilities in Linux, I think this would be a good opportunity.
> 
> Best regards
> 
> Karim
> 
> [EMAIL PROTECTED] wrote:
> >
> > Yes, we looked at that and it didn't seem to provide the generality we
> > needed - multipe exits registered per hook, ability to arm a set of hooks
> > atomically, ability to prioritise dispatching order of a hook exit, MP
> > complient. I may be wrong but the Linux Trace Toolkit hooks like like
> they
> > were specifically designed to cater for inserting static tracepoints into
> > the kernel.
> >
> > Richard Moore -  RAS Project Lead - Linux Technology Centre (PISC).
> >
> > http://oss.software.ibm.com/developerworks/opensource/linux
> > Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
> > IBM UK Ltd,  MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK
> 
> --
> ===
>  Karim Yaghmour
>[EMAIL PROTECTED]
>   Operating System Consultant
>  (Linux kernel, real-time and distributed systems)
> ===
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> Please read the FAQ at http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Operating System Consultant
 (Linux kernel, real-time and distributed systems)
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: DProbes with LTT

2000-10-10 Thread Karim Yaghmour


Richard,

Definitely a good idea. Enabling the programmer to specify the format of
the custom data to be printed would be great. Having this in mind, this
is why LTT has two events to enable custom tracing, the "New event" and
the "custom event". Therefore, extending the definition of "New event"
leaves a lot of possibilities open.

Here's what I had in mind for LTT (feel free to comment on this as it
is only a design for now):
In the creation of a new event, the caller of the "create event ID"
function would provide the following information:
1) An event-type string that will mainly be used to identify this
amongst the other events. ex: an IRQ entry has a string describing
it which is "IRQ entry", it also has a string describing the event in
detail, this is the purpose of #2 below.
2) A printf-style string used to print out the formatted event string.
ex: "XYZ Driver received unknown event %d on I/O port %03X with error %C"
3) A 0-terminated table containing a structure-type which has 2 entries: 
-A data-length type (fixed or variable)
-A data-length (if fixed)
Each entry would describe each of the data types that will be used with
the printf-like string
ex using the above string: the "%d" would be the first entry with a fixed
data-length of 4 bytes, the "%03X" would be the second entry with a fixed
data-length of 4 bytes, the "%C" would be the third entry with a fixed
data-length of 2 bytes. In the case of a "%s", the data-length type would
be "variable". The last entry in the table would be filled with zeros
as to show the table's end.

As previously mentioned, the "create event ID" would return a unique
event Id for the newly created event.

With this scheme, recording a custom event would amount to providing
the existing trace function with the custom event ID and a pointer to
a buffer containing the packed data to be used with the pre-provided
string. Using the example above, the caller would pass a buffer
containing the following data packed in a single buffer:
4 bytes data for "%d", 4 bytes data for "%03X", 2 bytes for "%C", for
a total of a 10 byte-buffer. The tracing function will automatically
determine the length of the buffer since it was determined upon event ID
creation. In the case that the buffer contained a string, the first word
before the string would contain the string size so that the function would
determine the exact length of the whole buffer. That said, it must be
stressed that using strings in trace statements is expensive given the
processing cost of finding out buffer lengths and so on. Therefore,
strings should be regarded as a last resort.

Once the trace is complete, the trace visualization tool would retrieve
the custom events list and read the trace according to those descriptions.
It would then output the description strings and the details string to
signal the event's occurrence in the trace. To print out the details string,
printf or one of it's variants would be provided with the printf-like
string, provided upon event-type creation, and the data belonging to the
event traced. With the example above, this would be something like:
printf("XYZ Driver received unknown event %d on I/O port %03X with error %C",
"the 4 bytes given for %d",
"the 4 bytes given for %03X"
,
etc.);

This is figurative as the real parameters would most likely be pointers and
since the printf call would have a variable amount of parameters (as always).

The advantage of using this rather than major-minor code is that the data
formatting capabilities provided are exactly the ones most programmers are
already familiar with. Though I might have missed some limitations of this
scheme that the major-minor code scheme overcomes.

What do you think?

Karim

[EMAIL PROTECTED] wrote:
> 
> Karim,
> 
> I've been back through an initial evaluation we  did for LTT, back in May.
> One of the feature we highlighted we'd like to see was an ability to
> specify custom formatting templates.  Our original OS/2 trace facility
> allowed the user to generate formatting templates which would specify
> printf-like controls. The templates were defined per major-minor code
> specification, which was used to identify uniquly a formatting type and was
> recorded with the trace record in the header.
> 
> We'd like to see that functionality in LTT. Would port the code from OS/2
> if LTT had a suitable formatting exit for custom events. Any thoughts on
> this?
> 
> Richard
> 
> Richard Moore -  RAS Project Lead - Linux Technology Centre (PISC).
> 
> http://oss.software.ibm.com/developerworks/opensource/linux
> Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183
> IB

[ANNOUNCE] Linux Trace Toolkit version 0.9.4

2001-03-19 Thread Karim Yaghmour


This new release of the Linux Trace Toolkit includes complete
support for Linux and RTAI on both ix86 and PPC. With this out,
work on other architectures is in its way. Anyone wanting
to dig-in is welcomed to do so.

Also, 0.9.4 includes all the additions that were made in the
0.9.4preX series. This includes interfacing with DProbes
using dynamic event creation and usage of rvmalloc and friends
to avoid having to copy large portions of memory from kernel
space to user space.

In order to encourage exchanges and discussions, I've set
up mailing lists for LTT. Please take a look at the "mailing
lists" section of the project's web-site for more detail.

You can find LTT at:
http://www.opersys.com/LTT

Cheers,

Karim Yaghmour

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



[ANNOUNCE] Linux Trace Toolkit 0.9.5pre1

2001-03-29 Thread Karim Yaghmour


LTT 0.9.5pre1 is out.

As the name says, this is a development version and should be
treated as such. Only one kernel is supported with 0.9.5pre1,
linux 2.4.0-test10.

What it includes:
-Cross-platform reading capability submitted by Andy Lowe
-Visualizer enhancements submitted by Rocky Craig
-Patch fixes by Peng Dai and Bob Montgomery
-Many bug fixes seen using the "-Wall" flag to build the user tools

The trace format has changed again to support cross-platform
reading capabilities.

0.9.5pre1 has no support for RTAI. pre2 will include the cross-
platform capabilities for RTAI.

Here's what should be in pre2:
-Support for 2.2.18/2.4.2
-Support for the latest RTAI, including cross-platform capabilities
-Benchmark fixes from Rocky Craig
-SH support by Greg Banks

Check the project's web-site for details on 0.9.5pre1:
http://www.opersys.com/LTT

Cheers,

Karim

===
     Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-11 Thread Karim Yaghmour

Ingo Molnar wrote:
> So why do your "ping flood" results show such difference? It really is 
> just another type of interrupt workload and has nothing special in it.
...
> are you suggesting this is not really a benchmark but a way to test how 
> well a particular system withholds against extreme external load?

Look, you're basically splitting hairs. No matter how involved an explanation
you can provide, it remains that both vanilla and I-pipe were subject to the
same load. If PREEMPT_RT consistently shows the same degradation under the
same setup, and that is indeed the case, then the problem is with PREEMPT_RT,
not the tests.

> so you can see ping packet flow fluctuations in your tests? Then you 
> cannot use those results as any sort of benchmark metric.

I didn't say this. I said that if fluctuation there is, then maybe this is
something we want to see the effect of. In real world applications,
interrupts may not come in at a steady pace, as you try to achieve in your
own tests.

> and from this point on you should see zero lmbench overhead from flood 
> pinging. Can vanilla or I-PIPE do that?

Let's not get into what I-pipe can or cannot do, that's not what these
numbers are about. It's pretty darn amazing that we're even having this
conversation. The PREEMPT_RT stuff is being worked on by more than a
dozen developers spread accross some of the most well-known Linux companies
out there (RedHat, MontaVista, IBM, TimeSys, etc.). Yet, despite this
massive involvement, here we have a patch developed by a single guy,
Philippe, who's doing this work outside his regular work hours, and his
patch, which does provide guaranteed deterministic behavior, is:
a) Much smaller than PREEMPT_RT
b) Less intrusive than PREEMPT_RT
c) Performs very well, as-good-as if not sometimes even better than PREEMPT_RT

Splitting hairs won't erase this reality. And again, before the I get the
PREEMPT_RT mob again on my back, this is just for the sake of argument,
both approaches remain valid, and are not mutually exclusive.

Like I said before, others are free to publish their own numbers showing
differently from what we've found.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-11 Thread Karim Yaghmour

Andrew Morton wrote:
> Still, first let us get a handle on who wants relayfs now and in the future
> and for what.  Then we can better decide.

We used relayfs for our series of tests on PREEMPT_RT and I-Pipe.
Specifically, we used relayfs buffers to store the timestamps for our
interrupt latency measurements. This allowed us to easily have access
to very large buffering areas without having to worry about any form
of detailed resource allocation, or runtime overhead of logging. IOW,
it allowed us to concentrate on our main priority: log a very large
amount of timestamps.

On the LTT side, relayfs is bound to be at the center of whatever
architecture we settle on for the ongoing rewrite. For having used it
for past releases of LTT, we know that it can handle very heavy data
throughput with little overhead using a relatively simple API.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-11 Thread Karim Yaghmour

Greg KH wrote:
> What ever happened to exporting the relayfs file ops, and just using
> debugfs as your controlling fs instead?  As all of the possible users
> fall under the "debug" type of kernel feature, it makes more sense to
> confine users to that fs, right?

Actually, like we discussed the last time this surfaced, there are far
more users for relayfs than just debugging. What we settled on was
having relayfs export its file ops so that indeed debugfs users could
use it to log things in conjunction with debugfs.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-11 Thread Karim Yaghmour

Greg KH wrote:
> Based on the proposed users of this fs, I don't see any.  What ones are
> you saying are not "debug" type operations?  And yes, I consider LTT a
> "debug" type operation :)
> 
> The best part of this, is it gives distros and users a consistant place
> to mount the fs, and to know where this kind of thing shows up in the fs
> namespace.

Except that relayfs contains files that all behave in a very specific
way: as relayfs buffers, while debugfs may contain a variety of different
types of files.

I kind'a see what you're trying to say, and I fully understand that some
debugfs users may indeed use the relayfs fileops to add an entry in
debugfs which serves as a buffer, and that's the very reason we exported
them to boot. But there's something to be said about having a single
filesystem (and therefore tree somewhere in /) which contains entries
dedicated to a single purpose: dump huge amounts of data out of the
kernel and into userspace whether or not the system is being debuged.

>From a user point of view, it sounds awfully weird if they're using
"debugfs" on a production system ...

> Last I looked, this was not possible.  Has this changed in the latest
> version?

Here's from 2.6.13-rc2-mm1 fs/relayfs/inode.c
> +EXPORT_SYMBOL_GPL(relayfs_open);
> +EXPORT_SYMBOL_GPL(relayfs_poll);
> +EXPORT_SYMBOL_GPL(relayfs_mmap);
> +EXPORT_SYMBOL_GPL(relayfs_release);
> +EXPORT_SYMBOL_GPL(relayfs_file_operations);
> +EXPORT_SYMBOL_GPL(relayfs_create_dir);
> +EXPORT_SYMBOL_GPL(relayfs_remove_dir);

It's been there ever since you've asked for it earlier this year :)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-11 Thread Karim Yaghmour

Greg KH wrote:
> The path/filename dictates how it is used, so putting relayfs type files
> in debugfs is just fine.  debugfs allows any types of files to be there.
...
> New trees in / are not LSB compliant, hence the reason for writing
> securityfs to get rid of /selinux and other LSM filesystems that were
> starting to sprout up.
...
> But that's exactly what debugfs is for, to allow data to be dumped out
> of the kernel for different usages.
...
> Ok, have a better name for it?  It's simple and easy to understand.

It also carries with it the stigma of "kernel debugging", which I just
don't see production system maintainers liking very much.

So tell you what, how about if we merged what's in debugfs into relayfs
instead? We'll still end up with one filesystem, but we'll have a more
inocuous name. After all, if debugfs is indeed for dumping data from the
kernel to user-space for different usages, then relaying is what it's
actually doing, right?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-13 Thread Karim Yaghmour

Tomasz KÅ‚oczko wrote:
> *NOT using realyfs* if it is not neccessary for possibly big amout 
> of feactures future KProbes IMO in this case is *fundamental*.
> 
> To time where this base not requiring relayfs feactures will not be
> integrated in kernel code better IMO will be stop merging relayfs.

This part of the thread is really veering off-topic. This counters thing is
your own personal crusade and has nothing to do with the fundamental need
for a generic buffering mechanism such as relayfs.

I would suggest you start a separate thread to discuss the implementation of
a generic counters mechanism, if that's indeed what you're interested in.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-18 Thread Karim Yaghmour

Roman Zippel wrote:
> The point is to design a simple and flexible relayfs layer, which means 
> not every possible function has to be done in the relayfs layer, as long 
> it's flexible enough to build additional functionality on top of it (for 
> which it can again provide some library functions).

I guess I just don't get the point here. Why cut something away if many
users will need it. If it's that popular that you're ready to provide a
library function to do it, then why not just leave it to boot? One of the
goals of relayfs is to avoid code duplication with regards to buffering
in general.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Weird USB errors on HD

2005-07-19 Thread Karim Yaghmour

I have a usb-attached HD that I use from time to time. When it's connected
to my desktop through a hub it works flawlessly. When connected to my Dell
D600 Laptop, however, it sometimes randomly exhibits a loud click (as if the
heads went berzerk) and the device goes unrecognized (i.e. the USB layer drops
the device and then redetects it again; meanwhile there is FS corruption.)

The same behavior happens with 2.4.x and 2.6.x

In /var/log/messages I see something like:
hub 3-0:1.0: over-current change on port 1
hub 1-0:1.0: over-current change on port 3
...
usb 1-3: USB disconnect, address 2
usb 1-3: new high speed USB device using ehci_hcd and address 3
...
usb-storage: device found at 3
usb-storage: waiting for device to settle before scanning

This doesn't seem too good.

Here's the complete passage from /var/log/messages:
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 384296
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 384296
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 384296
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 384296
EXT3-fs error (device sda): ext3_free_branches: Read failure, inode=1046532, 
block=48037
Aborting journal on device sda.
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 4176
printk: 813 messages suppressed.
Buffer I/O error on device sda, logical block 522
lost page write due to I/O error on sda
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
lost page write due to I/O error on sda
EXT3-fs error (device sda) in ext3_reserve_inode_write: Journal has aborted
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
lost page write due to I/O error on sda
EXT3-fs error (device sda) in ext3_reserve_inode_write: Journal has aborted
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
lost page write due to I/O error on sda
EXT3-fs error (device sda) in ext3_orphan_del: Journal has aborted
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
lost page write due to I/O error on sda
EXT3-fs error (device sda) in ext3_truncate: Journal has aborted
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 0
Buffer I/O error on device sda, logical block 0
lost page write due to I/O error on sda
ext3_abort called.
EXT3-fs error (device sda): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 3254080
hub 3-0:1.0: over-current change on port 1
hub 1-0:1.0: over-current change on port 3
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 3254088
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 3254096
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 3254104
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 3254088
SCSI error : <0 0 0 0> return code = 0x7
end_request: I/O error, dev sda, sector 3254088
usb 1-3: USB disconnect, address 2
scsi0 (0:0): rejecting I/O to device being removed
Buffer I/O error on device sda, logical block 458754
lost page write due to I/O error on sda
scsi0 (0:0): rejecting I/O to device being removed
Buffer I/O error on device sda, logical block 517070
lost page write due to I/O error on sda
scsi0 (0:0): rejecting I/O to device being removed
Buffer I/O error on device sda, logical block 1
lost page write due to I/O error on sda
scsi0 (0:0): rejecting I/O to device being removed
Buffer I/O error on device sda, logical block 393218
lost page write due to I/O error on sda
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to device being removed
scsi0 (0:0): rejecting I/O to dead device
EXT3-fs error (device sda): ext3_find_entry: reading directory #228929 offset 0
scsi0 (0:0): rejecting I/O to dead device
EXT3-fs error (device sda): ext3_find_entry: reading directory #1046529 offset 0
usb 1-3: new high speed USB device using ehci_hcd and address 3
scsi1 : SCSI emulation for USB Mass Storage devices
usb-storage: device found at 3
usb-storage: waiting for device to settle before scanning
scsi0 (0:0): rejecting I/O to dead device
EXT3-fs error (device sda): ext3_find_entry: reading directory #196225 offset 0
scsi0 (0:0): rejecting I/O to dead device
EXT3-fs error (device sda): ext3_find_entry: reading 

Re: Weird USB errors on HD

2005-07-19 Thread Karim Yaghmour

Greg KH wrote:
> Ugh, you have a bad device or power supply, or aren't giving it enough
> power to drive the thing.  Nothing we can do in Linux for that, sorry.
> Buy a wall-powered usb hub, that usually helps.

I have one. I naively thought I could just plug the drive directly to the
laptop without using the wall-powered hub. I'll try that instead. Thanks.

That being said, shouldn't there be a way for the kernel to refuse to
use this hd if it's not getting enough power. I don't know enough about
USB to say, but isn't there something more elegant that could be done in
software?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-22 Thread Karim Yaghmour

Tom Zanussi wrote:
> - removed the deliver() callback
> - removed the relay_commit() function

This breaks LTT. Any reason why this needed to be removed? In the end,
the code will just end up being duplicated in ltt and all other users.
IOW, this is not some potential future use, but something that's
currently being used.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Re: relayfs documentation sucks?

2005-07-25 Thread Karim Yaghmour

Christoph Hellwig wrote:
> That beein said I wish LTT folks would make a little more progress so
> we could actually include it.

We're working on it. On the topic of revamping LTT, 3 different people
came up with 3 different implementations.

Following your feedback on the patch I sent a few weeks back, I headed
out asking myself "what is the bare-minimum tracing functionality that
will actually fly while still being flexible enough to add to it?" I
spent some time at the OLS comparing notes with others interested in this
area, and I think we've got something that should fit the bill. We should
be able to post something sooner rather than later.

Now if only I could remember what I talked about after I left the Black
Thorn at 2h45am and the guy in the elevator at Les Suites pressed on a
button and said "'M' for more beer" ...

Thanks,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Weird USB errors on HD

2005-07-25 Thread Karim Yaghmour

Alistair John Strachan wrote:
> You can get special USB cables that link two USB ports' 5Vs together in 
> parallel, which seems to help supply the necessary current; after the HD has 
> spun up you can remove the second "dummy" USB connector (my laptop only has 
> two USB ports and I require the second port).

Yeah, there was one of these in the box with the drive, but the first time
I saw it I remember thinking: what the hell is this thing? Then when I
figured it out, I found myself wondering whether the USB interface was
ever planed for such a such and whether it wouldn't have been better to
just ship a real adapter with the thing ...

Anyhow, I will not be using the drive anymore without a powered hub.

Thanks for all those that helped,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Merging relayfs?

2005-07-25 Thread Karim Yaghmour

Tom Zanussi wrote:
> In userspace, the sub-buffer reading loop looks at the commit value in
> the sub-buffer, and if it matches (sub-buffer size - padding), the
> buffer has been completely written and can be saved, otherwise it's
> not yet complete and is checked again the next time around.  This way,
> there's no need for a deliver() callback, the relay_commit() is
> replaced with the increment of the reserved commit value, the arrays
> aren't needed and you get the same result in the end in a much simpler
> way, IMHO.

Actually this has a much greater potential of loosing buffers because
we have to poll the buffer for completion. Seen another way, the kernel-
side has got to wait until the user-side has "figured out" that it needs
to commit content to disk. As it was originally, it was relatively
straightforward to dertermine why data was lost: ok, we've signaled it
from kernel space, but the daemon never flushed it out. Without commit/
deliver, things are much less clear, and I still miss what gain we
are making by removing them.

I would very much like to see the commit/deliver functionality back.
Such mechanisms are required for any sane producer-consumer model.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH/RFC] Significantly reworked LTT core

2005-07-08 Thread Karim Yaghmour

Christoph Hellwig wrote:
> We're not gonna add hooks to the kernel so you can copile the same
> horrible code you had before against it out of tree.  Do a sane demux
> and submit it.

If I just wanted hooks, I would have submitted a patch that did just
that, without any logging function. The code for the mux that goes
on top of that code is actually on its way to be completely rewritten.
I can see that you may have read my posting as indicating that we were
recompiling the same previous code out of tree, but that is certainly
not the intent.

FWIW, we'll look submitting a minimal mux with the patch.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-08 Thread Karim Yaghmour

Missing attachment herein included.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
 L M B E N C H  2 . 0   S U M M A R Y
 

Processor, Processes - times in microseconds - smaller is better

 null null   open
signal   signalforkexecve  /bin/sh
kernel   call  I/O statfstatclose   
install   handle  process  process  process
-  ---  ---  ---  ---  ---  
---  ---  ---  ---  ---
HIGHMEM-RT-V0.7.50-35 0.18   0.2947 3.02 0.42 3.62 
0.59 1.98  156  448 1481
NOHIGHMEM-RT-V0.7.50-35   0.18  0.28635 2.91 0.42 3.70 
0.58 2.02  111  383 1372
HIGHMEM-RT-V0.7.51-02 0.18  0.27045 2.47 0.39 3.02 
0.56 1.75  103  372 1352
NOHIGHMEM-RT-V0.7.51-02   0.18   0.2673 2.36 0.39 2.77 
0.56 1.72   90  351 1328

File select - times in microseconds - smaller is better
---
select   select   select   select   select   
select   select   select
kernel   10 fd   100 fd   250 fd   500 fd   10 tcp  100 
tcp  250 tcp  500 tcp
-  ---  ---  ---  ---  ---  
---  ---  ---
HIGHMEM-RT-V0.7.50-35 1.29 5.7013.2125.76 1.49   
7.8809  18.6905   na
NOHIGHMEM-RT-V0.7.50-35   1.26 5.6913.2525.84 1.47  
 na   na   na
HIGHMEM-RT-V0.7.51-02 1.01 3.88 8.8217.08 1.24  
 na  14.1979  27.8158
NOHIGHMEM-RT-V0.7.51-02   1.02 3.90 8.8417.12 1.30   
6.0573   na   na

Context switching with 0K - times in microseconds - smaller is better
-
2proc/0k   4proc/0k   8proc/0k  16proc/0k  
32proc/0k  64proc/0k  96proc/0k
kernel ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx 
swtch  ctx swtch  ctx swtch
-  -  -  -  -  
-  -  -
HIGHMEM-RT-V0.7.50-35   4.87   5.55   5.01   4.47   
4.00   4.45   5.13
NOHIGHMEM-RT-V0.7.50-35 3.25   3.92   3.53   3.10   
2.96   3.46   4.09
HIGHMEM-RT-V0.7.51-02   2.70   3.48   3.51   3.50   
3.36   3.93   4.82
NOHIGHMEM-RT-V0.7.51-02 1.86   2.23   2.41   2.41   
2.41   3.02   3.92

Context switching with 4K - times in microseconds - smaller is better
-
2proc/4k   4proc/4k   8proc/4k  16proc/4k  
32proc/4k  64proc/4k  96proc/4k
kernel ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx 
swtch  ctx swtch  ctx swtch
-  -  -  -  -  
-  -  -
HIGHMEM-RT-V0.7.50-35   5.48   4.75   4.47   4.76   
4.68   5.90   7.24
NOHIGHMEM-RT-V0.7.50-35 3.88   4.54   4.02   3.91   
4.04   4.93   5.85
HIGHMEM-RT-V0.7.51-02   3.25   3.59   3.85   3.89   
4.18   5.41   6.75
NOHIGHMEM-RT-V0.7.51-02 2.70   3.01   2.99   3.04   
3.31   4.56   6.16

Context switching with 8K - times in microseconds - smaller is better
-
2proc/8k   4proc/8k   8proc/8k  16proc/8k  
32proc/8k  64proc/8k  96proc/8k
kernel ctx swtch  ctx swtch  ctx swtch  ctx swtch  ctx 
swtch  ctx swtch  ctx swtch
-  -  -  -  -  
-  -  -
HIGHMEM-RT-V0.7.50-35   6.09   5.31   5.22   5.09   
5.68   7.82   8.87
NOHIGHMEM-RT-V0.7.50-35 4.51   5.08   4.54   4.36   
4.44   6.49   7.75
HIGHMEM-RT-V0.7.51-02   3.85   4.01   4.20   4.31   
5.27   7.38   8.51
NOHIGHMEM-RT-V0.7.51-02 3.05   3.49   3.53   3.60   
3.99   6.37   7.56

Context switching with 16K - times in microseconds - smaller is better
--
   2p

Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-09 Thread Karim Yaghmour

Paul Rolland wrote:
>>mmap   | 794us   |  654us (+18%)  |  822us (+4%)
>   
> You mean -18%, not +18% I think.

Doh ... too many numbers flying around ... yes, -18% :)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-09 Thread Karim Yaghmour

Ingo Molnar wrote:
> yeah, they definitely have helped, and thanks for this round of testing 
> too! I'll explain the recent changes to PREEMPT_RT that resulted in 
> these speedups in another mail.

Great, I'm very much looking forward to it.

> Looking at your numbers i realized that the area where PREEMPT_RT is 
> still somewhat behind (the flood ping +~10% overhead), you might be 
> using an invalid test methodology:

I've got to smile reading this :) If one thing became clear out of
these threads is that no matter how careful we are with our testing,
there is always something that can be criticized about them.

Take the highmem thing, for example, I never really bought the
argument that highmem was the root of all evil ;) , and the last
comparison we did between 50-35 and 51-02 with and without highmem
clearly showed that indeed while highmem is a factor, there are
inherent problems elsewhere than the disabling of highmem doesn't
erase. Also, both vanilla and I-pipe were run with highmem, and if
they don't suffer from it, then the problem is/was with PREEMPT_RT.

With ping floods, as with other things, there is room for
improvement, but keep in mind that these are standard tests used
as-is by others to make measurements, that each run is made 5
times, and that the values in those tables represent the average
of 5 runs. So while they may not be as exact as could be, I don't
see why they couldn't be interpreted as giving us a "good idea" of
what's happening.

For one thing, the heavy fluctuation in ping packets may actually
induce a state in the monitored kernel which is more akin to the
one we want to measure than if we had a steady flow of packets.

I would usually like very much to entertain this further, but we've
really busted all the time slots I had allocated to this work. So at
this time, we really think others should start publishing results.
After all, our results are no more authoritative than those
published by others.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-09 Thread Karim Yaghmour

Karim Yaghmour wrote:
> I would usually like very much to entertain this further, but we've
> really busted all the time slots I had allocated to this work. So at
> this time, we really think others should start publishing results.
> After all, our results are no more authoritative than those
> published by others.

BTW, we've also released the latest very of the LRTBF we used to
publish these latest results, so others can it a try too :)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PREEMPT_RT and I-PIPE: the numbers, part 4

2005-07-09 Thread Karim Yaghmour

Can't type right anymore ...

Karim Yaghmour wrote:
> BTW, we've also released the latest very of the LRTBF we used to
   version

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: list patches in kernel

2005-07-26 Thread Karim Yaghmour

Brad Tilley wrote:
> Is there an easy way to make a running kernel display how it has been
> patched from vanilla? Probably not, but I thought I'd ask.

This issue does come up every so often. If you look in the archives you
should find some info about this, including a patch if my memory is
correct.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Average instruction length in x86-built kernel?

2005-07-29 Thread Karim Yaghmour

I'm wondering if anyone's ever done an analysis on the average length
of instructions in an x86-built kernel.

Googling around, I can find references claiming that the average
instruction length on x86 is anywhere from 2.7 to 3.5 bytes, but I
can't find anything studying Linux specifically.

Just curious,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Average instruction length in x86-built kernel?

2005-07-30 Thread Karim Yaghmour

Hello Ingo,

Ingo Oeser wrote:
> Just study the output od objdump -d and average the differences
> of the first hex number in a line printed, which are followed by a ":"

Here's a script that does what I was looking for:
#!/bin/bash

# Dissassemble
objdump -d $1 -j .text > $2-dissassembled-kernel

# Remove non-instruction lines:
sed /^[^c].*/d $2-dissassembled-kernel > $2-stage-1

# Remove empty lines:
sed /^'\t'*$/d $2-stage-1 > $2-stage-2

# Remove function names:
sed /^c[0-9,a-f]*' '\<.*\>:$/d $2-stage-2 > $2-stage-3

# Remove addresses:
sed s/^c[0-9,a-f]*:'\t'// $2-stage-3 > $2-stage-4

# Remove instruction text:
sed s/'\t'.*// $2-stage-4 > $2-stage-5

# Remove trailing whitespace:
sed s/'\s'*$// $2-stage-5 > $2-stage-6

# Separate instructions depending on size:
egrep "([0-9a-f]{2}[' ']*){5}" $2-stage-6 > $2-more-or-eq-5
egrep "^([0-9a-f]{2}[' ']*){0,4}$" $2-stage-6 > $2-less-or-eq-4

# Find out how much of each we've got:
wc -l $2-stage-6
wc -l $2-more-or-eq-5
wc -l $2-less-or-eq-4

The last part can easily be changed to iterate through and separate
those that are 1 byte, 2 bytes, etc. and automatically come up with
stats, but this was fine for what I was looking for.

I ran it on a 2.4.x and a 2.6.x kernel and about 3/4 of instructions
are 4 bytes or less.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relayfs question

2005-03-19 Thread Karim Yaghmour

Jan Engelhardt wrote:
> Ok, urandom was a bad example. I have my tty logger (ttyrpld.sf.net) which 
> moves a lot of data (depends) to userspace. It uses a ring buffer of "fixed" 
> size (set at module load time). Apart from that relayfs could use a dynamic 
> sized ring buffer, I would not see any need to move it to relayfs, would you?

First, please note that the info on Opersys' site is out-of-date. While
it was relevant while we were still maintaining relayfs separately, it
has somewhat lost its relevance since we started posting the most up-to-
date code directly to LKML. For one thing, the dynamic resizing was
dropped very early in relayfs' inclusion review.

What relayfs does, and does very well, is move very large amounts of
data out of the kernel and make them available to user-space with very
little overhead. In the actual case of your tty logger, I've browsed
through the code briefly, and I think that with relayfs you should be
able to:
- Get rid of half the code:
  - No need to manage your own user/kernel-buffer boundary (Most of the
code in uio_*()).
  - No need to do any buffer management at all.
- Get better performance out of your logging functions.
- Get per-cpu buffers for free.

Basically, all the transport code you are doing in the kernel side of
your logger would be taken care of by relayfs. And given that there are
a lot of people doing similar ad-hoc buffering code, it just makes
sense to have one well-tested yet generic mechanism. Have a look at
Documentation/filesystems/relayfs.txt for the API details.

On a separate yet related topic:
Looking closer at rpldev.c, I believe that you'll be able to get rid of
it entirely (or very close to) once I actually get the time to refactor
the tracing code in LTT to make it generic. What I intend to do is to
obsolete the need for functions like your kio_*, and make it all
automatically generated at build time (you'll still to add the
instrumentation, but won't need to hand-code the callbacks). This is
still on the top of my to-do list and I should be able to get to this
shortly.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relayfs question

2005-03-19 Thread Karim Yaghmour

Karim Yaghmour wrote:
> What relayfs does, and does very well, is move very large amounts of
> data out of the kernel and make them available to user-space with very
> little overhead. In the actual case of your tty logger, I've browsed
> through the code briefly, and I think that with relayfs you should be
> able to:

Just to avoid any confusion, note that I'm referring mainly to rpldev.c,
which is the kernel-side driver for the logger, I haven't looked at any
of the user tools.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Relayfs question

2005-03-19 Thread Karim Yaghmour

Jan Engelhardt wrote:
> Well, what about things like urandom? It also moves "a lot" of data and does
> nothing else.

Forgive my slowness today, but I don't get the angle here:
- Relayfs is not a replacement for char devices, we've never claimed it
  to be.
- Urandom generates a lot of data, and uses copy_to_user() to get it to
  user-space, but it isn't a generalized buffering mechanism for
  transfering large amounts of data to user-space.

If what you're inquiring about is a comparison between relayfs'
mechanisms and the underlying mechanisms that urandom is using, then
I don't think there can be a comparison: the goals are different.

For example, urandom relies on a global spin lock and uses copy_to_user()
for its transfers. This is just fine for this type of application. If
you wanted to transfer a huge amount of data from the kernel to user-
space (the kind of data generated by tracing facilities, for example),
however, these mechanisms would be simply inadequate. If we're generating
the amount of data LTT can gather, for example, (say 2MB/s as was
described in the earlier thread regarding relayfs), then you need per-cpu
buffering and you need to not write anything back to user-space, but
dump it to disk ASAP, etc. This is where relayfs comes in handy.

On the other hand, using relayfs to replace what urandom currently uses
is just the wrong thing to do. If nothing else, /dev/urandom would
behave entirely differently (API, dynamics, etc.). There would also be
no clear added benefit for using relayfs.

What character drivers do (mainly copy_to_user()) and what relayfs is
used for are entirely different. To use a slightly exagerated example
to illustrate the difference: replacing the standard mechanisms drivers
use to transfer data to user-space with relayfs would be like renting
a supersonic jet to get your package to a foreign country instead of
just using Fedex. It works ... but it's clearly the wrong approach.

Please read relayfs.txt.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux, part 2

2005-01-31 Thread Karim Yaghmour

Andi Kleen wrote:
> It's doing a complicated function call which does who knows what in
> the logging fast path (I stopped reading after some point)  
> It definitely is not putc !

I was anticipating some people would have this requirement, and this
is why I introduced the ad-hoc mode. Roman asked me to get rid of it
because nobody had yet asked for it, so this is why it was dropped.
As it is, the implementation you are suggesting is insufficient for
LTT, which is relayfs' first formal client. I think that it would be
better to provide two underlying mechanisms for relayfs at this point,
we had already stripped it as thin as it would go for things like LTT.

>>separate grabbing a slot in the buffer from the memcpy because some
>>applications such as ltt want to be able to directly write into the
>>slot without having to copy it into another buffer first.  How about
> 
> 
> If the inline function to log was fast enough it wouldn't need 
> any such hacks.

Actually that's not true. There are two problems with this statement:
a- It requires prepackaged data units.
b- It's only useful for fixed-size data units.

Any efficient client that has complex data units will want to write
directly into the buffer instead of creating an intermediate package
which is then memcopied. With the modified code, we are now forced
to create an intermediate package, which is wrong. Also, if the client
wants to log variable-size events, he would have to re-implement lots
of the writing code.

Note that I really think relay_write() should be dropped altogether.
Clients should call on relay_reserve() and do whatever is necessary
after that.

> Note that gcc is quite good at optimizing memcpy, so essentially
> when you e.g. do log(singleint) it should be roughly equivalent
> to a int store into the buffer + the check if there is enough
> buffer space.

I understand the point you are trying to make, but I really think that
this is best implemented as two separate buffering schemes instead of
breaking the existing one (which had already been trimmed down quite
thin following Roman's input.)

> You could avoid the local_irq_save() if you use separate interrupt
> buffers that are only accessed in non nesting interrupt context 
> (like softirqs) That would require a sorting step at output though. Not
> sure if it's worth it. The problem is that hardirqs can nest anyways,
> so it wouldn't work for them. However a lot of important code runs
> in softirq (like the network stack) where this is true.

For the kind of data sizes we are looking at for LTT (100GBs) splitting
buffers is not viable anyway.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux, part 2

2005-01-31 Thread Karim Yaghmour

Tom Zanussi wrote:
> OK, makes sense to me - I'll get rid of relay_reserve and replace it
> with the simple putc write and variant.

Please don't do that. Instead, bring back the ad-hoc mode code, that's
what is was for anyway.

> You could just create and log into a separate relayfs channel, if you
> wanted to.  Not sure we need to add anything special to support that.

Postprocessing doesn't solve world famine ;) As far as LTT goes,
splitting events like this makes it impossible to read large traces.
Other clients are free to do as they wish.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux, part 2

2005-01-31 Thread Karim Yaghmour

Tom Zanussi wrote:
> I don't think they need to be mutually exclusive - we could keep
> relay_reserve(), but the relay_write() that's currently built on top
> of relay_reserve() would use the putc code instead.  It's complicating
> the API a bit, but if it makes everyone happy...

Actually I think that this would be a much better use of relay_write(),
which is unlikely to be used by any client that requires relay_reserve()
to start with. Also, I don't think it complicates the API at all.
Compared to the original API, what we've got now is very simple. So
it basically boils down to:
- use relay_write() if you want putc-like functionality.
- use relay_reserve() if you want to reserve space and write separately.

This is even better than having a separate ad-hoc mode.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux, part 2

2005-01-31 Thread Karim Yaghmour

Greg KH wrote:
> On Fri, Jan 28, 2005 at 01:38:22PM -0600, Tom Zanussi wrote:
> 
>>+extern void * alloc_rchan_buf(unsigned long size,
>>+   struct page ***page_array,
>>+   int *page_count);
>>+extern void free_rchan_buf(void *buf,
>>+struct page **page_array,
>>+int page_count);
> 
> 
> As these will be "polluting" the global namespace of the kernel, could
> you add "relayfs_" to the front of them?

BTW, these functions are in buffers.h which is an internal header to
fs/relayfs/*.c files. buffers.h is not included in anything outside.
Correct me if I'm wrong, but there is no namespace pollution in that
case, right? All that does contribute to namespace pollution is in
include/linux/relayfs_fs.h.

Thanks,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux, part 2

2005-01-31 Thread Karim Yaghmour

Greg KH wrote:
> When relayfs is built into the kernel, those symbols are then global to
> the whole static kernel.
> 
> Please be nice and rename them.

My pleasure :)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs crash

2005-02-06 Thread Karim Yaghmour

Kingsley Cheung wrote:
> To solve the problem I applied a patch similar to the one you posted
> back in July and it fixed the problem.  Could we consider putting this
> patch into relayfs? Its similar to the one posted in July 2004, except
> it also moves clear_readers() before INIT_WORK in relay_release (is
> that acceptable?).

Tom, correct me if I'm wrong but these fixes were integrated in the
first relayfs redux I sent to LKML a few weeks back, right?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-15 Thread Karim Yaghmour

Hello Thomas,

I don't mind having a general discussion about instrumentation, but
it has to be understood that the topic is so general and means so
many different things to different people that we are unlikely to
reach any useful consensus. Believe me, it's not for the lack of
trying. More below.

Thomas Gleixner wrote:
> 

:D

> One of those backends is LTT+relayfs. 
> I really respect the work you have done there, but please accept that I
> just see the limitations and try to figure out a way to make it more
> generic and flexible before it is cemented into the kernel and makes it
> hard to use for other interesting instrumentation aspects and maybe
> enforces redundant implementation of infrastructure related
> functionality.
> 
> E.g. tracking down timing related issues can make use from such
> functionality if the infrastructure is provided seperately.
> I guess a lot of developers would be happy to use it when it is already
> around in the kernel and it can help testers for giving better
> information to developers.

I would invite you to review the history behind LTT and the history
behind the efforts to get LTT integrated in the kernel (which are
two separate topics.) If you look back, you will see that I worked
very hard trying to get people to think about a common framework
and that I and others made numerous suggestions in this regard. Here
are a few examples:

- DProbes (kprobes ancestor):
Shortly after dprobes came out in 2000, I was one of the first to
suggest that there could be interfacing between both to allow
dynamically added trace points. We worked with, and eventually
joined forces with, the IBM team working on this and very early
on, LTT and DProbes were interfacing:
http://marc.theaimsgroup.com/?l=linux-kernel&m=97079714009328&w=2
- OProfile:
When time came to integrate oprofile in the kernel, I tried to push
for oprofile to use ltt as it's logging engine (to John's utter
horror.) relayfs didn't exist at the time, and obviously oprofile
made it in without relying on ltt.
Here's a posting from July 2002 where I suggested oprofile rely on
ltt. In that same posting I listed a number of drivers/subsystems
that already contained tracing statements. Obviously I was pointing
out that there was an opportunity to create a common, uniform
infrastructure based on ltt:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102624656615567&w=2
- Syscalltrack:
In replying to a posting of someone looking for tracing info, there
was a brief discussion as to how syscalltrack could use ltt instead
of: a) redirecting the syscall table, b) have its own buffering
mechanism. Again, relayfs didn't exist at the time:
http://marc.theaimsgroup.com/?l=linux-kernel&m=102822343523369&w=2
- Event logging:
When there was discussion about event logging, there was suggestion
to use ltt's engine. Again, relayfs wasn't there:
http://marc.theaimsgroup.com/?l=linux-kernel&m=101836133400796&w=2

And there are many other cases. As you can see, it's not as if
I didn't try to have this discussion before. Unfortunately, interest
in this was rather limited.

In addition, and this is a very important issue, quite a few
kernel developers mistook LTT for a kernel debugging tool, which
it was never meant to be. When, in fact, if you ask those who have
looked at using it for that purpose (try Marcelo or Andrea) you will
see that they didn't find it to be appropriate for them. And
rightly so, it was never meant for that purpose. Even lately, when
I suggested Ingo try using relayfs instead of his custom tracing
code for his preemption work, he looked at it and said that it
wasn't suited, but would consider reusing parts of it if it were
in the kernel.

So, in general, one thing I learned over the years is to not touch
the topic of kernel debugging even with a 10 foot poll when
discussing LTT.

What you are hinting at here (mention of developers vs. testers,
for example), and your stated preference for the type of ring-buffer
you described earlier clearly goes in the direction I've learned to
avoid: buffering support for the general purpose of kernel debugging.

Let me say outright that I see the relevance of what you are looking
for, but let me also say that what we tried to achieve with relayfs
is to provide a general mechanism for kernel subsystems that need to
convey large amounts of data to user-space. We did not attempt to
solve the problem of providing a buffering framework for core kernel
debugging. As I mentioned to Ingo in the mail I referred to earlier
regarding the type of buffering you are looking for:
> The above tracer may indeed be very appropriate for kernel development,
> but it doesn't provide enough functionality for the requirements of
> mainstream users.

If there is interest for using either relayfs and/or ltt for that
purpose, then this is an entirely different mandate and a few things
would need to be added for that to happen. For starters, we could
add another mode to relayfs. Currently, it supports a locking and

Re: 2.6.11-rc1-mm1

2005-01-15 Thread Karim Yaghmour

Hello Thomas,

In the interest of avoiding expanding the thread too thin, I'm replying to
both emails in the same time.

Thomas Gleixner wrote:
>>relayfs is a generalized buffering mechanism. Tracing is one application
>>it serves. Check out the web site: "high-speed data-relay filesystem."
>>Fancy name huh ...
>
>
> I do not doubt that.
>
> But hardwiring an instrumentation framework on it is also hardwiring
> implicit restrictions on the usability of the instrumentation for
> certain purposes.

To a certain extent this is true. Please refer to my reply to your RFC
for a discussion of this.

>>Well for one thing, a portion of code running in user-context won't
>>disable interrupts while it's attempting to get buffer space, and
>>therefore won't impact on interrupt delivery.
>
>
> The do {} while loops are in the fast ltt_log_event path

You mean that it would impact on interrupt deliver? This code's behavior
has actually been carefully studied, and what has been seen is that
there code almost never loops, and when it does, it very rarely does
it more than twice. In the case of an interrupt, you'd have to receive
an interrupt while reserving space for logging a current's interrupt
occurrence for the loop to be done twice. I've CC'ed Bob Wisniewski
on this as he's the one that implemented this code and studied its
behavior in depth.

> Yeah, did you answer one of my arguments except claiming that I'm to
> stupid to understand how it works ? 

If I miss-spoke, then I appologize. For one thing, I've never thought
of you as stupid. I'm just trying to get specifics here.

> I just dont like the idea, that instrumentation is bound on relayfs and
> adds a feature to the kernel which fits for a restricted set of problems
> rather than providing a generic optimized instrumentation framework,
> where one can use relayfs as a backend, if it fits his needs. Making
> this less glued together leaves the possibility to use other backends. 

Yes, I understand and I hope my other mail properly addresses this issue.

> There is a loop in ltt_log_event, which enforces the processing of each
> event twice. Spliting traces is postprocessing and can be done
> elsewhere.

Sorry, this is not postprocessing. Let me explain:

Basically, the ltt framework allows only one tracing session to be active
at all times. IOW, if you were planning on starting a 2 week trace and
after doing so wanted to trace a short 10s on an application then you are
screwed, LTT won't allow you to do that. Currently this is a limitation
which we haven't heard any complaints about, so we're not going to
generalize it until there is proof that people really need this.

However, there are cases where you want to have tracing running at _all_
times in what is refered to as flight-recorder mode and only dump the
content of the buffers when something special happens. Yet, those who
are interested in having this 24x7 mode also know enough about tracing
that they do need to actually trace other things for short periods
without disrupting their flight-recording. That's why there's a loop.
An event will be processed twice only if you're tracing AND flight-
recording in the same time.

There is no way to do an equivalent of what I just described with any
form of postprocessing.

Here's the proper snippet from include/linux/ltt-events.h:
/* We currently support 2 traces, normal trace and flight recorder */
#define NR_TRACES   2
#define TRACE_HANDLE0
#define FLIGHT_HANDLE   1

> In _ltt_log_event lives quite a bunch of if(...) processing decisions
> which have to be evaluated for _each_ event.

Correct, and I'm honest enough with myself to admit that this is the bit
of code that I think needs the most reviewing. So, in order to help
you help me, here's the various code snippets and things I can think
of which would help make the code faster/simpler:

Here's the preamble where we check some make some basic sanity checks:

if (!trace)
return -ENOMEDIUM;

if (trace->paused)
return -EBUSY;

tracer_handle = trace->trace_handle;

if (!trace->flight_recorder && (trace->daemon_task_struct == NULL))
return -ENODEV;

channel_handle = trace_channel_handle(tracer_handle, cpu_id);

if ((trace->tracer_started == 1) || (event_id == LTT_EV_START) || 
(event_id == LTT_EV_BUFFER_START))
goto trace_event;

return -EBUSY;

trace_event:
if (!ltt_test_bit(event_id, &trace->traced_events))
return 0;

Basically, unless we've succeeded in all those if's, we're not going to
write anything. I think we could get rid of the first 4 ones by simply
maintaining a state-machine for the tracer. Then we could either have
a single if or even use function pointers (though I think this costs
more) to call or not call _ltt_log_event. As for checking whether the
event has a certain ID (EV_START or EV_BUFFER_STAR

Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-15 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> On Sat, 15 Jan 2005, Karim Yaghmour wrote:
>>In addition, and this is a very important issue, quite a few
>>kernel developers mistook LTT for a kernel debugging tool, which
>>it was never meant to be. When, in fact, if you ask those who have
>>looked at using it for that purpose (try Marcelo or Andrea) you will
>>see that they didn't find it to be appropriate for them. And
>>rightly so, it was never meant for that purpose. Even lately, when
>>I suggested Ingo try using relayfs instead of his custom tracing
>>code for his preemption work, he looked at it and said that it
>>wasn't suited, but would consider reusing parts of it if it were
>>in the kernel.
> 
> Well, that's really a core problem. We don't want to duplicate 
> infrastructure, which practically does the same. So if relayfs isn't 
> usable in this kind of situation, it really raises the question whether 
> relayfs is usable at all. We need to make relayfs generally usable, 
> otherwise it will join the fate of devfs.

Hmm, coming from you I will take this is a pretty strong endorsement
for what I was suggesting earlier: provide a basic buffering mode
in relayfs to be used in kernel debugging. However, it must be
understood that this is separate from the existing modes and ltt,
for example, could not use such a basic infrastructure. If this is
ok with you, and no one wants to complain too loudly about this, I
will go ahead and add this to our to-do list for relayfs.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-15 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> It's interesting to read more about ltt's requirements, but I still think 
> it's possible to leave this work to the relayfs layer.

Ok, I'm willing to play ball, but can you be a little bit more specific.

> Why not just move the ltt buffer management into relayfs and provide a 
> small library, which extracts the event stream again? Otherwise you have 
> to duplicate this work for every serious relayfs user anyway.

Ok, I've been meditating over what you say above for some time in order
to understand how best to follow what you are suggesting. So here's
what I've been able to come up with. Let me know if you have other
suggestions:

Drop the buffer-start/end callbacks altogether. Instead, allow user
to specify in the channel properties whether they want to have
sub-buffer delimiters. If so, relayfs would automatically prepend
and append the structures currently written by ltt:
/* Start of trace buffer information */
typedef struct _ltt_buffer_start {
struct timeval time;/* Time stamp of this buffer */
u32 tsc;/* TSC of this buffer, if applicable */
u32 id; /* Unique buffer ID */
} LTT_PACKED_STRUCT ltt_buffer_start;

/* End of trace buffer information */
typedef struct _ltt_buffer_end {
struct timeval time;/* Time stamp of this buffer */
u32 tsc;/* TSC of this buffer, if applicable */
} LTT_PACKED_STRUCT ltt_buffer_end;

This would also allow dropping the start_reserve, end_reserve, and
channel_start_reserve. The latter can be added by ltt as its first
event.

Is this what you are looking for and is there something else we should
be doing.

> Completely abstracting the buffer management would the make whole 
> interface simpler and it would be a lot easier to change without breaking 
> everything. E.g. it would be possible to use per cpu buffers and remove 
> the need for different locking mechanisms, for a good tracing mechanism 
> it's not just important that it's lockless, but also that the cpus don't 
> share cache lines in the fast path. In this regard relayfs/ltt has really 
> still too much overhead and the complex relayfs API isn't really making it 
> easy to fix this.

The per-cpu buffering issue is really specific to the client. It just
so happens that LTT creates one channel for each CPU. Not everyone
who needs to ship lots of data to user-space needs/wants one channel
per cpu. You could, for example, use a relayfs channel as a big
chunk of memory visible to both a user-space app and its kernel buddy
in order to exchange data without ever using either needing more
than one such channel for your entire subsystem.

As for lockless vs. locking there is a need for both. Not having
to get locks has obvious advantages, but if you require strict
timing you will want to use the locking scheme because its logging
time is linear (see Thomas' complaints about lockless elsewhere
in this thread, and Ingo's complaints about relayfs somewhere back
in October.)

But in trying to make things simpler, here's a reworked API:

rchan* relay_open(channel_path, mode, bufsize, nbufs);
intrelay_close(*rchan);
intrelay_reset(*rchan)
intrelay_write(*rchan, *data_ptr, count, **wrote-pos);

intrelay_info(*rchan, *channel_info)
void   relay_set_property(*rchan, property, value);
void   relay_get_property(*rchan, property, *value);

For direct writing (currently already used by ltt, for example):

char*  relay_reserve(*rchan, len, *ts, *td, *err, *interrupting)
void   relay_commit(*rchan, *from, len, reserve_code, interrupting);

These are the related macros:

#define relay_write_direct(DEST, SRC, SIZE) \
#define relay_lock_channel(RCHAN, FLAGS) \
#define relay_unlock_channel(RCHAN, FLAGS) \

As I hinted elsewhere, we would now have three modes for relayfs
channels:
- locking => relies on local_irq_save.
- lockless => relies on try_reserve/fail->retry (based on cmpxchg).
- kdebug => this is for kernel debugging.

The last one could be based on Ingo's tracing code, or any
implementation suggestions by Thomas. It wouldn't do all
the checks and provide all the capabilities of the other two
mechanisms, but would really be a hot-path logger with only
minimalistic provisions for content loss and other such things.

(note to Tom: time_delta_offset that used to be in relay_write
should be a property set using relay_set_property).

What I'm dropping for now is all the functions that allow a
subsystem to read from a channel from within the kernel. So,
for example, if you want to obtain large amounts of data from
user-space via a relayfs channel you won't be able to. Here
are the functions that would go:

rchan_reader *add_rchan_reader(channel_id, auto_consume)
intremove_rchan_reader(rchan_reader *reader)
rchan_reader *add_map_reader(channel_id)
intremove_map_reader(rchan_reader *reader)
intrelay_read(reader, buf, count, wait, *actual_read_offset)
void   relay_buffers_consumed

Re: Event tools, do they exist

2001-04-26 Thread Karim Yaghmour


Hellor George,

As others have suggested, you can do what you are asking for using LTT
(http://www.opersys.com/LTT).

Specifically, you may want to use the event allocation capabilities.
This will enable you to add your own events and view these as part
of the trace.

By the way, there are mailing lists for LTT if you're interested to
make a contribution.

Cheers,

Karim

george anzinger wrote:
> 
> This is an attempt to look in the wheel locker.
> 
> I need a simple event sub system for use in the kernel.  I envision at
> least two types of events: the history event and the timing event.
> 
> The timing event would keep track of start/stop times by class.  If, for
> example, I wanted to know how much time the kernel spends doing the
> recalc in schedule() I would put and event start in front of it and an
> end at the other end.  The sub system would note the first event time
> and the cumulative time between all starts and stops on the same event.
> When reported by /proc/ it would give the total event time, the elapsed
> time and the % of processor time for each of the possibly several
> classes.
> 
> The history event would record each events time, location, data1,
> data2.  It would keep N of these (the last N) and report M (M= /proc/.  This list should also be kept in a format that a simple
> debugger can easily examine.
> 
> Somebody must have written these routines and have them in their
> library.  Sure would help if I could have a peek.
> 
> George
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

-- 
===
 Karim Yaghmour
   [EMAIL PROTECTED]
  Embedded and Real-Time Linux Expert
===
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: read() on relayfs channel returns premature 0

2005-03-25 Thread Karim Yaghmour

Jan Engelhardt wrote:
> Hm? Relayfs does not support a `cat /dev/relay/AChannelName` anymore?

This was a requirement for it to be included.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-16 Thread Karim Yaghmour

Hello Christoph,

Christoph Hellwig wrote:
> Why would you want anything but read access?

Fine, we can put it read-only, we'll drop the "mode" field.

> I think random access is overkill.  Keeping the code simple is more
> important and user-space can post-process it.

it's overkill if you're thinking in terms of kbs or mbs of data.
it isn't if you're looking at gbs and 100gbs. please read my
other posting as to who is using this and how.

but regardless of access, you have to have some way of telling
relayfs of the size of the channel you want. bufsize, nbufs
just tell relayfs the size of the buffers you want and how many
buffers there are in the ring. both of which are really basic
to any sort of buffering scheme.

> Auto-resizing sounds like a really bad idea.

Ok, it will go.

> And why can't you do this from that code?  It just needs an initcall-like
> thing that runs after mounting of relayfs.

Ok, we'll leave it to the caller to do a relay_write() with his
init-bufs at startup.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-16 Thread Karim Yaghmour

Christoph Hellwig wrote:
> the lockless mode is really just loops around cmpxchg.  It's spinlocks
> reinvented poorly.

I beg to differ. You have to use different spinlocks depending on
where you are:
- serving user-space
- bh-derivatives
- irq

lockless is the same primitive regardless of your current state,
it's not the same as spinlocks.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-16 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> It seems we first need to specify, what relayfs actually is supposed to 
> be. Is it a relaying mechanism for large amount of data from kernel to 
> user space or is it a general communication channel between kernel and 
> user space? You have to choose one, if you mix contradicting requirements, 
> you'll never get a simple abstraction layer and relayfs will always be a 
> pain to work with.

I think we want to concentrate on the former, though I suspect the latter
will happen eventually. But let's keep our focus on providing a mechanism
for relaying large amounts of data from the kernel to user-space.

> You can make it even simpler by dropping this completely. Every buffer is 
> simply a list of events and you can let ltt write periodically a timer 
> event. In userspace you can randomly seek at buffer boundaries and search 
> for the timer events. It will require a bit more work for userspace, but 
> even large amount of tracing data stays managable.

We already do write a heartbeat event periodically to have readable
traces in the case where the lower 32 bits of the TSC wrap-around.

As I mentioned elsewhere, please don't think of this in terms of
kbs or mbs of data. What we're talking about here is gbs if not
100gbs of data. Having to start reading each sub-buffer until you
hit a heartbeat really is a killer for such large traces. If there
was a significant impact on relayfs for having this I would have
understood the argument, but relayfs needs to do buffer-management
anyway, so I don't see that much complexity being added by allowing
the channel user to ask relayfs for delimiters.

> Userspace can then easily restore the original order of events.

As above, restoring the original order of events is fine if you are
looking at mbs or kbs of data. It's just totally unrealistic for
the amounts of data we want to handle.

But like I said earlier, the added relayfs mode (kdebug) would allow
for exactly what you are suggesting:
event_id = atomic_inc_return(&event_cnt);

So here's the new API based on input from Christoph and Tom:

rchan* relay_open(channel_path, bufsize, nbufs);
intrelay_close(*rchan);
intrelay_reset(*rchan)
intrelay_write(*rchan, *data_ptr, count, **wrote-pos);

intrelay_info(*rchan, *channel_info)
void   relay_set_property(*rchan, property, value);
void   relay_get_property(*rchan, property, *value);

For direct writing (currently already used by ltt, for example):

char*  relay_reserve(*rchan, len, *ts, *td, *err, *interrupting)
void   relay_commit(*rchan, *from, len, reserve_code, interrupting);
void   relay_buffers_consumed(*rchan, u32)

These are the related macros:

#define relay_write_direct(DEST, SRC, SIZE) \
#define relay_lock_channel(RCHAN, FLAGS) \
#define relay_unlock_channel(RCHAN, FLAGS) \

What we are dropping for later review: read/write semantics from
user-space. It has to be understood that we believe that this is
a major drawback. For one thing, you won't be able to do something
like:
$ cat /relayfs/xchg/my-file > ~/test-data

Instead, you will have to write a custom app that does open(),
mmap(), write(). We could still provide a small app/library that
did this automagically, but you've got to admit that nothing
beats the real thing.

Also note that there are people who currently use this already,
so there will be some unhappy campers.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-16 Thread Karim Yaghmour

Thomas Gleixner wrote:
> This implies to seperate 
> 
> - infrastructure 
> - event registration
> - transport mechanism

Like I said in my first response: we can't be everything for everbody,
the requirements are just too broad. ISO tried it with OSI. Have a
look at net/* for the result.

Currently, LTT provides the first two in one piece, and relayfs
provides the third. Like I acknowledged earlier, there is room for
generalizing the transport mechanism, and I'm thinking of amending
the relayfs API proposal further and rename the modes to make them
more straight-forward:
- Managed (locking or lockless.)
- Ad-Hoc (which works like Ingo, yourself, and others have requested.)

If you really want to define layers, then there are actually four
layers:
1- hooking mechanism
2- event definition / registration
3- event management infrastructure
4- transport mechanism

LTT currently does 1, 2 & 3. Clearly, as in the mail I refered to
earlier, there is code in the kernel that already does 1, 2, 3,
and 4 in very hardwired/ad-hoc fashion and there isn't anyone asking
for them to remove it. We're offering 4 separately and are putting
LTT on top of it. If you want to get 1 & 2 separately, have a look
at kernel hooks and genevent:
http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/
http://www.listserv.shafik.org/pipermail/ltt-dev/2003-January/000408.html

We'd gladly take a serious look at using the former if it was
included, and there is work in progress being conducted on getting
the latter being the standard way for declaring LTT events instead
of using a static ltt-events.h.

Five years ago, there was a discussion about integrating GKHI into
the kernel (the kernel hooks ancestor). Have a look for yourself
as to the response to this suggestion (basically people weren't
ready to accept a generalized hooking mechanism without a defined
set of hooks, and then others didn't like the idea at all because
creating general hooks in the kernel which anybody can register
to creates legal and maintenance problems ... basically it's a
can of worms):
http://marc.theaimsgroup.com/?l=linux-kernel&m=97371908916365&w=2

There's only so much we can push into the kernel in the same time.
Not to mention that before you can be generic, you've got to have
some specific implementation to start working off on. I believe
that what we've ironed out through the discussion of the past
two days is a good basis.

There is some irony in all this. For years, we were told that we
couldn't make it into the kernel because we were perceived as
providing a kernel debugging tool, and now that we're starting
to get our things seriously reviewed we're being told that maybe
it ain't really that useful because those who want to do kernel
debugging can't use it as-is ... go figure.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-16 Thread Karim Yaghmour

Thomas Gleixner wrote:
> Which is every 1.42 seconds on a 3GHz machine. I guess we don't have
> GB's of data when the 1.42 seconds elapse without an event.

My argument was about being able to browse the amount of data I was
refering to. The hearbeat thing was an asside to Roman as to the
fact that we already do what he's suggesting.

> I still don't see the point. The implicit ability of LTT to allow
> tracing of up to 8192 bytes user data, strings and XML makes this
> neccecary. I do not see any neccecarity to integrate this special usage
> modes instead of an generic usable instrumentation implementation.

I've already clarified your mischaracterization of custom events,
you are being dissengenious here. If you want a generalized hooking
mechanism, feel free to ask Andrew to take kernel hooks:
http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/

> If relayfs is giving those users the ability to do so then they can do
> it, but I object the fact that LTT/relayfs is occupying the place of a
> more generic implementation in the way it is implemeted now.

Again, damned if we do, damned if don't. LTT isn't meant for kernel
debugging per se, though you can use it to that end to a certain extent.
However, if you are kernel debugging, you will find the ad-hoc mode I'm
talking about adding to relayfs quite useful.

> For normal event tracing you have about 32-64 byte of data per event. So
> disabling interrupts in order to copy this amount of imformation into a
> buffer is cheaper on most architectures than doing the whole magic in
> LTT and relayfs. This also keeps your buffers consistent and does not
> need any magic for postprocessing. 

Oh, now you want to lighten the weight on postprocessing? Common Thomas,
please stop wasting my time.

Note, however, that we are thinking of dropping the lockless scheme
for now. We will pick up this discussion separately further down the
road.

> Sorting out disabled events in the hot path and moving the if
> (pid/gid/grp) whatever stuff into userspace postprocessing is not an
> alien request.

It is. Have you even read what I suggested to change in my other mail:
if ((any_filtering) && !(ltt_filter(event_id, event_struct, data)))
return -EINVAL;

You're not honestly telling me that checking for any_filtering is
going to ruin your day.

> You are talking of Gigabytes of data. In what time ?
> 
> Let's do some math.
> 
> For simplicity all events use 64 Byte event space.
> 
> ~ 64kB/sec for 1000 events/s (event frequency   1kHz) ( 1 ms)
> 1024kB/sec for  16 events/ms (event frequency  16kHz) (62 us)
> 2048kB/sec for  32 events/ms (event frequency  32kHz) (31 us)
> 4096kB/sec for  64 events/ms (event frequency  64kHz) (15 us)
> 8192kB/sec for 128 events/ms (event frequency 128kHz) ( 8 us)
> 
> where a 100Mbit network can theoretically transport 10240kB/sec and
> practically does 4000-8000 kB/sec. 
> 
> An event frequency of 8us even on a 3 GHz machine is complete illusion,
> because we spend already a couple of usecs in servicing the legacy 8254
> timer.
> 
> So the realistic assumption on a 3Ghz machine is definitely below 64kHz,
> which means we have to handle max. 4Mb of data per second. 

Actually, on a PII-350MHz, I was already generating 0.5MB/s of data
just by running an X session. If we assume that a machine 10 times
faster generates 10 times as many events, we've already got 5MB/s,
and I'm sure that there are heavier cases than X.

Here's the paper if you want to read it:
http://www.opersys.com/ftp/pub/LTT/Documentation/ltt-usenix.ps.gz

> I'm not impressed. Disabling interrupts for a couple of nano seconds to
> store the trace data in the buffer does not hurt at all. Running through
> a big bunch of out of cache line instructions does.

Like I said above, fighting for/against lockless is not our immediate
goal, and we will likely remove it.

> If you try to trace more than this amount you are toast anyway.
> 
> Please beware me of "reality has bitten" arguments. The whole if(..)
> scenario in _ltt_event_log() is doing postprocessing, which can be done
> in userspace. I don't care about the required time as long as it does
> not introduce additional burden into the kernel.

Not even Ingo hinted at getting rid of filtering. Remember the earlier
e-mail I refered to? Here's what he was suggesting:
> void trace(event, data1, data2, data3)
> {
>   int cpu = smp_processor_id();
>   int idx, pending, *curr = curr_idx + cpu;
>   struct trace_event *t;
>   unsigned long flags;
> 
>   if (!event_wanted(current, event, data1, data2, data3))
>   return;
> 
>   local_irq_save(flags);
> 
> idx = ++curr_idx[cpu] & (NR_TRACE_ENTRIES - 1);
>   pending = ++curr_pending[cpu];
> 
> t = trace_ring[cpu] + idx;
> 
> t->event = event;
> rdtscll(t->timestamp);
> t->data1 = data1;
> t->data2 = data2;
> t->data3 = data3;
> 
>   if (curr_pending == TRACE_LOW_WATERMARK &

Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Thomas Gleixner wrote:
> Sorting out disabled events is the filtering you have to do in kernel
> and you should do it in the hot path or remove the unneccecary
> tracepoints at compiletime. 

Do you actually read my replies or do you just grep for something
you can object to? If you care to read my replies you will see that
this has already been answered.

> You are not answering my argument. 8MB/sec is an event frequency of
> 128hz when we assume 64byte/event. It's one event every 8us. So every
> unneccecary computation, every leaving the hotpath for nothing is just
> giving you performance loss.

I have, you just choose not to read. Here's what I said earlier:
> Note, however, that we are thinking of dropping the lockless scheme
> for now. We will pick up this discussion separately further down the
> road.

IOW, we will be using cli/sti. So there is no "leaving the hotpath".

> I said:
> 
>>>Sorting out disabled events in the hot path 
> 
> 
> s/Sorting/Filtering/
> 
> I never said this should not be done.

You're either on crack or I don't know how to read english. Here's what
you said:
> Sorting out disabled events in the hot path and moving the if
> (pid/gid/grp) whatever stuff into userspace postprocessing is not an
> alien request.

Clearly you are suggesting to moving the filtering into user-space.

> Seperating layers as I suggested before is not making it a generic
> debugging tool. It makes parts of those layers available for other usage
> and gives us the chance to reuse the parts for cleaning up already
> available code which has the same hardwired structure.

This has already been answered.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-17 Thread Karim Yaghmour

Thomas Gleixner wrote:
> Thats the point. Adding another hardwired implementation does not give
> us a possibility to solve the hardwired problem of the already available
> stuff.

Well then, like I said before, you know what you need to do:
http://www-124.ibm.com/developerworks/oss/linux/projects/kernelhooks/

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> Periodically can also mean a buffer start call back from relayfs 
> (although that would mean the first entry is not guaranteed) or a 
> (per cpu) eventcnt from the subsystem. The amount of needed search would 
> be limited. The main point is from the relayfs POV the buffer structure 
> has always the same (simple) structure.

But two e-mails ago, you told us to drop the start_reserve and end_reserve
and move the details of the buffer management into relayfs and out of
ltt? Either we have a callback, like you suggest, and then we need to
reserve some space to make sure that the callback is guaranteed to have
the first entry, or we drop the callback and provide an option to the
user for relayfs to write this first entry for him. Providing a callback
without reservation is no different than relying purely on the heartbeat,
which, like I said before and for the reasons illustrated below, is
unrealistic.

> You have to be more specific, what's so special about this amount of data. 
> You likely want to (incrementally) build an index file, so you don't have 
> to repeat the searches, but even with your current format you would 
> benefit from such an index file.
[snip]
>>As above, restoring the original order of events is fine if you are
>>looking at mbs or kbs of data. It's just totally unrealistic for
>>the amounts of data we want to handle.
> 
> 
> Why is it "totally unrealistic"?

Ok, let's expand a little here on the amount of data. Say you're getting
2MB/s of data (which is not unrealistic on a loaded system.) That means
that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour).
In practice, users aren't necessarily interested in plowing through the
entire 345GB, they just want to view a given portion of it. Now, if I
follow what you are suggesting, I have to go through the entire 345GB to:
a) create indexes, b) reorder events, and likely c) have to rewrite
another 345GB of data. And I haven't yet discussed the kind of problems
you would encounter in trying to reorder such a beast that contains,
by definition, variable-sized events. For one thing, if event N+1 doesn't
follow N, then you would be forced to browse forward until you actually
found it before you could write a properly ordered trace. And it just
takes a few processes that are interrupted and forced to sleep here and
there to make this unusable. That's without the RAM or fs space required
to store those index tables ... At 3 to 12 bytes per events, that's a lot
of space for indexes ...

If I keep things as they are with ordered events and delimiters on buffer
boundaries, I can skip to any place within this 345GB and start processing
from there.

And that's for two days. If you're a sysadmin encountering a transient
problem on a server, you may actually want more than that.

>>But like I said earlier, the added relayfs mode (kdebug) would allow
>>for exactly what you are suggesting:
>>  event_id = atomic_inc_return(&event_cnt);
> 
> 
> Actually that would be already too much for low level kernel debugging.
> Why do you want to put this into relayfs?

I don't. I was just saying that with the adhoc mode, a relayfs client
could use the code snippet you were suggesting.

> What are the _specific_ reasons you need these various modes, why can't 
> you build any special requirements on top of a very light weight relay 
> mechanism?

Because of the opposite requirements.

Here are the two modes I'm suggesting in relayfs and how they operate:

Managed:
- Presumes active user-space daemon interested in catching _all_ events.
- Allows N buffers in buffer ring
- Provides limit-checking (callback on end of sub-buffer)
- Provides buffer delimiters (writes timestamp at beg and end)
- Suited for all types of event sizes (both fixed and variable) at
  very high frequency.
- Daemon is woken up when buffer is ready for writing, executes a
  write() on an mmaped area and notifies relevant kernel subsystem,
  which in turn notifies relayfs that buffer can now be reused.
- Relies on proper abstraction of cli/sti.

Ad-Hoc:
- Presumes transient userspace tool interested in event snapshots.
- Single circular buffer.
- No limits checking (or very basic: as in stop if overwrite).
- No buffer delimiters.
- Best suited for fixed-size events at extreme high frequency.
- User-space tool simply does a write() on an mmaped area and
  exits or goes back to sleep.
- Relies on proper abstraction of cli/sti.

Basically, the ad-hoc modes abides by the principles of KISS, whereas
the managed is a more elaborate for clients like LTT.

Rhetorical: Couldn't the ad-hoc mode case be a special case of the
managed mode? In theory yes, in practice no. The various conditionals
and code paths for switching buffers, invoking callbacks, writing
delimiters and the likes, which make this mode useful to client like
LTT, will always be a problem for those seeking the shortest path to
buffer comital. In the case of Ingo, for example, I'm sure he'd

Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Hello Chistoph,

Christoph Hellwig wrote:
> The thing I'm unhappy with is what the code does currently.  I haven't
> looked at the code enough nor through about the problem enough to tell
> you what's the right thing to do.  Knowing that will involve review of
> the architecture and serious benchmarking on a few plattforms.

Like I was saying elswhere, we are likely going to drop the lockless
code for now (i.e. the code that does the cmpxchg). Instead we will
depend on normal cli/sti abstractions.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Thomas Gleixner wrote:
> I know, what I have said. I said reduce the filtering to the absolute
> minimum and do the rest in userspace.

You keep adopting the interpretation which best suits you, taking
quotes out of context, and keep repeating things that have already
been answered. There are limits to one's patience.

What you did is change your position twice. It's there for anyone to see.

> The now builtin filters are defined to fit somebodys needs or idea of
> what the user should / wants to see. They will not fit everybodys
> needs / ideas. So we start modifying, adding and #ifdefing kernel
> filters, which is a scary vision.

Ah, finally. Here's an actual suggestion. _IF_ you want, I'll just
export a ltt_set_filter(*callback) and rewrite the if in
_ltt_log_event() to:
if ((ltt_filter != NULL) && !( Enabling and disabling events is a valid basic filter request, which
> should live in the kernel. Anything else should go into userspace, IMO.

What you are suggesting is that a system administator that wants to
monitor his sendmail server over a period of three weeks should
just postprocess 1.8TB (1MB/s) of data because Thomas Gleixner didn't
like the idea of kernel event filtering based on anything but events.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-17 Thread Karim Yaghmour

Thomas Gleixner wrote:
> If we add another hardwired implementation then we do not have said
> benefits.

Please stop handwaving. Folks like Andrew, Christoph, Zwane, Roman,
and others actually made specific requests for changes in the code.
What makes you think you're so special that you think you are
entitled to stay on the side and handwave about concepts.

If there is a limitation with the code, please present actual
snippets that need to be changed and suggest alternatives. That's
what everyone else does on this list.

If you want to clean-up the existing tracing code in the kernel,
then here are some ltt calls you may be interested in:
int ltt_create_event(char *event_type,
 char *event_desc,
 int format_type,
 char *format_data);
int ltt_log_raw_event(int event_id, int event_size, void *event_data);

And here's an actual example:
...
  delta_id = ltt_create_event("Delta",
  NULL,
  CUSTOM_EVENT_FORMAT_TYPE_HEX,
  NULL);
...
  ltt_log_raw_event(delta_id, sizeof(a_delta_event), &a_delta_event);
...
  ltt_destroy_event(delta_id);

You can then use LibLTT to read the trace and extract your custom
events and format your binary data as it suits you.

Save the bandwidth and start cleaning.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> An additional comment about the order of events. What you're doing in 
> lockless_reserve is bogus anyway. There is no single correct time to 
> write into the event. By artificially synchronizing event order and event 
> time you only cheat yourself. You either take it into account during 
> postprocessing that events can be interrupted or the time stamp doesn't 
> seem to be that important, but there is nothing you can do during the 
> recording of the event except of completely disabling interrupts.

Correct and like I said before, we are dropping the lockless scheme.
Ergo, disabling interrupts we will.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Thomas Gleixner wrote:
> Provide a hook, export it and load your filters as a module, but keep
> the filters out of the mainline kernel code. 

Great idea! I will do exactly that.

Thanks,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> Why is so important that it's at the start of the buffer? What's wrong 
> with a special event _near_ the start of a buffer?
[snip]
> What gives you the idea, that you can't do this with what I proposed?
> You can still seek freely within the data at buffer boundaries and you 
> only have to search a little into the buffer to find the delimiter. Events 
> are not completely at random, so that the little reordering can be done at 
> runtime. Sorry, but I don't get what kind of unsolvable problems you see 
> here.

Actually I just checked the code and this is a non-issue. The callback
can only be called when the condition is met, which itself happens only
on buffer switch, which itself only happens when we try to reserve
something bigger than what is left in the buffer. IOW, there is no need
for reserving anything. Here's what the code does:
if (!finalizing) {
bytes_written = rchan->callbacks->buffer_start ...
cur_write_pos(rchan) += bytes_written;
}

With that said, I hope we've agreed that we'll have a callback for
letting relayfs clients know that they need to write the begining of
the buffer event. There won't be any associated reserve. Conversly,
I hope it is not too much to ask to have an end-of-buffer callback.

> Wrong question. What compromises can be made on both sides to create a 
> common simple framework? Your unwillingness to compromise a little on the 
> ltt requirements really amazes me.

Roman, of all people I've been more than happy to change my stuff following
your recommendations. Do I have to list how far down relayfs has been
stripped down? I mean, we got rid of the lockless scheme (which was
one of ltt's explicit requirements), we got rid of the read/write capabilities
for user-space, etc. And we are now only left with the bare-bones API:
rchan* relay_open(channel_path, bufsize, nbufs, flags, *callbacks);
intrelay_close(*rchan);
intrelay_reset(*rchan);
intrelay_write(*rchan, *data_ptr, count, **wrote-pos);

char*  relay_reserve(*rchan, len, *ts, *td, *err, *interrupting);
void   relay_commit(*rchan, *from, len, reserve_code, interrupting);
void   relay_buffers_consumed(*rchan, u32);

#define relay_write_direct(DEST, SRC, SIZE) \
#define relay_lock_channel(RCHAN, FLAGS) \
#define relay_unlock_channel(RCHAN, FLAGS) \

This is a far-cry from what we had before, have a look at the
relayfs.txt file in 2.6.11-rc1-mm1's Documentation/filesystems if
you want to compare. Please at least acknowledge as much.

I'm more than willing to compromise, but at least give me something
substantive to feed on. I've explained why I believe there needs to be
two modes for relayfs. If you don't think they are appropriate, then
please explain why. Either my experience blinds me or it rightly
compels me to continue defending it.

You ask what compromises can be found from both sides to obtain a
single implementation. I have looked at this, and given how
stripped down it has become, anything less from relayfs will make
it useless for LTT. IOW, I would have to reimplement a buffering
scheme within LTT outside of relayfs.

Can't you see that not all buffering schemes are adapted to all
applications and that it's preferable to have a single API
transparently providing separate mechanisms instead of a single
mechanism that doesn't satisfy any of its users?

If I can't convince you of the concept, can I at least convince
you to withhold your final judgement until you actually see the
code for the managed vs. ad-hoc schemes?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-17 Thread Karim Yaghmour

Aaron Cohen wrote:
>   I've got a quick question and I just want to be clear that it
> doesn't have a political agenda behind it.

:)

> Here goes, why can't LTT and/or relayfs, work similar to the way
> syslog does and just fill a buffer (aka ring-buffer or whatever is
> appropriate), while a userspace daemon of some kind periodically reads
> that buffer and massages it.  I'm probably being naive but if the
> difficulty is with huge several hundred-gig files, the daemon if it
> monitors the buffer often enough could stuff it into a database or
> whatever high-performance format you need.

Because of the bandwidth it is not possible to do any sort of live
processing of any kind. The only thing the daemon can possibly do
is write large blocks of tracing info to disk as rapidly as possible.

>  It also seems to me that Linus' nascent "splice and tee" work would
> be really useful for something like this to avoid a lot of unnecessary
> copying by the userspace daemon.

There is no copying by the userspace daemon. All it does is open(),
then mmap(), and then it sleeps until it is woken up by the ltt
kernel subsystem. When that happens, it only does a write() on the
mmaped area, tells the ltt subsystem that it commited X number of
sub-buffers and goes back asleep. This is all zero-copy.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-18 Thread Karim Yaghmour

Thomas,

Thomas Gleixner wrote:
> Yes, I did already start cleaning
> 
> cat ../broken-out/ltt* | patch -p1 -R

:D

If it gives you a warm and fuzzy feeling to have the last
cheap-shot, then I'm all for it, it is of no consequence anyway.
And _please_ don't forget to answer this very email with
something of the same substance.

For my part I consider that I've invested a substantial amount
of time in responding to both your conceptual and practical
feedback, as the archives clearly show.

That being said, I have to thank you for making sure that all
the obvious questions have been asked. I now have more than a
dozen archive links of my answers to those. I'll sure come in
handy when writing an FAQ.

Thanks again,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: 2.6.11-rc1-mm1

2005-01-18 Thread Karim Yaghmour

Tom Zanussi wrote:
> I have to disagree.  Awhile back, if you remember, I posted a patch to
> the LTT daemon that would monitor the trace stream in real time, and
> process it using an embedded Perl interpreter, no less:
> 
> http://marc.theaimsgroup.com/?l=linux-kernel&m=109405724500237&w=2
> 
> It didn't seem to have any problems keeping up with the trace stream
> even though it was monitoring all LTT event types (and a couple of
> others - custom events injected using kprobes) and not doing any
> filtering in the kernel, through kernel compiles, normal X traffic,
> etc.  I don't know what volume of event traffic would cause this model
> to break down, but I think it shows that at least some level of
> non-trivial live processing is possible...

Good Point.

My bad. Thanks for bringing this up. Obviously this didn't get as
much attention as it should've had the last time it was posted,
especially as it allows very easy scripting of filtering in userspace.
That email you refer to is pretty loaded and I'm sure those who
are interested will dig through it. But in the interest of helping
everyone get a rapid understanding of what it does and how it does it,
can you break it down in to a short description, possibly with a
diagram? I'm sure many will find this very interesting.

Thanks,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-19 Thread Karim Yaghmour

Werner Almesberger wrote:
>>From all I've heard and seen of LTT (and I have to admit that most
> of it comes from reading this thread, not from reading the code),

Might I add that this is part of the problem ... No personal
offence intended, but there's been _A LOT_ of things said about
LTT that were based on third-hand account and no direct contact
with the toolset/code. And part of the problem is that _many_
people on this list, and elsewhere, have done some form of
tracing or another as part of their development, so they all
have their idea of how this is best done. Yet, while such
experience can help provide additional ideas to LTT's development,
it also often requires re-explaining to every new suggestor why we
added features he couldn't imagine would be useful to any of
his/her own tracing needs ... Sometimes I wish my interests lied
in some arcane feature that few had ever played with ;)

IOW, while I don't discount anybody else's experience with tracing,
please give us at least the benefit of the doubt by actually:
a) Looking at the code
b) Looking at the mailing list archives
c) Asking us questions directly related to the code

> I have the impression that it may try to be a bit too specialized,
> and thus might miss opportunities for synergy. 

Bare with me on this one ...

> You must be getting tired of people trying to redesign things from
> scratch, but maybe you'll humor me anyway ;-)

Hey, from you Werner I'll take anything. It's always a pleasure
talking with you :)

> Karim Yaghmour wrote:
> 
>>If you really want to define layers, then there are actually four
>>layers:
>>1- hooking mechanism
>>2- event definition / registration
>>3- event management infrastructure
>>4- transport mechanism
> 
> 
> For 1, kprobes would seem largely sufficient. In cases where you
> don't have a usable attachment point (e.g. in the middle of a
> function and you need access to variables with unknown location),
> you can add lightweight instrumentation that arranges the code
> flow suitably. [1, 2]

Let me say outright, as I said to Andi early on in the sister thread,
that I have no problems with having the trace points being fed by
kprobes. In fact, in 2000, way back before kprobes even existed, LTT
was already interfacing with DProbes for dynamic insertion of trace
points.

... There I said it ... now watch me have to repeat this yet again
later on ... :/

However, kprobes is not magic:
a) Like I said to Andi:
> As far as kprobes go, then you still need to have some form or another
> of marking the code for key events, unless you keep maintaining a set
> of kprobes-able points separately, which really makes it unusable for
> the rest of us, as the users of LTT have discovered over time (having
> to create a new patch for every new kernel that comes out.)

b) Like I said to Andrew back in July:
> I've double-checked what I already knew about kprobes and have looked again
> at the site and the patch, and unless there's some feature of kprobes I don't
> know about that allows using something else than the debug interrupt to add
> hooks,
...
> Generating new interrupts is simply unacceptable for LTT's functionality.
> Not to mention that it breaks LTT because tracing something will generate
> events of its own, which will generating tracing events of their own ...
> recursion.

Ok, you can argue about the recursion thing with an "if()", but you'll
have to admit that like in the case I described to Roman:
> ... Say you're getting
> 2MB/s of data (which is not unrealistic on a loaded system.) That means
> that if I'm tracing for 2 days, I've got 345GB of data (~7.5GB/hour).
IOW, something like 200,000events/s (average of 10bytes/event). Do I
really need to explain that 200,000 traps/interrupts per second is
not something you want ... ?

But don't despair, like I said to Andi:
> So lately I've been thinking that there may be a middle-ground here
> where everyone could be happy. Define three states for the hooks:
> disabled, static, marker. The third one just adds some info into
> System.map for allowing the automation of the insertion of kprobes
> hooks (though you would still need the debugging info to find the
> values of the variables that you want to log.) Hence, you get to
> choose which type of poison you prefer. For my part, I think the
> noop/early-check should be sufficient to get better performance from
> the existing hook-set.
I have received very little feedback on this suggestion, though I
really think it's worth entertaining, especially with your mention
of uml-sim markers further below.

As for the location of ltt trace points, then they are very rarely
at function boundaries. Here's a classic:
prepare_arch_

Re: [RFC] Instrumentation (was Re: 2.6.11-rc1-mm1)

2005-01-20 Thread Karim Yaghmour

Werner Almesberger wrote:
>  - if the probe target is an instruction long enough, replace it with
>a jump or call (that's what I think the kprobes folks are working
>on. I remember for sure that they were thinking about it.)

I heard about this years ago, but I don't know that anything came of
it. I suspect that this is not as simple as it looks and that the
only reliable way to do it is with a trap.

> Probably because everybody saw that it was good :-)

Great, thanks. That's what we'll aim for then. We've already got
the "disable" and "static" implemented, so now we need to figure
out how do we best implement this tagging. IBM's kernel hooks
allowed the NOP solution, so I'm guessing it shouldn't be that
much of a stretch to extend it for marking up the code for kprobes
and friends. I don't know whether this code is still maintained or
not, but I'd like to hear input as to whether this is a good basis,
or whether you're thinking of something like your uml-sim hooks?

> So you need seeking, even in the presence of fine-grained control
> over what gets traced in the first place ? (As opposed to extracting
> the interesting data from the full trace, given that the latter
> shouldn't contain too much noise.)

The problem is that you don't necessarily know beforehand what's
the problem. So here's an actual example:

I had a client who had this box on which a task was always getting
picked up by the OOM killer. Try as they might, the development
team couldn't figure out which part of the code was causing this.
So we put LTT in there and in less than 5 minutes we found the
problem. It turned out that a user-space access to a memory-mapped
FPGA caused an unexpected FP interrupt to occur, and the application
found itself in a recursive signal handler. In this case there was
an application symptom, but it was a hardware problem.

This is just a simple example, but there are plenty of other
examples where a sysadmin will be experiencing some weird
hard to reproduce bugs on some of his systems and he'll spend
a considerable amount of time trying to guess what's happening.
This is especially complicated when there's no indication as to
what's the root of the problem. So at that point being able to
log everything and being able to rapidely browse through it is
critical.

Once you've done such a first trace you _may_ _possibly_ be
able to refine your search requirements and relog with that in
mind, but that's after the fact.

> Or that they have been consumed. My question is just whether this
> kind of aggregation is something you need.

Absolutely. If you're thinking about short 100kb or MBs traces,
then a simpler scheme would be possible. But when we're talking
about GB and 100GBs spaning days, there's got to be a managed
way of doing it.

>>I have nothing against kprobes. People keep refering to it as if
>>it magically made all the related problems go away, and it doesn't.
> 
> 
> Yes, I know just too well :-) In umlsim, I have pretty much the
> same problems, and the solutions aren't always nice. So far, I've
> been lucky enough that I could almost always find a suitable
> function entry to abuse.

Glad you acknowledge as much.

> However, since a kprobes-based mechanism is - in the worst case,
> i.e. when needing markup - as good as direct calls to LTT, and gives
> you a lot more flexibility if things aren't quite as hostile, I
> think it makes sense to focus on such a solution.

You certainly have a lot more experience than I do with that, so
I'd like to solicit your help. As above: what's the best way to
provide this in addition to the static and disable points?

> Yup, but you could move even more intelligence outside the kernel.
> All you really need in the kernel is a place to put the probe,
> plus some debugging information to tell you where you find the
> data (the latter possibly combined with gently coercing the
> compiler to put it at some accessible place).

Right, but then you end up with a mechanism with generalized hooks.
Actually there was a time when LTT was a driver and you could
either build it as a module or keep it built-in. However, when
we published patches to get LTT accepted in 2.5 we were told on
LKML to move LTT into kernel/ and avoid all this driver stuff.
Having it, or parts of it, in the kernel makes it much simpler
and much more likely that the existing ad-hoc tracing code
spreading accross the sources be removed in exchange for a
single agreed upon way of doing things.

It must be said that like I had done with relayfs, the LTT patch
will go through a major redux and I will post the patches for
review like before on LKML.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.htm

Re: [RFC] tracepipe -- event streams, debugfs, and pipe_buffers

2005-01-20 Thread Karim Yaghmour

Zach Brown wrote:
> Thoughts?  I, for one, am tired of writing throw-away per-cpu tracing
> patches ;)

Have you taken a look at relayfs and ltt?

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux for 2.6.10: lean and mean

2005-01-20 Thread Karim Yaghmour

Greg KH wrote:
> Hm, how about this idea for cutting about 500 more lines from the code:
> 
> Why not drop the "fs" part of relayfs and just make the code a set of
> struct file_operations.  That way you could have "relayfs-like" files in
> any ram based file system that is being used.  Then, a user could use
> these fops and assorted interface to create debugfs or even procfs files
> using this type of interface.
> 
> As relayfs really is almost the same (conceptually wise) as debugfs as
> far as concept of what kinds of files will be in there (nothing anyone
> would ever rely on for normal operations, but for debugging only) this
> keeps users and developers from having to spread their debugging and
> instrumenting files from accross two different file systems.

However this assumes that the users of relayfs are not going to want
it during normal system operation. This is an assumption that fails
with at least LTT as it is targeted at sysadmins, application developers
and power users who need to be able to trace their systems at any time.

I don't mind piggy-backing off another fs, if it makes sense, but
unlike debugfs, relayfs is meant for general use, and all files in there
are of the same type: relay channels for dumping huge amounts of data
to user-space. It seems to me the target audience and basic idea (relay
channels only in the fs) are different, but let me know if there's a
compeling argument for doing this in another way without making it too
confusing for users of those special "files" (IOW, when this starts
being used in distros, it'll be more straightforward for users to
understand if all files in a mounted fs behave a certain way than if
they have certain "odd" files in certain directories, even if it's
/proc.)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] tracepipe -- event streams, debugfs, and pipe_buffers

2005-01-20 Thread Karim Yaghmour

Zach Brown wrote:
> Only briefly.  They've always seemed more involved than the sort of
> thing I was after.  I'll try and sit down and investigate in more detail.

There's definitely an opportunity for interfacing here. If nothing else,
this clearly shows the interest for the kind of things both relayfs and
ltt attempt to achieve.

So here are a few comments regading the implementation and how this
relates to the stuff I'm working on.

> While it's running the kernel subsystem can send binary blobs, less than
> the length of a page, down this channel.  The blobs are copied into
> per-cpu lists of pages.  Cutesy little headers with get_cycles() and the
> cpu id are prepended to each blob.  The traces are only recorded if user
> space has open references to the file.

In the case of LTT, we just open one relay channel per cpu. This avoids
having to write the CPUID to the trace, that's 2 bytes less per event,
and also avoids any need for synchronization.

As for get_cycles(), some architectures don't have anything useful to
give. Here's for ARM (include/asm-arm/timex.h):
static inline cycles_t get_cycles (void)
{
return 0;
}

In the case of LTT, we just use the, albeit expensive, do_gettimeofday
when hardware counters aren't there (currently all non-x86 tracing does
this, but this should be fixed.) Also, in the case of the x86 at least,
we just write the lower 32-bits of the TSC, so that's 4 bytes less per
event. Instead, we use the buffer_start and buffer_end callbacks provided
by relayfs to write a header and footer containing full do_gettimeofday
value and TSC value.

> As the pages fill they're kicked off to a work_struct worker who puts
> them in the bufs[] array in the debugfs pipe file.  Userspace can then
> do whatever it wants with the data via the pipe.  One can imagine it
> wanting to splice() these pages to disk in huge batches, or perhaps some
> zero-copy network card, etc.  I've only tested this so far as verifying
> that 'cat' is able to push data into a regular file.

It seems to me that while this is a nice use of pipes, it isn't as fast
as ram-locked pages. Basically relayfs does the bttv driver magic (or
what used to be done in there, I haven't checked what they do lately.)
Basically, we allocate pages, lock them into ram and remap them for use
as a single memory area. No caching necessary. It goes from the buffer
to whatever media you want (disk, network, etc.) IOW, user-space does
a open(), mmap(), write(). Also, the channels exist whether user-space
has done an open or not. That's good for flight-recording.

Looking at the code:

- tracepipe_event() does a get_cpu()/put_cpu() for protecting the
writing to the buffer. What about tracing within an interrupt?
local_irq_save()?

- I hadn't thought of doing something like this to write the header:
+   hdr = tcpu->next_region;
+   hdr->cycles = get_cycles();
+   hdr->cpu = cpu;
I will replace some of the memcpy() code in LTT with something like this.

- From what I assume is a "whishlist":
+ * - actually communicate missed to userspace

Already done in LTT.

+ * - how to specify wrapping or dropping

relayfs provides RELAY_MODE_CONTINUOUS and RELAY_MODE_NO_OVERWRITE.

+ * - non-temporal stores into bufs

The latest relayfs code doesn't care about timestamps. It's its
clients job to do that (ex. ltt).

+ * - let caller reserve space and get a pointer into buf

This is the relevant relayfs function:
char* relay_reserve(struct rchan *rchan, u32 len, int *err, int *interrupting)

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-20 Thread Karim Yaghmour

OK, I finally come around to answering this ...

Roman Zippel wrote:
> Sorry, you missunderstood me. At the moment I'm only secondarily 
> interested in the API details, primarily I want to work out the details of 
> what exactly relayfs/ltt are supposed to do. One main question here I 
> can't answer yet, why you insist on multiple relayfs modes.

I should have avoided earlier confusing the use of a certain type of
relayfs channel for a given purpose (i.e. LTT should not necessarily
depend on the managed mode.) I believe that there is a need for
more than one mode in relayfs independently of LTT. There are users
who want to be able to manage the data in a buffer (by manage I mean:
receive notification of important buffer events, be able to insert
important data at boundaries, etc.), and there are users who just
want to dump as much information as possible in as fast a way as
possible without having to deal with non-essential codepaths.

> This is what I basically have in mind for the relay_write function:
> 
>   cpu = get_cpu();
>   buffer = relay_get_buffer(chan, cpu);
>   while(1) {
>   offset = local_add_return(buffer->offset, length);
>   if (likely(offset + length <= buffer->size))
>   break;
>   buffer = relay_switch_buffer(chan, buffer, offset);
>   }
>   memcpy(buffer->data + offset, data, length);
>   put_cpu();

looking at this code:

1) get_cpu() and put_cpu() won't do. You need to outright disable
interrupts because you may be called from an interrupt handler.

2) You assume that relayfs creates one buffer per cpu for each
channel. We think this is wrong. Relayfs should not need to care
about the number of CPUs, it's the clients' responsibility to
create as many channels as they see fit, whether it be one channel
per CPU or 10 channels per CPU or 1 channel per interrupt, etc.

3) I'm unclear about the need for local_add_return(), why not
just:
if (likely(buffer->offset + length <= buffer->size)
In any case, here's what we do in relay_write():
write_pos = relay_reserve(rchan, count, &reserve_code, &interrupting);
If there's any buffer switching required, that will be done in
relay_reserve. This has the added advantage that clients that
want to write directly to the buffer without using relay_write()
can do so by calling relay_reserve() and not care about required
buffer switching.

4) After securing the area, you simply go ahead and do a memcpy()
and leave. We think that this is insufficient. Here's what we
do:
if (likely(write_pos != NULL)) {
relay_write_direct(write_pos, data_ptr, count);
relay_commit(rchan, write_pos, count, reserve_code, 
interrupting);
*wrote_pos = write_pos;
the relay_write_direct() is basically an memcpy(). We also do
a relay_commit(). This actually effects the delivery of the
event. If, for example, there had been a buffer switch at the
previous relay_reserve(), then this call to relay_commit() will
generate a call to the client's deliver() callback function.
In the case of LTT, for example, this is how it knows that it's
got to notify the user-space daemon that there are buffers to
consume (i.e. write to disk.)

> ltt_log_event should only be a few lines more (for writing header and 
> event data).

Actually no, you don't want ltt_log_event using relay_write(),
for one thing because is can generate variable size events.
Instead, ltt_log_event does (basically):
data_size = sizeof(event_id) + sizeof(time_delta) + sizeof(data_size);

relay_lock_channel();
relay_reserve();

relay_write_direct(&event_id, sizeof(event_id));
relay_write_direct(&time_delta, sizeof(event_id));
if (var_data) {
relay_write_direct(var_data, var_data_len);
data_size += var_data_len;
}
relay_write_direct(&data_size, sizeof(data_size));

relay_commit();
relay_unlock_channel();

> What I'd like to know now are the reasons why you need more than this.

I hope the above explanation clarifies things.

> It's not the amount of data and any timing requirements have to be done by 
> the caller. During processing you either take the events in the order they 
> were recorded (often that's good enough) or you sort them which is not 
> that difficult.

Ordering is a non-issue to be honest. Unless you've got some hardware
scope in there, it's almost impossible to pinpoint exactly when an
event occurred. There is no single line of code where an event occurs,
so it's all an educated guess anyway. You want things to resemble what
really happened in as much as possible though.

> I know you don't want to touch the topic of kernel debugging, but its 
> requirements greatly overlap with what you want to do with ltt, e.g. one 
> needs very often information about scheduling events as many kernel 
> processes rely more and more on kernel threads. The only real

Re: 2.6.11-rc1-mm1

2005-01-22 Thread Karim Yaghmour

Hello Roman,

Roman Zippel wrote:
> Well, let's concentrate for a moment on the last thing and check later 
> if and how they fit into relayfs. Since ltt will be first main user, let's 
> optimize it for this.
> Also since relayfs is intended for large, fast data transfers, per cpu 
> buffers are pretty much always required, so it would make sense to leave 
> this to relayfs (less to get wrong for the client).

But how does relayfs organize the namespace then? What if I have
multiple channels per CPU, each for a different type of data, will
all channels for the same CPU be under the same directory or will
each type of data have its own directory with one entry per CPU?
I don't have an answer to that, and I don't know that we should. Why
not just leave it to the client to organize his data as he wishes.
If we must assume that everyone will have at least one channel per
CPU, then why not provide helper functions built on top of very
basic functions instead of fixing the namespace in stone?

> I have to modify it a little (only the if (!buffer) part is new):
> 
>   cpu = get_cpu();
>   buffer = relay_get_buffer(chan, cpu);
>   while(1) {
>   offset = local_add_return(buffer->offset, length);
>   if (likely(offset + length <= buffer->size))
>   break;
>   buffer = relay_switch_buffer(chan, buffer, offset);
>   if (!buffer) {
>   put_cpu();
>   return;
>   }
>   }
>   memcpy(buffer->data + offset, data, length);
>   put_cpu();
> 
> This has a very short fast path and I need very good reasons to change/add 
> anything here. OTOH the slow path with relay_switch_buffer() is less 
> critical and still leaves a lot of flexibility.

This is not good for any client that doesn't know beforehand the exact
size of their data units, as in the case of LTT. If LTT has to use this
code that means we are going to loose performance because we will need to
fill an intermediate data structure which will only be used for relay_write().
Instead of zero-copy, we would have an extra unnecessary copy. There has
got to be a way for clients to directly reserve and write as they wish.
Even Zach Brown recognized this in his tracepipe proposal, here's from
his patch:
+ * - let caller reserve space and get a pointer into buf

>>1) get_cpu() and put_cpu() won't do. You need to outright disable
>>interrupts because you may be called from an interrupt handler.
> 
> 
> Look closer, it's already interrupt safe, the synchronization for the 
> buffer switch is left to relay_switch_buffer().

Sorry, I'm still missing something. What exactly does local_add_return()
do? I assume this code has got to be interrupt safe? Something like:
#define local_add_return(OFFSET, LEN) \
do {\
...
local_irq_save(); \
OFFSET += LEN;
local_irq_restore(); \
...
} while(0);

I'm assuming local_irq_XXX because we were told by quite a few people
in the related thread to avoid atomic ops because they are more expensive
on most CPUs than cli/sti.

Also how does relay_get_buffer() operate? What if I'm writing an event
from within a system call and I'm about to switch buffers and get
an interrupt at the if(likely(...))? Isn't relay_get_buffer() going to
return the same pointer as the one obtained for the syscall, and aren't
both cases now going to effect relay_switch_buffer(), one of which will
be superfluous?

> This adds a conditional and is not really needed. Above shows how to make 
> it interrupt safe and if the clients wants to reuse the same buffer, leave 
> the locking to the client.

Fine, but how is the client going to be able to reuse the same buffer if
relayfs always assumes per-CPU buffer as you said above? This would be
solved if at its core relayfs' functions worked on single channels and
additional code provided helpers for making the SMP case very simple.

> That's quite a lot of code with at least 14 conditions (or 13 conditions 
> too much) and this is just relayfs.

I believe Tom has refactored the code with your comments in mind, and has
something ready for review. I just want to clear up the above before we
make this final. Among other things, he just dropped all modes, and there's
only a basic relay_write() that closely resembles what you have above.

> That's not always true, where perfomance matters we provide different 
> functions (e.g. spinlocks), so having an alternative version of 
> relay_write is a possibility (although I'd like to see the user first).

Sure, see above in the case of LTT.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http:

Re: 2.6.11-rc1-mm1

2005-01-22 Thread Karim Yaghmour

Karim Yaghmour wrote:
> This is not good for any client that doesn't know beforehand the exact
> size of their data units, as in the case of LTT. If LTT has to use this
> code that means we are going to loose performance because we will need to
> fill an intermediate data structure which will only be used for relay_write().
> Instead of zero-copy, we would have an extra unnecessary copy. There has
> got to be a way for clients to directly reserve and write as they wish.
> Even Zach Brown recognized this in his tracepipe proposal, here's from
> his patch:
> + *   - let caller reserve space and get a pointer into buf

Actually, come to think of it, this code is not good for any client that
needs to fill complex data structures, whether they be fixed-size or not,
because it requires having a prepackaged structure already available.
Any client that wants to have zero-copying will want to write data
directly into the buffer instead of filling an intermediate buffer first.
And this requires being able to atomically reserve.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] relayfs redux for 2.6.10: lean and mean

2005-01-22 Thread Karim Yaghmour

Greg KH wrote:
> Are they willing to trade off the performance of LTT to get this?  I
> thought this was being touted as a "when you need to test" type of
> thing, not a "run it all the time" type of feature.

The problem is that you never know beforehand when you're going to
get that weird glitch on your server, or how much time you're going
to need to reproduce it. People who manage thousands of servers
will want to be able to fire this off at will without having to
reboot/recompile their kernel. What has to be done is make the cost
of the tracing infrastructure as minimal as possible when it is
indeed built into the kernel (of course if it's disabled it should
cost the same thing as if it wasn't there to boot: nothing.) This,
though, is a separate topic which is being addressed in other threads.
Have a look at Werner's resent postings if you're interested on the
"[RFC] instrumentation" thread.

> And a driver will never want to have both a relay channel, and a simple
> debug output at the same time?  You are now requiring them to look for
> that data in two different points in the fs.
[snip]
> So, since you are proposing that relayfs be mounted all the time, where
> do you want to mount it at?  I had to provide a "standard" location for
> debugfs for people to be happy with it, and the same issue comes up
> here.
> 
> Also, why not export your relayfs ops so that someone useing debugfs can
> create a relay channel in it, or in any other type of fs they might
> create?

Ok, there are a couple of things in there:

- First I don't object to having the relayfs ops being exported so that
they could be used in conjunction with other filesystems, in addition
to having relayfs live as an independent fs. So as in the case above, we
should be able to accomodate the device driver writer who wants to have
all his files in the same fs. However, for the first case relayfs was
built for, I think there is merit for having it live as a separate fs.
Is this a good compromise for you?

- As for where relayfs should be mounted, then this is a very good
question. We've taken to the habit of having a /relayfs. If this is
too problematic, I don't see any problem with /mnt/relayfs also. In
either case, I have to admit frankly that I'm not familiar with the
exact formal rules for introducing something like this. Of course
I'm aware of the FHS and LSB, but let me know what you think is the
best way to proceed here.

Thanks,

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-23 Thread Karim Yaghmour

Karim Yaghmour wrote:
> This is not good for any client that doesn't know beforehand the exact
> size of their data units, as in the case of LTT. If LTT has to use this
> code that means we are going to loose performance because we will need to
> fill an intermediate data structure which will only be used for relay_write().
> Instead of zero-copy, we would have an extra unnecessary copy. There has
> got to be a way for clients to directly reserve and write as they wish.
> Even Zach Brown recognized this in his tracepipe proposal, here's from
> his patch:
> + *   - let caller reserve space and get a pointer into buf

Also, if the reserve is exported, then a client that chooses so, can
do something like:

local_irq_save();
relay_reserve();
write(); write(); write(); ...
local_irq_restore();

And therefore enforce in-order events is he so chooses.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.11-rc1-mm1

2005-01-25 Thread Karim Yaghmour

Roman Zippel wrote:
> Ok, great.
> BTW I don't really expect the first version to be fully optimized (unless 
> you want to :) ), but once the basics are right, that can still be added 
> later.

Agreed. Tom will post updated patches sometime this week. I'll follow up
with the LTT stuff separately as agreed.

Karim
-- 
Author, Speaker, Developer, Consultant
Pushing Embedded and Real-Time Linux Systems Beyond the Limits
http://www.opersys.com || [EMAIL PROTECTED] || 1-866-677-4546
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/05] Linux Kernel Markers, non optimized architectures

2007-02-21 Thread Karim Yaghmour

- KRYPTIVA PACKAGED MESSAGE -
PACKAGING TYPE: SIGNED

Hello Mathieu,

Mathieu Desnoyers wrote:
> Yes, that was indeed the first way I implemented it, as a "disable" option. One of the 
main thing we have to figure out before I modify this is if we want to have the generic version of 
markers available in a "forced" manner at the marker site with the GEN_MARK macro instead of 
the MARK macro (this is the actual implementation). It has proven to be useful to instrument lockdep.c 
irq
> enable/disable tracing functions. The reason why is because they are called 
just before the trap handler returns and I need it to do XMC on x86 and x86_64. It 
would therefore cause a recursive trap.
>
> I think it makes sense to have this kind of support for hard-to-instrument 
sites within the marker infrastructure, but the cost is to have two marker flavors 
: MARK and GEN_MARK (but really GEN_MARK is only intended for a few sites).

I must admit that I'm unsure about the use of different marker macros.
How about bitwise flags that could be coded as part of the marker
at the marker site? Something like "MARKER_TYPE_FORCED". This would
still allow some form of toplevel control at the macro definition.
Otherwise there's some digging to be done on a per-marker
basis ...

Karim
- KRYPTIVA SIGNED MESSAGE -
This email claims to have been packaged by Kryptiva.
To process this email and authenticate its origin, get
the free plugin from:
http://www.kryptiva.com/downloads

- KRYPTIVA SIGNATURE START -
AvWVqAIBTiACAQC3AQAIAgECABTXxT4xHdR4/1uU1hL2
+TaPrqNB0wMAFNa8GHXZWJH5Dz+D76vfh6JhvWLvBAAUpuIZcCAkCC+ldyaBuoAWxK50HiQF
ABRI38gc/foDHQsS6X3W0VP4xTukBwYAFB0lithGcxNZYBHaLDONjp6eo/LoBwAU6OwGS0m1
IVdBt6tKzhaPW8MOfncRABgAAABOIEXcozcACATMABkTAAQAggQA
mHAJeFbYUzxSX+zkI0DtoVKcqqSp2Ztc9GtY7ZtuLBmeqg5pW0rIbkhutQiztTXlJQ0Ye9bV
yzEVWd/m7GhDAgRBmyg3kCOt7g7potr1l5J3X5K8TiqtWXbNo3k6AHRlGZyn0190iIBSvf85
nVh3hKiNPsw8DYs1NKb+KMON+4g=
- KRYPTIVA SIGNATURE END -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/05] Linux Kernel Markers, non optimized architectures

2007-02-21 Thread Karim Yaghmour

- KRYPTIVA PACKAGED MESSAGE -
PACKAGING TYPE: SIGNED

Mathieu Desnoyers wrote:
> The problem with your proposal, I guess, is that people will have to add a 
supplementary parameter to the macro.
>
> It is not uncommon to have two slightly versions of macros/functions in the 
kernel (preempt_enable()/preempt_enable_no_resched(), or macros starting with 
underscores). Normally, the underscore states that the macro does not do the 
proper locking itself (this is not our case). Therefore, I would suggest using a 
name that suggests against what the macro is protected. For instance, a marker
> pointing to the generic version is only needed to protect against the debug 
trap handler and should only be used on x86 and x86_64.

I can see your point, to a degree. The difference here is that the
variants you mention are actually macros that do something, they
aren't stubs for code. IOW, you actually know what's happening
underneath a foo() vs. _foo() by its name only. Maybe this applies
the same to markers, I don't know. But maybe we want to make it
easy for those looking at markers that there's a master kill
switch somewhere that all markers go through and through which
they can all be disabled very simply (say by using a "#if 0").
While different names *may* be doing that, a same name *does* that.
But I don't feel too strongly either way, it's really up to those
who maintaining the code to say.

Karim

- KRYPTIVA SIGNED MESSAGE -
This email claims to have been packaged by Kryptiva.
To process this email and authenticate its origin, get
the free plugin from:
http://www.kryptiva.com/downloads

- KRYPTIVA SIGNATURE START -
AvWVqAIBTiACAQC3AQAIAgECABTXxT4xHdR4/1uU1hL2
+TaPrqNB0wMAFNa8GHXZWJH5Dz+D76vfh6JhvWLvBAAUpuIZcCAkCC+ldyaBuoAWxK50HiQF
ABRI38gc/foDHQsS6X3W0VP4xTukBwYAFB0lithGcxNZYBHaLDONjp6eo/LoBwAUpXC6F2jf
nElq3fnZQpGW97Fk/2QRABgAAABOIEXcvqAADJ5wAB4TAAQAggP/
RQ/W0H9H9bhrZyC67an//DbWC4D38PgLoeMG6Tjvx7jWTpEh79DeQ/+sbb9aYZvbwYwtaVaJ
VuPEiRnPZX0mqnOFm+GDzE9jB6202lR0Nzczh1WCifbrrXI7CSEjOwI3ve0jcCoGxTEzZRYj
LGxuubV8Hh5HU12zi3Mxgdz031Y=
- KRYPTIVA SIGNATURE END -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH 05/05] Linux Kernel Markers, non optimized architectures

2007-02-16 Thread Karim Yaghmour

- KRYPTIVA PACKAGED MESSAGE -
PACKAGING TYPE: SIGNED

Mathieu Desnoyers wrote:
> The main goal of this config option is for embedded systems which doesn't support live 
code modification. Maybe we can put that under "embedded sytems" menu ?

Not sure whether you had had other feedback on this elsewhere in
the rest of the thread, but yes, this would make sense if the
"embedded" angle is the only reason we need this (and not, say,
performance, etc.) Also, having done that, maybe it would make
some sense to have it be a "disable" rather than "enable":
CONFIG_MARKERS_DISABLE_OPTIMIZATION?

Karim


- KRYPTIVA SIGNED MESSAGE -
This email claims to have been packaged by Kryptiva.
To process this email and authenticate its origin, get
the free plugin from:
http://www.kryptiva.com/downloads

- KRYPTIVA SIGNATURE START -
AvWVqAIBTiACAQC3AQAIAgECABTXxT4xHdR4/1uU1hL2
+TaPrqNB0wMAFNa8GHXZWJH5Dz+D76vfh6JhvWLvBAAUpuIZcCAkCC+ldyaBuoAWxK50HiQF
ABRI38gc/foDHQsS6X3W0VP4xTukBwYAFDzvzh+u6zVtolglAZrnE7FOmtZDBwAUTxyTas6N
WLapdnSnAwVHeC06/ioRABgAAABOIEXWD8AACTdnAN8TAAQAggP+
K8Gk1SWj+c67jiJerodkr1gntoa9dJVVN6InxB824CfKC6flE4JMWtffw0Dxh0cJ8iOQ8UeC
zoWzTs9Z+K9j1CL11CHkIIit3RK3hnfnby6whr4xoZ9UX/BUUv8FVKZeyRg7SbDKlhEZTwIH
7axjVQJ6MGU7h+0/5dKCDMEtzPY=
- KRYPTIVA SIGNATURE END -
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 1/2] Provide in-kernel headers for making it easy to extend the kernel

2019-03-12 Thread Karim Yaghmour



Hi Geert,

On 3/11/19 4:03 AM, Geert Uytterhoeven wrote:
[ snip ]

OK.

Now about the actual solution: what is your opinion on embedding e.g.
a squashfs image in the kernel instead, which would be a more generic
solution, not adding more ABI to /proc?


I'm not familiar enough with the intricacies of squashfs to have an 
educated opinion, but I hear that it's got its quirks (need for 
user-space tools, etc.) and possibly security issues. Also, I wonder 
whether it's a generalized solution that still kicks the ABI can down 
the road -- ultimately the kernel would still have a path/format/foo for 
making kheaders available in that squashfs image and that convention 
would become ABI. The only "benefit" being that said ABI wouldn't appear 
under /proc, and, tbh, I'm not sure that that's actually a benefit or is 
even idiomatic since kconfig.gz is already under /proc. To an extent, 
the precedent set by kconfig favors kheaders to also be available in the 
same location using a similar mechanism ... i.e. bonus points for 
consistency.


But that's my hand-wavy gut-reaction response to your question. I'm sure 
others on this thread have far more informed opinions about the 
specifics than I could have. My priority was to clarify the basis for 
the need being addressed.


Cheers,

--
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour


Logging/buffering mechanism comparison? (ring buffer, relay, etc.)

2014-02-05 Thread Karim Yaghmour

Just wondering if anyone had some pointers on a comparison between the
various logging/buffering mechanisms out there (ring buffer, relay,
lttng buffering, etc.)? Googling was inconclusive.

Anything that has benchmarks/pros/cons would be great.

Thanks,

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/15] tracing: 'hist' triggers

2015-03-02 Thread Karim Yaghmour

On 15-03-02 02:45 PM, Steven Rostedt wrote:
> Interesting. The Android devices I have still have it enabled (rooted,
> but still running the stock system).

I don't know that there's any policy to disable tracing on Android. The
Android framework in fact has generally been instrumented by Google
itself to output trace info into trace_marker. And the systrace/atrace
tools made available to app developers need to get access to this
tracing info. So, if Android had tracing disabled, systrace/atrace
wouldn't work.
https://developer.android.com/tools/debugging/systrace.html

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/15] tracing: 'hist' triggers

2015-03-02 Thread Karim Yaghmour

On 15-03-02 03:33 PM, Alexei Starovoitov wrote:
> that's interesting. thanks for the link.
> 
> I don't see tracing being explicitly enabled in defconfig:
> https://source.android.com/devices/tech/kernel.html
> or here:
> https://android.googlesource.com/kernel/common/+/android-3.10/android/configs/android-recommended.cfg

I don't know that either of these is "authoritative". I know of both of
these, but I've never looked at them as being the reference for what
manufacturers ship. Instead, most manufacturers get their default
kernels from SoC vendors. So it's much likelier that an Androidized
kernel tree from Qualcomm or Intel is closer to what gets really shipped
than the two links above.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 00/15] tracing: 'hist' triggers

2015-03-02 Thread Karim Yaghmour

On 15-03-02 03:48 PM, Alexei Starovoitov wrote:
> good. thanks for explaining.
> all makes sense now.
> 
> btw, that fancy systrace seems to be parsing text from trace_pipe
> https://android.googlesource.com/platform/external/chromium-trace/+/jb-dev/src/tracing/linux_perf_importer.js
> with a bunch of regex...
> including sched_switch: next_prio...

Yes, it does. This is why it's not meant for analyzing large traces.

-- 
Karim Yaghmour
CEO - Opersys inc. / www.opersys.com
http://twitter.com/karimyaghmour

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >