Re: [LIP] Re: [ilugd]: readings logs

Varun Varma Sun, 31 Mar 2002 00:12:42 -0800

On Sun, Mar 31, 2002 at 12:28:15PM +0530, Sapan J . Bhatia spoke out thus:
> // > Yes, they do, and they'd be more or less the same. It's only that one usually
> // > makes the assumption that wakeup operations in kernel space, based on
> // > wait_queues/sems/interrupts are more efficient than delegating the
> // > responsibility for this sort of thing to a user space process by setting
> // > up a timer, being woken up in userspace, making a system call and going
> // > back to sleep.
> // 
> // This is exactly the assumption I want to challenge. Operations in kernel spac// e 
>are more efficient, why? It's exactly the same kind of code running in kern// el 
>space and user space - why should kernel space be more efficient? Also, do// n't 
>forget, that in any mechanism that we discuss, we can't just look at what//  happens 
>in kernel space - we have to see the interaction between kernel spac// e and user 
>space */
> 
> That's a *misquote*.I never claimedthat operations in kernel space are more eff.
> than ones in userspace. Just taht *wakeup operations* are faster because of the
> reasons I specified later. The "how much faster is kernel space if at all" is 
> better addressed differently. We're discussing two alternatives here-
> 
> 1) Waking up the user space application with a timer, thus delegating the 
> responsibility of the wakeup operation to the process (becuase the process
> wakes up, checks and goes back to sleep).
> 2) Getting kernel space to wake tue process up on a wait_queue or whatever
> by its own mechanisms, most of which are insulated from user space processes.


We seem to have two different definitions of "wakeup." I am refering to wakeup as the 
mechanism that passes control back to the process after some condition. If I 
understand correctly, you are refering to wakeup as the work involved in the process 
regaining control and doing something thereafter.

As per my definition, how can the kernel delegate the responsibility of the wakeup to 
the process? The process cannot decide when it should be woken up, it only decides 
when it should go to sleep, and under what conditions should it "request" the kernel 
to wake it up.

Even when you use sleep(), the kernel only wakes up the process, as per the mechanism 
that I described - sleep() -> sys_nanosleep() -> schedule_timeout(TASK_INTERRUPTIBLE) 
-> add_timer -> clock interrupt -> bottom half processing -> schedulers sees if 
timeout has elapsed and wakes up the process, i.e. pass control back to the process.

The wakeup in this case is just as efficient in this case as any other syscall. The 
difference is in what happens after the wakeup. In what you propose, the process would 
be garuanteed to have some data to operate on - in my case, it might have to go back 
to sleep. But, in any case, the wakeup operation from the kernel is equally efficient, 
both involve the scheduler decieding to run the process.
 
> A look at how the line ptys, serial drivers etc. wake up the line discipline
> wait_queues in pty.c, tty_io.c and n_tty.c should give a better idea about what
> I'm saying. 

A look at how sleep() is implemented in Linux should give a better idea about what I 
am what I'm saying.

> Case 1 - process sleeps, is rudely woken up after n seconds even though it's
> cause has not yet been met with, goes back to sleep. the process is clueless
> about how the wakeup operation is taking place. the kernel is not.

It is not rudely woken up. It requested the kernel to be woken up after a particular 
time. It is not clueless how the wakeup operation is taking place - it called sleep() 
in the first place!!

> Case 2 - process sleeps, is woken up ONLY when its cause is met with - through
> a wakeup operation which is *very device specific* - eg. the pty driver's
> output buffer falling below a certain value because of which n_tty->write_q
> is weoken and POLL_OUTS sent out to processes, blocking writes woken up.
> More on this later...

Aiee!!! That's exactly the case in case of going to sleep() - it's only woken up when 
the cause it met - in this case the cause being some time elapsing.

Like I said earlier, you are mixing the mechanism of the wakeup procedure with what 
happens in the process *after* the wakeup - does it need to go back to sleep, or does 
it have all the data that it needs.

In any case, why do you think the kernel is "rude" to processes that *ask* to be woken 
up after a certain time. On the contrary, I might say that it is "rude" for the kernel 
to interrupt a process the moment these is some data to be delivered to it. Like I'll 
give an example below, sometimes, the process might *not* want the data the moment it 
is churned out.


> // > For instance, if you take the host file system that User mode linux uses,
> // > it makes it possible for kernel space code - drivers, etc. to register
> // > for callbacks which are supported asynchronously by the file-system itself.
> // > In effect, the whole loop is avoided, since the wakeup operation once again
> // > is in teh kernel.
> // > Here, user space applications know nothing about this capability of the UBD
> // > file system, so a decision on whether to employ method (1) or method (2)
> // > is just not on. You use method (2) and leave it to the kernel. There's no
> // > reason we shouldn't assume teh same for the tty drivers to be able to
> // > deal with other file systems in a similar, more efficient way than
> // > processes in user-space can.
> // 
> // We'll come to efficiency when we talk about kernel space/user space interacti// 
>ons, and the basic issue here - polling vs. interrupt driven systems.
> 
> Yes. My point exactly. We're talking about wakeup operations, not general
> KS Vs US. When we talk KS Vs US, we talk preemption, scheduler latency, 
> scheduler delays blah blah balh... 

Then talk about just wakeups, not the conditions that happen in the process 
before/after the wakeup.

> // > Also, when you're bludgening from userspace, you're forcing the operating
> // > system to keep your executable pages (and god knows what with read-ahead)
> // > in memory instead of swapping them out like any well-behaved system with
> // > 32Mb RAM should :) Especially if it's going to wake up and start being
> // > useful again 6 months later when someone uses the CDwriter again (or
> // > something). The async tty-reader would score on this, since there's no
> // > activity in the program after a poll system call, ro indefinite sleep and
> // > there's no activity in the kernel either since the wakeup operation is
> // > invoked in the event of a write to the pty.
> // 
> // That's my point! Since there is no activity in the process after the system p// 
>oll system call, the OS might swap the program out of RAM and onto the disk -// this 
>is what I meant when, later, I [rather ambiguosly] said "memory gets swa// pped to 
>disk."
> 
> Ok, my point was that when a process, such as something that dumps a LOG is 
> waiting for an event so that it can become useful again (when there's no new
> log to read, the process is useless), 

And I, on the contarary, suggested that you look at a system where there is a lot of 
log activity or the process wants to control the rate at which it recieves data.

> 1) IMHO, it's pages SHOULD BE swapped out.
> That's what the mm system is designed for - to swap out useless processes and
> swap in current ones. And calling the process time after time again PREVENTS
> this from happening, since the pages it holds never age and stay in the main
> memory and keep getting replenished each time the *TIMER* wakes up the process.
> In the case of the *final* poll callback, it is woken up whenever it's needed
> again (days, monthes... whatever later). And then its pages are swapped in againinto 
>RAM.
> If the logs're coming in at a fast rate, then the pages stay high up any way
> in LRU / LFU.

Of course that is what the mm is designed to do. Except, if you use polling, you get 
more control over execution.

> 2) there's no reason the processor should be *forced* to spin on a processs
> that's useless every n seconds. 
> --because the overall no.of spins fora wakeup operation in k-space will *always*
> be <= the overall no. of spins for a wakeup operation in u-space. again.
> *not*because* k-space is more eff. than u-space in general*. But because 
> the kernel space KNOWS more about the internal structure of what it is that's
> generating data, and is betterequipped to deal with it (be it a tty, fs, network
> driver...). So you might have to repeatedly ask k-space if data is avail. from
> u-space, but in k-space, an interrupt (or a process writing to a buffer...)
> might EXPLICITLY wakeup processes on a queue.

Like I said, the discussion is about polling vs. interrupt driven implementaions. Why 
repeatdly ask the k-space whether data is avail.? This is based on the assumption that 
data might not be avail. when you poll, or that you are interested in data the moment 
it is avail.

But what I have been saying is *consider heavily loaded logging systems*. Also don't 
forget, being able to implement *crude* buffering. Also, don't forget, my system works 
with existing apps.

Alternatively consider a system where you have to fetch data from the logs, send it 
over the network and wait atleast 2 secs and repeat the process. Now, in this case, 
you are not interested in data the moment is becomes avail., but only after 2 sec 
intervals. If we were to use pty's, we would read in data the moment in comes in 
through a thread, keep on storing it in a buffer and then runa diff. thread to send it 
out when 2 secs. elapse, not to mention all the synchronisation problems. Use sleep() 
and you the kernel do the buffering for you - just run one thread to read it in evry 2 
secs. and send it out.

As you can see, these are design issues and my point is that whether polling or 
interrupt driven operation is more efficient is dependent on the application on hand.

> Here, the only time there's any activity FOR the wakeup is when the interrupt
> is called / xterm writes to a pty... because the kernel KNOWS how the wakeup
> is being performed, and the process doesn't.

Ok, the thing I have been mentioning about statistics is this: on loaded systems, you 
can take an average of the logs produced/sec, and then use that fig. + sleep() to 
implement a log/processing buffer.

What you say is that the "kernel KNOWS how the wakeup is being perfomed, and the 
process doesn't" - why is that a good thing??? If you know what you are doing in the 
process, you would want control over when you get woken up! 

> The unthrottling of a pty + line discipline is an excellent example of this
> (pty.c -> pty_unthrottle).
> 
> // > in the case of the pty device, we're sleeping (interruptably) in kernel
> // > space on a wait queue (tty_struct tty->read_wait). As long as a pty
> // > device doesn't have any available data (i.e., the process writing to it
> // > hasn't given any for a while), read_wait stays asleep. When the process
> // > writes to a pty device, the pty driver hands over control to the line
> // > discipline, which wakes up read_wait. The inference being that ALL
> // > processes that wait for tty-input/output are put on the hook in the
> // > line-discipline in a way that puts everything to rest...
> // In the case of using a timer, we also go to sleep (interruptably) on the a wa// 
>it queue (kernel/sched.c, kernel/timer.c) in the kernel space. 
> 
> Well, all I can say is, 1) you can't always assume that whatever it is that
> you're waiting for sleeps on with a timeout. If it does, it's probably for a 
> good reason - and so be it. And besides, the timeout value for such a thing
> (as a tty) ius likely to be very high since it's waiting for something
> (such as the console driver) to wake it up and it has no use for the timeotu
> execpt to dole out exceptions.

Unless the system is loaded and you know that you won't dole out exceptions when you 
get woken up.
 
> // Now, depending of how loaded the logging is, this can be a good thing. For e.g. 
>if the log rate is very high, we can effectively bunch a set of log entries for a 
>whole second or two before deciding to read them.
> 
> This works in case (2) as well.

By bunching log entries, I mean you can buffer them!

> // The process is involved in the wakeup, since it was what requested to go to s// 
>leep - implicitly. You have to expect that when you are doing a blocking read// , 
>that you might be put to sleep. In the case of the timer, the process goes // to 
>sleep explicitly - knowingly, rather than unknowingly - more power to the // 
>programmer. And, BTW, why is it "better handled in kernel"?
> 
> The process is not involved inthe wakeup (in case 2)because there's something in the
> kernel (again - something that unthrottles a pty device / something that 
> tells the User Mode Linux kernel that there's new data waiting in the inode
> cache / the serial driver waking up the tty read_wait queue there's data)
> that wakes the process up. Ok, again as mentioned earlier the timer in the
> waitqueue *MAY* time out a couple of times before being woken up, though in 
> all the examples I've mentioned, it's always *very* unlikely.
> 
> // > // What can we say about the chances for the above two incidents to happen 
>statistically.
> // >
> // > disk accesses should happen everytime the inode buffers are flushed and will
> // > depend on writes and not fstats.
> // > the second (pty process memory being swapped out) is irrelevant since
> // > again, the user space processes are not involved in the waking up
> // > operations.
> // 
> // The real question here is not kernel space/user space - that efficiency depen// 
>ds upon how/what syscalls are used within the process. The question is a basi// c 
>diff. between polling and interrupt driven systems. What you are proposing // is not 
>more efficient because everything would happen in kernel space - it mi// ght be more 
>efficient because there would be no "false triggers", i.e. you ge// t woken up from a 
>timer sleep, only to see that there is no data available. B// ut, again, as I 
>mentioned earlier, this would not be an issue on loaded syste// ms, and would anyway 
>give you *very*  crude mechanism for implementing buffer// ing.
> 
> Again, eff. of syscalls, context switching is a part of a larger discussion
> of KS / US eff (which I'm quite interested in myself). But that would be a new 
>thread... 
> 
> // So, while I agree with your suggestion that psuedo file system might be more // 
>efficient, I don't agree with the reason - it ain't because everthing happens//  in 
>kernel space - everything happens in kernel space even when you use polli// ng, 
>albeit the program gets a little more control.
> 
> aaaaargh! ok. I just rememberd you've written this in the same email. as
> mentioned earlier, this is a misquote (1G).

What we have been discussing here is differences in implementation of a specific task 
- should you poll or should you be interrupted. What should be used is very specific 
to the application at hand. You can't have a carte blanch and say the "polling is 
always inefficient" - if used properly, and under the right circumstances, polling can 
be more efficient the interrupt driven operations - and all of this has nothing to do 
with how the kernel wakes up the process - it's do with what happens before/after the 
wakeup.

I am not suggesting that polling is better than interrupt driven operations. I am 
merely that under some conditions, it might be.
 
> // Now, any more discussions on this and I will send you very ugly naked pictures of 
>myself :)
> 
> updated procmail. thanks

Liar! People, I'll have you know that I sent him those pictures anyway, and he, 
unfortuanately, liked them and is asking for more. This time around though, I am going 
to send him naked pictures of ..er..Saddam Hussain.

Regards,
-Varun
-- 
---------------------------------------
Mindframe Software & Services Pvt. Ltd.

http://www.mindsw.com
---------------------------------------

          ================================================
To subscribe, send email to [EMAIL PROTECTED] with subscribe in subject header
To unsubscribe, send email to [EMAIL PROTECTED] with unsubscribe in subject header
Archives are available at http://www.mail-archive.com/ilugd%40wpaa.org
          =================================================

Re: [LIP] Re: [ilugd]: readings logs

Reply via email to