Re: [ilugd]: readings logs

Varun Varma Sat, 30 Mar 2002 10:53:50 -0800

at, Mar 30, 2002 at 05:01:01AM +0530, Sapan J . Bhatia spoke out thus:
> // Welcome to the list! And here begins the nitpicking...
> *groan*  :)


moan, aah, aah, aahhhhh...was is good for you too?

> // tail does what it does as it cannot assume anything about the source of the 
>generated file - can be logging process or some app or even a special device file.
>
> Yes, the suggestion for hooking up with a pty was assuming that one has
> control over the writing process, eg. syslogd, my_prog.c, UML.
>
> // Now, an interesting question is that is this more efficient, i.e. reading/writing 
>to/from named pipe/pty rather than fstat'ing the file - don't both of them incur the 
>same penalty in terms of system calls?
>
> Yes, they do, and they'd be more or less the same. It's only that one usually
> makes the assumption that wakeup operations in kernel space, based on
> wait_queues/sems/interrupts are more efficient than delegating the
> responsibility for this sort of thing to a user space process by setting
> up a timer, being woken up in userspace, making a system call and going
> back to sleep.

This is exactly the assumption I want to challenge. Operations in kernel space are 
more efficient, why? It's exactly the same kind of code running in kernel space and 
user space - why should kernel space be more efficient? Also, don't forget, that in 
any mechanism that we discuss, we can't just look at what happens in kernel space - we 
have to see the interaction between kernel space and user space.

> For instance, if you take the host file system that User mode linux uses,
> it makes it possible for kernel space code - drivers, etc. to register
> for callbacks which are supported asynchronously by the file-system itself.
> In effect, the whole loop is avoided, since the wakeup operation once again
> is in teh kernel.
> Here, user space applications know nothing about this capability of the UBD
> file system, so a decision on whether to employ method (1) or method (2)
> is just not on. You use method (2) and leave it to the kernel. There's no
> reason we shouldn't assume teh same for the tty drivers to be able to
> deal with other file systems in a similar, more efficient way than
> processes in user-space can.

We'll come to efficiency when we talk about kernel space/user space interactions, and 
the basic issue here - polling vs. interrupt driven systems.

> Also, when you're bludgening from userspace, you're forcing the operating
> system to keep your executable pages (and god knows what with read-ahead)
> in memory instead of swapping them out like any well-behaved system with
> 32Mb RAM should :) Especially if it's going to wake up and start being
> useful again 6 months later when someone uses the CDwriter again (or
> something). The async tty-reader would score on this, since there's no
> activity in the program after a poll system call, ro indefinite sleep and
> there's no activity in the kernel either since the wakeup operation is
> invoked in the event of a write to the pty.

That's my point! Since there is no activity in the process after the system poll 
system call, the OS might swap the program out of RAM and onto the disk - this is what 
I meant when, later, I [rather ambiguosly] said "memory gets swapped to disk."

> // You say that fstat would have to access the disk? This is something that I am not 
>sure about - when does Linux serialize changed filesyste info to disk? Are there 
>chances that when I do two consecutive fstat's "quickly" I might get data from in 
>memory?
>
> fstats never read from disk. AFAIK, the generic buffer cache/ inode cache are 
>responsible this. i.e., values come out of cache and not the disk and
> the cache is modified when the inode written to.

So, the process I want to run this for remains in memory *and* fstat data is read from 
memory. Umm.. sounds good to me so far.

> // What about the pty device? What if that memory gets swapped to disk?
>
> ok. I don't think I understood your question right. lemme know if I've got
> you all wrong--

Like I mentioned above, the process that uses the pty slave device might get swapped 
to disk.

> in the case of the pty device, we're sleeping (interruptably) in kernel
> space on a wait queue (tty_struct tty->read_wait). As long as a pty
> device doesn't have any available data (i.e., the process writing to it
> hasn't given any for a while), read_wait stays asleep. When the process
> writes to a pty device, the pty driver hands over control to the line
> discipline, which wakes up read_wait. The inference being that ALL
> processes that wait for tty-input/output are put on the hook in the
> line-discipline in a way that puts everything to rest...

In the case of using a timer, we also go to sleep (interruptably) on the a wait queue 
(kernel/sched.c, kernel/timer.c) in the kernel space. If you really must know, glibc 
implements sleep() through nanosleep() which in turn is implemented by sys_nanosleep 
which puts the process in TASK_INTERUPTIPLE state and then calls schedule_timeout, 
which adds the actual timer and calls schedule(). We only get woken up when the timer 
goes off.

Now, depending of how loaded the logging is, this can be a good thing. For e.g. if the 
log rate is very high, we can effectively bunch a set of log entries for a whole 
second or two before deciding to read them.

You do a non-blocking read after you get woken up, so that if there is no data, you 
return immidiately. Now, you might say that that's another syscall, so another context 
switch, and thus less efficieny. Except, what exactly is the penalty of a context 
switch? No one's been able to give me an accurate idea of this one. And, of-course, 
let's not forget calling a syscall gives a chance to the scheduler to run, so that 
might be good. Also, let's not forget - this implementation would not require any 
changes at all - things would work with existing files et. al.


> So the processes aren't involved AT ALL in the wake=up
> operations, which is how it should be since even here, wakeups can be
> invoked by things like a call on fflush on the pty device, which again
> the reading process knows nothing about and are better handled in kernel
> space.

The process is involved in the wakeup, since it was what requested to go to sleep - 
implicitly. You have to expect that when you are doing a blocking read, that you might 
be put to sleep. In the case of the timer, the process goes to sleep explicitly - 
knowingly, rather than unknowingly - more power to the programmer. And, BTW, why is it 
"better handled in kernel"?

> // What can we say about the chances for the above two incidents to happen 
>statistically.
>
> disk accesses should happen everytime the inode buffers are flushed and will
> depend on writes and not fstats.
> the second (pty process memory being swapped out) is irrelevant since
> again, the user space processes are not involved in the waking up
> operations.

The real question here is not kernel space/user space - that efficiency depends upon 
how/what syscalls are used within the process. The question is a basic diff. between 
polling and interrupt driven systems. What you are proposing is not more efficient 
because everything would happen in kernel space - it might be more efficient because 
there would be no "false triggers", i.e. you get woken up from a timer sleep, only to 
see that there is no data available. But, again, as I mentioned earlier, this would 
not be an issue on loaded systems, and would anyway give you *very*  crude mechanism 
for implementing buffering.

And these are the stats that I look for - what is going to happen statistically.

So, while I agree with your suggestion that psuedo file system might be more 
efficient, I don't agree with the reason - it ain't because everthing happens in 
kernel space - everything happens in kernel space even when you use polling, albeit 
the program gets a little more control.

Now, any more discussions on this and I will send you very ugly naked pictures of 
myself :)

Regards,
-Varun
---------------------------------------
Mindframe Software & Services Pvt. Ltd.

http://www.mindsw.com
---------------------------------------

          ================================================
To subscribe, send email to [EMAIL PROTECTED] with subscribe in subject header
To unsubscribe, send email to [EMAIL PROTECTED] with unsubscribe in subject header
Archives are available at http://www.mail-archive.com/ilugd%40wpaa.org
          =================================================

Re: [ilugd]: readings logs

Reply via email to