Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.43-00

2005-04-04 Thread Esben Nielsen
On Mon, 4 Apr 2005, Steven Rostedt wrote:

> On Mon, 2005-04-04 at 22:47 +0200, Ingo Molnar wrote:
> 
> > > Currently my fix is in yield to lower the priority of the task calling 
> > > yield and raise it after the schedule.  This is NOT a proper fix. It's 
> > > just a hack so I can get by it and test other parts.
> > 
> > yeah, yield() is a quite RT-incompatible concept, which could livelock 
> > an upstream kernel just as much - if the task in question is SCHED_FIFO.  
> > Almost all yield() uses should be eliminated from the upstream kernel, 
> > step by step.
> 
> Now the question is, who will fix it? Preferably the maintainers, but I
> don't know how much of a priority this is to them. I don't have the time
> now to look at this and understand enough about the code to be able to
> make a proper fix, and I'm sure you have other things to do too.

How about adding a
 if(rt_task(current)) {
WARN_ON(1);
mutex_setprio(current, MAX_PRIO-1)
 }
?

to find all calls to yields from rt-tasks. That will force the user (aka
the real-time developer) to either stop calling the subsystems still using
yield from his RT-tasks, or fix those subsystems.

Esben

> 
> -- Steve
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.43-00

2005-04-04 Thread Esben Nielsen
On Mon, 4 Apr 2005, Zwane Mwaikambo wrote:

> On Mon, 4 Apr 2005, Steven Rostedt wrote:
> 
> > On Mon, 2005-04-04 at 22:47 +0200, Ingo Molnar wrote:
> > 
> > > > Currently my fix is in yield to lower the priority of the task calling 
> > > > yield and raise it after the schedule.  This is NOT a proper fix. It's 
> > > > just a hack so I can get by it and test other parts.
> > > 
> > > yeah, yield() is a quite RT-incompatible concept, which could livelock 
> > > an upstream kernel just as much - if the task in question is SCHED_FIFO.  
> > > Almost all yield() uses should be eliminated from the upstream kernel, 
> > > step by step.
> > 
> > Now the question is, who will fix it? Preferably the maintainers, but I
> > don't know how much of a priority this is to them. I don't have the time
> > now to look at this and understand enough about the code to be able to
> > make a proper fix, and I'm sure you have other things to do too.
> 
> I'm sure a lot of the yield() users could be converted to 
> schedule_timeout(), some of the users i saw were for low memory conditions 
> where we want other tasks to make progress and complete so that we a bit 
> more free memory.
> 

Easy, but damn ugly. Completions are the right answer. The memory system
needs a queue system where tasks can sleep (with a timeout) until the
right amount of memory is available instead of half busy-looping.

Esben


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.43-00

2005-04-05 Thread Esben Nielsen
On Tue, 5 Apr 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > > Now the question is, who will fix it? Preferably the maintainers, but I
> > > don't know how much of a priority this is to them. I don't have the time
> > > now to look at this and understand enough about the code to be able to
> > > make a proper fix, and I'm sure you have other things to do too.
> > 
> > How about adding a
> >  if(rt_task(current)) {
> > WARN_ON(1);
> > mutex_setprio(current, MAX_PRIO-1)
> >  }
> > ?
> > 
> > to find all calls to yields from rt-tasks. That will force the user 
> > (aka the real-time developer) to either stop calling the subsystems 
> > still using yield from his RT-tasks, or fix those subsystems.
> 
> i've added this to the -43-08 patch, so that we can see the scope of the 
> problem. But any yield() use could become a problem due to priority 
> inheritance. (which might eventually be expanded to userspace locking 
> too)
> 
Any calls to non-deterministic subsystems breaks the real-time properties.
yield() is certainly not the only problem. Code waiting for RCU-completion
or whatever is bad too. Calling code like that from RT-tasks or calling
them while having locks shared with RT-tasks is just bad. Anyone knowing
about RT development _has_ to know that. Putting warnings and traces into
the kernel is a nice feature. 

Static code analyzes would also help quite a bit. What about having a new
attribute "nonrt" for functions and locks? yield() and syncronize_kernel() are 
certain candidates. Any function having nonrt operations are marked 
nonrt. Any lock becomes held while doing a nonrt operation is marked
nonrt. Taking a nonrt lock is a nonrt operation. (Might end up marking the
whole kernel nonrt)

Esben

>   Ingo



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Priority Lists for the RT mutex

2005-04-12 Thread Esben Nielsen
I looked at the PI-code to see what priority the task (old_owner below)
would end up with when it released a lock. From rt.c:

prio = mutex_getprio(old_owner);
if (new_owner && !plist_empty(&new_owner->pi_waiters)) {
w = plist_entry(&new_owner->pi_waiters, struct
rt_mutex_waiter, pi_list);
prio = w->task->prio;
}
if (prio != old_owner->prio)
pi_setprio(lock, old_owner, prio);

What has new_owner to do with it? Shouldn't it be old_owner in these
lines? I.e. the prio we want to set old_owner to should be the prio of the
head of the old_owner->pi_waiters, not the new_owner!

Esben


On Mon, 11 Apr 2005, Ingo Molnar wrote:

> 
> * Perez-Gonzalez, Inaky <[EMAIL PROTECTED]> wrote:
> 
> > Let me re-phrase then: it is a must have only on PI, to make sure you 
> > don't have a loop when doing it. Maybe is a consequence of the 
> > algorithm I chose. -However- it should be possible to disable it in 
> > cases where you are reasonably sure it won't happen (such as kernel 
> > code). In any case, AFAIR, I still did not implement it.
> 
> are there cases where userspace wants to disable deadlock-detection for 
> its own locks?
> 
> the deadlock detector in PREEMPT_RT is pretty much specialized for 
> debugging (it does all sorts of weird locking tricks to get the first 
> deadlock out, and to really report it on the console), but it ought to 
> be possible to make it usable for userspace-controlled locks as well.
> 
>   Ingo
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: FUSYN and RT

2005-04-12 Thread Esben Nielsen
I think we (at least) got a bit confused here. What (I think) the thread
started out with was a clear layering of the mutexes. I.e. the code obeys
the grammar

 VALID_LOCK_CODE   = LOCK_FUSYN VALID_LOCK_CODE UNLOCK_FUSYN 
   | VALID_LOCK_CODE VALID_LOCK_CODE
   | VALID_RTLOCK_CODE
 VALID_RTLOCK  = LOCK_RTLOCK VALID_RTLOCK_CODE UNLOCK_RTLOCK
   | VALID_RTLOCK_CODE VALID_RTLOCK_CODE
   | VALID_SPINLOCK_CODE
   | (code with no locks at all)
 VALID_SPINLOCK_CODE = ... :-)

In that context the case is simple: Fusyn's and RT-locks can easily
co-exist. One only need an extra level akin to static_prio to fall back to
when the last fusyn is unlocked. The API's should be _different_, but
fusyn_setprio() should both update static_prio and call mutex_setprio().
There will never be deadlocks involving both types of locks, as Daniel
said because the lock nesting is sorted out. Furtheremore, unbalanced
(incorrect) code like
   LOCK_FUSYN VALID_RTLOCK_CODE (no unlock)
will never hit the RT-level. So assuming the RT-lock based code is
debugged the error must be in Fusyn based code.

Esben

On Tue, 12 Apr 2005, Perez-Gonzalez, Inaky wrote:

> >From: Esben Nielsen [mailto:[EMAIL PROTECTED]
> >On 12 Apr 2005, Daniel Walker wrote:
> >
> >>
> >>
> >> At least, both mutexes will need to use the same API to raise and
> lower
> >> priorities.
> >
> >You basicly need 3 priorities:
> >1) Actual: task->prio
> >2) Base prio with no RT locks taken: task->static_prio
> >3) Base prio with no Fusyn locks taken: task->??
> >
> >So no, you will not need the same API, at all :-) Fusyn manipulates
> >task->static_prio and only task->prio when no RT lock is taken. When
> the
> >first RT-lock is taken/released it manipulates task->prio only. A
> release
> >of a Fusyn will manipulate task->static_prio as well as task->prio.
> 
> Yes you do. You took care of the simple case. Things get funnier
> when you own more than one PI lock, or you need to promote a
> task that is blocked on other PI locks whose owners are blocked
> on PI locks (transitivity), or when you mix PI and PP (priority
> protection/ priority ceiling).
> 
> In that case not having a sim{pl,g}e API for doing it is nuts.
> 
> >> The next question is deadlocks. Because one mutex is only in the
> kernel,
> >> and the other is only in user space, it seems that deadlocks will
> only
> >> occur when a process holds locks that are all the same type.
> >
> >Yes.
> >All these things assumes a clear lock nesting: Fusyns are on the outer
> >level, RT locks on the inner level. What happens if there is a bug in
> RT
> >locking code will be unclear. On the other hand errors in Fusyn locking
> >(user space) should not give problems in the kernel.
> 
> Wrong. Fusyns are kernel locks that are exposed to user space (much as
> a file descriptor is a kernel object exposed to user space through
> a system call). Of course if the user does something mean with them
> they will cause an error, but should not have undesired consequences
> in the kernel. But BUGS in the code will be as unclear as in RT mutexes.
> 
> >it is is bad maintainance to have to maintain two seperate systems. The
> >best way ought to be to try to only have one PI system. The kernel is
> big
> >and confusing enough as it is!
> 
> Ayeh for the big...it is not that confusing :)
> 
> -- Inaky
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: FUSYN and RT

2005-04-12 Thread Esben Nielsen
On 12 Apr 2005, Daniel Walker wrote:

> 
> I just wanted to discuss the problem a little more. From all the
> conversations that I've had it seem that everyone is worried about
> having PI in Fusyn, and PI in the RT mutex. 
> 
> It seems like these two locks are going to interact on a very limited
> basis. Fusyn will be the user space mutex, and the RT mutex is only in
> the kernel. You can't lock an RT mutex and hold it, then lock a Fusyn
> mutex (anyone disagree?). That is assuming Fusyn stays in user space.
> 
> The RT mutex will never lower a tasks priority lower than the priority
> given to it by locking a Fusyn lock.

I have not seen the Fusyn code. Where is the before-any-lock priority
stored? Ingo's code sets the prio back to what is given by static_prio.
So, if Fusyn sets static_prio it will work as you say. It it will then be
up to Fusyn to restore static_prio to what it was before the first Fusyn
lock.

> 
> At least, both mutexes will need to use the same API to raise and lower
> priorities.

You basicly need 3 priorities:
1) Actual: task->prio
2) Base prio with no RT locks taken: task->static_prio
3) Base prio with no Fusyn locks taken: task->??

So no, you will not need the same API, at all :-) Fusyn manipulates
task->static_prio and only task->prio when no RT lock is taken. When the
first RT-lock is taken/released it manipulates task->prio only. A release
of a Fusyn will manipulate task->static_prio as well as task->prio.

> 
> The next question is deadlocks. Because one mutex is only in the kernel,
> and the other is only in user space, it seems that deadlocks will only
> occur when a process holds locks that are all the same type.

Yes.
All these things assumes a clear lock nesting: Fusyns are on the outer
level, RT locks on the inner level. What happens if there is a bug in RT
locking code will be unclear. On the other hand errors in Fusyn locking
(user space) should not give problems in the kernel.

> 
> 
> Daniel
> 

I think that it might be a fast track to get things done to have a double
PI-locking system, one for the kernel and one for userspace. But I think
it is is bad maintainance to have to maintain two seperate systems. The
best way ought to be to try to only have one PI system. The kernel is big
and confusing enough as it is!

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RT and XFS

2005-07-18 Thread Esben Nielsen
On Fri, 15 Jul 2005, Daniel Walker wrote:

> On Fri, 2005-07-15 at 12:23 +0200, Ingo Molnar wrote:
> > * Daniel Walker <[EMAIL PROTECTED]> wrote:
> > 
> > > PI is always good, cause it allows the tracking of what is high 
> > > priority , and what is not .
> > 
> > that's just plain wrong. PI might be good if one cares about priorities 
> > and worst-case latencies, but most of the time the kernel is plain good 
> > enough and we dont care. PI can also be pretty expensive. So in no way, 
> > shape or form can PI be "always good".
> 
> I don't agree with that. But of course I'm always speaking from a real
> time perspective . PI is expensive , but it won't always be. However, no
> one is forcing PI on anyone, even if I think it's good ..
> 

Is PI needed? If you use a mutex to protect a critical area you are
destroying the strict meaning of priorities if the mutex doesn't have PI:
Priority inversion can effectively make the high priority task low
priority in that situation and postpone it's execution indefinitely. 
For RT applications that is clearly unacceptable.

One can argue that for non-RT tasks priorities aren't supposed to be that 
rigid as for RT tasks, anyway. Therefore it doesn't matter so much.
But as I read the comments in sched.c a nice -20 task have to preempt any
nice 0 task no matter how much a cpu-hog it is. If it happens to share a
critical section with a nice +19 task, priority inversion will
occationally destroy that property. If we disregard the costs of PI, PI is
thus a good thing.

But how expensive is PI? Ofcourse there is an overhead in doing
the calculations. Ingo's implementation can be optimized quite a bit once
things are settled but it will always be many times more expensive than a
raw spin-lock. But is it much more expensive than a plain binary
semaphore?

If the is no congestion on a mutex the PI code will not be called at all.
On UP, the only occation where congestion can occur is when a low
priority task is preempted by a higher priority task while it has the
mutex. So let us look at the expensive part where the high priority task
tries to grab the mutex:

With PI: The owner have to be boosted, an immediate task switch have to
take place, the owner runs to the unlock operation and it set down in
priority, whereafter there is a task-switch again to the highpriority
task.

Without PI: The owner waits and there is a task switch to some thread
which might not be the owner but often is. When the owner eventually
unlocks the mutex it will be follow by a task-switch - because congestion
can only occur when the task trying to get the task preempts and thus have
higher priority than the owner.

The number of task switches are thus the same with and without PI!

And then there is the cache issue: When other tasks gets scheduled in the
priority inversion case the data being protected can be flushed from the
cache while they are running. With PI the CPU continues to work with the
same data - and most often in the same code module. I.e. there is a higher
chance that the instruction and data cache contains the right data.

Thus in the end it all depends on how cheaply the PI calculations can be
made.

Esben

> Daniel
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RT and XFS

2005-07-18 Thread Esben Nielsen
On Thu, 14 Jul 2005, Christoph Hellwig wrote:

> On Thu, Jul 14, 2005 at 08:56:58AM -0700, Daniel Walker wrote:
> > This reminds me of Documentation/stable_api_nonsense.txt . That no one
> > should really be dependent on a particular kernel API doing a particular
> > thing. The kernel is play dough for the kernel hacker (as it should be),
> > including kernel semaphores.
> > 
> > So we can change whatever we want, and make no excuses, as long as we
> > fix the rest of the kernel to work with our change. That seems pretty
> > sensible , because Linux should be an evolution. 
> 
> Daniel, get a fucking clue.  Read some CS 101 literature on what a semaphore
> is defined to be.  If you want PI singing dancing blinking christmas tree
> locking primites call them a mutex, but not a semaphore.
>

As a matter of fact I just finished what corresponds to your "CS 101" (I
study CS in spare time while having a full time job coding RT stuff):
To the one lecture I attended they talked about sempahores. They tought
students to use binary semphores for locking. Based on real-life
experience (and the Pathfinder story), I complained and told
them they ought to teach the students to use a mutex instead. They had no
clue "It is the same thing they said". Yes, a mutex can be implemented
just as a binary semaphore but the semantics of it is different. In RT the
difference is very important and even without-RT it is a good idea to
maintain the difference for readability and deadlock detection. If you
later on want to optimize the semaphore for what it is used for it is also
good to have maintained that information. It is a bit like discarding
the type information from you programs. You want to keep the type information 
even though the compilere end up producing the same code.

The kernel developer clearly have followed the same lectures and used
plain binary semaphores, sometimes calling the mutex sometimes semaphore.
I believe that the semaphore ought to be removed. Either use a mutex or
a completion. Far the most code is using a sempahore as either signalling 
- i.e. as a completion - or critical sections - i.e. as a mutex. If code
mixes the usage it is must likely very hard to read

Unfortunately, one of the goals of the preempt-rt branch is to avoid
altering too much code. Therefore the type semaphore can't be removed
there. Therefore the name still lingers ... :-(

Esben



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable

2005-07-27 Thread Esben Nielsen
On Wed, 27 Jul 2005, Ingo Molnar wrote:

> 
> * Steven Rostedt <[EMAIL PROTECTED]> wrote:
> 
> > Perfectly understood.  I've had two customers ask me to increase the 
> > priorities for them, but those where custom kernels, and a config 
> > option wasn't necessary. But since I've had customers asking, I 
> > thought that this might be something that others want.  But I deal 
> > with a niche market, and what my customers want might not be what 
> > everyone wants. (hence the RFC in the subject).
> > 
> > So if there are others out there that would prefer to change their 
> > priority ranges, speak now otherwise this patch will go by the waste 
> > side.
> 
> i'm not excluding that this will become necessary in the future. We 
> should also add the safety check to sched.h - all i'm suggesting is to 
> not make it a .config option just now, because that tends to be fiddled 
> with.
> 
Isn't there a way to mark it "warning! warning! dangerous!" ?

Anyway: I think 100 RT priorities is way overkill - and slowing things
down by making the scheduler checking more empty slots in the runqueue.
Default ought to be 10. In practise it will be very hard to have
a task at the lower RT priority behaving real-time with 99 higher
priority tasks around. I find it hard to believe that somebody has an RT
app needing more than 10 priorities and can't do with RR or FIFO
scheduling within a fewer number of prorities.

Esben

>   Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable

2005-07-27 Thread Esben Nielsen
On Wed, 27 Jul 2005, K.R. Foley wrote:

> Esben Nielsen wrote:
> > On Wed, 27 Jul 2005, Ingo Molnar wrote:
> > 
> > 
> >>* Steven Rostedt <[EMAIL PROTECTED]> wrote:
> >>
> >>
> >>>Perfectly understood.  I've had two customers ask me to increase the 
> >>>priorities for them, but those where custom kernels, and a config 
> >>>option wasn't necessary. But since I've had customers asking, I 
> >>>thought that this might be something that others want.  But I deal 
> >>>with a niche market, and what my customers want might not be what 
> >>>everyone wants. (hence the RFC in the subject).
> >>>
> >>>So if there are others out there that would prefer to change their 
> >>>priority ranges, speak now otherwise this patch will go by the waste 
> >>>side.
> >>
> >>i'm not excluding that this will become necessary in the future. We 
> >>should also add the safety check to sched.h - all i'm suggesting is to 
> >>not make it a .config option just now, because that tends to be fiddled 
> >>with.
> >>
> > 
> > Isn't there a way to mark it "warning! warning! dangerous!" ?
> > 
> > Anyway: I think 100 RT priorities is way overkill - and slowing things
> > down by making the scheduler checking more empty slots in the runqueue.
> > Default ought to be 10. In practise it will be very hard to have
> > a task at the lower RT priority behaving real-time with 99 higher
> > priority tasks around. I find it hard to believe that somebody has an RT
> > app needing more than 10 priorities and can't do with RR or FIFO
> > scheduling within a fewer number of prorities.
> > 
> > Esben
> > 
> 
> Actually, is it really that slow to search a bitmap for a slot that 
> needs processing? 
No, it is ultra fast - but done very often.

> I work on real-time test stands which are less of an 
> embedded system and more of a real Unix system that require determinism. 
> It is very nice in some cases to have more than 10 RT priorities to work 
> with.

What for? Why can't you use FIFO at the same priorities for some of your
tasks? I pretty much quess you have a very few tasks which have some high
requirements. The rest of you "RT" task could easily share the lowest RT
priority. FIFO would also be more effective as you will have context
switches.

This about multiple priorities probably comes from an ordering of tasks:
You have a lot of task. You have a feeling about which one ought to be
more important than the other. Thus you end of with an ordered list of
tasks. BUT when you boil it down to what RT is all about, namely
meeting your deadlines, it doesn't matter after the 5-10 priorities
because the 5-10 priorities have introduced a lot of jitter to the rest
of the tasks anyway. You can just as well just put them at the same
priority.

Esben

> 
> -- 
> kr
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable

2005-07-27 Thread Esben Nielsen
On Wed, 27 Jul 2005, Steven Rostedt wrote:

> On Wed, 2005-07-27 at 19:01 +0200, Esben Nielsen wrote:
> > 
> > What for? Why can't you use FIFO at the same priorities for some of your
> > tasks? I pretty much quess you have a very few tasks which have some high
> > requirements. The rest of you "RT" task could easily share the lowest RT
> > priority. FIFO would also be more effective as you will have context
> > switches.
> > 
> > This about multiple priorities probably comes from an ordering of tasks:
> > You have a lot of task. You have a feeling about which one ought to be
> > more important than the other. Thus you end of with an ordered list of
> > tasks. BUT when you boil it down to what RT is all about, namely
> > meeting your deadlines, it doesn't matter after the 5-10 priorities
> > because the 5-10 priorities have introduced a lot of jitter to the rest
> > of the tasks anyway. You can just as well just put them at the same
> > priority.
> 
> Nope, I wouldn't agree with you here.  If you have tasks that will run
> periodically, at different frequencies, you need to order them. And each
> task would probably need a different priority. FIFO is very dangerous
> since it doesn't release a task until that task voluntarily sleeps.
> 
> A colleague of mine, well actually the VP of my company of the time,
> Doug Locke, gave me a perfect example.  If you have a program that runs
> a nuclear power plant that needs to wake up and run 4 seconds every 10
> seconds, and on that same computer you have a program running a washing
> machine that needs to wake up every 3 seconds and run for one second
> (I'm using seconds just to make the example simple). Which process gets
> the higher priority?  The answer is the washing machine.
> 
> Rational:  If the power plant was higher priority, the washing machine
> would fail almost every time, since the power plant program would run
> for 4 seconds, and since the cycle of the washing machine is 3 seconds,
> it would fail everytime the nuclear power plant program ran.  Now if you
> have the washing machine run in it's cycle, the nuclear power plant can
> easily make the 4 seconds ever 10 seconds, even when it is interrupted
> by the washing machine.
> 

This is rate monotonic scheduling
(http://en.wikipedia.org/wiki/Rate-monotonic_scheduling).
((Notice: The article gets it wrong on the priority inheritance/ceiling
stuff...))
(Notice: The fewer tasks the higher the theoretical max to the CPU
utialization :-)
In theory it works fine, but there is some drawbacks when you go to
reality:
1) You have to know the running time of all your tasks.
2) The running time is assumed contant.
3) You don't take into account the time it takes to perform a task switch.
4) Mutual exclution isn't considered.
5) What if things aren't periodic?

To make this work according to the theory you have to make a really,
really detailled analysis of your system. It is not really posible for
more than a few tasks. In practise you will have to be very much below the
theoretical limit for CPU utialization to be safe. 

And my claim is:
All the different periodic jobs you want to perform can most often easily
be put into groups like 1ms, 5ms, 20ms, 100ms groups. In each group you
can easily run with fifo preemption - and even within one OS thread.

Think about it: If you have two tasks with close to the same period and
one runs too long for the other to meets it's deadlines within fifo setup,
you are bound to be very close to the limits no matter which one you give
the highest priority.

> Doug also mentioned that you really want to have every task with a
> different priority, so it makes sense to have a lot of priorities.  I
> can't remember why he said this, but I'm sure you and I can find out by
> searching through his papers.
> 

On the contrary. Especially if you have mutual exclustion. If you run two
tasks at the same priority with fifo you don't hit congestion on you
mutexes (between those two tasks at least). If you give one of them just
one higher priority it will preempt the other. You thus risk congestion -
which is an expensive thing. Thus by giving your tasks different
priorities you risk that you system can't meet the deadlines due to the
extra CPU used!!

> -- Steve
> 
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH] Make MAX_RT_PRIO and MAX_USER_RT_PRIO configurable

2005-07-28 Thread Esben Nielsen

On Wed, 27 Jul 2005, K.R. Foley wrote:

> Esben Nielsen wrote:
> [...]
> 
> All of the RT priorities that we have are not absolutely necessary. As I 
> think Steven pointed out in another email, it is nice though to be able 
> to priortize tasks using large jumps in priorities and then being able 
> to fill in tasks that are dependent on other tasks in between. 

For portability you shouldn't hardcode your priorities anyway. You need
some sort of abstraction layer. It should be very simple to code one such 
you at least can avoid having gaps in your priorities. Making it figure
out who can share priorities is probably harder. 
Under all circumstances: You are stressing the system run-time because you
didn't do a proper job compile and boot time. 

> Even if 
> you think of nothing but the IRQ handlers, the 5-10 priorities quickly 
> get crowded without any user tasks.

Why do the irq-handlers need _different_ priorities? Do they really have
to preempt each other? It is likely that  the longest handler runs
longer than the accepted latency for the most critical handler. Thus the
most critical handler has to preempt that one. But I bet you don't have a
system with 10 interrupts source where handler 1 needs to preempt handler 
2, handler 2 needs to preempt handler 3 etc. At most you have a system
where handlers 1-3 need to preempt handlers 4-10 (and the application)
handler 4 and 5 need to preempt handler 6-10, while handlers 6-10 don't
need to preempt any of the other handlers. In that case you only need 3
priorities, not 5-10. (The highest priority can very well be hard-irq
context :-). 
Then add 2 RT priorities for your application. You need one thread to
handle the data from the high priority interrupt and 1 for the middle
interrupt. You thus have 5 RT priorities
 1: handlers 1-3 (can be in hard irq)
 2: application thread 1
 3: handlers 4 and 5
 4: application thread 2
 5: the rest of the irq handlers

Even as your application grows big it doesn't help throwing in more
priorities. If a task runs for very long and have a low latency limit you
will have a hard time no matter what you do. If all the low latency stuff
runs sufficient fast it can just as well run with FIFO priority wrt. each
other.
And even if you split the stuff up in seperate OS threads giving them the
same RT priority and FIFO policy will make things run faster due to fewer
task switches. Preemption is expensive. Even though you want it, you
should design your system such it doesn't happen too often.

Esben

> 
> 
> -- 
> kr
> 


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 00/14] ppc32: Remove board ports that are no longer maintained

2005-07-29 Thread Esben Nielsen


On Wed, 27 Jul 2005, Matt Porter wrote:

> On Wed, Jul 27, 2005 at 09:27:41AM -0700, Eugene Surovegin wrote:
> > On Wed, Jul 27, 2005 at 12:13:23PM -0400, Michael Richardson wrote:
> > > Kumar, I thought that we had some volunteers to take care of some of
> > > those. I know that I still care about ep405, and I'm willing to maintain
> > > the code.
> > 
> > Well, it has been almost two months since Kumar asked about maintenance 
> > for this board. Nothing happened since then.
> > 
> > Why is it not fixed yet? Please, send a patch which fixes it. This is 
> > the _best_ way to keep this board in the tree, not some empty 
> > maintenance _promises_.
> 
> When we recover our history from the linuxppc-2.4/2.5 trees we can
> show exactly how long it's been since anybody touched ep405.
> 
> Quick googling shows that it's been almost 2 years since the last
> mention of ep405 (exluding removal discussions) on linuxppc-embedded.
> Last ep405-related commits are more than 2 years ago.
> 
I don't follow that reasoning. Even broken drivers(board support files,
whateever) are better than non.

Take ArcNet support forinstance. Clearly it hadn't been used in any 2.6
kernel up until around 2.6.10. It was highly broken (call to
uninitialized function pointer). But I needed it. I fixed it and send the
patch so it works from 2.6.11 and up.  If the driver had been dropped in
the 2.6 series because nobody actively maintained it, I  wouldn't have got
around to fix it at all and was probably forced to use another OS for my
perpose.  

But because the driver was still in there and somebody had made sure it
was updated along the changes to the API in the 2.6 kernel, it was easy
for me to fix it although I didn't know so much about the kernel internals
at that time.

Esben





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT/Cogito question] Access to specific versions of the kernel

2005-07-31 Thread Esben Nielsen
I finally succeeded to get cg-clone to work on 
linux/kernel/git/torvalds/linux-2.6.git
I can see the 2.6.13-rc4 is in there and can use cg-diff to see the
difference between the current tree and 2.6.13-rc4. 

But how to I extract the 2.6.13-rc4 source from the tree?

Or even more complicated: I would like to make a branch based on
2.6.13-rc4 and work from there. At some point I would like to jump to
2.6.13-rc5 (or 2.6.13). I do not want to have the in-between changes
tickle in. I.e. I need something like "cvs rtag -b -r 2.6.13-rc4 mytress"
and "cvs update -j 2.6.13-rc4 -j 2.6.13-rc5".

In drawing

 Linus
  |
  + 2.6.13-rc3
  |
  + 2.6.13-rc4  My tree
  | \
  | (current)|
  |  |
  + 2.6.13-rc5   |
  |\ |
  |  + merge point
  |  |
  + 2.6.13   |
  |\ |
  |  + merge point

How do I do that with cogito or git?


Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFQ + 2.6.13-rc4-RT-V0.7.52-02 = BUG: scheduling with irqs disabled

2005-08-24 Thread Esben Nielsen
On Wed, 24 Aug 2005, Jens Axboe wrote:

> On Wed, Aug 24 2005, Lee Revell wrote:
> > Just found this in dmesg.
> > 
> > BUG: scheduling with irqs disabled: libc6.postinst/0x2000/13229
> > caller is ___down_mutex+0xe9/0x1a0
> >  [] schedule+0x59/0xf0 (8)
> >  [] ___down_mutex+0xe9/0x1a0 (28)
> >  [] cfq_exit_single_io_context+0x22/0xa0 (84)
> >  [] cfq_exit_io_context+0x3a/0x50 (16)
> >  [] exit_io_context+0x64/0x70 (16)
> >  [] do_exit+0x5a/0x3e0 (20)
> >  [] do_group_exit+0x2a/0xb0 (24)
> >  [] syscall_call+0x7/0xb (20)
> 
> Hmm, Ingo I seem to remember you saying that the following construct:
> 
> local_irq_save(flags);
> spin_lock(lock);
> 
> which is equivelant to spin_lock_irqsave() in mainline being illegal in
> -RT, is that correct? 

I can easily answer this for Ingo.

Yes, spin_lock(lock) is blocking since lock is mutex, not a spinlock under
preempt-rt. But isn't it easy to fix? Replace the two lines by
spin_lock_irqsave(flags). That would work for both preempt-rt
and !preempt-rt.

You supposed to ask if the macro name spin_lock() isn't confusing. It very
much is, but one of Ingo's aims is not to change existing code too much.
The purist would probably change all instances of spin_lock() to lock() or
down() to stop refering to a specific lock type when it can be changed
with config-options. That would, however, require a large patch,
which does the preempt-rt branch harder to merge with the main-line.

Esben


> This is what cfq uses right now for an exiting
> task, as the above trace indicates.
> 
> -- 
> Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.13-rc6-rt1

2005-08-29 Thread Esben Nielsen
On Fri, 26 Aug 2005, Matt Mackall wrote:

> On Tue, Aug 16, 2005 at 02:32:01PM +0200, Michal Schmidt wrote:
> > Ingo Molnar wrote:
> > >i've released the 2.6.13-rc6-rt1 tree, which can be downloaded from the 
> > >usual place:
> > >
> > >  http://redhat.com/~mingo/realtime-preempt/
> > >
> > >as the name already suggests, i've switched to a new, simplified naming 
> > >scheme, which follows the usual naming convention of trees tracking the 
> > >mainline kernel. The numbering will be restarted for every new upstream 
> > >kernel the -RT tree is merged to.
> > 
> > Great! With this naming scheme it is easy to teach Matt Mackall's 
> > ketchup script about the -RT tree.
> > The modified ketchup script can be downloaded from:
> > http://www.uamt.feec.vutbr.cz/rizeni/pom/ketchup-0.9+rt
> > 
> > Matt, would you release a new ketchup version with this support for 
> > Ingo's tree?
> 
> Thanks. I've put this in my version, which is now exported as a
> Mercurial repo at:
> 
>  http://selenic.com/repo/ketchup
> 
> This version also has -git support, finally.
> 
I added the line in the patch below to be able to get Ingo's older
patches.

Esben

diff -r 1342be306020 ketchup
--- a/ketchup   Sat Aug 27 01:12:42 2005
+++ b/ketchup   Tue Aug 30 00:30:23 2005
@@ -367,6 +367,7 @@
 
 # the jgarzik memorial hack
 url2 = re.sub("/snapshots/", "/snapshots/old/", url)
+url2 = re.sub("/realtime-preempt/", "/realtime-preempt/older/", url2)
 if url2 != url:
 if download(url2, file): return file

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Arcnet, linux 2.6.13

2005-09-06 Thread Esben Nielsen
On Mon, 5 Sep 2005, Pieter Dejaeghere wrote:

> In the current arcnet driver, the hard_start_xmit method allocates a
> buffer for an outgoing transmission. However, this method doesn't check
> whether there was already an allocated buffer from an earlier outgoing
> transmission. This patch checks whether lp->next_tx already had an
> allocated buffer, and if so, it returns NETDEV_TX_BUSY. This prevents
> buffers from dissapearing under heavy traffic.
> 
> This patch seems to work fine on my arcnet network, and I also sent it to
> the person (Esben Nielsen [EMAIL PROTECTED]) who made some arcnet patches
> in 2.6.8 and 2.6.11, and they work fine on his setup too.
> 

Yes, I tested it. It works and apparently solves a problem I have had for
a long time with lost buffers and extremely long ping times, when pinging
with large packages. 

Please, apply this patch.

Andrew and David: I CC'ed you guyes because you took care of it the last
time :-)


Esben


> url to the patch:
> http://pieter.dejaeghere.net:9080/arcnet/patch-buffer
> 
> inlined (hopefully without broken linewraps):
> --- linux-2.6.12-gentoo-r1/drivers/net/arcnet/arcnet.c2005-06-25
> 20:42:46.0 +0200
> +++ linux-2.6.13-gentoo/drivers/net/arcnet/arcnet.c   2005-09-03
> 19:46:54.227846664 +0200
> @@ -597,7 +597,7 @@ static int arcnet_send_packet(struct sk_
>   struct ArcProto *proto;
>   int txbuf;
>   unsigned long flags;
> - int freeskb = 0;
> + int freeskb, retval;
> 
>   BUGMSG(D_DURING,
>  "transmit requested (status=%Xh, txbufs=%d/%d, len=%d, protocol
> %x)\n",
> @@ -615,7 +615,7 @@ static int arcnet_send_packet(struct sk_
>   if (skb->len - ARC_HDR_SIZE > XMTU && !proto->continue_tx) {
>   BUGMSG(D_NORMAL, "fixme: packet too large: compensating 
> badly!\n");
>   dev_kfree_skb(skb);
> - return 0;   /* don't try again */
> + return NETDEV_TX_OK;/* don't try again */
>   }
> 
>   /* We're busy transmitting a packet... */
> @@ -623,8 +623,11 @@ static int arcnet_send_packet(struct sk_
> 
>   spin_lock_irqsave(&lp->lock, flags);
>   AINTMASK(0);
> -
> - txbuf = get_arcbuf(dev);
> + if(lp->next_tx == -1)
> + txbuf = get_arcbuf(dev);
> + else {
> + txbuf = -1;
> + }
>   if (txbuf != -1) {
>   if (proto->prepare_tx(dev, pkt, skb->len, txbuf) &&
>   !proto->ack_tx) {
> @@ -638,6 +641,8 @@ static int arcnet_send_packet(struct sk_
>   lp->outgoing.skb = skb;
>   lp->outgoing.pkt = pkt;
> 
> + freeskb = 0;
> +
>   if (proto->continue_tx &&
>   proto->continue_tx(dev, txbuf)) {
> BUGMSG(D_NORMAL,
> @@ -645,10 +650,12 @@ static int arcnet_send_packet(struct sk_
>"(proto='%c')\n", proto->suffix);
>   }
>   }
> -
> + retval = NETDEV_TX_OK;
> + dev->trans_start = jiffies;
>   lp->next_tx = txbuf;
>   } else {
> - freeskb = 1;
> + retval = NETDEV_TX_BUSY;
> + freeskb = 0;
>   }
> 
>   BUGMSG(D_DEBUG, "%s: %d: %s, status:
> %x\n",__FILE__,__LINE__,__FUNCTION__,ASTATUS());
> @@ -664,7 +671,7 @@ static int arcnet_send_packet(struct sk_
>   if (freeskb) {
>   dev_kfree_skb(skb);
>   }
> - return 0;   /* no need to try again */
> + return retval;  /* no need to try again */
>  }
> 
> 
> @@ -690,7 +697,6 @@ static int go_tx(struct net_device *dev)
>   /* start sending */
>   ACOMMAND(TXcmd | (lp->cur_tx << 3));
> 
> - dev->trans_start = jiffies;
>   lp->stats.tx_packets++;
>   lp->lasttrans_dest = lp->lastload_dest;
>   lp->lastload_dest = 0;
> @@ -917,6 +923,9 @@ irqreturn_t arcnet_interrupt(int irq, vo
> 
>   BUGMSG(D_RECON, "Network reconfiguration detected 
> (status=%Xh)\n",
>  status);
> + /* MYRECON bit is at bit 7 of diagstatus */
> + if(diagstatus & 0x80)
> + BUGMSG(D_RECON,"Put out that recon myself\n");
> 
>   /* is the RECON info empty or old? */
>   if (!lp->first_recon || !lp->last_recon ||
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kbuild & C++

2005-09-06 Thread Esben Nielsen
On Tue, 6 Sep 2005, Jesper Juhl wrote:

> On 9/6/05, Budde, Marco <[EMAIL PROTECTED]> wrote:
> > Hi,
> > 
> > for one of our customers I have to port a Windows driver to
> > Linux. Large parts of the driver's backend code consists of
> > C++.
> > 
> > How can I compile this code with kbuild? The C++ support
> > (I have tested with 2.6.11) of kbuild seems to be incomplete /
> > not working.
> > 
> 
> That would be because the kernel is written in *C* (and some asm), *not* C++.
> There /is/ no C++ support.

Which is too bad. You can do stuff much more elegant, effectively and
safer in C++ than in C. Yes, you can do inheritance in C, but it leaves
it up to the user to make sure the type-casts are done OK every time. You
can with macros do some dynamic typing, but not nearly as effectively as
with templates, and those macros always comes very, very ugly. (Some say
templates are ugly, but they first become ugly when they are used
way beyond what you can do with macros.)

I think it can only be a plus to Linux to add C++ support for at least
out-of-mainline drivers. Adding drivers written in C++ into the mainline
is another thing.

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kbuild & C++

2005-09-07 Thread Esben Nielsen
On Tue, 6 Sep 2005 [EMAIL PROTECTED] wrote:

> On Wed, 07 Sep 2005 00:20:11 +0200, Esben Nielsen said:
> 
> > Which is too bad. You can do stuff much more elegant, effectively and
> > safer in C++ than in C. Yes, you can do inheritance in C, but it leaves
> > it up to the user to make sure the type-casts are done OK every time. You
> > can with macros do some dynamic typing, but not nearly as effectively as
> > with templates, and those macros always comes very, very ugly. (Some say
> > templates are ugly, but they first become ugly when they are used
> > way beyond what you can do with macros.)
> > 
> > I think it can only be a plus to Linux to add C++ support for at least
> > out-of-mainline drivers. Adding drivers written in C++ into the mainline
> > is another thing.
> 
> http://www.tux.org/lkml/#s15-3 Why don't we rewrite the Linux kernel in C++?
> 

I can't see it should be _that_ hard to make the kernel C++ friendly. At work 
I use a RTOS written in plain C but where you can easily use C++ in kernel
space (there is no user-space :-). We use gcc by the way.

It has been done for Linux as well 
(http://netlab.ru.is/pronto/pronto_code.shtml). Why can't this kind of
stuff be merged into the kernel? Why is there no efford to do so??
It is one of those projects I would have liked to spend time on if I had
any, but not if it would be rejected in the mainline no matter how little
intrusive it is. 

What I ague for is that people find out _what_ can be accepted in the
mainline with regard to C++. If the maintainers could somehow signal  
that a CONFIG_CPP_SUPPORT would be a acceptable option in the mainline
tree I am sure someone (not me out of lag of time) would make a patch and
submit it. I am sure distributions like RedHat would skip kernels with
CONFIG_CPP_SUPPORT=y once it was there.

Esben

PS. Do the above people break GPL by forcing people to accept a
license-agreement before downloading a patch to the kernel? Shouldn't they
provide a direct url?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kbuild & C++

2005-09-07 Thread Esben Nielsen
On Wed, 7 Sep 2005 [EMAIL PROTECTED] wrote:

> On Wed, 07 Sep 2005 11:13:24 +0200, "Budde, Marco" said:
> 
> > E.g. in my case the Windows source code has got more than 10 MB.
> > Nobody will convert such an amount of code from C++ to C.
> > This would take years.
> 
> Do you have any *serious* intent to drop 10 *megabytes* worth of driver
> into the kernel??? (Hint - *everything* in drivers/net/wireless *totals*
> to only 2.7M).
> 

For a special perpose embedded application, doing it all in kernel space
would be the first, effective hack. 

> A Linux device driver isn't the same thing as a Windows device driver - much 
> of
> a Windows driver is considered "userspace" on Linux, and you're free to do 
> that
> in C++ if you want.
> 

Yes, moving stuff to user-space would be the way to go - unless it kills
performance! 

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kbuild & C++

2005-09-07 Thread Esben Nielsen


On Wed, 7 Sep 2005 [EMAIL PROTECTED] wrote:

> On Wed, 07 Sep 2005 11:21:42 +0200, Esben Nielsen said:
> 
> > I use a RTOS written in plain C but where you can easily use C++ in kernel
> > space (there is no user-space :-). We use gcc by the way.
> 
> This isn't RTOS, in case you haven't noticed. ;)
Well, with Ingo's preempt-RT patch it is becomming a RT-OS, but that is
not the issue here.

> 
> > It has been done for Linux as well 
> > (http://netlab.ru.is/pronto/pronto_code.shtml). Why can't this kind of
> > stuff be merged into the kernel? Why is there no efford to do so??
> 
> Quoting http://netlab.ru.is/exception/LinuxCXX.shtml:
> 
> "The code is installed by applying a patch to the Linux kernel and enables the
> full use of C++ using the GNU g++ compiler. Programmers that have used C++ in
  

> Linux kernel modules have primarily been using classes and virtual functions,
> but not global constructors. dynamic type checking and exceptions. Using even
> this small part of C++ requires each programmer to write some supporting
> routines. Using the rest of C++ includes porting the C++ ABI that accompanies
> GNU g++ to the Linux kernel, and to enable global constructors and 
> destructors."
> 
> So let's see - no constructors, no type checking, no exceptions, and using
> virtual functions requires the programmer to write the glue code that
> programmers want to use C++ to *avoid* writing.  Sounds like "We stripped out
> all the reasons programmers want to use C++ just so we can say we use C++ in
> the kernel".
> 
> So, other than wank value, what *actual* advantages are there to using this
> limited subset of C++ in the kernel?
> 

If you cared to read the whole page you will notice that they talk about
the _past_. They have as I understand the page, they claim to have _fixed_
the problems.

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-21 Thread Esben Nielsen



On Wed, 16 Jan 2008, Steven Rostedt wrote:



On Wed, 16 Jan 2008, Steven Rostedt wrote:


We modified mcount now, and it is derived from an objdump of glibc. So
this is most definitely a "derived" work from glibc. But glibc is licensed
as LGPL, which IIRC allows for non GPL to link to it.

I personally could care less if we use EXPORT_SYMBOL or EXPORT_SYMBOL_GPL.
But I really want to do The Right Thing(tm). I'm not a lawyer and don't
claim that I know anything about the law, but I'm leaning towards the non
_GPL version because the code was from LGPL and not from strict GPL.


Sorry folks, I'm going to stick with the _GPL version. It doesn't mean
that you can't still load your nVidia module into -rt. I just means you
can't turn on function trace and then load it. Well, you might if you
don't compile the nVidia wrapper against it with function trace on.

The reason simply is to cover my butt.  By limiting it to GPL, I'm fine.
Even if the original author didn't care. But by opening it up to external
prorietary modules, I may be considered infringing on the license.

So, unless I hear from a lawyer that is willing to back me up on a non
_GPL export publically, the mcount function will stay as an
EXPORT_SYMBOL_GPL.

Note: There is a definite reason for this change. The previous version
of mcount was written by Ingo Molnar, and he added the export. I've
changed mcount to be closer to the glibc code (which I derived it from),
so the change in EXPORT type is legitimate.

-- Steve



Please, tell what in the license forbids me to make a global replacement
EXPORT_SYMBOL_GPL -> EXPORT_SYMBOL and distribute the result?

For me, on the other hand, it is against the spirit of free software to 
actively make a block for people to do what ever they want with the code 
when they are only doing it to themselves. That includes loading non-GPL 
software into the kernel. The only thing they  are not allowed to do is to

distribute it and in that way "hurt" other people.

Esben



-
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.24-rc7-rt2

2008-01-27 Thread Esben Nielsen


On Mon, 21 Jan 2008, Steven Rostedt wrote:



On Mon, 21 Jan 2008, Esben Nielsen wrote:


Please, tell what in the license forbids me to make a global replacement
EXPORT_SYMBOL_GPL -> EXPORT_SYMBOL and distribute the result?


If you want to distribute that code, the authors of that said code
may be able to challenge you in saying that you are enabling a means to
circumvent a way around the license, and hold you liable. Remember, all it
takes is one country with the laws that will grant this complaint.



For me, on the other hand, it is against the spirit of free software to
actively make a block for people to do what ever they want with the code
when they are only doing it to themselves. That includes loading non-GPL
software into the kernel. The only thing they  are not allowed to do is to
distribute it and in that way "hurt" other people.


Honestly, I don't care which export it is. The thing is that I derived
that code from someone else. I did not look up the original author of the
code to find out which export they would like it to be. I may be able to
argue that since it was under a LGPL and not a GPL license, I may very
well be able to export it that way.

I'm taking the safe way out. By exporting it as EXPORT_SYMBOL_GPL, I am
safe either way. By exporting it as EXPORT_SYMBOL without first hearing
from the original author (and getting that in writing), or hearing it from
a lawyer, I may be putting myself at risk.

Feel free to creating a version of this code and
s/EXPORT_SYMBOL_GPL/EXPORT_SYMBOL/ and distribute it. I wont come after
you for that, but at least I know those that would, will go after you and
not me.

Call me a chicken, I don't care, but I'm just not going to put myself nor
my company I work for, at risk over this issue.



First off, sorry for sounding so harsh and sorry for taking this 
discussion onto you. It is quite off-topic in this context. It was just 
a rant about the misconception that adding/removing _GPL to 
EXPORT_SYMBOL can make non-GPL modules more or less legal. Is is a 
_political_ issue, not a legal one.


Esben


-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-18 Thread Esben Nielsen
On Fri, 18 Mar 2005, Ingo Molnar wrote:

> 
> * Ingo Molnar <[EMAIL PROTECTED]> wrote:
> 
> > [...] How about something like:
> > 
> > void
> > rcu_read_lock(void)
> > {
> > preempt_disable();
> > if (current->rcu_read_lock_nesting++ == 0) {
> > current->rcu_read_lock_ptr =
> > &__get_cpu_var(rcu_data).lock;
> > preempt_enable();
> > read_lock(current->rcu_read_lock_ptr);
> > } else
> > preempt_enable();
> > }
> > 
> > this would still make it 'statistically scalable' - but is it correct?
> 
> thinking some more about it, i believe it's correct, because it picks
> one particular CPU's lock and correctly releases that lock.
> 
> (read_unlock() is atomic even on PREEMPT_RT, so rcu_read_unlock() is
> fine.)
> 

Why can should there only be one RCU-reader per CPU at each given
instance? Even on a real-time UP system it would be very helpfull to have
RCU areas to be enterable by several tasks as once. It would perform
better, both wrt. latencies and throughput: 
With the above implementation an high priority task entering an RCU area
will have to boost the current RCU reader, make a task switch until that
one finishes and makes yet another task switch. to get back to the high
priority task. With an RCU implementation which can take n RCU readers per CPU
there is no such problem.

Also having all tasks serializing on one lock (per CPU) really destroys
the real-time properties: The latency of anything which uses RCU will be
the worst latency of anything done under the RCU lock.

When I looked briefly at it in the fall the following solution jumped into
mind: Have a RCU-reader count, rcu_read_count, for each CPU. When you
enter an RCU read region increment it and decrement it when you go out of
it. When it is 0, RCU cleanups are allowed - a perfect quiescent state. At
that point call rcu_qsctr_inc() at that point. Or call it in schedule() as
now just with a if(rcu_read_count==0) around it.

I don't think I understand the current code. But if it works now with
preempt_disable()/preempt_enable() around all the read-regions it ought to
work with 
preempt_enable();
rcu_read_count++/--;
preempt_disable() 
around the same regions and the above check for rcu_read_count==0 in or
around rcu_qsctr_inc() as well.

It might take a long time before the rcu-batches are actually called,
though, but that is a different story, which can be improved upon. An
improvemnt would be to boost the none-RT tasks entering a rcu-read region
into the lowest RT-priority. That way there can't be a lot of low
priority tasks hanging around making rcu_read_count non-zero for a long
period of time since these tasks only can be preempted by RT tasks while
in the RCU-region.

>   Ingo

Esben



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-18 Thread Esben Nielsen

On Fri, 18 Mar 2005, Ingo Molnar wrote:

> 
> * Bill Huey <[EMAIL PROTECTED]> wrote:
> 
> > I'd like to note another problem. Mingo's current implementation of
> > rt_mutex (super mutex for all blocking synchronization) is still
> > missing reader counts and something like that would have to be
> > implemented if you want to do priority inheritance over blocks.
> 
> i really have no intention to allow multiple readers for rt-mutexes. We
> got away with that so far, and i'd like to keep it so. Imagine 100
> threads all blocked in the same critical section (holding the read-lock)
> when a highprio writer thread comes around: instant 100x latency to let
> all of them roll forward. The only sane solution is to not allow
> excessive concurrency. (That limits SMP scalability, but there's no
> other choice i can see.)
>

Unless a design change is made: One could argue for a semantics where
write-locking _isn't_ deterministic and thus do not have to boost all the
readers. Readers boost the writers but not the other way around. Readers
will be deterministic, but not writers.
Such a semantics would probably work for a lot of RT applications
happening not to take any write-locks - these will in fact perform better. 
But it will give the rest a lot of problems.
 
>   Ingo

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-20 Thread Esben Nielsen
On Fri, 18 Mar 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > Why can should there only be one RCU-reader per CPU at each given
> > instance? Even on a real-time UP system it would be very helpfull to
> > have RCU areas to be enterable by several tasks as once. It would
> > perform better, both wrt. latencies and throughput: With the above
> > implementation an high priority task entering an RCU area will have to
> > boost the current RCU reader, make a task switch until that one
> > finishes and makes yet another task switch. to get back to the high
> > priority task. With an RCU implementation which can take n RCU readers
> > per CPU there is no such problem.
> 
> correct, for RCU we could allow multiple readers per lock, because the
> 'blocking' side of RCU (callback processing) is never (supposed to be)
> in any latency path.
> 
> except if someone wants to make RCU callback processing deterministic at
> some point. (e.g. for memory management reasons.)

I think it can be deterministic (on the long timescale of memory management) 
anyway: Boost any non-RT task entering an RCU region to the lowest RT priority.
This way only all the RT tasks + one non-RT task can be within those
regions. The RT-tasks are supposed to have some kind of upper bound to
their CPU-usage. The non-RT task will also finish "soon" as it is
boosted. If the RCU batches are also at the lowest RT-priority they can be
run immediately after the non-RT task is done.

> 
> clearly the simplest solution is to go with the single-reader locks for
> now - a separate experiment could be done with a new type of rwlock that
> can only be used by the RCU code. (I'm not quite sure whether we could
> guarantee a minimum rate of RCU callback processing under such a scheme
> though. It's an eventual memory DoS otherwise.)
> 

Why are a lock needed at all? If it is doable without locking for an
non-preemptable SMP kernel it must be doable for an preemptable kernel as
well.I am convinced some kind of per-CPU rcu_read_count as I specified in
my previous mail can work some way or the other. call_rcu() might need to
do more complicated stuff and thus use CPU but call_rcu() is supposed to
be an relative rare event not worth optimizing for.  Such an
implementation will work for any preemptable kernel, not only PREEMPT_RT. 
For performance is considered it is important not to acquire any locks in
the rcu-read regions. 

I tried this approach. My UP labtop did boot on it, but I haven't testet
it further. I have included the very small patch as an attachment.

>   Ingo

I have not yet looked at -V0.7.41-00...

Esben

diff -Naur --exclude-from diff_exclude 
linux-2.6.11-final-V0.7.40-00/include/linux/rcupdate.h 
linux-2.6.11-final-V0.7.40-00-RCU/include/linux/rcupdate.h
--- linux-2.6.11-final-V0.7.40-00/include/linux/rcupdate.h  2005-03-11 
23:40:13.0 +0100
+++ linux-2.6.11-final-V0.7.40-00-RCU/include/linux/rcupdate.h  2005-03-19 
12:47:09.0 +0100
@@ -85,6 +85,7 @@
  * curlist - current batch for which quiescent cycle started if any
  */
 struct rcu_data {
+   longactive_readers;
/* 1) quiescent state handling : */
longquiescbatch; /* Batch # for grace period */
int passed_quiesc;   /* User-mode/idle loop etc. */
@@ -115,12 +116,14 @@
 static inline void rcu_qsctr_inc(int cpu)
 {
struct rcu_data *rdp = &per_cpu(rcu_data, cpu);
-   rdp->passed_quiesc = 1;
+   if(rdp->active_readers==0)
+   rdp->passed_quiesc = 1;
 }
 static inline void rcu_bh_qsctr_inc(int cpu)
 {
struct rcu_data *rdp = &per_cpu(rcu_bh_data, cpu);
-   rdp->passed_quiesc = 1;
+   if(rdp->active_readers==0)
+   rdp->passed_quiesc = 1;
 }
 
 static inline int __rcu_pending(struct rcu_ctrlblk *rcp,
@@ -183,29 +186,27 @@
  *
  * It is illegal to block while in an RCU read-side critical section.
  */
-#define rcu_read_lock()preempt_disable()
+static inline void rcu_read_lock(void)
+{  
+   preempt_disable(); 
+   __get_cpu_var(rcu_data).active_readers++;
+   preempt_enable();
+}
 
 /**
  * rcu_read_unlock - marks the end of an RCU read-side critical section.
  *
  * See rcu_read_lock() for more information.
  */
-#define rcu_read_unlock()  preempt_enable()
+static inline void rcu_read_unlock(void)
+{  
+   preempt_disable(); 
+   __get_cpu_var(rcu_data).active_readers--;
+   preempt_enable();
+}
 
 #define IGNORE_LOCK(op, lock)  do { (void)(lock); op(); } while (0)
 
-#ifdef CONFIG_PREEMPT_RT
-# define rcu_read_lock_spin(lock)  spin_lock(lock)
-# define rcu_read_unlock_spin(lock)spin_unlock(lock)
-# define rcu_read_lock_read(lock)  read_lock(lock)
-# define rcu_read_unlock_read(lock)

Re: Real-Time Preemption and RCU

2005-03-20 Thread Esben Nielsen
On Sun, 20 Mar 2005, Paul E. McKenney wrote:

> On Sun, Mar 20, 2005 at 02:29:17PM +0100, Esben Nielsen wrote:
> > On Fri, 18 Mar 2005, Ingo Molnar wrote:
> > 
> > > [...]
> > 
> > I think it can be deterministic (on the long timescale of memory 
> > management) 
> > anyway: Boost any non-RT task entering an RCU region to the lowest RT 
> > priority.
> > This way only all the RT tasks + one non-RT task can be within those
> > regions. The RT-tasks are supposed to have some kind of upper bound to
> > their CPU-usage. The non-RT task will also finish "soon" as it is
> > boosted. If the RCU batches are also at the lowest RT-priority they can be
> > run immediately after the non-RT task is done.
> 
> Hmmm...  Sort of a preemptive-first-strike priority boost.  Cute!  ;-)
> 
Well, I was actually thinking of an API like
 preempt_by_nonrt_disable()
 preempt_by_nonrt_enable()
working like the old preempt_disable()/preempt_enable() but still
allowing RT tasks (as well as priority inheriting non-RT tasks) to be
scheduled. That would kind of help "split" the kernel into two halfs: the
RT part and the non-RT part. The non-RT part would in many ways work as it
has always done.

> > > clearly the simplest solution is to go with the single-reader locks for
> > > now - a separate experiment could be done with a new type of rwlock that
> > > can only be used by the RCU code. (I'm not quite sure whether we could
> > > guarantee a minimum rate of RCU callback processing under such a scheme
> > > though. It's an eventual memory DoS otherwise.)
> > > 
> > 
> > Why are a lock needed at all? If it is doable without locking for an
> > non-preemptable SMP kernel it must be doable for an preemptable kernel as
> > well.I am convinced some kind of per-CPU rcu_read_count as I specified in
> > my previous mail can work some way or the other. call_rcu() might need to
> > do more complicated stuff and thus use CPU but call_rcu() is supposed to
> > be an relative rare event not worth optimizing for.  Such an
> > implementation will work for any preemptable kernel, not only PREEMPT_RT. 
> > For performance is considered it is important not to acquire any locks in
> > the rcu-read regions. 
> 
> You definitely don't need a lock -- you can just suppress preemption
> on the read side instead.  But that potentially makes for long scheduling
> latencies.

Well, in my patch I do not leave preemption off - only while doing the
simple ++/--. In effect, I let rcu_qsctr_inc know that some RCU reader
might be active, i.e. preempted, on the current CPU such that this isn't
and quiescent point after all.
(To others: Paul nicely unfolded my attachment below - I left it in
the mail such you can read it.)
The problem with this approach is ofcourse that user space programs might
preempt an RCU reader for a very long time such that RCU batches are never
really run. The boosting of non-RT tasks mentioned above would help a
lot.
A plus(?) in it: You can actually sleep while having the rcu_read_lock !!

> 
> The counter approach might work, and is also what the implementation #5
> does -- check out rcu_read_lock() in Ingo's most recent patch.
> 

Do you refer to your original mail with implementing it in 5 steps?
In #5 in that one (-V0.7.41-00, right?) you use a lock and as you say that
forces syncronization between the CPUs - bad for scaling. It does make the
RCU batches somewhat deterministic, as the RCU task can boost the readers
to the rcu-task's priority.
The problem about this approach is that everybody calling into RCU code
have a worst case behaviour of the systemwide worst case RCU reader 
section - which can be pretty large (in principle infinite if somebody.)
So if somebody uses a call to a function in the code containing a RCU read
area the worst case behavious would be the same as teh worst case latency
in the simple world where preempt_disable()/preempt_enable() was used.

>   Thanx, Paul
> 
> > I tried this approach. My UP labtop did boot on it, but I haven't testet
> > it further. I have included the very small patch as an attachment.
> > 
> > >   Ingo
> > 
> > I have not yet looked at -V0.7.41-00...
> > 
> > Esben
> > 
> 
> > diff -Naur --exclude-from diff_exclude 
> > linux-2.6.11-final-V0.7.40-00/include/linux/rcupdate.h 
> > linux-2.6.11-final-V0.7.40-00-RCU/include/linux/rcupdate.h
> > --- linux-2.6.11-final-V0.7.40-00/include/linux/rcupdate.h  2005-03-11 
> > 23:40:13.0 +0100
> > +++ linux-2.6.11-final-V0.7.40-00-RCU/include/linux/rcupdate.h  
> > 2005-03-19 12:47:09.0 +0100
> > 

Re: Real-Time Preemption and RCU

2005-03-22 Thread Esben Nielsen

On Mon, 21 Mar 2005, Paul E. McKenney wrote:

> On Mon, Mar 21, 2005 at 12:23:22AM +0100, Esben Nielsen wrote:
> > > [...] 
> > Well, I was actually thinking of an API like
> >  preempt_by_nonrt_disable()
> >  preempt_by_nonrt_enable()
> > working like the old preempt_disable()/preempt_enable() but still
> > allowing RT tasks (as well as priority inheriting non-RT tasks) to be
> > scheduled. That would kind of help "split" the kernel into two halfs: the
> > RT part and the non-RT part. The non-RT part would in many ways work as it
> > has always done.
> 
> Does sound in some ways similar to the migration approach -- there, the
> RT/non-RT split is made across CPUs.  But if RT is allowed to preempt,
> then you still have to deal with preemption for locking correctness, right?
> 
Yes. It is not a locking mechanism, it should just prevent scheduling of
"normal" userspace tasks.

> > [..] 
> > Well, in my patch I do not leave preemption off - only while doing the
> > simple ++/--. In effect, I let rcu_qsctr_inc know that some RCU reader
> > might be active, i.e. preempted, on the current CPU such that this isn't
> > and quiescent point after all.
> > (To others: Paul nicely unfolded my attachment below - I left it in
> > the mail such you can read it.)
> > The problem with this approach is ofcourse that user space programs might
> > preempt an RCU reader for a very long time such that RCU batches are never
> > really run. The boosting of non-RT tasks mentioned above would help a
> > lot.
> > A plus(?) in it: You can actually sleep while having the rcu_read_lock !!
> 
> This is in some ways similar to the K42 approach to RCU (which they call
> "generations").  Dipankar put together a similar patch for Linux, but
> the problem was that grace periods could be deferred for an extremely
> long time.  Which I suspect is what you were calling out as causing
> RCU batches never to run.
> 

That is where the preempt_by_nonrt_disable/enable() is supposed to help:
Then it can't take longer than the normal kernel in the situation where
there is no RT tasks running. RT tasks will prolong the grace periods if
they go into RCU regions, but they are supposed to be relatively small -
and deterministic!

> > > The counter approach might work, and is also what the implementation #5
> > > does -- check out rcu_read_lock() in Ingo's most recent patch.
> > > 
> > 
> > Do you refer to your original mail with implementing it in 5 steps?
> > In #5 in that one (-V0.7.41-00, right?) you use a lock and as you say that
> > forces syncronization between the CPUs - bad for scaling. It does make the
> > RCU batches somewhat deterministic, as the RCU task can boost the readers
> > to the rcu-task's priority.
> > The problem about this approach is that everybody calling into RCU code
> > have a worst case behaviour of the systemwide worst case RCU reader 
> > section - which can be pretty large (in principle infinite if somebody.)
> > So if somebody uses a call to a function in the code containing a RCU read
> > area the worst case behavious would be the same as teh worst case latency
> > in the simple world where preempt_disable()/preempt_enable() was used.
> 
> I missed something here -- readers would see the worst-case -writer-
> latency rather than the worst-case -reader- latency, right?  Or are you
> concerned about the case where some reader blocks the write-side
> acquisitions in _synchronize_kernel(), but not before the writer has
> grabbed a lock, blocking any readers on the corresponding CPU?
> 

I am conserned that readers block each other, too. You do need an rw-mutex
allowing an unlimited number of readers for doing this. With the current
rw-mutex the readers block each other. I.e. the worst case latency is the
worst case reader latency - globally!
On the other hand with a rw-lock being unlimited - and thus do not keep
track of it readers - the readers can't be boosted by the writer. Then you
are back to square 1: The grace period can take a very long time.

> Yes, but this is true of every other lock in the system as well, not?

Other locks are not globaly used but only used for a specific subsystem.
On a real-time system you are supposed to know which subsystems you can
call into and still have a low enough latency as each subsystem has it's
own bound. But with a global RCU locking mechanism all RCU using code is
to be regarded as _one_ such subsystem.

> In a separate conversation a few weeks ago, Steve Hemminger suggested
> placing checks in long RCU read-side critical sections -- this could
> be used to keep the worst-case reader latency within the desired bounds.
>

Re: Real-Time Preemption and RCU

2005-03-22 Thread Esben Nielsen
On Tue, 22 Mar 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > On the other hand with a rw-lock being unlimited - and thus do not
> > keep track of it readers - the readers can't be boosted by the writer.
> > Then you are back to square 1: The grace period can take a very long
> > time.
> 
> btw., is the 'very long grace period' a real issue? We could avoid all
> the RCU read-side locking latencies by making it truly unlocked and just
> living with the long grace periods. Perhaps it's enough to add an
> emergency mechanism to the OOM handler, which frees up all the 'blocked
> by preemption' RCU callbacks via some scheduler magic. (e.g. such an
> emergency mechanism could be _conditional_ locking on the read side -
> i.e. new RCU read-side users would be blocked until the OOM situation
> goes away, or something like that.)

You wont catch RCU read-sides already entered - see below.

> 
> your patch is implementing just that, correct? Would you mind redoing it
> against a recent -RT base? (-40-04 or so)
>

In fact I am working on clean 2.6.12-rc1 right now. I figured this is
orthorgonal to the rest RT patch and can probably go upstream immediately.
Seemed to work. I'll try to make into a clean patch soonish and also try
it on -40-04. 
I am trying to make a boosting mechanism in the scheduler such that RCU
readers are boosted to MAX_RT_PRIO when preempted. I have to take it out
first.

Any specific tests I have to run? I am considering making an RCU test
device.

> also, what would be the worst-case workload causing long grace periods?

A nice 19 task, A, enter an RCU region and is preempted. A lot of other
tasks starts running. Then task A might starved for _minuttes_ such that
there is no RCU-grace periods in all that time.

> 
>   Ingo

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-22 Thread Esben Nielsen
On Tue, 22 Mar 2005, Bill Huey wrote:

> On Fri, Mar 18, 2005 at 05:55:44PM +0100, Esben Nielsen wrote:
> > On Fri, 18 Mar 2005, Ingo Molnar wrote:
> > > i really have no intention to allow multiple readers for rt-mutexes. We
> > > got away with that so far, and i'd like to keep it so. Imagine 100
> > > threads all blocked in the same critical section (holding the read-lock)
> > > when a highprio writer thread comes around: instant 100x latency to let
> > > all of them roll forward. The only sane solution is to not allow
> > > excessive concurrency. (That limits SMP scalability, but there's no
> > > other choice i can see.)
> > 
> > Unless a design change is made: One could argue for a semantics where
> > write-locking _isn't_ deterministic and thus do not have to boost all the
> 
> RCU isn't write deterministic like typical RT apps are we can... (below :-))

It is: It takes place right away. But it is not non-deterministic when
_all_ readers actually read it. Also the cleanup is non-deterministic.
So unless you actually _wait_ for the cleanup to happen instead of
defering it you can safely do RCU writes in a RT-task.

> 
> > readers. Readers boost the writers but not the other way around. Readers
> > will be deterministic, but not writers.
> > Such a semantics would probably work for a lot of RT applications
> > happening not to take any write-locks - these will in fact perform better. 
> > But it will give the rest a lot of problems.
> 
> Just came up with an idea after I thought about how much of a bitch it
> would be to get a fast RCU multipule reader semantic (our current shared-
> exclusive lock inserts owners into a sorted priority list per-thread which
> makes it very expensive for a simple RCU case since they are typically very
> small batches of items being altered). Basically the RCU algorithm has *no*
> notion of writer priority and to propagate a PI operation down all reader
> is meaningless, so why not revert back to the original rwlock-semaphore to
> get the multipule reader semantics ?

Remember to boost the writer such RT tasks can enter read regions. I must
also warn against the dangers: A lot of code where a write-lock is taken 
need to marked as non-deterministic, i.e. must not-be called from
RT-tasks (maybe put a WARN_ON(rt_task(current)) in the write-lock
operation?)

> 
> A notion of priority across a quiescience operation is crazy anyways, so
> it would be safe just to use to the old rwlock-semaphore "in place" without
> any changes or priorty handling addtions. The RCU algorithm is only concerned
> with what is basically a coarse data guard and it isn't time or priority
> critical.

I don't find it crazy. I think it is elegant - but also dangerous as it
might take a long time.

> 
> What do you folks think ? That would make Paul's stuff respect multipule
> readers which reduces contention and gets around the problem of possibly
> overloading the current rt lock implementation that we've been bitching
> about. The current RCU development track seem wrong in the first place and
> this seem like it could be a better and more complete solution to the problem.
> 
> If this works, well, you heard it here first. :)
> 
> bill
> 
Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-22 Thread Esben Nielsen
On Tue, 22 Mar 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > +static inline void rcu_read_lock(void)
> > +{  
> > +   preempt_disable(); 
> > +   __get_cpu_var(rcu_data).active_readers++;
> > +   preempt_enable();
> > +}
> 
> this is buggy. Nothing guarantees that we'll do the rcu_read_unlock() on
> the same CPU, and hence ->active_readers can get out of sync.
> 

Ok, this have to be handled in the mitigration code somehow. I have already 
added an 
  current->rcu_read_depth++
so it ought to be painless. A simple solution would be not to mititagrate
threads with rcu_read_depth!=0.

>   Ingo
> 

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-22 Thread Esben Nielsen
On Tue, 22 Mar 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > > > +static inline void rcu_read_lock(void)
> > > > +{  
> > > > +   preempt_disable(); 
> > > > +   __get_cpu_var(rcu_data).active_readers++;
> > > > +   preempt_enable();
> > > > +}
> > > 
> > > this is buggy. Nothing guarantees that we'll do the rcu_read_unlock() on
> > > the same CPU, and hence ->active_readers can get out of sync.
> > > 
> > 
> > Ok, this have to be handled in the mitigration code somehow. I have already 
> > added an 
> >   current->rcu_read_depth++
> > so it ought to be painless. A simple solution would be not to
> > mititagrate threads with rcu_read_depth!=0.
> 
> could you elaborate?
> 
Put an rcu_read_depth on each task. In rcu_read_lock() make a 
 current->rcu_read_depth++;
and visa versa in rcu_read_unlock(). In can_migrate_task() add
   if(p->rcu_read_depth)
 return 0;
That might do the trick.
 

> In any case, see the new PREEMPT_RCU code in the -40-07 patch (and
> upwards). I've also attached a separate patch, it should apply cleanly
> to 2.6.12-rc1.
> 

I barely have time to download at the patches - let alone applying them!

Anyway: I find one thing I don't like: using atomic_inc()/dec() in
rcu_read_lock()/unlock() to touch rcu_data which might be on another
CPU. Then rcu_data is not really per-CPU data anymore and it also hurts
performance of RCU readers.
I think it will be cheaper to use the above rcu_read_depth and then either
not migrate tasks at all or make the migrate code take care of migrating
the rcu_read_depth count to the new CPU - one would have to take care to
increment it in the rcu_data of the new CPU on the new CPU (it isn't
atomic) and then decrement it in the rcu_data of the old CPU on the old
CPU - in that order.


>   Ingo
> 


Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and RCU

2005-03-23 Thread Esben Nielsen
On Tue, 22 Mar 2005, Paul E. McKenney wrote:

> On Tue, Mar 22, 2005 at 09:55:26AM +0100, Esben Nielsen wrote:
> > On Mon, 21 Mar 2005, Paul E. McKenney wrote:
> [ . . . ]
> > > On Mon, Mar 21, 2005 at 12:23:22AM +0100, Esben Nielsen wrote:
> > > This is in some ways similar to the K42 approach to RCU (which they call
> > > "generations").  Dipankar put together a similar patch for Linux, but
> > > the problem was that grace periods could be deferred for an extremely
> > > long time.  Which I suspect is what you were calling out as causing
> > > RCU batches never to run.
> > 
> > That is where the preempt_by_nonrt_disable/enable() is supposed to help:
> > Then it can't take longer than the normal kernel in the situation where
> > there is no RT tasks running. RT tasks will prolong the grace periods if
> > they go into RCU regions, but they are supposed to be relatively small -
> > and deterministic!
> 
> The part that I am missing is how this helps in the case where a non-RT
> task gets preempted in the middle of an RCU read-side critical section
> indefinitely.  Or are you boosting the priority of any task that
> enters an RCU read-side critical section?

Yes in effect: I set the priority to MAX_RT_PRIO. But actually I am
playing around (when I get time for it that is :-( ) with cheaper
solution: 
I assume you enter these regions where you don't want to be
preempted by non-RT tasks are relatively short. Therefore the risc of
getting preempted is small. Moving the priority is expensive since you
need to lock the runqueue. I only want to do the movement when
there is an preemption. Therefore I added code in schedule() to take care
of it: If a task is in a rcu-read section, is non-RT and is preempted it's
priority is set to MAX_RT_PRIO for the time being. It will keep that
priority until the priority is recalculated, but that shouldn't hurt
anyone. 
I am not happy about adding code to schedule() but setting the
priority in there is very cheap because it already has the lock
on the runqueue. Furthermore, I assume it only happens very rarely. In the
execution of schedule() my code only takes a single test on wether the
previous task was in a rcu-section or not. That is not very much code.

I have not yet tested it (no time :-( )


> [...]
> > > Yes, but this is true of every other lock in the system as well, not?
> > 
> > Other locks are not globaly used but only used for a specific subsystem.
> > On a real-time system you are supposed to know which subsystems you can
> > call into and still have a low enough latency as each subsystem has it's
> > own bound. But with a global RCU locking mechanism all RCU using code is
> > to be regarded as _one_ such subsystem.
> 
> Yep.  As would the things protected by the dcache lock, task list lock,
> and so on, right?

Yep

> 
>   Thanx, Paul
> 
Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Real-Time Preemption and UML?

2005-02-07 Thread Esben Nielsen
Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I
managed to get it to compile but it wont start - it simply stops somewhere
in start_kernel() :-(

Have anyone else looked at it?

It doesn't sound like it makes much sense to have PREEMPT_REALTIME for UML
but I thought it was a good developing platform for playing around
before going to the real hardware, where the latency meassurements
of course have to take place. The turn around time should be much shorter
than rebooting a full PC every time and the possibility of getting debug
output in the beginning should also be much better.

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and UML?

2005-02-07 Thread Esben Nielsen
Well, I keep trying a little bit more. In the mean while you can get some
of the stuff I needed to change to at least get it to compile:

One of the problems was use of direct architecture specific semaphores
(which doesn't work under PREEMPT_REALTIME) and in places where a quick
(maybe too quick) look at the code told me that completions ought to be
used. Therefore I changed two semaphores to completions which compiled
fine. I have tried the change on 2.6.11-rc2, and it seemed to work, but I
have not tested it heavily.

The patch is in an attachment - I hope the mail-list will alow that. It is
simply too trouplesome otherwise when I am using Pine as mail client.

Esben


On Mon, 7 Feb 2005, Jeff Dike wrote:

> [EMAIL PROTECTED] said:
> > Hi, I am trying to compile and run UM-Linux with PREEMPT_REALTIME. I
> > managed to get it to compile but it wont start - it simply stops
> > somewhere in start_kernel() :-( 
> 
> I've never played with preemption on UML.  No doubt it needs some work...
> 
>   Jeff
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [EMAIL PROTECTED]
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
--- linux-2.6.11-rc2-um/arch/um/drivers/port_kern.c.orig2005-01-23 
15:53:29.0 +0100
+++ linux-2.6.11-rc2-um/arch/um/drivers/port_kern.c 2005-02-06 
19:54:52.0 +0100
@@ -23,7 +23,7 @@
 struct port_list {
struct list_head list;
int has_connection;
-   struct semaphore sem;
+   struct completion done;
int port;
int fd;
spinlock_t lock;
@@ -66,7 +66,7 @@
conn->fd = fd;
list_add(&conn->list, &conn->port->connections);
 
-   up(&conn->port->sem);
+   complete(&conn->port->done);
return(IRQ_HANDLED);
 }
 
@@ -183,13 +183,14 @@
*port = ((struct port_list) 
{ .list = LIST_HEAD_INIT(port->list),
  .has_connection   = 0,
- .sem  = __SEMAPHORE_INITIALIZER(port->sem, 
- 0),
  .lock = SPIN_LOCK_UNLOCKED,
  .port = port_num,
  .fd   = fd,
  .pending  = LIST_HEAD_INIT(port->pending),
  .connections  = LIST_HEAD_INIT(port->connections) });
+
+   init_completion(&port->done), 
+
list_add(&port->list, &ports);
 
  found:
@@ -221,7 +222,7 @@
int fd;
 
while(1){
-   if(down_interruptible(&port->sem))
+   if(wait_for_completion_interruptible(&port->done))
return(-ERESTARTSYS);
 
spin_lock(&port->lock);
--- linux-2.6.11-rc2-um/arch/um/drivers/xterm_kern.c.orig   2005-01-23 
15:53:29.0 +0100
+++ linux-2.6.11-rc2-um/arch/um/drivers/xterm_kern.c2005-02-06 
19:54:58.0 +0100
@@ -16,7 +16,7 @@
 #include "xterm.h"
 
 struct xterm_wait {
-   struct semaphore sem;
+   struct completion ready;
int fd;
int pid;
int new_fd;
@@ -32,7 +32,7 @@
return(IRQ_NONE);
 
xterm->new_fd = fd;
-   up(&xterm->sem);
+   complete(&xterm->ready);
return(IRQ_HANDLED);
 }
 
@@ -49,10 +49,10 @@
 
/* This is a locked semaphore... */
*data = ((struct xterm_wait) 
-   { .sem  = __SEMAPHORE_INITIALIZER(data->sem, 0),
- .fd   = socket,
+   { .fd   = socket,
  .pid  = -1,
  .new_fd   = -1 });
+   init_completion(&data->ready);
 
err = um_request_irq(XTERM_IRQ, socket, IRQ_READ, xterm_interrupt, 
 SA_INTERRUPT | SA_SHIRQ | SA_SAMPLE_RANDOM, 
@@ -68,7 +68,7 @@
 *
 * XXX Note, if the xterm doesn't work for some reason (eg. DISPLAY
 * isn't set) this will hang... */
-   down(&data->sem);
+   wait_for_completion(&data->ready);
 
free_irq_by_irq_and_dev(XTERM_IRQ, data);
free_irq(XTERM_IRQ, data);


Re: Real-Time Preemption and UML?

2005-02-08 Thread Esben Nielsen
On Tue, 8 Feb 2005, Jeff Dike wrote:

> [EMAIL PROTECTED] said:
> > Jeff, any objections against adding this change to UML at some point?
> 
> No, not at all.  I just need to understand what CONFIG_PREEMPT requires of
> UML.

Ingo can probably tell you in much more detail. My problem when I tried to
compile with CONFIG_PREEMPT_RT (not CONFIG_PREEMPT!) was that
__SEMAPHORE_INITIALIZER didn't exist since the architecture specific
semaphore.h is not included in that configuration. The reason again is
that locking (not completions) is changed a lot under CONFIG_PREEMPT_RT to
introduce muteces instead of raw spinlocks and priority inheritance to
make these lockings behave deterministicly.

> 
> >From a quick read of Documentation/preempt-locking.txt, this looks like it's
> implementing Rule #3 (unlock by the same task that locked), which looks fine.
>

Now I don't really know who I am responding to. But both up()s now changed
to complete()s are in something looking very much like an interrupt
handler. But again, as I said, I didn't analyze the code in detail, I just
made it compile and checked that it worked in bare 2.6.11-rc2 UML  - which
I am not too sure how to set up and use to begin with!
 
>   Jeff
> 

Esben


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and UML?

2005-02-08 Thread Esben Nielsen

On Tue, 8 Feb 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > Now I don't really know who I am responding to. But both up()s now
> > changed to complete()s are in something looking very much like an
> > interrupt handler. But again, as I said, I didn't analyze the code in
> > detail, I just made it compile and checked that it worked in bare
> > 2.6.11-rc2 UML - which I am not too sure how to set up and use to
> > begin with!
> 
> btw., UML is really easy to begin with: after you've compiled you get a
> 'linux' binary in the toplevel directory - just execute it via './linux'
> and you'll see a Linux kernel booting - that's all you need!
> 
> Add a filesystem image via a root= parameter to that command and the UML
> kernel will start booting that filesystem image. (if you are adventurous
> you can even boot a real partition, but for the first user this is
> strongly discouraged.) There are a number of UML-ready filesystem images
> downloadable from the net.
> 
Thanks, I managed to get that far after googling a bit. I have had some 
problems with the filesystem though. Fixed now (I forgot to compile ext3
in *blush*.) But you might still be interessted in this trace (2.6.11-rc2
with or without my changes):

line_ioctl: tty0: ioctl KDSIGACCEPT called
Debug: sleeping function called from invalid context at
include/asm/arch/semaphore.h:107
in_atomic():0, irqs_disabled():1
Call Trace: 
a08639e0:  [] __might_sleep+0x9b/0xb8
a0863a10:  [] uml_console_write+0x20/0x54
a0863a30:  [] __call_console_drivers+0x50/0x58
a0863a60:  [] call_console_drivers+0x7d/0x124
a0863a90:  [] release_console_sem+0xa3/0x25c
a0863aa0:  [] release_console_sem+0xbc/0x25c
a0863ac0:  [] vprintk+0x193/0x2d0
a0863ae0:  [] printk+0x12/0x14
a0863b00:  [] line_ioctl+0x8e/0x94
a0863b24:  [] line_ioctl+0x0/0x94
a0863b30:  [] tty_ioctl+0xfd/0x680
a0863b80:  [] do_ioctl+0x3f/0x64
a0863bb0:  [] sys_ioctl+0x13d/0x350
a0863bd0:  [] sys_open+0x5b/0x74
a0863be0:  [] sys_open+0x4c/0x74
a0863c00:  [] execute_syscall_tt+0xa1/0xe0
a0863c1c:  [] sigemptyset+0x17/0x30
a0863c70:  [] record_syscall_start+0x4e/0x58
a0863c90:  [] syscall_handler_tt+0x3f/0x74
a0863cc0:  [] sig_handler_common_tt+0x90/0x108
a0863cd0:  [] sig_handler_common_tt+0xf1/0x108
a0863d00:  [] sig_handler+0x1f/0x38
a0863d20:  [] __restore+0x0/0x8

It could look like a semaphore which should be replaced by a spinlock
(which will become a mutex in preempt-realtime :-)


Esben

>   Ingo

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-time rw-locks (Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-15)

2005-01-30 Thread Esben Nielsen
On Fri, 28 Jan 2005, Ingo Molnar wrote:

> 
> * Esben Nielsen <[EMAIL PROTECTED]> wrote:
> 
> > I noticed that you changed rw-locks to behave quite diferently under
> > real-time preemption: They basicly works like normal locks now. I.e.
> > there can only be one reader task within each region. This can can
> > however lock the region recursively. [...]
> 
> correct.
> 
> > [...] I wanted to start looking at fixing that because it ought to
> > hurt scalability quite a bit - and even on UP create a few unneeded
> > task-switchs. [...]
> 
> no, it's not a big scalability problem. rwlocks are really a mistake -
> if you want scalability and spinlocks/semaphores are not enough then one
> should either use per-CPU locks or lockless structures. rwlocks/rwsems
> will very unlikely help much.
>
I agree that RCU ought to do the trick in a lot of cases. Unfortunately,
people haven't used RCU in a lot of code but an rwlock. I also like the
idea of rwlocks: Many readers or just one writer. I don't see the need to
take that away from people.  Here is an examble which even on a UP will
give problems without it:
You have a shared datastructure, rarely updated with many readers. A low
priority task is reading it. That is preempted a high priority task which
finds out it can't read it -> priority inheritance, task switch. The low
priority task finishes the job -> priority set back, task switch. If it
was done with a rwlock two task switchs would have been saved.

 
> > However, the more I think about it the bigger the problem:
> 
> yes, that complexity to get it perform in a deterministic manner is why
> i introduced this (major!) simplification of locking. It turns out that
> most of the time the actual use of rwlocks matches this simplified
> 'owner-recursive exclusive lock' semantics, so we are lucky.
> 
> look at what kind of worst-case scenarios there may already be with
> multiple spinlocks (blocker.c). With rwlocks that just gets insane.
> 
Yes it does. But one could make a compromise: The up_write() should _not_
be deterministic. In that case it would be very simple to implement.
up_read() could still be deterministic as it would only involve boosting
one writer in the rare case such exists. That kind of locking would be
very usefull in many real-time systems. Ofcourse, RCU can do the job as
well, but it puts a lot of contrains on the code. 

However, as Linux is a general OS there is no way to know wether a
specific lock needs to be determnistic wrt. writing or not as the actual
application is not known at the time the lock type is specified.

>   Ingo
> 

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Real-Time Preemption and GFP_ATOMIC

2005-02-02 Thread Esben Nielsen
On 2 Feb 2005, Kevin Hilman wrote:

> While testing an older driver on an -RT kernel (currently using
> -V0.7.37-03), I noticed something strange.
> 
> The driver was triggering a "sleeping function called from invalid
> context" BUG().  It was coming from a case where the driver was doing
> a __get_free_page(GFP_ATOMIC) while interrupts were disabled (example
> trace below).  I know this is probably real bug and it shouldn't be
> allocating memory with interrupts disabled, but shouldn't this be
> possible?  Isn't the role of GFP_ATOMIC to say that "this caller
> cannot sleep". 
> 
The problem is that almost all locks are replaced by mutexex which
can make yuo sleep. That includes locks around the various allocation
structures.

This is one of those places where I think Ingo have gone too far, but I
see that the code in mm/ is not fitted for for doing anything else
but what Ingo have done right now. It would require some rewriting to fix
it.

The basic allocations should be of the free-list form
  res = first_free;
  if(res) {
 first_free = res->next;
  }
  return res;

I se no problem in protecting this kind of operation by a raw spinlock.
Using a mutex to protect such a list would be a waste: You would have to
lock and unlock the mutex's spinlock twice! If it was made that way, i.e.
the very basic free-list operation was taken out in front of the more
complicated stuff in mm/slap.c GFP_ATOMIC could be made to work as usual.

The hard job is that the refill operation has to be done under a mutex
under PREEMPT_RT. I.e. suddenly there are two locks to take care off.

Esben


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, deactivate() scheduling issue

2005-03-03 Thread Esben Nielsen
As I read the code the driver task (A) should _not_ be removed from the
runqueue. It has to be waken up to call schedule_timeout() such it gets
back on the runqueue after 10 ms. If it is taken out of the runqueue at
line 76 it will stay off the runqueue forever in the TASK_UNINTERRUBTIBLE
state!

As I read the use PREEMPT_ACTIVE, it is there to test on wether this
rescheduling is volentery or forced (a preemption). If it is forced the
task shall ofcourse not go off the runqueue but stay there to run again
when it gets the highest priority. That is why PREEMPT_ACTIVE is set in
preempt_schedule() and preempt_schedule_irq(). On the other hand if the
task itself has called schedule() or schedule_timeout() it has to go out
of the runqueue and wait for some event to wake it up.

Yes there will be tasks in state other that TASK_RUNNING on the runqueue.
The "bug" as I see it is in the scheduler interface: There is no way to
set the task state and call schedule() or schedule_timeout() atomicly.
Therefore you can be preempted while the state is not TASK_RUNNING.

Esben


On Thu, 3 Mar 2005, Eugeny S. Mints wrote:

> please consider the following scenario for full RT kernel.
> 
> Task A is running then an irq is occured which in turn wakes up irq 
> related thread (B) of a higher priority than A.
> 
> my current understanding that actual context switch between A and B will 
> occure at preempt_schedule_irq() on the "return form irq " path.
> 
> in this case the following "if" statement in __schedule() always returns 
> false since  preempt_schedule_irq() always sets up  PREEMPT_ACTIVE 
> before __schedule() call.
> 
>  if ((prev->state & ~TASK_RUNNING_MUTEX) &&
>  !(preempt_count() & PREEMPT_ACTIVE)) {
> 
> as result the deactivate() is never called for preempted task A in this 
> scenario. BUt if the task A is preempted while not in TASK_RUNNING state 
> such behaviour seems incorrect since we get a task in not TASK_RUNNING 
> state linked into a run queue.
> 
> An example:
> 
> drivers/net/irda/sir_dev.c: 76 (2.6.10 kernel)
> 
>  spin_lock_irqsave(&dev->tx_lock, flags); /* serialize th other 
> tx operations */
>  while (dev->tx_buff.len > 0) {/* wait until tx idle */
>  spin_unlock_irqrestore(&dev->tx_lock, flags);
> 76: set_current_state(TASK_UNINTERRUPTIBLE);
>  schedule_timeout(msecs_to_jiffies(10));
>  spin_lock_irqsave(&dev->tx_lock, flags);
>  }
> 
> At  line 76 irqs are enabled, preemption is enabled.
> Let assume the task A executes this code and gets preempted right after 
> line 76. Task state is TASK_UNINTERRUPTIBLE but it will not be 
> deactevated. Of cource this is the bug in set_current_state() 
> utilization in this particular driver but schedule stuff should be 
> robust to such bugs I believe. There are a lot such bugs in the kernel I 
> believe.
> 
> Not sure what the actual reason for !(preempt_count() & PREEMPT_ACTIVE)) 
>condition is but if it's just a sort of optimization (not remove a 
> task from run queue if it was preemped in TASK_RUNNING state) then 
> probably it should be removed in order to save correctness. Patch attached.
> 
>   Eugeny
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07

2005-03-24 Thread Esben Nielsen
On Thu, 24 Mar 2005, Ingo Molnar wrote:

> 
> * Steven Rostedt <[EMAIL PROTECTED]> wrote:
> 
> > Here we have more unnecessary schedules.  So the condition to grab a 
> > lock should be:
> > 
> > 1. not owned.
> > 2. partially owned, and the owner is not RT.
> > 3. partially owned but the owner is RT and so is the grabber, and the
> > grabber's priority is >= the owner's priority.
> 
> there's another approach that could solve this problem: let the 
> scheduler sort it all out. Esben Nielsen had this suggestion a couple of 
> months ago - i didnt follow it because i thought that technique would 
> create too many runnable tasks, but maybe that was a mistake. If we do 
> the owning of the lock once the wakee hits the CPU we avoid the 'partial 
> owner' problem, and we have the scheduler sort out priorities and 
> policies.
> 
> but i think i like the 'partial owner' (or rather 'owner pending') 
> technique a bit better, because it controls concurrency explicitly, and 
> it would thus e.g. allow another trick: when a new owner 'steals' a lock 
> from another in-flight task, then we could 'unwakeup' that in-flight 
> thread which could thus avoid two more context-switches on e.g. SMP 
> systems: hitting the CPU and immediately blocking on the lock. (But this 
> is a second-phase optimization which needs some core scheduler magic as 
> well, i guess i'll be the one to code it up.)
> 

I checked the implementation of a mutex I send in last fall. The unlock
operation does give ownership explicitly to the highest priority waiter,
as Ingo's implementation does.

Originally I planned for just having unlock to wake up the highest
priority owner and set lock->owner = NULL. The lock operation would be
something like
 while(lock->owner!=NULL)
   {
  schedule();
   } 
 grap the lock.

Then the first task, i.e. the one with highest priority on UP, will get it
first. On SMP a low priority task on another CPU might get in and take it.

I like the idea of having the scheduler take care of it - it is a very
optimal coded queue-system after all. That will work on UP but not on SMP.
Having the unlock operation to set the mutex in a "partially owned" state
will work better. The only problem I see, relative to Ingo's
implementation, is that then the awoken task have to go in and
change the state of the mutex, i.e. it has to lock the wait_lock again.
Will the extra schedulings being the problem happen offen enough in
practise to have the extra overhead?


>   Ingo

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07

2005-03-30 Thread Esben Nielsen
On Wed, 30 Mar 2005, Steven Rostedt wrote:

> [...] 
> 
> Heck, I'll make it bloat city till I get it working, and then tone it
> down a little :-)  And maybe later we can have a better solution for the
> BKL.
> 
What about removing it alltogether?
Seriously, how much work would it be to simply remove it and go in and
make specific locks in all those places the code can't compile?

Esben

> -- Steve
> 
> 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] Real-Time Preemption, -RT-2.6.12-rc1-V0.7.41-07

2005-03-31 Thread Esben Nielsen
On Thu, 31 Mar 2005, Ingo Molnar wrote:

> 
> * Steven Rostedt <[EMAIL PROTECTED]> wrote:
> 
> > Well, here it finally is. There's still things I don't like about it. 
> > But it seems to work, and that's the important part.
> > 
> > I had to reluctantly add two items to the task_struct.  I was hoping 
> > to avoid that. But because of race conditions it seemed to be the only 
> > way.
> 
> well it's not a big problem, and we avoided having to add flags to the 
> rt_lock structure, which is the important issue.
> 
I was going to say the opposit. I know that there are many more rt_locks
locks around and the fields thus will take more memory when put there but
I believe it is more logical to have the fields there.

Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] RT: Add priority-queuing and priority-inheritance to workqueue infrastructure

2007-08-01 Thread Esben Nielsen



On Wed, 1 Aug 2007, Daniel Walker wrote:


On Wed, 2007-08-01 at 07:59 -0400, Gregory Haskins wrote:

On Tue, 2007-07-31 at 20:52 -0700, Daniel Walker wrote:



Here's a simpler version .. uses the plist data structure instead of the
100 queues, which makes for a cleaner patch ..


Hi Daniel,

I like your idea on the plist simplification a lot.  I will definitely
roll that into my series.

I am not too psyched about using the rt_mutex_setprio() API directly,
however.  It seems broken to be calling that directly from non rt_mutex
code, IMHO.  Perhaps the PI subsystem should be broken out from the
rt_mutex code so it can be used generally?  There are other areas where
PI could potentially be used besides rt_mutex (this patch as a prime
example), so this might make sense.


rt_mutex_setprio() is just a function. It was also designed specifically
for PI , so it seems fairly sane to use it in other PI type
situations ..

Daniel



There seems to be a general need for boosting threads temporarely in a few 
places. HR-timers also have it, last time I checked. And preemptive RCU as 
well for boosting RCU readers. They all seems to deal with the same issues 
of correctly dealing with setting the priority and PI bosting from 
mutexes.


When boosting of RCU readers was discussed I came to the conclusion that 
the boosting property should be taken out of the of the rt_mutex module 
and instead be made into a sub-property of the scheduler:


task->pi_waiters should be replaced with task->prio_boosters being a 
pi_list of struct prio_booster representing something, which temporarely 
wants to boost a task.
A rt_mutex_waiter should of course contain a prio_booster which is added 
to owner->prio_boosters. A work queue element should contain a prio_booster.
When boosting a RCU reader a prio_booster is added  to the reader's 
prio_boosters.


Such a system will correctly deal with boosters going away in arbitrary 
order. Something which is not strait forward when each user of boosting is 
trying to do it on their own.


Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] Broken ArcNet com20020 pcmcia driver in 2.6.20

2007-02-07 Thread Esben Nielsen

Hi,
 I can not get my com20020 pcmcia driver to work as a module under 2.6.20.
There is the build problem:

MODPOST 30 modules
WARNING: "com20020_found" [drivers/net/pcmcia/com20020_cs.ko] undefined!
WARNING: "com20020_check" [drivers/net/pcmcia/com20020_cs.ko] undefined!

The solution:
Always export com20020_found and com20020_check.

Esben

 drivers/net/arcnet/com20020.c |3 ---
 1 file changed, 3 deletions(-)

Index: linux-2.6.20/drivers/net/arcnet/com20020.c
===
--- linux-2.6.20.orig/drivers/net/arcnet/com20020.c
+++ linux-2.6.20/drivers/net/arcnet/com20020.c
@@ -337,11 +337,8 @@ static void com20020_set_mc_list(struct
}
 }

-#if defined(CONFIG_ARCNET_COM20020_PCI_MODULE) || \
-defined(CONFIG_ARCNET_COM20020_ISA_MODULE)
 EXPORT_SYMBOL(com20020_check);
 EXPORT_SYMBOL(com20020_found);
-#endif

 MODULE_LICENSE("GPL");



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Broken ArcNet com20020 pcmcia driver in 2.6.20

2007-02-07 Thread Esben Nielsen



On Wed, 7 Feb 2007, Randy Dunlap wrote:


Esben Nielsen wrote:

 Hi,
 I can not get my com20020 pcmcia driver to work as a module under 2.6.20.
 There is the build problem:


Please send me your .config file.  I can't seem to reproduce this.


The relevant parts:
...
CONFIG_ARCNET_COM20020=m
# CONFIG_ARCNET_COM20020_ISA is not set
# CONFIG_ARCNET_COM20020_PCI is not set
...
CONFIG_ARCNET_COM20020_CS=m

In this hiracy it is not nice for the com20020 module to check wether 
there are users for it. It breaks the direction of dependency. What if I 
first compile com20020 install it, and then decide I want one of the other

modules? Then would I have to recompile com20020 and reload it.
Therefore: Remove the check, do always export the symbols.

As for actually testing: I have access to ArcNet hardware for 3 more 
weeks. Then I will start at a new job, where there is no ArcNet.


Esben




 MODPOST 30 modules
WARNING:  "com20020_found" [drivers/net/pcmcia/com20020_cs.ko] undefined!
WARNING:  "com20020_check" [drivers/net/pcmcia/com20020_cs.ko] undefined!

 The solution:
 Always export com20020_found and com20020_check.

 Esben

  drivers/net/arcnet/com20020.c |3 ---
  1 file changed, 3 deletions(-)

 Index: linux-2.6.20/drivers/net/arcnet/com20020.c
 ===
 --- linux-2.6.20.orig/drivers/net/arcnet/com20020.c
 +++ linux-2.6.20/drivers/net/arcnet/com20020.c
 @@ -337,11 +337,8 @@ static void com20020_set_mc_list(struct
  }
  }

 -#if defined(CONFIG_ARCNET_COM20020_PCI_MODULE) || \
 -defined(CONFIG_ARCNET_COM20020_ISA_MODULE)
  EXPORT_SYMBOL(com20020_check);
  EXPORT_SYMBOL(com20020_found);
 -#endif

  MODULE_LICENSE("GPL");


--
~ Randy


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.20-rt5 Oops on boot

2007-02-14 Thread Esben Nielsen



On Sat, 10 Feb 2007, Andrew Burgess wrote:


I have terrible news: 2.6.20-rt5 does not boot at all on a couple
machines I was brave enough to try -- a [EMAIL PROTECTED] SMP/HT desktop, and a
Core2 Duo [EMAIL PROTECTED] laptop.


Ditto for me on an ASUS AMD64 x2, just hangs, I have no
serial console. 2.6.20-rc5-rt7 booted ok (the last one I
tried). Both using SMP.



2.6.20-rt5 doesn't boot on my hyperthreaded Pentium 4 2.80GHz either. I 
don't have serial console but I can see that it stops at exit.c:877.


Esben


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.19-rc6-rt0, -rt YUM repository

2006-11-16 Thread Esben Nielsen

On Thu, 16 Nov 2006, Daniel Walker wrote:


On Thu, 2006-11-16 at 16:35 +0100, Ingo Molnar wrote:


-rt0 is a rebase of -rt to 2.6.19-rc6, with lots of updates and fixes
included. It includes the latest -hrt-dynticks tree and more.



Does the zero carry and meaning or did you just decide start using zero
instead of one?

Daniel


0 bugs?

Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] convert mmap_sem to a scalable rw_mutex

2007-05-12 Thread Esben Nielsen



On Fri, 11 May 2007, Peter Zijlstra wrote:



I was toying with a scalable rw_mutex and found that it gives ~10% reduction in
system time on ebizzy runs (without the MADV_FREE patch).



You break priority enheritance on user space futexes! :-(
The problems is that the futex waiter have to take the mmap_sem. And as 
your rw_mutex isn't PI enabled you get priority inversions :-(


Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] convert mmap_sem to a scalable rw_mutex

2007-05-12 Thread Esben Nielsen



On Sat, 12 May 2007, Peter Zijlstra wrote:


On Sat, 2007-05-12 at 11:27 +0200, Esben Nielsen wrote:


On Fri, 11 May 2007, Peter Zijlstra wrote:



I was toying with a scalable rw_mutex and found that it gives ~10% reduction in
system time on ebizzy runs (without the MADV_FREE patch).



You break priority enheritance on user space futexes! :-(
The problems is that the futex waiter have to take the mmap_sem. And as
your rw_mutex isn't PI enabled you get priority inversions :-(


Do note that rwsems have no PI either.
PI is not a concern for mainline - yet, I do have ideas here though.


If PI wasn't a concern for mainline, why is PI futexes merged into the 
mainline?


I notice that the rwsems used now isn't priority inversion safe (thus 
destroyingthe perpose of having PI futexes). We thus already have a bug in 
the mainline.


I suggest making a rw_mutex which does read side PI: A reader boosts the 
writer, but a writer can't boost the readers, since there can be a large 
amount of those.


I don't have time to make such a rw_mutex but I have a simple idea for 
one, where the rt_mutex can be reused.


 struct pi_rw_mutex {
  int count; /*  0   -> unoccupied,
 >0 -> the number of current readers,
   Second highest bit: there are a waiting writer
 -1 -> A writer have it. */
  struct rt_mutex mutex;
 }

Use atomic exchange on count.

When locking:

A writer checks if count <= 0. If so it sets the value to -1 and takes
the mutex. When it gets the mutex it rechecks the count and proceeds.
If count > 0 the writer sets the second highest bit and add itself to
the wait-list in the mutex and sleeps. (The mutex will now be in a state 
where owner==NULL but there are waiters. It must be cheched if the 
rt_mutex code can handle this.)


A reader checks if count >= 0. If so it does count++ and proceeds.
If count < 0 it takes the rtmutex. When it gets the mutex it sets the 
count to 1, unlocks the mutex and proceeds.


When unlocking:

The writer sets count to 0 or 0x800 (second highest bit) depending 
on how many waiters the mutex have and unlocks the mutex.


The reader checks if count is 0x8001. If so it sets count to 0 and 
wakes up the first waiter on the mutex (if there are any). Otherwise it 
just do count--.


Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] convert mmap_sem to a scalable rw_mutex

2007-05-12 Thread Esben Nielsen



On Sat, 12 May 2007, Ingo Molnar wrote:



* Esben Nielsen <[EMAIL PROTECTED]> wrote:


I notice that the rwsems used now isn't priority inversion safe (thus
destroying the perpose of having PI futexes). We thus already have a
bug in the mainline.


you see everything in black and white, ignoring all the grey scales!
Upstream PI futexes are perfectly fine as long as the mm semaphore is
not write-locked (by anyone) while the critical path is running. Given
that real-time tasks often use mlockall and other practices to simplify
their workload so this is not all that hard to achieve.



Yeah, after sending that mail I realized I accepted this fact way back...
But I disagree in that it is easy to avoid not write-lcling the mm 
semaphore: A simple malloc() might lead to a mmap() call creating trouble. 
Am I right?




I suggest making a rw_mutex which does read side PI: A reader boosts
the writer, but a writer can't boost the readers, since there can be a
large amount of those.


this happens automatically when you use Peter's stuff on -rt.


Because the rw_mutex is translated into a rt_mutex - as you say below.


But
mainline should not be bothered with this.



I disagree. You lay a large burdon on the users of PI futexes to avoid 
write locking the mm semaphore. PI boosting those writers would be a good 
idea even in the mainline.



I don't have time to make such a rw_mutex but I have a simple idea for
one, where the rt_mutex can be reused.


Peter's stuff does this already if you remap all the mutex ops to
rt_mutex ops. Which is also what happens on -rt automatically. Ok?



That is how -rt works and should work. No disagrement there.


[ for mainline it is totally pointless and unnecessary to slow down all
 MM ops via an rt_mutex, because there are so many other, much larger
 effects that make execution time unbound. (interrupts for example) ]



1) How much slower would the pi_rw_mutex I suggested really be? As far as 
I see there is only an overhead when there is congestion. I can not see 
that that overhead is much larger than a non-PI boosting implementation.


2) I know that execution time isn't bounded in the main-line - that is why 
-rt is needed. But it is _that_ bad? How low can you get your latencies 
with preemption on on a really busy machine?


Basically, I think PI futexes should provide the same kind of latencies 
as the basic thread latency. That is what the user would expect.
Priority invertions should be removed, but ofcourse you can't do better 
than the OS does otherwise.



Ingo



Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] convert mmap_sem to a scalable rw_mutex

2007-05-14 Thread Esben Nielsen



On Sat, 12 May 2007, Eric Dumazet wrote:


Esben Nielsen a écrit :



 On Sat, 12 May 2007, Peter Zijlstra wrote:

>  On Sat, 2007-05-12 at 11:27 +0200, Esben Nielsen wrote:
> > 
> >  On Fri, 11 May 2007, Peter Zijlstra wrote:
> > 
> > > 
> > >  I was toying with a scalable rw_mutex and found that it gives ~10% 
> > >  reduction in

> > >  system time on ebizzy runs (without the MADV_FREE patch).
> > > 
> > 
> >  You break priority enheritance on user space futexes! :-(
> >  The problems is that the futex waiter have to take the mmap_sem. And 
> >  as

> >  your rw_mutex isn't PI enabled you get priority inversions :-(
> 
>  Do note that rwsems have no PI either.

>  PI is not a concern for mainline - yet, I do have ideas here though.
> 
>

 If PI wasn't a concern for mainline, why is PI futexes merged into the
 mainline?


If you really care about futexes and mmap_sem, just use private futexes, 
since they dont use mmap_sem at all.




futex_wait_pi() takes mmap_sem. So does futex_fd(). I can't see a code 
path into the PI futexes which doesn't use mmap_sem.


There is another way to avoid problems with mmap_sem:
Use shared memory for data you need to share with high priority tasks and
normal low priority tasks there. The high priority task(s) run(s) in 
an isolated high priority process having its own mmap_sem. This high 
priority process is mlock'ed and doesn't do any operations write locking 
mmap_sem.


But it would be nice if you can avoid such a cumbersome workaround...

Esben

Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-15 Thread Esben Nielsen

On Fri, 13 Apr 2007, Ingo Molnar wrote:


[announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

i'm pleased to announce the first release of the "Modular Scheduler Core
and Completely Fair Scheduler [CFS]" patchset:

  http://redhat.com/~mingo/cfs-scheduler/sched-modular+cfs.patch

This project is a complete rewrite of the Linux task scheduler. My goal
is to address various feature requests and to fix deficiencies in the
vanilla scheduler that were suggested/found in the past few years, both
for desktop scheduling and for server scheduling workloads.

[...]


I took a brief look at it. Have you tested priority inheritance?
As far as  I can see rt_mutex_setprio doesn't have much effect on 
SCHED_FAIR/SCHED_BATCH. I am looking for a place where such a task change 
scheduler class when boosted in rt_mutex_setprio().


Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]

2007-04-16 Thread Esben Nielsen

On Sun, 15 Apr 2007, Ingo Molnar wrote:



* Esben Nielsen <[EMAIL PROTECTED]> wrote:


I took a brief look at it. Have you tested priority inheritance?


yeah, you are right, it's broken at the moment, i'll fix it. But the
good news is that i think PI could become cleaner via scheduling
classes.


As far as I can see rt_mutex_setprio doesn't have much effect on
SCHED_FAIR/SCHED_BATCH. I am looking for a place where such a task
change scheduler class when boosted in rt_mutex_setprio().


i think via scheduling classes we dont have to do the p->policy and
p->prio based gymnastics anymore, we can just have a clean look at
p->sched_class and stack the original scheduling class into
p->real_sched_class. It would probably also make sense to 'privatize'
p->prio into the scheduling class. That way PI would be a pure property
of sched_rt, and the PI scheduler would be driven purely by
p->rt_priority, not by p->prio. That way all the normal_prio() kind of
complications and interactions with SCHED_OTHER/SCHED_FAIR would be
eliminated as well. What do you think?



Now I have not read your patch into detail. But agree it would be nice to 
have it more "OO" and remove cross references between schedulers. But 
first one should consider wether PI between SCHED_FAIR tasks or not is 
usefull or not. Does PI among dynamic priorities make sense at all? I think it
does: On heavy loaded systems where a nice 19 might not get the CPU for 
very long, a nice -20 task can be priority inverted for a very long 
time.
But I see no need it taking the dynamic part of the effective priorities 
into account. The current/old solution of mapping the static nice values 
into a global priority index which can incorporate the two scheduler 
classes is probably good enough - it just has to be "switched on" a again 
:-)


But what about other scheduler classes which some people want to add in
the future? What about having a "cleaner design"?

My thought was to generalize the concept of 'priority' to be an
object (a struct prio) to be interpreted with help from a scheduler class 
instead of globally interpreted integer.


int compare_prio(struct prio *a, struct prio *b)
{
if (a->sched_class->class_prio < b->sched_class->class_prio)
return -1;

if (a->sched_class->class_prio < b->sched_class->class_prio)
return +1;


return a->sched_class->compare_prio(a, b);

}

Problem 1: Performance.

Problem 2: Operations on a plist with these generalized priorities are not 
bounded because the number of different priorites are not bounded.


Problem 2 could be solved by using a combined plist (for rt priorities) 
and rbtree (for fair priorities) - making operations logarithmic just as 
the fair scheduler itself. But that would take more memory for every 
rtmutex.


I conclude that is too complicated and go on to the obvious idea:
Use a global priority index where each scheduler class get's it own 
range (rt: 0-99, fair 100-139 :-). Let the scheduler class have a 
function returning it instead of reading it directly from task_struct such

that new scheduler classes can return their own numbers.

Esben



Ingo


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: CFS and suspend2: hang in atomic copy (was: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS])

2007-04-19 Thread Esben Nielsen



On Wed, 18 Apr 2007, Ingo Molnar wrote:



* Christian Hesse <[EMAIL PROTECTED]> wrote:


Hi Ingo and all,

On Friday 13 April 2007, Ingo Molnar wrote:

as usual, any sort of feedback, bugreports, fixes and suggestions are
more than welcome,


I just gave CFS a try on my system. From a user's point of view it
looks good so far. Thanks for your work.


you are welcome!


However I found a problem: When trying to suspend a system patched
with suspend2 2.2.9.11 it hangs with "doing atomic copy". Pressing the
ESC key results in a message that it tries to abort suspend, but then
still hangs.


i took a quick look at suspend2 and it makes some use of yield().
There's a bug in CFS's yield code, i've attached a patch that should fix
it, does it make any difference to the hang?

Ingo

Index: linux/kernel/sched_fair.c
===
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -264,15 +264,26 @@ static void dequeue_task_fair(struct rq

/*
 * sched_yield() support is very simple via the rbtree, we just
- * dequeue and enqueue the task, which causes the task to
- * roundrobin to the end of the tree:
+ * dequeue the task and move it to the rightmost position, which
+ * causes the task to roundrobin to the end of the tree.
 */
static void requeue_task_fair(struct rq *rq, struct task_struct *p)
{
dequeue_task_fair(rq, p);
p->on_rq = 0;
-   enqueue_task_fair(rq, p);
+   /*
+* Temporarily insert at the last position of the tree:
+*/
+   p->fair_key = LLONG_MAX;
+   __enqueue_task_fair(rq, p);
p->on_rq = 1;
+
+   /*
+* Update the key to the real value, so that when all other
+* tasks from before the rightmost position have executed,
+* this task is picked up again:
+*/
+   p->fair_key = rq->fair_clock - p->wait_runtime + p->nice_offset;


I don't think it safe to change the key after inserting the element in the 
tree. You end up with an unsorted tree giving where new entries end up in 
wrong places "randomly".
I think a better approach would be to keep track of the rightmost entry, 
set the key to the rightmost's key +1 and then simply insert it there.


Esben




-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-05 Thread Esben Nielsen



On Wed, 2 May 2007, Ingo Molnar wrote:



* Balbir Singh <[EMAIL PROTECTED]> wrote:


The problem is with comparing a s64 values with (s64)ULONG_MAX, which
evaluates to -1. Then we check if exec_delta64 and fair_delta64 are
greater than (s64)ULONG_MAX (-1), if so we assign (s64)ULONG_MAX to
the respective values.


ah, indeed ...


The fix is to compare these values against (s64)LONG_MAX and assign
(s64)LONG_MAX to exec_delta64 and fair_delta64 if they are greater
than (s64)LONG_MAX.

Tested on PowerPC, the regression is gone, tasks are load balanced as
they were in v7.


thanks, applied!

Ingo


I have been wondering why you use usigned for timers anyway. It is also 
like that in hrtimers. Why not use signed and avoid (almost) all worries 
about wrap around issues. The trick is that when all

  a < b
is be replaced by
  a - b < 0
the code will work on all 2-complement machines even if the (signed!) 
integers a and b wrap around.


In both the hrtimer and CFS patch 32 bit timers could be used internally 
on 32 bit architectures to avoid expensive 64 bit integer calculations.
The only thing is: timeouts can not be bigger than 2^31 in the chosen 
units.


I have successfully coded a (much more primitive) hrtimer system for 
another OS on both ARM and PPC using the above trick in my former job. 
On both I used the machine's internal clock as the internal 
representation of time and I only scaled to a struct timespec in the user 
interface.


Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-07 Thread Esben Nielsen



On Sat, 5 May 2007, Linus Torvalds wrote:




On Sat, 5 May 2007, Esben Nielsen wrote:


I have been wondering why you use usigned for timers anyway. It is also like
that in hrtimers. Why not use signed and avoid (almost) all worries about wrap
around issues. The trick is that when all
  a < b
is be replaced by
  a - b < 0
the code will work on all 2-complement machines even if the (signed!) integers
a and b wrap around.


No. BOTH of the above are buggy.

The C language definition doesn't allow signed integers to wrap (ie it's
undefined behaviour), so "a-b < 0" can be rewritten by the compiler as a
simple signed "a < b".

And the unsigned (or signed) "a < b" is just broken wrt any kind of
wrap-around (whether wrapping around zero or the sign bit).

So the _only_ valid way to handle timers is to
- either not allow wrapping at all (in which case "unsigned" is better,
  since it is bigger)
- or use wrapping explicitly, and use unsigned arithmetic (which is
  well-defined in C) and do something like "(long)(a-b) > 0".

Notice? The signed variant is basically _never_ correct.



What is (long)(a-b) ? I have tried to look it up in the C99 standeard but 
I can't find it. Maybe it is in the referred LIA-1 standeard, which I 
can't find with google.


I think the best would be to use "a-b > ULONG_MAX/2" when you mean "aas that should be completely portable.


According to C99 Appendix H2.2 
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf) an 
implementation can choose to do modulo signed integers as it is 
mandatory for unsigned integers. If an implementation have choosen 
to do that it must be a bug to to do the "a-b < 0" -> "a

I have never experienced a compiler/architecture combination _not_ doing 
wrapped signed integers.


Esben



Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-07 Thread Esben Nielsen



On Sun, 6 May 2007, Linus Torvalds wrote:




On Sun, 6 May 2007, Ingo Molnar wrote:


* Linus Torvalds <[EMAIL PROTECTED]> wrote:


So the _only_ valid way to handle timers is to
 - either not allow wrapping at all (in which case "unsigned" is better,
   since it is bigger)
 - or use wrapping explicitly, and use unsigned arithmetic (which is
   well-defined in C) and do something like "(long)(a-b) > 0".


hm, there is a corner-case in CFS where a fix like this is necessary.

CFS uses 64-bit values for almost everything, and the majority of values
are of 'relative' nature with no danger of overflow. (They are signed
because they are relative values that center around zero and can be
negative or positive.)


Well, I'd like to just worry about that for a while.

You say there is "no danger of overflow", and I mostly agree that once
we're talking about 64-bit values, the overflow issue simply doesn't
exist, and furthermore the difference between 63 and 64 bits is not really
relevant, so there's no major reason to actively avoid signed entries.

So in that sense, it all sounds perfectly sane. And I'm definitely not
sure your "292 years after bootup" worry is really worth even considering.



I would hate to tell mission control for Mankind's first mission to another
star to reboot every 200 years because "there is no need to worry about it."

As a matter of principle an OS should never need a reboot (with exception 
for upgrading). If you say you have to reboot every 200 years, why not 
every 100? Every 50?  Every 45 days (you know what I am referring 
to :-) ?



When we're really so well off that we expect the hardware and software
stack to be stable over a hundred years, I'd start to think about issues
like that, in the meantime, to me worrying about those kinds of issues
just means that you're worrying about the wrong things.

BUT.

There's a fundamental reason relative timestamps are difficult and almost
always have overflow issues: the "long long in the future" case as an
approximation of "infinite timeout" is almost always relevant.

So rather than worry about the system staying up 292 years, I'd worry
about whether people pass in big numbers (like some MAX_S64 approximation)
as an approximation for "infinite", and once you have things like that,
the "64 bits never overflows" argument is totally bogus.

There's a damn good reason for using only *absolute* time. The whole
"signed values of relative time" may _sound_ good, but it really sucks in
subtle and horrible ways!



I think you are wrong here. The only place you need absolute time is a 
for the clock (CLOCK_REALTIME). You waste CPU using a 64 bit
representation when you could have used a 32 bit. With a 32 bit 
implementation you are forced to handle the corner cases with wrap 
around and too big arguments up front. With a 64 bit you hide those 
problems.


I think CFS would be best off using a 32 bit timer counting in micro 
seconds. That would wrap around in 72 minuttes. But as the timers are 
relative you will never be able to specify a timer larger than 36 minuttes 
in the future. But 36 minuttes is redicolously long for a scheduler and a 
simple test limiting time values to that value would not break anything.


Esben


Linus


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-08 Thread Esben Nielsen



On Mon, 7 May 2007, Johannes Stezenbach wrote:


On Mon, May 07, 2007, Linus Torvalds wrote:

On Mon, 7 May 2007, Esben Nielsen wrote:


What is (long)(a-b) ? I have tried to look it up in the C99 standeard but I
can't find it. Maybe it is in the referred LIA-1 standeard, which I can't find
with google.


C99 defines unsigned overflow semantics, but it doesn't say anything
about signed overflow, thus it's undefined -- and you have a hard
time finding it out.

However, I have no clue *why* it's undefined and not
implementation defined. Does someone know?


I don't worry about non-2's-complement machines (they don't exist, and
likely won't exist in the future either).


I think DSPs can do saturated arithmetics (clamp to min/max
values instead of wrap around). Not that it matters for Linux...


So I worry about compilers rewriting my code.


gcc has -fwrapv and -ftrapv to change signed integer overflow
behaviour.

One baffling example where gcc rewrites code is when
conditionals depend on signed integer overflow:

$ cat xx.c
#include 

int foo(int a)
{
assert(a + 100 > a);
return a;
}

int bar(int a)
{
if (a + 100 > a)
a += 100;
return a;
}
$ gcc -Wall -Wextra -fomit-frame-pointer -c xx.c
$ objdump -dr xx.o

xx.o: file format elf32-i386

Disassembly of section .text:

 :
  0:   8b 44 24 04 mov0x4(%esp),%eax
  4:   c3  ret

0005 :
  5:   83 44 24 04 64  addl   $0x64,0x4(%esp)
  a:   8b 44 24 04 mov0x4(%esp),%eax
  e:   c3  ret


The assert and the condition were just dropped
by gcc -- without any warning.

gcc-4.2 will add -fstrict-overflow and -Wstrict-overflow.
http://gcc.gnu.org/gcc-4.2/changes.html


Johannes



This is contrary to C99 standeard annex H2.2 
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf):


"An implementation that defines signed integer types as also being modulo need
not detect integer overflow, in which case, only integer divide-by-zero need
be detected."

So if it doesn't properly defines wrapping it has to detect integer 
overflow, right?


gcc does niether with that optimization :-(

Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch] CFS scheduler, -v8

2007-05-08 Thread Esben Nielsen



On Tue, 8 May 2007, Peter Williams wrote:


Esben Nielsen wrote:



 On Sun, 6 May 2007, Linus Torvalds wrote:

> 
> 
>  On Sun, 6 May 2007, Ingo Molnar wrote:
> > 
> >  * Linus Torvalds <[EMAIL PROTECTED]> wrote:
> > 
> > >  So the _only_ valid way to handle timers is to
> > >  - either not allow wrapping at all (in which case "unsigned" is 
> > >  better,

> > > since it is bigger)
> > >   - or use wrapping explicitly, and use unsigned arithmetic (which is
> > > well-defined in C) and do something like "(long)(a-b) > 0".
> > 
> >  hm, there is a corner-case in CFS where a fix like this is necessary.
> > 
> >  CFS uses 64-bit values for almost everything, and the majority of 
> >  values

> >  are of 'relative' nature with no danger of overflow. (They are signed
> >  because they are relative values that center around zero and can be
> >  negative or positive.)
> 
>  Well, I'd like to just worry about that for a while.
> 
>  You say there is "no danger of overflow", and I mostly agree that once

>  we're talking about 64-bit values, the overflow issue simply doesn't
>  exist, and furthermore the difference between 63 and 64 bits is not 
>  really

>  relevant, so there's no major reason to actively avoid signed entries.
> 
>  So in that sense, it all sounds perfectly sane. And I'm definitely not
>  sure your "292 years after bootup" worry is really worth even 
>  considering.

>

 I would hate to tell mission control for Mankind's first mission to
 another
 star to reboot every 200 years because "there is no need to worry about
 it."

 As a matter of principle an OS should never need a reboot (with exception
 for upgrading). If you say you have to reboot every 200 years, why not
 every 100? Every 50?  Every 45 days (you know what I am referring to
 :-) ?


There's always going to be an upper limit on the representation of time.
 At least until we figure out how to implement infinity properly.


Well you need infinite memory for that :-)
But that is my point: Why go into the problem of storing absolute time 
when you can use relative time?







>  When we're really so well off that we expect the hardware and software
>  stack to be stable over a hundred years, I'd start to think about issues
>  like that, in the meantime, to me worrying about those kinds of issues
>  just means that you're worrying about the wrong things.
> 
>  BUT.
> 
>  There's a fundamental reason relative timestamps are difficult and 
>  almost

>  always have overflow issues: the "long long in the future" case as an
>  approximation of "infinite timeout" is almost always relevant.
> 
>  So rather than worry about the system staying up 292 years, I'd worry
>  about whether people pass in big numbers (like some MAX_S64 
>  approximation)

>  as an approximation for "infinite", and once you have things like that,
>  the "64 bits never overflows" argument is totally bogus.
> 
>  There's a damn good reason for using only *absolute* time. The whole
>  "signed values of relative time" may _sound_ good, but it really sucks 
>  in

>  subtle and horrible ways!
>

 I think you are wrong here. The only place you need absolute time is a for
 the clock (CLOCK_REALTIME). You waste CPU using a 64 bit
 representation when you could have used a 32 bit. With a 32 bit
 implementation you are forced to handle the corner cases with wrap around
 and too big arguments up front. With a 64 bit you hide those problems.


As does the other method.  A 32 bit signed offset with a 32 bit base is just 
a crude version of 64 bit absolute time.


64 bit is also relative - just over a much longer period.
32 bit signed offset is relative - and you know it. But with 64 people 
think it is absolute and put in large values as Linus said above. With 32 
bit future developers will know it is relative and code for it. And they 
will get their corner cases tested, because the code soon will run 
into those corners.






 I think CFS would be best off using a 32 bit timer counting in micro
 seconds. That would wrap around in 72 minuttes. But as the timers are
 relative you will never be able to specify a timer larger than 36 minuttes
 in the future. But 36 minuttes is redicolously long for a scheduler and a
 simple test limiting time values to that value would not break anything.


Except if you're measuring sleep times.  I think that you'll find lots of 
tasks sleep for more than 72 minutes.


I don't think those large values will be relavant. You can easily cut 
off sleep times at 30 min or even 1 min. But you need to 

Re: [patch] CFS scheduler, -v8

2007-05-08 Thread Esben Nielsen



On Tue, 8 May 2007, Johannes Stezenbach wrote:


On Tue, May 08, 2007, Esben Nielsen wrote:


This is contrary to C99 standeard annex H2.2
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf):

"An implementation that defines signed integer types as also being modulo
need
not detect integer overflow, in which case, only integer divide-by-zero need
be detected."

So if it doesn't properly defines wrapping it has to detect integer
overflow, right?


No. Annex H (informative!) only talks about LIA-1 conformance.

C99 isn't LIA-1 conformant. H2.2 describes what an implementation
might do to make signed integers LIA-1 compatible.


"The signed C integer types int, long int, long long int, and the 
corresponding unsigned types are compatible with LIA-1."


I read this as any C99 implementation must be compatible. I would like to 
see LIA-1 to check.




, which is
what gcc does with -fwarpv or -ftrapv.



Yes, either or: Either wrap or trap.


At least that's how I understand it, the C99 standard
seems to have been written with the "it was hard to
write, so it should be hard to read" mindset. :-/

I still don't know _why_ signed integer overflow behaviour
isn't defined in C. It just goes against everyones expectation
and thus causes bugs.


Because it is hard to make wrapping work on non twos complement 
architectures. Then it is easier to trap.


Esben




Johannes


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Hi, I have one question about rt_mutex.

2007-05-10 Thread Esben Nielsen



On Thu, 10 May 2007, Li Yu wrote:


Hi, Steven.

Nice to meet you again.

I have read the rt-mutex-design.txt that you wrote. That is excellent
description of rt_mutex. But I have a question for rt_mutex.

As you said:



Now since mutexes can be defined by user-land applications, we don't

want a DOS

type of application that nests large amounts of mutexes to create a large
PI chain, and have the code holding spin locks while looking at a large
amount of data. So to prevent this, the implementation not only implements
a maximum lock depth, but also only holds at most two different locks at a
time, as it walks the PI chain. More about this below.


After read the implementation of rt_mutex_adjust_prio_chain(), I found
the we really require maximin lock depth (1024 default), but I can not
see the check for more same locks duplication. Does this doc is
inconsistent with code?

Thanks in advanced.

Good luck.
- Li Yu


At the label "again:" inside rt_mutex_adjust_prio_chain() no spinlocks are 
held. That is the kernel can reschedule at that point in the loop. So if 
you as a priority X task try to take a lock you will not delay any higher 
priority than X by more than the amount of time it takes to go 
around in the loop once. The max lock depth is just an extra safety.


The whole idea in a priority based real-time system is that the latency on 
priorty X only depends on what is going on at priority X and higher 
(including interrupt handlers). What ever goes on at lower priority can only

interfere with a fixed, predetermined, small amount of jitter.

Esben


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 20/30] Use menuconfig objects - ARCNET

2007-04-11 Thread Esben Nielsen

On Tue, 10 Apr 2007, Jan Engelhardt wrote:



(Wow, not a single MODULE_AUTHOR line in drivers/net/arcnet/ ...)



ArcNet is old. Almost nobody is using it anymore. I used it at my former 
job, since we used it as control network. A lot of companies still does 
quitely, but not in combination with Linux.




Use menuconfigs instead of menus, so the whole menu can be disabled at
once instead of going through all options.

Signed-off-by: Jan Engelhardt <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5/drivers/net/arcnet/Kconfig
===
--- linux-2.6.21-rc5.orig/drivers/net/arcnet/Kconfig
+++ linux-2.6.21-rc5/drivers/net/arcnet/Kconfig
@@ -2,10 +2,8 @@
# Arcnet configuration
#

-menu "ARCnet devices"
+menuconfig ARCNET
depends on NETDEVICES && (ISA || PCI)


Why does it depend on ISA || PCI ? People tend to forget the PCMCIA 
driver. And in principle you could enable the ArcNet framework without 
using any of the drivers in the kernel tree.



Esben

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 20/30] Use menuconfig objects - ARCNET

2007-04-11 Thread Esben Nielsen



On Wed, 11 Apr 2007, Jan Engelhardt wrote:



On Apr 11 2007 10:30, Esben Nielsen wrote:

On Tue, 10 Apr 2007, Jan Engelhardt wrote:


(Wow, not a single MODULE_AUTHOR line in drivers/net/arcnet/ ...)


ArcNet is old. Almost nobody is using it anymore. I used it at my
former job, since we used it as control network. A lot of companies
still does quitely, but not in combination with Linux.


Let me correct myself: I have only know of one company using the Linux 
ArcNet combination in production. But there must be other companies using 
ArcNet playing with Linux in R&D.




So send some removal patches :)


No. Somebody (like me) in those companise use them sporadically for their 
PCI/PCMCIA cards. The vendor have some basic Windoze drives. Missing Linux 
drivers should not be yet another obstacle for using Linux.


They might also one day want to run Linux on their embedded platform - 
especially with preempt-realtime. When I at my former job got Linux to boot

at our embedded platform their I could almost immediately use the onboard
ArcNet controller. For the propriatary OS otherwise used on those platforms it
took many man weeks to write a driver.

Keeping support for old devices in the kernel tree is a good for Linux. It 
should take too long updating it for API changes and although they might 
not work, but if the occational user is capable enough, he can soon fix 
them. If they are are removed the occational user will choose another OS.


Esben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/