date:20230419

sched_lock() should be completely removed

2023-04-19 Thread tt tt

Hi, all:

The function sched_lock() is used to disable preemption,
I also see a lot about sched_lock() discussion in the community,
and based on my own experience,I believe that sched_lock() is very
easy to misuse and difficult to understand ,  especially in SMP where
sched_lock() has caused many bugs.

I think sched_lock() should be completely removed
1 most case we can replace sched_lock() with enter_critical_section(),
at the sametime we add sched_lock()'s ability to the enter_critical_section()
2 we can replace sched_lock() with mutex ,sem or spinlock。
3 directly removal

Any comments are very welcome!

BR!

Re: [ANNOUNCE] Apache NuttX 12.1.0 released

2023-04-19 Thread Tomek CEDRO

CONGRATZ!! :-)

https://nuttx.apache.org/download


https://nuttx.apache.org/releases/12.1.0


--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Ville Juven

Hi,

Thanks for the responses. The scalar read / write functions will work when
accessing singular fields from the semaphore, and such an access will (or
at least should be) always be correctly aligned so there is no risk in the
scalar value being split by a page boundary -> no extra work is required,
simply obtaining the kernel virtual address for the page + offset will
suffice.

However, as I said, there are some things in the sem_t structure whose
usage is ubiquitous and I don't know if the scalar access macros will be
enough, i.e. I don't fully understand how they work / are supposed to work.

struct sem_s
{
  volatile int16_t semcount;
  uint8_t flags;
  dq_queue_t waitlist; // This is accessed by the system scheduler when
traversing TSTATE_WAIT_SEM-list

// These are also accessed from all over
#ifdef CONFIG_PRIORITY_INHERITANCE
# if CONFIG_SEM_PREALLOCHOLDERS > 0
  FAR struct semholder_s *hhead;
# else
  struct semholder_s holder[2];
# endif
#endif
};

The waitlist doubly linked queue is used by the scheduler to implement the
TSTATE_WAIT_SEM list. This means the semaphore object is sporadically
accessed when the scheduler accesses tasks in that list. This means that
ANY user semaphore from any process must be accessible by the kernel to
traverse the list. Ok, we can access dqueue->head via pointer aligned
access, so that can be patched.

The next is the semholder list. I'm not even sure how much patching that
will need to work with MMU. The static holder[] allocations come from user
memory, while the "hhead" allocations come from a kernel pool. Ok, maybe
the test for CONFIG_SEM_PREALLOCHOLDERS can also test for MMU, if MMU is in
use, do not use the static holder[] list.

Finally, we have a reference to a (I don't know which, yet) semaphore in
struct semholder_s too. Maybe someone can help me understand what this
reference is for ?

A more detailed description of my findings is in the github issue I created
for this: https://github.com/apache/nuttx/issues/8917

Also, what Jukka said above about the security aspect is a perfectly valid
point. Now the dq waitlist and the semaphore holder list are in user
memory, which means the user process can mess those lists up, crashing the
kernel.

Br,
Ville Juven

PS. Whatever solution is implemented shall in no way increase code size or
modify functionality in the flat addressing mode(s). This is a requirement
for us as well, we have flat targets that have very limited memory as well.

On Tue, Apr 18, 2023 at 4:23 PM Gregory Nutt  wrote:

> On 4/18/2023 6:58 AM, Nathan Hartman wrote:
> > On Tue, Apr 18, 2023 at 8:52 AM spudaneco  wrote:
> >
> >> Perhaps you could use accessor functions to read/write 8, 16 32, 64 bit
> >> values.  Each scalar value will be properly aligned and, hence, will lie
> >> within a page.
> >
> >
> > And those accessor functions could be macros in FLAT builds where
> functions
> > are not required, so no memory or code size increase.
> >
> > Cheers
> > Nathan
> >
> The inefficiency is that each access would require a virtual to physical
> address look-up.  That could be alleviated by providing the base
> physical address of the the page and the offset of the sem_t iin the
> page as a parameter, maybe like:
>
>  struct paginfo_t pageinfo;
>
>  get_pageinfo(sem, &pageinfo);
>
>  value = get_pagevalue16(&pageinfo, &sem->field);
>
> If the field like in the following page, then get_pagevalue16 would have
> to work harder.  But the usual case, where the sem_t lies wholly in the
> page would be relatively fast.
>
> This, however, would not macro-ize as well.
>
> Not that no special alignment of the sem_t is required if you access by
> scalar field.
>
>
>

Re: sched_lock() should be completely removed

2023-04-19 Thread Gregory Nutt


> I think sched_lock() should be completely removed
> most case we can replace sched_lock() with enter_critical_section(),
> at the sametime we add sched_lock()'s ability to the 
enter_critical_section()

> we can replace sched_lock() with mutex ,sem or spinlock。
> directly removal

I would not remove sched_lock().  I think is it very important in a real 
time system.  Most RTOSs support something like this internally.


enter_critical_section() is not a replacement for sched_lock() because 
it disables interrupts and, hence, harms real time performance.  
Interrupts must be disabled as little as possible because it destroys 
deterministic OS behavior.  sched_lock(), on the other hand, has little 
or now real time performance implications if used properly.


sched_lock() postpones a task from losing the CPU until sched_unlock() 
is called.  Misuses would include locking the scheduler for too long or, 
in the SMP case, misunderstanding the functionality of sched_lock() and 
assuming that it is always equivalent to a critical section.  Locking 
the scheduler for too long is probably not as damaging to performance 
has holding the critical section too long!


Both are very light weight in the single CPU modes.  But in SMP, 
enter_critical_section() is huge performance pig and should be avoided 
when at all possible.  sched_lock() is much more real time friendly in SMP.


Your primary complaint and the thing that no one argues with is that
nxsched "sched_lock() is very easy to misuse and difficult to 
understand."  However, that is not a sane justification to remove an 
important OS function.  You don't have a valid, technical justification 
for doing that.


However, we should address your concern (but NOT by removing 
sched_lock()).  How can be make sched_lock() more intuititive and less 
prone to miunderstanding by naive users?


I think that all we can do is change the naming or perhaps modify semantics.

First of all, sched_lock() is a non-standard, internal OS and should 
never be exposed to applications.  It is for the use by informed 
developers within the OS only.  The naive user should not even be aware 
that it exists.  So at a minimum, (1) it should be renamed 
nxsched_lock(), (2) the user syscall interface should be removed from 
syscall/ and include/sys, and (3) the prototype should be removed from 
include/sched.h and moved to include/nuttx/sched.h


The naming could be further improved.  The name 
nxsched_disable_preemption() and nxsched_enable_preemption(), while a 
bit long, do make the functionality of the internal interfaces much 
clearer and those are the names that I would recommend.

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Gregory Nutt


On 4/19/2023 3:33 AM, Ville Juven wrote:

However, as I said, there are some things in the sem_t structure whose
usage is ubiquitous and I don't know if the scalar access macros will be
enough, i.e. I don't fully understand how they work / are supposed to work.
...


I believe that all of the lists and structures that you are concerned 
with are already allocated from  kernel space memory, but I have not 
verified that it all cases.  Certainly, they are things that would never 
be accessed from user space.  The user memory sem_t "container" just 
holds references to kernel space allocations.


If I am correct, then you should not have any problems.  But, as I said, 
I did not verify all of that.

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Gregory Nutt





However, as I said, there are some things in the sem_t structure whose
usage is ubiquitous and I don't know if the scalar access macros will be
enough, i.e. I don't fully understand how they work / are supposed to 
work.

...


I believe that all of the lists and structures that you are concerned 
with are already allocated from  kernel space memory, but I have not 
verified that it all cases.  Certainly, they are things that would 
never be accessed from user space.  The user memory sem_t "container" 
just holds references to kernel space allocations.


If I am correct, then you should not have any problems.  But, as I 
said, I did not verify all of that.
I would say that, as a rule, nothing in user space should have any 
knowledge or access to the internals of a sem_t.  Looking at 
libs/libc/semaphore, I don't believe that there is any,.

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Ville Juven

Hi, thanks again for the responses and bearing with me

> I believe that all of the lists and structures that you are concerned
> with are already allocated from  kernel space memory, but I have not

The lists do exist in user memory, although I agree that they should not.

> verified that it all cases.  Certainly, they are things that would
> never be accessed from user space

Absolutely correct, the user does not need them nor should they be
accessed/accessible by the user. However, they can be accessed, either
intentionally, or by accident (e.g. via memcpy, memset, etc). Which means
the user really can mess the lists up, causing the kernel to crash.

> I would say that, as a rule, nothing in user space should have any
> knowledge or access to the internals of a sem_t.  Looking at
> libs/libc/semaphore, I don't believe that there is any,.

You are correct, the API does not access or provide access to those kernel
fields, but when a user instantiates sem_t, those kernel control fields
also go to user space. They are also accessible to the user process via the
named struct fields e.g. sem->waitlist.

The following stupid user space example will compile and run just fine as
the full composition of sem_t is visible to the user. The worker thread can
access sem->waitlist and completely mess up the doubly linked queue, from
user space.

#include 

sem_t sem;

void worker(void *arg)
{
  sem.waitlist = (void *)0xDEADBEEF;
  sem_post(&sem);
}

int main(int argc, char *argv[])
{
  sem_init(&sem, 0, 0);
  ...
  spawn the worker thread here!
  ...
  sem_wait(&sem);
  return 0;
}

The hhead objects are an exception, they are allocated from kernel space.
But struct semholder_s holder[2]; is defined inside sem_t, thus if
CONFIG_SEM_PREALLOCHOLDERS == 0, whenever sem_t is instantiated those go to
user space too.
hhead is not safe either, the user can mess up the priority inheritance
list as well, by setting sem.hhead = (void *)0x592959;

If there is something I'm missing I apologize for the spam, but I have
indeed verified that the lists are not accessible by the kernel unless the
correct process is running, which means the correct user mappings are
active and the user memory is visible to the kernel.

Br,
Ville Juven

On Wed, Apr 19, 2023 at 4:53 PM Gregory Nutt  wrote:

>
> >> However, as I said, there are some things in the sem_t structure whose
> >> usage is ubiquitous and I don't know if the scalar access macros will be
> >> enough, i.e. I don't fully understand how they work / are supposed to
> >> work.
> >> ...
> >
> > I believe that all of the lists and structures that you are concerned
> > with are already allocated from  kernel space memory, but I have not
> > verified that it all cases.  Certainly, they are things that would
> > never be accessed from user space.  The user memory sem_t "container"
> > just holds references to kernel space allocations.
> >
> > If I am correct, then you should not have any problems.  But, as I
> > said, I did not verify all of that.
> I would say that, as a rule, nothing in user space should have any
> knowledge or access to the internals of a sem_t.  Looking at
> libs/libc/semaphore, I don't believe that there is any,.
>
>
>

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Ville Juven

Hi, just to add that in the normal case obviously the correct process *is*
running when the semaphore is accessed (sem_wait(), sem_post()), but there
is one exception: sem_waitirq() which is called either by the watchdog ISR
or by the signal dispatch logic. In this case most likely the correct
process is *not* running, meaning access to the sem fields needed by the
kernel either go to the wrong address environment, or cause a page fault.

So the control fields in sem_t are tied to the process that instantiates
the semaphore, and if it messes the fields up only the process itself is
affected (the process can be killed for being naughty but the kernel
survives). But in those two exception cases, the kernel needs access to the
semaphore.

Br,
Ville Juven

On Wed, Apr 19, 2023 at 5:42 PM Ville Juven  wrote:

> Hi, thanks again for the responses and bearing with me
>
> > I believe that all of the lists and structures that you are concerned
> > with are already allocated from  kernel space memory, but I have not
>
> The lists do exist in user memory, although I agree that they should not.
>
> > verified that it all cases.  Certainly, they are things that would
> > never be accessed from user space
>
> Absolutely correct, the user does not need them nor should they be
> accessed/accessible by the user. However, they can be accessed, either
> intentionally, or by accident (e.g. via memcpy, memset, etc). Which means
> the user really can mess the lists up, causing the kernel to crash.
>
> > I would say that, as a rule, nothing in user space should have any
> > knowledge or access to the internals of a sem_t.  Looking at
> > libs/libc/semaphore, I don't believe that there is any,.
>
> You are correct, the API does not access or provide access to those kernel
> fields, but when a user instantiates sem_t, those kernel control fields
> also go to user space. They are also accessible to the user process via the
> named struct fields e.g. sem->waitlist.
>
> The following stupid user space example will compile and run just fine as
> the full composition of sem_t is visible to the user. The worker thread can
> access sem->waitlist and completely mess up the doubly linked queue, from
> user space.
>
> #include 
>
> sem_t sem;
>
> void worker(void *arg)
> {
>   sem.waitlist = (void *)0xDEADBEEF;
>   sem_post(&sem);
> }
>
> int main(int argc, char *argv[])
> {
>   sem_init(&sem, 0, 0);
>   ...
>   spawn the worker thread here!
>   ...
>   sem_wait(&sem);
>   return 0;
> }
>
> The hhead objects are an exception, they are allocated from kernel space.
> But struct semholder_s holder[2]; is defined inside sem_t, thus if
> CONFIG_SEM_PREALLOCHOLDERS == 0, whenever sem_t is instantiated those go to
> user space too.
> hhead is not safe either, the user can mess up the priority inheritance
> list as well, by setting sem.hhead = (void *)0x592959;
>
> If there is something I'm missing I apologize for the spam, but I have
> indeed verified that the lists are not accessible by the kernel unless the
> correct process is running, which means the correct user mappings are
> active and the user memory is visible to the kernel.
>
> Br,
> Ville Juven
>
> On Wed, Apr 19, 2023 at 4:53 PM Gregory Nutt  wrote:
>
>>
>> >> However, as I said, there are some things in the sem_t structure whose
>> >> usage is ubiquitous and I don't know if the scalar access macros will
>> be
>> >> enough, i.e. I don't fully understand how they work / are supposed to
>> >> work.
>> >> ...
>> >
>> > I believe that all of the lists and structures that you are concerned
>> > with are already allocated from  kernel space memory, but I have not
>> > verified that it all cases.  Certainly, they are things that would
>> > never be accessed from user space.  The user memory sem_t "container"
>> > just holds references to kernel space allocations.
>> >
>> > If I am correct, then you should not have any problems.  But, as I
>> > said, I did not verify all of that.
>> I would say that, as a rule, nothing in user space should have any
>> knowledge or access to the internals of a sem_t.  Looking at
>> libs/libc/semaphore, I don't believe that there is any,.
>>
>>
>>

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Gregory Nutt





I believe that all of the lists and structures that you are concerned
with are already allocated from  kernel space memory, but I have not

The lists do exist in user memory, although I agree that they should not.


Then that is a bug.  It is a serious security error if OS internal data 
structures are exposed and accessible in user space.  That must be fixed 
in my opinion.


People who only work in the FLAT address environment violate basic OS 
security like this all over the code.  And reviews who only use FLAT 
addressing cannot spot the security errors.

Re: [Breaking change] Move nxmutex to sched

2023-04-19 Thread Gregory Nutt





Hi, just to add that in the normal case obviously the correct process *is*
running when the semaphore is accessed (sem_wait(), sem_post()), but there
is one exception: sem_waitirq() which is called either by the watchdog ISR
or by the signal dispatch logic. In this case most likely the correct
process is *not* running, meaning access to the sem fields needed by the
kernel either go to the wrong address environment, or cause a page fault.

So the control fields in sem_t are tied to the process that instantiates
the semaphore, and if it messes the fields up only the process itself is
affected (the process can be killed for being naughty but the kernel
survives). But in those two exception cases, the kernel needs access to the
semaphore.


This is the best argument that I have heard for splitting the semaphore 
data.


Another option would be keep the kernel addressing within the sem_t.

sched_lock() should be completely removed

Re: [ANNOUNCE] Apache NuttX 12.1.0 released

Re: [Breaking change] Move nxmutex to sched

Re: sched_lock() should be completely removed

Re: [Breaking change] Move nxmutex to sched

Re: [Breaking change] Move nxmutex to sched

Re: [Breaking change] Move nxmutex to sched

Re: [Breaking change] Move nxmutex to sched

Re: [Breaking change] Move nxmutex to sched

Re: [Breaking change] Move nxmutex to sched

10 matches

Site Navigation

Mail list logo

Footer information