I think Xiang refers to how Linux does it. It simply creates a new "waitobj" variable into the kernel stack by declaring it inside the sem_wait()-equivalent function. The wait object is created into the kernel stack, ...
Yep, I was not thinking of allocating as a local variable. So I had a lot of questions. The kernel stack does not seem to have any of the amenities of the user stack: TLS, allocated stack frames, stack checking (?), etc.
Allocating as a local variable is a very good idea.