On Tue, Dec 31, 2024 at 11:29:43PM -0800, Ron Minnich wrote:
> Timesharing core (TC) processes schedule onto AC (app core) several
> ways, one of them being execac.
> 

Is the number of TC fixed, or is it at least one TC and the number
can increase if needed (or, put it differently, can a AC, if needed,
switch to a TC and vice-versa)?

> execac is exec with benefits. The process is started via the normal
> exec path, then you can think of it as suspending on the kernel core,
> and resuming on an AC.
> The way that's done: AC is running a "nanokernel" that is spinning on
> a function pointer in a struct in something called the ICC (Inter-Core
> Communication).
> 
> So TC fills in args, then sets the function pointer, then enters a
> loop calling sched() and checking the pointer. Nothing about this
> code, and how it works with sched(), is very special.
> 
> AC sees that the pointer is non-zero, sets up args and other state,
> and exits kernel mode to that address. Your proc is now on the AC, and
> will run until it decides not to (system call, fault, whatever).
> 
> When the process on the AC needs to do something other than compute,
> it exits user mode to the nanokernel. There is nothing special about
> this user mode exit, it can
> happen as a system call, page fault, or for some other reason. There
> are NO changes to libc, or any other source.
> 
> The AC saves state, zeros the function pointer, re-enters the
> nanokernel spinloop. The AC has no memory of what it was running;
> resuming the same process, or some other process, looks the same to
> the AC. There is a check in the nanokernel to make sure the stack is
> not growing without end.
> 
> The TC, as part of normal scheduling, schedules the kernel process
> code on the TC, which when it runs, is doing runs the "is the function
> pointer 0" test and, if so, resumes the process on the TC. Whatever
> needs to be done gets done, and ... resume scheduling on the AC.
> Unless there is a note or some other reason to not go back to the AC.
> 
> What happens if the ACs all go away while the process is on the TC? It
> can keep running on the TC. One of our goals was that NIX processes
> always work, even if an AC is not available for some reason.
> 
> A couple of things here:
> - NIX required a very small, but very careful, set of changes. But the
> size of the code? Very small. Just tricky in places. Gorka and I spent
> a fair bit of time getting that ICC stuff right.
> - Note that the TC offload to AC uses very little change to the sched().
> - The nanokernel is less than 60 lines or so IIRC
> - I have a standard benchmark that measures kernel noise. It's never
> perfect, save on Blue Gene. On NIX, procs on ACs had zero noise. The
> noise was perfect. That never happens ...
> 
> So, as I'm remembering this, I'm  remembering my surprise at how well
> it worked, and how quickly we got it going.
> 

Had you the opportunity to measure how "bad" some application
workloads could be because the number of TC---after the
initialization period---exceeded largely the number of AC with a not zero
pointer?

Because I imagine that the OS can be as smart as possible, if the
applications are badly programmed or if the data is not optimally
organized, no-one can hope solving with the OS a problem that relies on the
application side...

> But: you can see that all this kind of needs some sort of shared
> memory. I think for non-shared-memory environments, it would have to
> be done differently.
> 

Theoretically, could a machine with different kind of cores, perhaps with
differing architectures (specialized cores) but sharing at least
with a common MMU read/write (data) pages (for the kernel shared
data: locks and so on) be possible, with a system such as NIX in fact
scheduling to the matching kind of AC core for the task to be run?

There are cross-compilers, because they deal with bytes stream and
need not execute what they produce. There can be a "cross-OS", that
manage tasks without having to execute directly them (as long as
sharing of the structures for kernel management is possible).

> Also, the namespaces had no effect, good or bad, on all this working.
> They did not come into the picture, and whether a proc was on AC or
> TC, namespaces were not impacted.
> 

I wish the best to the readers for 2025 and thank Ron Minnich, Paul Lalonde and
all the participants to this thread who saved the 2024 ratio
signal/noise of the list :-)

-- 
        Thierry Laronde <tlaronde +AT+ kergis +dot+ com>
                     http://www.kergis.com/
                    http://kertex.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C

------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Mf5c9ef6c95a0e34cab5b7860
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Reply via email to