Re: [9fans] NIX experience
> What would you like to know? I also have an initial broken port to > 9front > if you'd like to try to bring it to life. Thank you for the response, it was quite an interesting read. Unfortunately, I'm not a great coder, so I can't take you up on that offer. You mentioned that GPUs took over, but not all problems can run on GPUs. There may still be a general interest in this (I know I am interested). How was it presented to the users? Could they query to see the current utilization of the system? How did you know that a job completed (or failed)? You mentioned it was a shared memory system, meaning it was in essence a "very large SMP machine" from the view of the OS? Could the NIX system only work with shared memory systems like that, or was it possible to take many smaller independent systems and combine their resources? -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/Tacd1c3c541277b7f-M3d221a244a4e706d0a7f15d7 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
[9fans] NIX experience
Hello, I was wondering if anyone has any experience using the NIX HPC environment? Traditionally, there's a scheduler that keeps track of the resources in the system, what nodes are busy and with which jobs, how much ram is in use and such. I'm finding very sparse information on the NIX project, so I turn here to ask if anyone has actually used it and can share some details? The site with the most information on it seems to be https://lsub.org/nix/ but the research papers that I have found there are not too detailed (perhaps I've only found previews?). Any extra information would be appreciated. -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Me7617ff3f90585d71e6e7779 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] NIX experience
Thank you for the response, it was quite an interesting read. Unfortunately, I'm not a great coder, so I can't take you up on that offer. You mentioned that GPUs took over, but not all problems can run on GPUs. There may still be a general interest in this (I know I am interested). How was it presented to the users? Could they query to see the current utilization of the system? How did you know that a job completed (or failed)? You mentioned it was a shared memory system, meaning it was in essence a "very large SMP machine" from the view of the OS? Could the NIX system only work with shared memory systems like that, or was it possible to take many smaller independent systems and combine their resources? Anything you can say about the actual usage would be quite interesting - what kind of applications are we talking? Was there commercial interest? -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Mad958dee6fd001441970abf8 Delivery options: https://9fans.topicbox.com/groups/9fans/subscription
Re: [9fans] NIX experience
On Monday, December 30th, 2024 at 5:25 PM, Ron Minnich wrote: > Thanks for the good questions. This has been a very interesting thread and I'm very glad you've given us your rare insights here, thank you! There are still some points I'd like to have fleshed out, but I'm not sure how to put it succinctly. > BTW, there are 512- and 1024-core risc-v systems in the works, and NIX > looks pretty good for that kind of CPU. > > > How did you know that a job completed (or failed)? > > Just as with a process; you read /proc/pid/wait. It was very transparent. > > > You mentioned it was a shared memory system, meaning it was in essence a > > "very large SMP machine" from the view of the OS? > > Yes, with a slight change in view: the AC looked like a CPU, and there > was shared memory, and it was coherent, but the AC scheduling was a > different bit of code than normal process scheduling. > > > Could the NIX system only work with shared memory systems like that, or was > > it possible to take many smaller independent systems and combine their > > resources? > > My original idea in spring 2011, after talking to Shalf at LBL, was > that we might have CPUs with hardware FIFOs communicating. When I got > to lsub, Charles made the point: "you have a shared memory machine, > might as well use it" -- and that made a lot of sense. So we used > shared memory, and avoided a lot of headaches that Charles, jmk, Eric, > and I had dealt with on Blue Gene. When I first heard about NIX, didn't know the architecture, but thought it was many systems working as one. That made a certain amount of sense to me, given the namespaces of Plan9. You could potentially map in the cores of as many independent systems as you needed to your one task. As you said above, NIX looks quite good for hundreds of cores or more on a single processor. Do you think the current code could be used in any way to make use of more than a single system? -- 9fans: 9fans Permalink: https://9fans.topicbox.com/groups/9fans/T7692a612f26c8ec5-Mdb6ffb1cae13ab4498d49f9f Delivery options: https://9fans.topicbox.com/groups/9fans/subscription