On Thu, Dec 6, 2018 at 10:01 PM Eli V <eliven...@gmail.com> wrote: > On Thu, Dec 6, 2018 at 2:08 AM Loris Bennett <loris.benn...@fu-berlin.de> > wrote: > > > Anyone have some thoughts/ideas about this? Seems like it should be > > > relatively straightforward to implement, though of course using it > > > effectively will require some tuning. > > > > It is not clear to me that this is a good idea. I think it is important > > to inform users about the memory usage of their jobs, so that they can > > estimate their requirements as accurately as possible. If, as a user, I > > find my job runs successfully even if I underestimate the memory needed, > > there is no real incentive for me to be more accurate in future. In > > fact, I may be rewarded for requesting too little RAM, since jobs > > requesting fewer resources may tend to start earlier.
This has been an interesting thread as it's a problem I've been wondering about for some time (but did nothing about it). Indeed, I agree with Loris. At the end of the day, I don't think this is something for slurm to handle. I mean in any workplace, there are resources that exist outside of a computer that have to be managed. And (IMHO), it only works when each member of the organization is courteous to their colleagues. An IT-based solution can help but I worry that it'll end up over-engineering the solution. For example, one might be able to book meeting rooms for 2 hours. But whether someone books a room for 2 hours and doesn't show up or only uses it for one hour and does not adjust the booking to release the second hour is an issue that a booking system can't handle. (Unless you put sensors in the meeting room that records the presence of people, which then updates the system...) Seems to me it might be better (but harder) to ask users to just be courteous to fellow users for some shared resource that is highly sought... > I'm sure it's largely workload dependent. I know for the types of > bioinformatics jobs we run it's very difficult to accurately estimate > the memory usage ahead of time. So we'll submit a job array of say 96 > jobs with the memory set such that even the highest usage job will > finish without being killed by slurm for exceeding it's memory limit. > This then ends up being more memory then is needed for most of the > jobs. So we could noticeably increase our throughput by allowing a > small amount of memory overcommit. But I know the dilemma with bioinformatics as there may be many users who don't quite understand the concept of memory usage. And don't really want to hear an explanation, let alone be courteous to colleagues... (Nooooo, I'm not speaking from personal experience. :-) ) However, a more general question... I thought there is no fool-proof way to watch the amount of memory a job is using. What if within the script they ran another program using "nohup", for example. Wouldn't slurm be unable to include the memory usage of that program? Ray