Could this be a function of the R script you're trying to run, or are
you saying you get this error running the same script which works at
other times?
On 3/29/21 7:47 AM, Simon Andrews wrote:
I've got a weird problem on our slurm cluster. If I submit lots of R
jobs to the queue then as soon as I've got more than about 7 of them
running at the same time I start to get failures, saying:
/bi/apps/R/4.0.4/lib64/R/bin/exec/R: error while loading shared
libraries: libpcre2-8.so.0: cannot open shared object file: No such file
or directory
..which makes no sense because that library is definitely there, and
other jobs on the same nodes worked both before and after the failed
jobs. I recently ran 500 identical jobs and 152 of them failed in this way.
There are no errors in the log files on the compute nodes where this
failed and it happens across multiple nodes so it's not a single one
being strange. The R binary is on an isilon network share, but the
libpcre2 library is on the local disk for the node.
Anyone come across anything like this before? Any suggestions for fixes?
Thanks
Simon.
This message is from an external sender. Learn more about why this
matters. <https://ut.service-now.com/sp?id=kb_article&number=KB0011401>