I have a script that runs as a cron job every minute (on Ubuntu 10.10 and R 
2.11.1), querying a database for new data. Most of the time it takes a few 
seconds to run, but once in while it takes more than a minute and the next run 
starts (on the same data) before the previous one has finished. In extreme 
cases this will fill up memory with a large number of runs of the same script 
on the same data. My 'solution' has been to create a process id file with the 
currently running script, first checking whether there is another process id 
file and whether that process is still running. I use the following code:

pid <- max(system("pgrep -x R", intern = TRUE))
if (file.exists("/var/run/myscript.pid")) {
rm(pid)
pid <- read.table("/var/run/myscript.pid")[[1]]
if (length(system(paste("ps -p", pid), intern = TRUE)) != 2) {
stop("Myscript is already running in another process.")
} else {
pid <- max(system("pgrep -x R", intern = TRUE))
write(pid, "/var/run/myscript.pid")
}
} else {
write(pid, "/var/run/myscript.pid")
}

....my script .....

file.remove("/var/run/myscript.pid")
#The End

The trouble here is that I also have other R scripts running on the same 
system, so while max(system("pgrep -x R", intern = TRUE)) will almost always 
give me the right pid, it is not guaranteed to work. There are two situations 
where it could fail: when the process id numbers round 32000 and start over 
again, and if another process starts up at the same time, the process ids could 
get swapped.

Is there a way to query for the process id of the specific R script, rather 
than all R processes?

Mikkel

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to