On 08/23/2012 11:46 AM, Aldi Kraja wrote:
Thanks to Martin who send an email off the list with among others the
following:
"Probably the file is being corrupted on disk, perhaps it has not yet
been closed before reading is attempted, or some other obscure file
system issue. Probably the key part in your script is 'sleep', which
probably slows disk access enough for your file system to recover
integrity."
His note made me think that something can be with the programs running
in parallel in the same processing server:
There are up to 8 slots for running in parallel 8 jobs in a Linux
server. Many servers are available.
Each job is working with unique file names for R and the corresponding
out files, and also all the objects inside the each R job are defined
unique with their own indices, and I finish the program with q(); n for
not saving the R space at the end of each process.
Let me draw a parallel thinking with SAS jobs. If I run a 8 parallel job
in SAS, SAS although it will use the /tmp directory of that processing
server, each job will have its own pid and they are built unique in
their run and uniquely saving temp data and removed at the end. So 8
parallel jobs in a server and more from different servers, they do not
corrupt each others data.
Now what happens with R? Eight jobs are in parallel, are they processed
in unique spaces of the /tmp harddrive, or all write to ~/.RData ? If
yes, they'll all write to ~/.RData (actually, .RData in the current
directory, see ?Startup).
the last happens although they are uniquely defined, it is quite
possible that in the ~/.RData something is happening with reported error:
Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted
Probably --no-restore --no-save may help, but isn't that dangerous if
yes, that's the right thing to do.
all programs (if I have 1000 of them) write all to ~/.RData? So how R
handles parallel jobs of the same user in regard to the R invocation and
space used for temporary calculations. Do these parallel batch R jobs
Each independent R process gets its own temporary directory, see the
output of tempdir().
mtmorgan@precise-mtmorgan:$ R --silent --vanilla -e "tempdir()"
> tempdir()
[1] "/tmp/RtmpuZ7IkT"
mtmorgan@precise-mtmorgan:$ R --silent --vanilla -e "tempdir()"
> tempdir()
[1] "/tmp/RtmpXnKIVO"
Hmm, but in the 'parallel' package the child processes inherit from the
parent.
> unique(unlist(mclapply(1:4, function(i) tempdir(), mc.cores=4)))
[1] "/tmp/Rtmpkr5w6j"
see each other in the same space or are they for sure in independent
temporary subdirs?
Thanks,
Aldi
On 8/22/2012 3:47 PM, Aldi Kraja wrote:
Hi,
Here is a solution for this type of error:
Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted
Created a script file under the directory where the pgm-s and data
reside and ran there
./script.sh
where script.sh had the following lines
R CMD BATCH ./dc19at1.R ./dc19at1.out
sleep 3
R CMD BATCH ./dc19at2.R ./dc19at2.out
sleep 3
...
etc
The programs ran with no problem.
So what I did is eliminated the full path let's say
R CMD BATCH /a/b/c/dc19at1.R /a/b/c/dc19at1.out
which did not work through bsub or at the command line in a remote
server.
I am not sure what is the "type 98 error" meaning in R?
Anybody knows where the R error types are described?
TIA,
Aldi
On 8/21/2012 10:09 AM, Aldi Kraja wrote:
Hi,
I am running a large number of jobs (thousands) in parallel (linux OS
64bit), R version 2.14.1 (2011-12-22), Platform:
x86_64-redhat-linux-gnu (64-bit). Up to yesterday everything ran fine
with jobs in several blocks (block1, block2 etc) of submission. They
are sent to an LSF platform to handle the parallel submission. Today
I see that only one of the blocks (the 19) has not finished correct:
It reports in the out file:
Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted
Checking through google one had recommended rm ~/.RData
I applied it, but the run again fails, when submitting through SAS
for block 19.
[SAS in macro lang.] %sysexec bsub R CMD BATCH &fullpath./dc19at&j..R
&fullpath.dc19at&j..out ;
[SAS ] %sysexec sleep 3 ;
<looping through jobs in a block>
If I go to the directory where the R program and the data reside and
apply the same command by hand
R CMD BATCH dc19at1.R dc19at1.out
it works with no problem.
But if I use a similar program (SAS program)
that has been executing the same command successfully for thousand of
jobs in other blocks, the jobs for the block 19 fail.
Error: ReadItem: unknown type 98, perhaps written by later version of R
Execution halted
even in the one I just mentioned if I execute by hand goes well.
Do you know what could be the cause of bsub submission to fail? Any
remedy?
Thank you in advance,
Aldi
--
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.