Dear Takanori,
Thank you very much for tips.
I checked the node running the job and there seem to be no memory issue
but out of 2 MPi X 30 threads apparently only two threads are running at
100% CPU usage and the rest at about 5% each.
Our file system is based on ZFS and we use --sbs option. Previously when
we tried to use many Mpis in polishing, disk I/O seemed to become slow.
Is that expected?
Is it useful for polishing to have number of frames divisible by number
of threads like in motioncor?
Or could the problem be related to --only_do_unfinished issue discussed
in: https://github.com/3dem/relion/issues/985 ?
Thanks for info!
Best
Leonid
On 14.11.24 13:58, Takanori Nakane wrote:
[EXTERNAL EMAIL - USE CAUTION when clicking links or attachments]
Hi,
> We normally don't use many Mpi to limit disk load - perhaps at
"combining frames" step it is OK to use many Mpi and less threads?
In general, having more MPI processes is more efficient,
provided that your file system can handle many parallel reads.
If you are using CephFS, GPFS, etc, this is the case.
If you are using a simple disk, or a RAID with not many disks,
that is surely limiting.
The combining step is more I/O bound than the earlier,
trajectory estimation step. If you cannot increase the number of
MPI processes in the early step, you cannot with this as well.
> Could it be memory leak problem similar to "subtracting particles"
First, that is not memory leak in RELION. It is caused by some
bad implementation/parameters of malloc.
You can check this by looking at the memory consumption.
Unlike subtraction, Polish supports continuation of a job.
In case of doubts, you can kill and continue the job.
Best regards,
Takanori Nakane
On 11/14/24 21:49, Leonid Sazanov wrote:
Hi, we are having some trouble finishing the polishing step. There
are about 11k movies from K3 super-resolution pixel 0.53,
motion-corrected in relion5 binned pixel 1.06, about 80 particles per
image, resolution of about 2.3 A. We use 2 Mpi with 30 threads each
and 16 GB memory per thread. The process seems to get slower and
slower as it goes via the last "combining frames" step - initially it
went though about 7k movies in one day and now at the third day it is
at 8k movies and predicts at least another 5 days. We normally don't
use many Mpi to limit disk load - perhaps at "combining frames" step
it is OK to use many Mpi and less threads? Could it be memory leak
problem similar to "subtracting particles" - we do have this problem
on our system and could try to fix this with a previous solution for
subtraction. Any other options?
Many thanks for any info!
Best
Leonid
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/
--
Prof. Leonid Sazanov FRS
IST Austria
Am Campus 1
A-3400 Klosterneuburg
Austria
Phone: +43 2243 9000 3026
E-mail: saza...@ist.ac.at
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/