Hi Takanori,
We have changed now to 8 Mpi x 10 threads and things became a lot
faster. So far seems like no particular problems with disk I/O.
Thanks a lot for your help!
Best
Leonid
On 15.11.24 13:20, Takanori Nakane wrote:
[EXTERNAL EMAIL - USE CAUTION when clicking links or attachments]
Hi,
I checked the node running the job and there seem to be no memory
issue but out of 2 MPi X 30 threads apparently only two threads are
running at 100% CPU usage and the rest at about 5% each.
This is typical inefficiency when you have too many threads per MPI
process.
Each thread handles one particle in Polish.
If you have 35 particles on a movie, all threads handle
the first 30 particles in the first round but in the next round,
only 5 threads have particles to work on. 25 threads become idle.
Our file system is based on ZFS and we use --sbs option. Previously
when we tried to use many Mpis in polishing, disk I/O seemed to become
slow. Is that expected?
The file system itself is not very important, the underlying hardware
is.
Is it useful for polishing to have number of frames divisible by
number of threads like in motioncor?
No, because parallelization is over particles, not frames.
Or could the problem be related to --only_do_unfinished issue
discussed in: https://github.com/3dem/relion/issues/985 ?
I believe Polish does not have this problem.
Best regards,
Takanori Nakane
On 14.11.24 13:58, Takanori Nakane wrote:
[EXTERNAL EMAIL - USE CAUTION when clicking links or attachments]
Hi,
> We normally don't use many Mpi to limit disk load - perhaps at
"combining frames" step it is OK to use many Mpi and less threads?
In general, having more MPI processes is more efficient,
provided that your file system can handle many parallel reads.
If you are using CephFS, GPFS, etc, this is the case.
If you are using a simple disk, or a RAID with not many disks,
that is surely limiting.
The combining step is more I/O bound than the earlier,
trajectory estimation step. If you cannot increase the number of
MPI processes in the early step, you cannot with this as well.
> Could it be memory leak problem similar to "subtracting particles"
First, that is not memory leak in RELION. It is caused by some
bad implementation/parameters of malloc.
You can check this by looking at the memory consumption.
Unlike subtraction, Polish supports continuation of a job.
In case of doubts, you can kill and continue the job.
Best regards,
Takanori Nakane
On 11/14/24 21:49, Leonid Sazanov wrote:
Hi, we are having some trouble finishing the polishing step. There
are about 11k movies from K3 super-resolution pixel 0.53,
motion-corrected in relion5 binned pixel 1.06, about 80 particles
per image, resolution of about 2.3 A. We use 2 Mpi with 30 threads
each and 16 GB memory per thread. The process seems to get slower
and slower as it goes via the last "combining frames" step -
initially it went though about 7k movies in one day and now at the
third day it is at 8k movies and predicts at least another 5 days.
We normally don't use many Mpi to limit disk load - perhaps at
"combining frames" step it is OK to use many Mpi and less threads?
Could it be memory leak problem similar to "subtracting particles" -
we do have this problem on our system and could try to fix this with
a previous solution for subtraction. Any other options?
Many thanks for any info!
Best
Leonid
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a
mailing list hosted by www.jiscmail.ac.uk, terms & conditions are
available at https://www.jiscmail.ac.uk/policyandsecurity/
########################################################################
To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list
hosted by www.jiscmail.ac.uk, terms & conditions are available at
https://www.jiscmail.ac.uk/policyandsecurity/