Re: [ccp4bb] how to speed up polishing

Takanori Nakane Fri, 15 Nov 2024 04:22:38 -0800

Hi,

I checked the node running the job and there seem to be no memory issuebut out of 2 MPi X 30 threads apparently only two threads are running at100% CPU usage and the rest at about 5% each.


This is typical inefficiency when you have too many threads per MPI
process.

Each thread handles one particle in Polish.

If you have 35 particles on a movie, all threads handle
the first 30 particles in the first round but in the next round,
only 5 threads have particles to work on. 25 threads become idle.

Our file system is based on ZFS and we use --sbs option. Previously whenwe tried to use many Mpis in polishing, disk I/O seemed to become slow.Is that expected?


The file system itself is not very important, the underlying hardware
is.

Is it useful for polishing to have number of frames divisible by numberof threads like in motioncor?


No, because parallelization is over particles, not frames.

Or could the problem be related to --only_do_unfinished issue discussedin: https://github.com/3dem/relion/issues/985 ?


I believe Polish does not have this problem.

Best regards,

Takanori Nakane

On 14.11.24 13:58, Takanori Nakane wrote:
[EXTERNAL EMAIL - USE CAUTION when clicking links or attachments]

Hi,
> We normally don't use many Mpi to limit disk load - perhaps at"combining frames" step it is OK to use many Mpi and less threads?
In general, having more MPI processes is more efficient,
provided that your file system can handle many parallel reads.
If you are using CephFS, GPFS, etc, this is the case.
If you are using a simple disk, or a RAID with not many disks,
that is surely limiting.

The combining step is more I/O　bound than the earlier,
trajectory estimation step. If you cannot increase the number of
MPI processes in the early step, you cannot with this as well.

> Could it be memory leak problem similar to "subtracting particles"

First, that is not memory leak in RELION. It is caused by some
bad implementation/parameters of malloc.

You can check this by looking at the memory consumption.
Unlike subtraction, Polish supports continuation of a job.
In case of doubts, you can kill and continue the job.

Best regards,

Takanori Nakane

On 11/14/24 21:49, Leonid Sazanov wrote:
Hi, we are having some trouble finishing the polishing step. Thereare about 11k movies from K3 super-resolution pixel 0.53,motion-corrected in relion5 binned pixel 1.06, about 80 particles perimage, resolution of about 2.3 A. We use 2 Mpi with 30 threads eachand 16 GB memory per thread. The process seems to get slower andslower as it goes via the last "combining frames" step - initially itwent though about 7k movies in one day and now at the third day it isat 8k movies and predicts at least another 5 days. We normally don'tuse many Mpi to limit disk load - perhaps at "combining frames" stepit is OK to use many Mpi and less threads? Could it be memory leakproblem similar to "subtracting particles" - we do have this problemon our system and could try to fix this with a previous solution forsubtraction. Any other options?
Many thanks for any info!
Best
Leonid

########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1
This message was issued to members of www.jiscmail.ac.uk/CCP4BB, amailing list hosted by www.jiscmail.ac.uk, terms & conditions areavailable at https://www.jiscmail.ac.uk/policyandsecurity/


########################################################################

To unsubscribe from the CCP4BB list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=CCP4BB&A=1

This message was issued to members of www.jiscmail.ac.uk/CCP4BB, a mailing list 
hosted by www.jiscmail.ac.uk, terms & conditions are available at 
https://www.jiscmail.ac.uk/policyandsecurity/

Re: [ccp4bb] how to speed up polishing

Reply via email to