Grzegorz: something occurred to me. When you start all these processes, how are 
you staggering their wireup? Are they flooding us, or are you time-shifting 
them a little?


On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote:

> Hm, so I am not sure how to approach this. First of all, the test case
> works for me. I used up to 80 clients, and for both optimized and
> non-optimized compilation. I ran the tests with trunk (not with 1.4
> series, but the communicator code is identical in both cases). Clearly,
> the patch from Ralph is necessary to make it work.
> 
> Additionally, I went through the communicator creation code for dynamic
> communicators trying to find spots that could create problems. The only
> place that I found the number 64 appear is the fortran-to-c mapping
> arrays (e.g. for communicators), where the initial size of the table is
> 64. I looked twice over the pointer-array code to see whether we could
> have a problem their (since it is a key-piece of the cid allocation code
> for communicators), but I am fairly confident that it is correct.
> 
> Note, that we have other (non-dynamic tests), were comm_set is called
> 100,000 times, and the code per se does not seem to have a problem due
> to being called too often. So I am not sure what else to look at.
> 
> Edgar
> 
> 
> 
> On 7/13/2010 8:42 PM, Ralph Castain wrote:
>> As far as I can tell, it appears the problem is somewhere in our 
>> communicator setup. The people knowledgeable on that area are going to look 
>> into it later this week.
>> 
>> I'm creating a ticket to track the problem and will copy you on it.
>> 
>> 
>> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote:
>> 
>>> 
>>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote:
>>> 
>>>> Bad news..
>>>> I've tried the latest patch with and without the prior one, but it
>>>> hasn't changed anything. I've also tried using the old code but with
>>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't
>>>> help.
>>>> While looking through the sources of openmpi-1.4.2 I couldn't find any
>>>> call of the function ompi_dpm_base_mark_dyncomm.
>>> 
>>> It isn't directly called - it shows in ompi_comm_set as 
>>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, but I 
>>> guess something else is also being hit. Have to look further...
>>> 
>>> 
>>>> 
>>>> 
>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>> Just so you don't have to wait for 1.4.3 release, here is the patch 
>>>>> (doesn't include the prior patch).
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote:
>>>>> 
>>>>>> 2010/7/12 Ralph Castain <r...@open-mpi.org>:
>>>>>>> Dug around a bit and found the problem!!
>>>>>>> 
>>>>>>> I have no idea who or why this was done, but somebody set a limit of 64 
>>>>>>> separate jobids in the dynamic init called by ompi_comm_set, which 
>>>>>>> builds the intercommunicator. Unfortunately, they hard-wired the array 
>>>>>>> size, but never check that size before adding to it.
>>>>>>> 
>>>>>>> So after 64 calls to connect_accept, you are overwriting other areas of 
>>>>>>> the code. As you found, hitting 66 causes it to segfault.
>>>>>>> 
>>>>>>> I'll fix this on the developer's trunk (I'll also add that original 
>>>>>>> patch to it). Rather than my searching this thread in detail, can you 
>>>>>>> remind me what version you are using so I can patch it too?
>>>>>> 
>>>>>> I'm using 1.4.2
>>>>>> Thanks a lot and I'm looking forward for the patch.
>>>>>> 
>>>>>>> 
>>>>>>> Thanks for your patience with this!
>>>>>>> Ralph
>>>>>>> 
>>>>>>> 
>>>>>>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>>>>>>> 
>>>>>>>> 1024 is not the problem: changing it to 2048 hasn't change anything.
>>>>>>>> Following your advice I've run my process using gdb. Unfortunately I
>>>>>>>> didn't get anything more than:
>>>>>>>> 
>>>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>>>>>>>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>> 
>>>>>>>> (gdb) bt
>>>>>>>> #0  0xf7f39905 in ompi_comm_set () from 
>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>> #1  0xf7e3ba95 in connect_accept () from
>>>>>>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>>>>>>>> #2  0xf7f62013 in PMPI_Comm_connect () from 
>>>>>>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>>>>>>> #3  0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43
>>>>>>>> 
>>>>>>>> What's more: when I've added a breakpoint on ompi_comm_set in 66th
>>>>>>>> process and stepped a couple of instructions, one of the other
>>>>>>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did.
>>>>>>>> 
>>>>>>>> Finally I decided to recompile openmpi using -g flag for gcc. In this
>>>>>>>> case the 66 processes issue has gone! I was running my applications
>>>>>>>> exactly the same way as previously (even without recompilation) and
>>>>>>>> I've run successfully over 130 processes.
>>>>>>>> When switching back to the openmpi compilation without -g it again 
>>>>>>>> segfaults.
>>>>>>>> 
>>>>>>>> Any ideas? I'm really confused.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>> I would guess the #files limit of 1024. However, if it behaves the 
>>>>>>>>> same way when spread across multiple machines, I would suspect it is 
>>>>>>>>> somewhere in your program itself. Given that the segfault is in your 
>>>>>>>>> process, can you use gdb to look at the core file and see where and 
>>>>>>>>> why it fails?
>>>>>>>>> 
>>>>>>>>> On Jul 7, 2010, at 10:17 AM, Grzegorz Maj wrote:
>>>>>>>>> 
>>>>>>>>>> 2010/7/7 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>> 
>>>>>>>>>>> On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>> sorry for the late response, but I couldn't find free time to play
>>>>>>>>>>>> with this. Finally I've applied the patch you prepared. I've 
>>>>>>>>>>>> launched
>>>>>>>>>>>> my processes in the way you've described and I think it's working 
>>>>>>>>>>>> as
>>>>>>>>>>>> you expected. None of my processes runs the orted daemon and they 
>>>>>>>>>>>> can
>>>>>>>>>>>> perform MPI operations. Unfortunately I'm still hitting the 65
>>>>>>>>>>>> processes issue :(
>>>>>>>>>>>> Maybe I'm doing something wrong.
>>>>>>>>>>>> I attach my source code. If anybody could have a look on this, I 
>>>>>>>>>>>> would
>>>>>>>>>>>> be grateful.
>>>>>>>>>>>> 
>>>>>>>>>>>> When I run that code with clients_count <= 65 everything works 
>>>>>>>>>>>> fine:
>>>>>>>>>>>> all the processes create a common grid, exchange some information 
>>>>>>>>>>>> and
>>>>>>>>>>>> disconnect.
>>>>>>>>>>>> When I set clients_count > 65 the 66th process crashes on
>>>>>>>>>>>> MPI_Comm_connect (segmentation fault).
>>>>>>>>>>> 
>>>>>>>>>>> I didn't have time to check the code, but my guess is that you are 
>>>>>>>>>>> still hitting some kind of file descriptor or other limit. Check to 
>>>>>>>>>>> see what your limits are - usually "ulimit" will tell you.
>>>>>>>>>> 
>>>>>>>>>> My limitations are:
>>>>>>>>>> time(seconds)        unlimited
>>>>>>>>>> file(blocks)         unlimited
>>>>>>>>>> data(kb)             unlimited
>>>>>>>>>> stack(kb)            10240
>>>>>>>>>> coredump(blocks)     0
>>>>>>>>>> memory(kb)           unlimited
>>>>>>>>>> locked memory(kb)    64
>>>>>>>>>> process              200704
>>>>>>>>>> nofiles              1024
>>>>>>>>>> vmemory(kb)          unlimited
>>>>>>>>>> locks                unlimited
>>>>>>>>>> 
>>>>>>>>>> Which one do you think could be responsible for that?
>>>>>>>>>> 
>>>>>>>>>> I was trying to run all the 66 processes on one machine or spread 
>>>>>>>>>> them
>>>>>>>>>> across several machines and it always crashes the same way on the 
>>>>>>>>>> 66th
>>>>>>>>>> process.
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Another thing I would like to know is if it's normal that any of my
>>>>>>>>>>>> processes when calling MPI_Comm_connect or MPI_Comm_accept when the
>>>>>>>>>>>> other side is not ready, is eating up a full CPU available.
>>>>>>>>>>> 
>>>>>>>>>>> Yes - the waiting process is polling in a tight loop waiting for 
>>>>>>>>>>> the connection to be made.
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Any help would be appreciated,
>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2010/4/24 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>> Actually, OMPI is distributed with a daemon that does pretty much 
>>>>>>>>>>>>> what you
>>>>>>>>>>>>> want. Checkout "man ompi-server". I originally wrote that code to 
>>>>>>>>>>>>> support
>>>>>>>>>>>>> cross-application MPI publish/subscribe operations, but we can 
>>>>>>>>>>>>> utilize it
>>>>>>>>>>>>> here too. Have to blame me for not making it more publicly known.
>>>>>>>>>>>>> The attached patch upgrades ompi-server and modifies the 
>>>>>>>>>>>>> singleton startup
>>>>>>>>>>>>> to provide your desired support. This solution works in the 
>>>>>>>>>>>>> following
>>>>>>>>>>>>> manner:
>>>>>>>>>>>>> 1. launch "ompi-server -report-uri <filename>". This starts a 
>>>>>>>>>>>>> persistent
>>>>>>>>>>>>> daemon called "ompi-server" that acts as a rendezvous point for
>>>>>>>>>>>>> independently started applications.  The problem with starting 
>>>>>>>>>>>>> different
>>>>>>>>>>>>> applications and wanting them to MPI connect/accept lies in the 
>>>>>>>>>>>>> need to have
>>>>>>>>>>>>> the applications find each other. If they can't discover contact 
>>>>>>>>>>>>> info for
>>>>>>>>>>>>> the other app, then they can't wire up their interconnects. The
>>>>>>>>>>>>> "ompi-server" tool provides that rendezvous point. I don't like 
>>>>>>>>>>>>> that
>>>>>>>>>>>>> comm_accept segfaulted - should have just error'd out.
>>>>>>>>>>>>> 2. set OMPI_MCA_orte_server=file:<filename>" in the environment 
>>>>>>>>>>>>> where you
>>>>>>>>>>>>> will start your processes. This will allow your singleton 
>>>>>>>>>>>>> processes to find
>>>>>>>>>>>>> the ompi-server. I automatically also set the envar to connect 
>>>>>>>>>>>>> the MPI
>>>>>>>>>>>>> publish/subscribe system for you.
>>>>>>>>>>>>> 3. run your processes. As they think they are singletons, they 
>>>>>>>>>>>>> will detect
>>>>>>>>>>>>> the presence of the above envar and automatically connect 
>>>>>>>>>>>>> themselves to the
>>>>>>>>>>>>> "ompi-server" daemon. This provides each process with the ability 
>>>>>>>>>>>>> to perform
>>>>>>>>>>>>> any MPI-2 operation.
>>>>>>>>>>>>> I tested this on my machines and it worked, so hopefully it will 
>>>>>>>>>>>>> meet your
>>>>>>>>>>>>> needs. You only need to run one "ompi-server" period, so long as 
>>>>>>>>>>>>> you locate
>>>>>>>>>>>>> it where all of the processes can find the contact file and can 
>>>>>>>>>>>>> open a TCP
>>>>>>>>>>>>> socket to the daemon. There is a way to knit multiple 
>>>>>>>>>>>>> ompi-servers into a
>>>>>>>>>>>>> broader network (e.g., to connect processes that cannot directly 
>>>>>>>>>>>>> access a
>>>>>>>>>>>>> server due to network segmentation), but it's a tad tricky - let 
>>>>>>>>>>>>> me know if
>>>>>>>>>>>>> you require it and I'll try to help.
>>>>>>>>>>>>> If you have trouble wiring them all into a single communicator, 
>>>>>>>>>>>>> you might
>>>>>>>>>>>>> ask separately about that and see if one of our MPI experts can 
>>>>>>>>>>>>> provide
>>>>>>>>>>>>> advice (I'm just the RTE grunt).
>>>>>>>>>>>>> HTH - let me know how this works for you and I'll incorporate it 
>>>>>>>>>>>>> into future
>>>>>>>>>>>>> OMPI releases.
>>>>>>>>>>>>> Ralph
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Apr 24, 2010, at 1:49 AM, Krzysztof Zarzycki wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Ralph,
>>>>>>>>>>>>> I'm Krzysztof and I'm working with Grzegorz Maj on this our small
>>>>>>>>>>>>> project/experiment.
>>>>>>>>>>>>> We definitely would like to give your patch a try. But could you 
>>>>>>>>>>>>> please
>>>>>>>>>>>>> explain your solution a little more?
>>>>>>>>>>>>> You still would like to start one mpirun per mpi grid, and then 
>>>>>>>>>>>>> have
>>>>>>>>>>>>> processes started by us to join the MPI comm?
>>>>>>>>>>>>> It is a good solution of course.
>>>>>>>>>>>>> But it would be especially preferable to have one daemon running
>>>>>>>>>>>>> persistently on our "entry" machine that can handle several mpi 
>>>>>>>>>>>>> grid starts.
>>>>>>>>>>>>> Can your patch help us this way too?
>>>>>>>>>>>>> Thanks for your help!
>>>>>>>>>>>>> Krzysztof
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 24 April 2010 03:51, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> In thinking about this, my proposed solution won't entirely fix 
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> problem - you'll still wind up with all those daemons. I believe 
>>>>>>>>>>>>>> I can
>>>>>>>>>>>>>> resolve that one as well, but it would require a patch.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Would you like me to send you something you could try? Might 
>>>>>>>>>>>>>> take a couple
>>>>>>>>>>>>>> of iterations to get it right...
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Apr 23, 2010, at 12:12 PM, Ralph Castain wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hmmm....I -think- this will work, but I cannot guarantee it:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 1. launch one process (can just be a spinner) using mpirun that 
>>>>>>>>>>>>>>> includes
>>>>>>>>>>>>>>> the following option:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> mpirun -report-uri file
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> where file is some filename that mpirun can create and insert 
>>>>>>>>>>>>>>> its
>>>>>>>>>>>>>>> contact info into it. This can be a relative or absolute path. 
>>>>>>>>>>>>>>> This process
>>>>>>>>>>>>>>> must remain alive throughout your application - doesn't matter 
>>>>>>>>>>>>>>> what it does.
>>>>>>>>>>>>>>> It's purpose is solely to keep mpirun alive.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 2. set OMPI_MCA_dpm_orte_server=FILE:file in your environment, 
>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>> "file" is the filename given above. This will tell your 
>>>>>>>>>>>>>>> processes how to
>>>>>>>>>>>>>>> find mpirun, which is acting as a meeting place to handle the 
>>>>>>>>>>>>>>> connect/accept
>>>>>>>>>>>>>>> operations
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Now run your processes, and have them connect/accept to each 
>>>>>>>>>>>>>>> other.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The reason I cannot guarantee this will work is that these 
>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>> will all have the same rank && name since they all start as 
>>>>>>>>>>>>>>> singletons.
>>>>>>>>>>>>>>> Hence, connect/accept is likely to fail.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> But it -might- work, so you might want to give it a try.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Apr 23, 2010, at 8:10 AM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> To be more precise: by 'server process' I mean some process 
>>>>>>>>>>>>>>>> that I
>>>>>>>>>>>>>>>> could run once on my system and it could help in creating those
>>>>>>>>>>>>>>>> groups.
>>>>>>>>>>>>>>>> My typical scenario is:
>>>>>>>>>>>>>>>> 1. run N separate processes, each without mpirun
>>>>>>>>>>>>>>>> 2. connect them into MPI group
>>>>>>>>>>>>>>>> 3. do some job
>>>>>>>>>>>>>>>> 4. exit all N processes
>>>>>>>>>>>>>>>> 5. goto 1
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 2010/4/23 Grzegorz Maj <ma...@wp.pl>:
>>>>>>>>>>>>>>>>> Thank you Ralph for your explanation.
>>>>>>>>>>>>>>>>> And, apart from that descriptors' issue, is there any other 
>>>>>>>>>>>>>>>>> way to
>>>>>>>>>>>>>>>>> solve my problem, i.e. to run separately a number of 
>>>>>>>>>>>>>>>>> processes,
>>>>>>>>>>>>>>>>> without mpirun and then to collect them into an MPI intracomm 
>>>>>>>>>>>>>>>>> group?
>>>>>>>>>>>>>>>>> If I for example would need to run some 'server process' 
>>>>>>>>>>>>>>>>> (even using
>>>>>>>>>>>>>>>>> mpirun) for this task, that's OK. Any ideas?
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>> Okay, but here is the problem. If you don't use mpirun, and 
>>>>>>>>>>>>>>>>>> are not
>>>>>>>>>>>>>>>>>> operating in an environment we support for "direct" launch 
>>>>>>>>>>>>>>>>>> (i.e., starting
>>>>>>>>>>>>>>>>>> processes outside of mpirun), then every one of those 
>>>>>>>>>>>>>>>>>> processes thinks it is
>>>>>>>>>>>>>>>>>> a singleton - yes?
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> What you may not realize is that each singleton immediately
>>>>>>>>>>>>>>>>>> fork/exec's an orted daemon that is configured to behave 
>>>>>>>>>>>>>>>>>> just like mpirun.
>>>>>>>>>>>>>>>>>> This is required in order to support MPI-2 operations such as
>>>>>>>>>>>>>>>>>> MPI_Comm_spawn, MPI_Comm_connect/accept, etc.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> So if you launch 64 processes that think they are 
>>>>>>>>>>>>>>>>>> singletons, then
>>>>>>>>>>>>>>>>>> you have 64 copies of orted running as well. This eats up a 
>>>>>>>>>>>>>>>>>> lot of file
>>>>>>>>>>>>>>>>>> descriptors, which is probably why you are hitting this 65 
>>>>>>>>>>>>>>>>>> process limit -
>>>>>>>>>>>>>>>>>> your system is probably running out of file descriptors. You 
>>>>>>>>>>>>>>>>>> might check you
>>>>>>>>>>>>>>>>>> system limits and see if you can get them revised upward.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 4:24 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Yes, I know. The problem is that I need to use some special 
>>>>>>>>>>>>>>>>>>> way for
>>>>>>>>>>>>>>>>>>> running my processes provided by the environment in which 
>>>>>>>>>>>>>>>>>>> I'm
>>>>>>>>>>>>>>>>>>> working
>>>>>>>>>>>>>>>>>>> and unfortunately I can't use mpirun.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> 2010/4/18 Ralph Castain <r...@open-mpi.org>:
>>>>>>>>>>>>>>>>>>>> Guess I don't understand why you can't use mpirun - all it 
>>>>>>>>>>>>>>>>>>>> does is
>>>>>>>>>>>>>>>>>>>> start things, provide a means to forward io, etc. It 
>>>>>>>>>>>>>>>>>>>> mainly sits there
>>>>>>>>>>>>>>>>>>>> quietly without using any cpu unless required to support 
>>>>>>>>>>>>>>>>>>>> the job.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> Sounds like it would solve your problem. Otherwise, I know 
>>>>>>>>>>>>>>>>>>>> of no
>>>>>>>>>>>>>>>>>>>> way to get all these processes into comm_world.
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> On Apr 17, 2010, at 2:27 PM, Grzegorz Maj wrote:
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>> I'd like to dynamically create a group of processes 
>>>>>>>>>>>>>>>>>>>>> communicating
>>>>>>>>>>>>>>>>>>>>> via
>>>>>>>>>>>>>>>>>>>>> MPI. Those processes need to be run without mpirun and 
>>>>>>>>>>>>>>>>>>>>> create
>>>>>>>>>>>>>>>>>>>>> intracommunicator after the startup. Any ideas how to do 
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> efficiently?
>>>>>>>>>>>>>>>>>>>>> I came up with a solution in which the processes are 
>>>>>>>>>>>>>>>>>>>>> connecting
>>>>>>>>>>>>>>>>>>>>> one by
>>>>>>>>>>>>>>>>>>>>> one using MPI_Comm_connect, but unfortunately all the 
>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> are already in the group need to call MPI_Comm_accept. 
>>>>>>>>>>>>>>>>>>>>> This means
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> when the n-th process wants to connect I need to collect 
>>>>>>>>>>>>>>>>>>>>> all the
>>>>>>>>>>>>>>>>>>>>> n-1
>>>>>>>>>>>>>>>>>>>>> processes on the MPI_Comm_accept call. After I run about 
>>>>>>>>>>>>>>>>>>>>> 40
>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>> every subsequent call takes more and more time, which I'd 
>>>>>>>>>>>>>>>>>>>>> like to
>>>>>>>>>>>>>>>>>>>>> avoid.
>>>>>>>>>>>>>>>>>>>>> Another problem in this solution is that when I try to 
>>>>>>>>>>>>>>>>>>>>> connect
>>>>>>>>>>>>>>>>>>>>> 66-th
>>>>>>>>>>>>>>>>>>>>> process the root of the existing group segfaults on
>>>>>>>>>>>>>>>>>>>>> MPI_Comm_accept.
>>>>>>>>>>>>>>>>>>>>> Maybe it's my bug, but it's weird as everything works 
>>>>>>>>>>>>>>>>>>>>> fine for at
>>>>>>>>>>>>>>>>>>>>> most
>>>>>>>>>>>>>>>>>>>>> 65 processes. Is there any limitation I don't know about?
>>>>>>>>>>>>>>>>>>>>> My last question is about MPI_COMM_WORLD. When I run my 
>>>>>>>>>>>>>>>>>>>>> processes
>>>>>>>>>>>>>>>>>>>>> without mpirun their MPI_COMM_WORLD is the same as 
>>>>>>>>>>>>>>>>>>>>> MPI_COMM_SELF.
>>>>>>>>>>>>>>>>>>>>> Is
>>>>>>>>>>>>>>>>>>>>> there any way to change MPI_COMM_WORLD and set it to the
>>>>>>>>>>>>>>>>>>>>> intracommunicator that I've created?
>>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Grzegorz Maj
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> 
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> users mailing list
>>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>>>> 
>>>>>>>>>>>> <client.c><server.c>_______________________________________________
>>>>>>>>>>>> users mailing list
>>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> users mailing list
>>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> us...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> us...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> us...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> us...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> us...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> 
>>>> 
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to