Re: [OMPI users] www.open-mpi.org certificate error?

2016-07-30 Thread dpchoudh .
Hi Jeff and all Disclaimer: I know next to nothing about how the web works. Having said that, would it not be possible to redirect an https request to a http request? I believe apache mod-rewrite can do it. Or does this certificate check happens even before the rewrite? Regards Durga The woods

Re: [OMPI users] open-mpi: all-recursive error when compiling

2016-08-04 Thread dpchoudh .
In addition to what Gilles said, could you also check if selinux is enabled, and if so, if disabling it makes a difference? Thanks Durga On Thu, Aug 4, 2016 at 8:33 PM, Gilles Gouaillardet wrote: > The error message is related to a permission issue (which is very puzzling > in itself ...) > > c

[OMPI users] No core dump in some cases

2016-05-06 Thread dpchoudh .
Hello all I run MPI jobs (for test purpose only) on two different 'clusters'. Both 'clusters' have two nodes only, connected back-to-back. The two are very similar, but not identical, both software and hardware wise. Both have ulimit -c set to unlimited. However, only one of the two creates core

Re: [OMPI users] mpirun command won't run unless the firewalld daemon is disabled

2016-05-09 Thread dpchoudh .
Hello Llolsten Is there a specific reason you run as root? This practice is discouraged, isn't it? Also, isn't it true that OMPI uses ephemeral (i.e. 'user level, randomly chosen') ports for TCP transport? In that case, how did this ever worked with a firewall enabled? I have, in the past, have

Re: [OMPI users] No core dump in some cases

2016-05-09 Thread dpchoudh .
PM, Jeff Squyres (jsquyres) wrote: > >> I'm afraid I don't know what a .btr file is -- that is not something that >> is controlled by Open MPI. >> >> You might want to look into your OS settings to see if it has some kind >> of alternate corefile mechanism

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
chanism. I think we > should revisit it at some point but for now the only effective way i have > found to prevent it is to restore the default signal handlers after > MPI_Init. > > Excuse the quoting style. Good sucks. > > > >

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
ing :-( > opal_backtrace_buffer and opal_backtrace_print are only used with stderr. > so i am puzzled who creates the tracefile name and where ... > also, no stack is printed by default unless opal_abort_print_stack is true > > Cheers, > > Gilles > > > On Wed, May

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
trlimit(RLIMIT_CORE, &rlim); > printf ("after MPI_Init : %d %d\n", rlim.rlim_cur, rlim.rlim_max); > *c = 0; > MPI_Finalize(); > return 0; > } > > > On 5/12/2016 4:22 AM, dpchoudh . wrote: > > Hello Gilles > > Thank you for the advic

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
nd in this machine that might help the developers narrow down the issue; please let me know. Thank you Durga The surgeon general advises you to eat right, exercise regularly and quit ageing. On Wed, May 11, 2016 at 10:34 PM, dpchoudh . wrote: > Hello Gilles > > Thank you for your cont

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
lem - I committed the fix for PSM with a link down > just today. > > > On May 11, 2016, at 7:34 PM, dpchoudh . wrote: > > Hello Gilles > > Thank you for your continued support. With your help, I have a better > understanding of what is happening. Here are the details. >

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
PM, Gilles Gouaillardet wrote: > Note the psm library sets its own signal handler, possibly after the > OpenMPI one. > > that can be disabled by > > export IPATH_NO_BACKTRACE=1 > > Cheers, > > Gilles > > > On 5/12/2016 11:34 AM, dpchoudh . wrote: > > Hello Gil

Re: [OMPI users] No core dump in some cases

2016-05-11 Thread dpchoudh .
rgeon general advises you to eat right, exercise regularly and quit ageing. On Wed, May 11, 2016 at 11:23 PM, dpchoudh . wrote: > Hello Gilles > > Mystery solved! In fact, this one line is exactly what was needed!! It > turns out the OMPI signal handlers are irrelevant. (i.e. don't

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread dpchoudh .
> mpirun --mca mtl ^psm ... > or if you do not need any mtl at all > mpirun --mca pml ob1 ... > should be enough > > Cheers, > > Gilles > > commit 4d026e223ce717345712e669d26f78ed49082df6 > Merge: f8facb1 4071719 > Author: rhc54 > Date: Wed May 11 17:43:17 2016

Re: [OMPI users] No core dump in some cases

2016-05-12 Thread dpchoudh .
> > can you please give the attached patches a try ? > > /* they are exclusive, e.g. you should only apply one at a time */ > > > Cheers, > > > Gilles > On 5/12/2016 4:54 PM, dpchoudh . wrote: > > Hello Gilles > > I am not sure if I understand you c

[OMPI users] One more (possible) bug report

2016-05-13 Thread dpchoudh .
Dear developers I have been observing this issue all along on the master branch, but have been brushing off as something to do with my installation. Right now, I just downloaded a fresh checkout (via git pull), built and installed it (after deleting /usr/local/lib/openmpi/) and I can reproduce th

Re: [OMPI users] One more (possible) bug report

2016-05-13 Thread dpchoudh .
eat right, exercise regularly and quit ageing. On Fri, May 13, 2016 at 11:24 PM, dpchoudh . wrote: > Dear developers > > I have been observing this issue all along on the master branch, but have > been brushing off as something to do with my installation. > > Right now, I just

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread dpchoudh .
... > to ensure no hang will happen in oob > > as usual, double check no firewall is running, and your hosts can ping > each other > > Cheers, > > Gilles > > On Saturday, May 14, 2016, dpchoudh . wrote: > >> Dear developers >> >> I have been observi

Re: [OMPI users] One more (possible) bug report

2016-05-14 Thread dpchoudh .
illes Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > iirc, ompi internally uses networks and not interface names. > what did you use in your tests ? > can you try with networks ? > > Cheers, > > Gilles > > On Saturday, May 14, 2016, dpchoudh . wrote: > >> H

[OMPI users] Possible (minor) bug?

2016-05-21 Thread dpchoudh .
Hello all I have started noticing this message since yesterday on builds from the master branch. Any simple mpirun command, such as: mpirun -np 2 -hostfile ~/hostfile -mca btl self,tcp hostname generates a warning/error like this: *Duplicate cmd line entry mca* The hostfile, in my case, is jus

[OMPI users] How to see the output from OPAL_OUTPUT_VERBOSE?

2016-05-22 Thread dpchoudh .
Hello all I have built and installed OMPI with --enable-debug set. What runtime parameter do I need to see the output from OPAL_DEBUG_OUTPUT? Thank you Durga We learn from history that we never learn from history.

Re: [OMPI users] How to see the output from OPAL_OUTPUT_VERBOSE?

2016-05-22 Thread dpchoudh .
_base_verbose x. The number x controls the > verbosity level. Starting with 2.x are named levels but now many components > conform to the names yet. In general components use use numbers between 0 > and 100 (inclusive) with 100 being very verbose. > > -Nathan > > > On

[OMPI users] PSM vs PSM2

2016-06-02 Thread dpchoudh .
Hello all What is the difference between PSM and PSM2? Any pointer to more information is appreciated. Also, the PSM2 MTL does not seem to have a owner.txt file (on master, at least). Why is that? Thanks Durga We learn from history that we never learn from history.

[OMPI users] QP creation failure on iWARP adapter

2016-02-05 Thread dpchoudh .
Dear all This is a slightly off-topic post, and hopefully people won't mind helping me out. I have a very simple setup with two PCs, both with identical Chelsio 10GE iWARP adapter connected back-to-back. With this setup, the TCP channel works fine (with MPI or otherwise). But somehow, using RDMA

[OMPI users] Release vs git trunk directory tree

2016-02-14 Thread dpchoudh .
Hello developers The directory structure of the latest release and what exists on the git trunk seems to be very different, at least in the ompi/mca/ branch. In particular, the btl/ subtree does not even exist in the trunk. The reason I went looking for it is I am trying to implement a BTL for a

[OMPI users] Adding a new BTL

2016-02-25 Thread dpchoudh .
Hello all I am not sure if this question belongs in the user list or the developer list, but because it is a simpler question I am trying the user list first. I am trying to add a new BTL for a proprietary transport. As step #0, I copied the BTL template, renamed the 'template' to something else

Re: [OMPI users] Adding a new BTL

2016-02-25 Thread dpchoudh .
pal/mca/btl since v2.x > so it is quite common a bit of porting is required, most of the time, > it consists in replacing OMPI like macros by OPAL like macros > > Cheers, > > Gilles > > On Thu, Feb 25, 2016 at 3:54 PM, dpchoudh . wrote: > > Hello all > > > >

Re: [OMPI users] Adding a new BTL

2016-02-25 Thread dpchoudh .
ter suited for the Devel list, since we're > talking about OMPI internals. > > Sent from my phone. No type good. > > On Feb 25, 2016, at 2:06 PM, dpchoudh . wrote: > > Hello Gilles > > Thank you very much for your advice. Yes, I copied the templates from the > maste

Re: [OMPI users] General Questions

2016-03-01 Thread dpchoudh .
I don't think the Open MPI TCP BTL will pass the SDP socket type when creating sockets -- SDP is much lower performance than native verbs/RDMA. You should use a "native" interface to your RDMA network instead (which one you use depends on which kind of network you have). I have a rather naive fo

[OMPI users] iWARP usage issue

2016-03-08 Thread dpchoudh .
Hello all I am asking for help for the following situation: I have two (mostly identical) nodes. Each of them have (completely identical) 1. qlogic 4x DDR infiniband, AND 2. Chelsio S310E (T3 chip based) 10GE iWARP cards. Both are connected back-to-back, without a switch. The connection is physi

Re: [OMPI users] Communication problem (on one node) when network interface is down

2016-03-11 Thread dpchoudh .
Hello all >From a user standpoint, that does not seem right to me. Why should one need any kind of network at all if one is entirely dealing with a single node? Is there any particular reason OpenMPI does not/cannot use the lo (loopback) interface? I'd think it is there for exactly this kind of si

[OMPI users] IB question (slightly off topic)

2016-03-12 Thread dpchoudh .
Hello all I have a question, that I do realize, is somewhat off topic to this list. But I do not know who to approach for an answer. Hopefully the community here will help me out. I know that Infiniband is a 'standard' interface (standardized by IETF? IEEE? or some similar body), much like Ethern

[OMPI users] Issue about cm PML

2016-03-16 Thread dpchoudh .
Hello all I have a simple test setup, consisting of two Dell workstation nodes with similar hardware profile. Both the nodes have (identical) 1. Qlogic 4x DDR infiniband 2. Chelsio C310 iWARP ethernet. Both of these cards are connected back to back, without a switch. With this setup, I can run O

Re: [OMPI users] Issue about cm PML

2016-03-17 Thread dpchoudh .
s the BTLs). > > > > On Mar 17, 2016, at 8:07 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > > > can you try to add > > --mca mtl psm > > to your mpirun command line ? > > > > you might also have to blacklist the opening

[OMPI users] Why does 'self' needs to be explicitly mentioned?

2016-03-19 Thread dpchoudh .
Hello all I am wondering as to: 1. Why 'self' needs to be explicitly mentioned when using the BTL communication? Since it must always be there for MPI communication to work, should it not be implicit? I am sure there is some architectural rationale behind this; could someone please elaborate? 2.

[OMPI users] Why do I need a C++ linker while linking in MPI C code with CUDA?

2016-03-20 Thread dpchoudh .
Hello all I downloaded some code samples from here: https://github.com/parallel-forall/code-samples/ and tried to build the subdirectory posts/cuda-aware-mpi-example/src in my CentOS 7 machine. I had to make several changes to the Makefile before it would build. The modified Makefile is attac

Re: [OMPI users] Why do I need a C++ linker while linking in MPI C code with CUDA?

2016-03-20 Thread dpchoudh .
t; if everything should work, then i recommend you report this to nvidia > > Cheers, > > Gilles > > On Monday, March 21, 2016, Damien Hocking wrote: > >> Durga, >> >> The Cuda libraries use the C++ std libraries. That's the std::ios_base >> errors.. You n

[OMPI users] Existing and emerging interconnects for commodity PCs

2016-03-21 Thread dpchoudh .
Hello all I don't mean this to be a political conversation, but more of a research type. >From what I have been observing, some of the interconnects that had very good technological features as well as popularity in the past have basically gone down the history book and some others, with comparab

[OMPI users] Error running mpicc

2016-03-28 Thread dpchoudh .
Hello all The system in question is a CentOS 7 box, that has been running OpenMPI, both the master branch and the 1.10.2 release happily until now. Just now, in order to debug something, I recompiled with the following options: $ ./configure --enable-debug --enable-debug-symbols --disable-dlopen

Re: [OMPI users] Error running mpicc

2016-03-28 Thread dpchoudh .
nly option is to manually disable some components, so > only one flavor of lib nl is used. > that can be achieved by adding a .opal_ignore empty file in the dir of the > components you want to disable. > /* you will need to rerun autogen.pl after that */ > > Cheers, > > Gilles >

Re: [OMPI users] Error running mpicc

2016-03-28 Thread dpchoudh .
be achieved by adding a .opal_ignore empty file in the dir of the > components you want to disable. > /* you will need to rerun autogen.pl after that */ > > Cheers, > > Gilles > > On 3/28/2016 3:16 PM, dpchoudh . wrote: > > Hello all > > The system in questio

Re: [OMPI users] Error running mpicc

2016-03-28 Thread dpchoudh .
s little endian and the other is > big endian ? > if yes, then you need to configure with --enable-heterogeneous > > Cheers, > > Gilles > > > On 3/28/2016 4:26 PM, dpchoudh . wrote: > > Hello Gilles > > Per your suggestion, installing libnl3-devel does fixes t

[OMPI users] libfabric verb provider for iWARP RNIC

2016-04-02 Thread dpchoudh .
Hello all My machine has 3 network cards: 1. Broadcom GbE (vanilla type, with some offload capability) 2. Chelsion S310 10Gb iWARP 3. Qlogic DDR 4X Infiniband. With this setup, I built libfabric like this: ./configure --enable-udp=auto --enable-gni=auto --enable-mxm=auto --enable-usnic=auto --e

[OMPI users] Newbie question

2016-04-03 Thread dpchoudh .
Hello all I don't mean to be competing for the 'silliest question of the year award', but I can't figure this out on my own: My 'cluster' has 2 machines, bigMPI and smallMPI. They are connected via several (types of) networks and the connectivity is OK. In this setup, the following program hangs

Re: [OMPI users] Newbie question

2016-04-03 Thread dpchoudh .
tp_if_include 192.168.0.0/24 -np 2 -hostfile ~/hostfile > --mca btl self,tcp --mca pml ob1 ./mpitest > should do the trick > > Cheers, > > Gilles > > > > > On 4/4/2016 8:32 AM, dpchoudh . wrote: > > Hello all > > I don't mean to be competing fo

Re: [OMPI users] Newbie question

2016-04-03 Thread dpchoudh .
less the host is unreachable > and/or the tcp connection is denied by the firewall. > > Cheers, > > Gilles > > > > On 4/4/2016 9:44 AM, dpchoudh . wrote: > > Hello Gilles > > Thanks for your help. > > My question was more of a sanity check on myself

Re: [OMPI users] libfabric verb provider for iWARP RNIC

2016-04-04 Thread dpchoudh .
o get literally the freshest version of > libfabric, either at github or the 1.3rc2 tarball at > > http://www.openfabrics.org/downloads/ofi/ > > Good luck, > > Howard > > > 2016-04-02 13:41 GMT-06:00 dpchoudh . : > >> Hello all >> >> My machine

Re: [OMPI users] libfabric verb provider for iWARP RNIC

2016-04-05 Thread dpchoudh .
:20 PM, dpchoudh . wrote: > Hi Howard > > Thank you very much for your suggestions. All the installation location in > my case are the default ones, so that is likely not the issue. > > What I find a bit confusing is this: > > As I mentioned, my cluster has both Qlogic Infi

Re: [OMPI users] libfabric verb provider for iWARP RNIC

2016-04-11 Thread dpchoudh .
sion of > libfabric, either at github or the 1.3rc2 tarball at > > http://www.openfabrics.org/downloads/ofi/ > > Good luck, > > Howard > > > 2016-04-02 13:41 GMT-06:00 dpchoudh . : > >> Hello all >> >> My machine has 3 network cards: >> >&g

[OMPI users] Debugging help

2016-04-12 Thread dpchoudh .
Hello all I am trying to set a breakpoint during the modex exchange process so I can see the data being passed for different transport type. I assume that this is being done in the context of orted since this is part of process launch. Here is what I did: (All of this pertains to the master branc

[OMPI users] Possible bug in MPI_Barrier() ?

2016-04-12 Thread dpchoudh .
Hi all I have reported this issue before, but then had brushed it off as something that was caused by my modifications to the source tree. It looks like that is not the case. Just now, I did the following: 1. Cloned a fresh copy from master. 2. Configured with the following flags, built and inst

[OMPI users] Build on FreeBSD

2016-04-17 Thread dpchoudh .
Hello all I understand that FreeBSD is not a supported platform, so this may be an irrelevant piece of information, but let me pass it on anyway in the hope that it might be useful to somebody. OpenMPI 1.10.2 (release) successfully compiles on FreeBSD 10.2 (except for a minor issue of setting LD_

[OMPI users] openib failover

2016-04-17 Thread dpchoudh .
Hello all As I understand, the openib BTL supports NIC failover, but I am confused about the scope of this support. Let me elaborate: 1. Is the failover support part of MPI specification? 2. Is it an openMPI-specific addition to MPI implementation? 3. Is it a verb-API specification? Since the o

Re: [OMPI users] Possible bug in MPI_Barrier() ?

2016-04-17 Thread dpchoudh .
t; eth1 > ib0 > eth0,eth1 > eth0,ib0 > ... > eth0,eth1,ib0 > > and see where problem start occuring. > > btw, are your 3 interfaces in 3 different subnet ? is routing required > between two interfaces of the same type ? > > Cheers, > > Gilles > > On 4/13

Re: [OMPI users] Possible bug in MPI_Barrier() ?

2016-04-18 Thread dpchoudh .
ode! Unite!! Occupy the kernel!!! On Sun, Apr 17, 2016 at 11:55 PM, Ralph Castain wrote: > Try adding -mca oob_tcp_if_include eno1 to your cmd line and see if that > makes a difference > > On Apr 17, 2016, at 8:43 PM, dpchoudh . wrote: > > Hello Gilles and all > > I am sorry

Re: [OMPI users] Possible bug in MPI_Barrier() ?

2016-04-18 Thread dpchoudh .
the executables have 99% of CPU privilege! Userspace code! Unite!! Occupy the kernel!!! On Mon, Apr 18, 2016 at 12:06 AM, dpchoudh . wrote: > Thank you for your suggestion, Ralph. But it did not make any difference. > > Let me say that my code is about a week stale. I just did a git pull

Re: [OMPI users] Possible bug in MPI_Barrier() ?

2016-04-18 Thread dpchoudh .
instead ? > did you double check there is no firewall running on your nodes ? > > Cheers, > > Gilles > > > > > > > On 4/18/2016 1:06 PM, dpchoudh . wrote: > > Thank you for your suggestion, Ralph. But it did not make any difference. > > Let me say th

Re: [OMPI users] Possible bug in MPI_Barrier() ?

2016-04-18 Thread dpchoudh .
n, just to make sure old stuff does not get in the way > > Cheers, > > Gilles > > > On 4/18/2016 2:12 PM, dpchoudh . wrote: > > Hello Gilles > > Thank you very much for your feedback. You are right that my original > stack trace was on code that was several weeks

Re: [OMPI users] OMPI users] Possible bug in MPI_Barrier() ?

2016-04-18 Thread dpchoudh .
st fails, this can hint to a firewall. >> >> Cheers, >> >> Gilles >> >> Gilles Gouaillardet wrote: >> sudo make uninstall >> will not remove modules that are no more built >> sudo rm -rf /usr/local/lib/openmpi >> is safe thought >> >>

[OMPI users] make install warns about 'common symbols'

2016-04-19 Thread dpchoudh .
Hello all While doing a 'make install' with some additional code written by me, I get the following message: WARNING! Common symbols found: Doing a search on previous mails, I found the following thread that is pertinent: https://www.open-mpi.org/community/lists/devel/2015/04/17220.php However,

Re: [OMPI users] make install warns about 'common symbols'

2016-04-19 Thread dpchoudh .
if the scope is only one source file > - always initialize global variables > > Cheers, > > Gilles > > > On 4/20/2016 11:48 AM, dpchoudh . wrote: > > Hello all > > While doing a 'make install' with some additional code written by me, I > get the fo

Re: [OMPI users] MPIRUN SEGMENTATION FAULT

2016-04-23 Thread dpchoudh .
Elio You should ask this question in the forum of the simulation program you are using. These failures have most likely nothing to do with MPI (or, at least, OpenMPI) so this is the wrong place for these questions. Here is a bit of suggestion: does your program run without MPI at all? (i.e. in a

Re: [OMPI users] track progress of a mpi gather

2016-04-24 Thread dpchoudh .
Hello I am not sure I am understanding your requirements correctly, but base on what I think it is, how about this: you do an MPI_Send() from all the non-root nodes to the root node and pack all the progress related data into this send. Use a special tag for this message to make it stand out from

Re: [OMPI users] track progress of a mpi gather

2016-04-24 Thread dpchoudh .
t; MPI_Iscatter() and MPI_Igather()) and "both" collective and the progress > statuses > > Cheers, > > Gilles > > On Sunday, April 24, 2016, dpchoudh . wrote: > >> Hello >> >> >> I am not sure I am understanding your requirements correctly

[OMPI users] Cannot run a simple MPI program

2016-04-24 Thread dpchoudh .
Hello all Attached is a simple MPI program (a modified version of a similar program that was posted by another user). This program, when run on a single node machine, hangs most of the time, as follows: (in all cases, OS was CentOS 7) Scenario 1: OMPI v 1.10, single socket quad core machine, with

Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread dpchoudh .
les > > Cheers, > > Gilles > > > On 4/25/2016 7:34 AM, dpchoudh . wrote: > > Hello all > > Attached is a simple MPI program (a modified version of a similar program > that was posted by another user). This program, when run on a single node > machine, hangs most

Re: [OMPI users] Cannot run a simple MPI program

2016-04-24 Thread dpchoudh .
, George Bosilca wrote: > Add --mca pml ob1 to your mpirun command. > > George > > > On Sunday, April 24, 2016, dpchoudh . wrote: > >> Hello Gilles >> >> Thank you for finding the bug; it was not there in the original code; I >> added it while trying to

[OMPI users] Add release dates to release notes

2016-04-28 Thread dpchoudh .
Hello developers May I request that you add the release dates to the release notes (the NEWS file)? The reason I ask is that some one-of-a-kind hardware or obsolete hardware run very old versions of OpenMPI, and I am asked to maintain that version using a PC platform. I want to know what is the ti

Re: [OMPI users] libmpi_cxx

2018-03-28 Thread dpchoudh .
Hello Gilles and all Sorry if this is a bit off topic, but I am curious as to why C++bindings were dropped? Any pointers would be appreciated. Best regards Durga $man why dump woman? man: too many arguments On Wed, Mar 28, 2018 at 11:43 PM, Gilles Gouaillardet wrote: > Arthur, > > Try to > co

Re: [OMPI users] running mpi program between my PC and an ARM-architektur raspberry

2018-04-02 Thread dpchoudh .
Sorry for a pedantic follow up: Is this (heterogeneous cluster support) something that is specified by the MPI standard (perhaps as an optional component)? Do people know if MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an OpenMPI forum) The reason I ask is that I have a mini Li

Re: [OMPI users] problem

2018-05-10 Thread dpchoudh
What Jeff is suggesting is probably valgrind. However, in my experience, which is much less than most OpenMPI developers, a simple code inspection often is adequate. Here are the steps: 1. If you don't already have it, build a debug version of your code. If you are using gcc, you'd use a -g to CFL