Justin and all, the root cause is indeed a bug i fixed in https://github.com/open-mpi/ompi/pull/2135 i also had this patch applied to home-brew, so if you re-install open-mpi, you should be fine.
Cheers, Gilles for those who want to know more - Open MPI uses two Unix sockets, one by oob/usock and one by mix - to keep things simple, oob/usock Unix socket is based on $TMPDIR, hostname and quite a few more characters. OSX default $TMPDIR is not short, so when we append the FQDN (that might not be short too) and other paths, the size may excess the max allowed path for a Unix socket (104 bytes on Yosemite). this path is currently silently truncated, so bad/non-understandable things can happen. the patch disqualifies oob/usock instead of silently truncating the path. a simple workaround is to export TMPDIR=/tmp a better workaround is to mpirun --mca oob ^usock ... or you can add to your environment export OMPI_MCA_oob=^sock and then use mpirun as usual - pmix Unix socket path is only based on $TMPDIR plus a few extra characters bottom line, and unless your $TMPDIR is insanely long, you should be fine with one of these workarounds, or the patch available at https://github.com/open-mpi/ompi/pull/2135.patch, or by using the latest open-mpi from homebrew. On Fri, Sep 23, 2016 at 11:15 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Justin, > > > the root cause could be the length of $TMPDIR that might cause some path > being truncated. > > you can check that by simply using a custom $TMPDIR that has the same size > than the original one > > > which version of OSX are you running ? > > this might explain why Nathan nor i were able to reproduce the issue, and > i'd like to understand why this > > issue went undetected by Open MPI > > > Cheers, > > > Gilles > > > > On 9/23/2016 3:12 AM, Justin Chang wrote: >> >> Oh, so setting this in my ~/.profile >> >> export TMPDIR=/tmp >> >> in fact solves my problem completely! Not sure why this is the case, but >> thanks! >> >> Justin >> >> On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >>> >>> Justin, >>> >>> i do not see this error on my laptop >>> >>> which version of OS X are you running ? >>> >>> can you try to >>> TMPDIR=/tmp mpirun -n 1 >>> >>> Cheers, >>> >>> Gilles >>> >>> On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm <hje...@me.com> wrote: >>>> >>>> FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI >>>> 2.0.1 installed through homebrew: >>>> >>>> ✗ brew -v >>>> Homebrew 1.0.0 (git revision c3105; last commit 2016-09-22) >>>> Homebrew/homebrew-core (git revision 227e; last commit 2016-09-22) >>>> >>>> ✗ brew info openmpi >>>> >>>> open-mpi: stable 2.0.1 (bottled), HEAD >>>> High performance message passing library >>>> https://www.open-mpi.org/ >>>> Conflicts with: lcdf-typetools, mpich >>>> /usr/local/Cellar/open-mpi/2.0.1 (688 files, 8.3M) * >>>> Poured from bottle on 2016-09-22 at 03:53:35 >>>> From: >>>> https://github.com/Homebrew/homebrew-core/blob/master/Formula/open-mpi.rb >>>> ==> Dependencies >>>> Required: libevent ✔ >>>> ==> Options >>>> --c++11 >>>> Build using C++11 mode >>>> --with-cxx-bindings >>>> Enable C++ MPI bindings (deprecated as of MPI-3.0) >>>> --with-java >>>> Build with java support >>>> --with-mpi-thread-multiple >>>> Enable MPI_THREAD_MULTIPLE >>>> --without-fortran >>>> Build without fortran support >>>> --HEAD >>>> Install HEAD version >>>> >>>> ✗ type -p mpicc >>>> mpicc is /usr/local/bin/mpicc >>>> >>>> ✗ mpirun --version >>>> mpirun (Open MPI) 2.0.1 >>>> >>>> Report bugs to http://www.open-mpi.org/community/help/ >>>> >>>> >>>> ✗ mpirun ./ring_c >>>> Process 0 sending 10 to 1, tag 201 (4 processes in ring) >>>> Process 0 sent to 1 >>>> Process 0 decremented value: 9 >>>> Process 0 decremented value: 8 >>>> Process 0 decremented value: 7 >>>> Process 0 decremented value: 6 >>>> Process 0 decremented value: 5 >>>> Process 0 decremented value: 4 >>>> Process 0 decremented value: 3 >>>> Process 0 decremented value: 2 >>>> Process 0 decremented value: 1 >>>> Process 0 decremented value: 0 >>>> Process 0 exiting >>>> Process 1 exiting >>>> Process 2 exiting >>>> Process 3 exiting >>>> >>>> >>>> -Nathan >>>> >>>>> On Sep 22, 2016, at 3:31 AM, Justin Chang <jychan...@gmail.com> wrote: >>>>> >>>>> I tried that and also deleted everything inside $TMPDIR. The error >>>>> still persists >>>>> >>>>> On Thu, Sep 22, 2016 at 4:21 AM, r...@open-mpi.org <r...@open-mpi.org> >>>>> wrote: >>>>>> >>>>>> Try removing the “pmix” entries as well >>>>>> >>>>>>> On Sep 22, 2016, at 2:19 AM, Justin Chang <jychan...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> "mpirun -n 1" was just to demonstrate that I get those error >>>>>>> messages. >>>>>>> I ran a simple helloworld.c and it still gives those two messages. >>>>>>> >>>>>>> I did delete openmpi-sessions-* from my $TMPDIR but it doesn't solve >>>>>>> the problem. Here's my $TMPDIR: >>>>>>> >>>>>>> ~ cd $TMPDIR >>>>>>> ~ pwd >>>>>>> /var/folders/jd/qh5zn6jn5kz_byz9gxz5kl2m0000gn/T >>>>>>> ~ ls >>>>>>> MediaCache >>>>>>> TemporaryItems >>>>>>> com.apple.AddressBook.ContactsAccountsService >>>>>>> com.apple.AddressBook.InternetAccountsBridge >>>>>>> com.apple.AirPlayUIAgent >>>>>>> com.apple.BKAgentService >>>>>>> com.apple.CalendarAgent >>>>>>> com.apple.CalendarAgent.CalNCService >>>>>>> com.apple.CloudPhotosConfiguration >>>>>>> com.apple.DataDetectorsDynamicData >>>>>>> com.apple.ICPPhotoStreamLibraryService >>>>>>> com.apple.InputMethodKit.TextReplacementService >>>>>>> com.apple.PhotoIngestService >>>>>>> com.apple.Preview >>>>>>> com.apple.Safari >>>>>>> com.apple.SocialPushAgent >>>>>>> com.apple.WeatherKitService >>>>>>> com.apple.cloudphotosd >>>>>>> com.apple.dt.XCDocumenter.XCDocumenterExtension >>>>>>> com.apple.dt.XcodeBuiltInExtensions >>>>>>> com.apple.geod >>>>>>> com.apple.iCal.CalendarNC >>>>>>> com.apple.lateragent >>>>>>> com.apple.ncplugin.stocks >>>>>>> com.apple.ncplugin.weather >>>>>>> com.apple.notificationcenterui.WeatherSummary >>>>>>> com.apple.photolibraryd >>>>>>> com.apple.photomoments >>>>>>> com.apple.quicklook.ui.helper >>>>>>> com.apple.soagent >>>>>>> com.getdropbox.dropbox.garcon >>>>>>> icdd501 >>>>>>> ics21406 >>>>>>> openmpi-sessions-501@Justins-MacBook-Pro-2_0 >>>>>>> pmix-12195 >>>>>>> pmix-12271 >>>>>>> pmix-12289 >>>>>>> pmix-12295 >>>>>>> pmix-12304 >>>>>>> pmix-12313 >>>>>>> pmix-12367 >>>>>>> pmix-12397 >>>>>>> pmix-12775 >>>>>>> pmix-12858 >>>>>>> pmix-17118 >>>>>>> pmix-1754 >>>>>>> pmix-20632 >>>>>>> pmix-20793 >>>>>>> pmix-20849 >>>>>>> pmix-21019 >>>>>>> pmix-22316 >>>>>>> pmix-8129 >>>>>>> pmix-8494 >>>>>>> xcrun_db >>>>>>> ~ rm -rf openmpi-sessions-501@Justins-MacBook-Pro-2_0 >>>>>>> ~ mpirun -n 1 >>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] bind() failed on >>>>>>> error Address already in use (48) >>>>>>> [Justins-MacBook-Pro-2.local:22527] [[12992,0],0] ORTE_ERROR_LOG: >>>>>>> Error in file oob_usock_component.c at line 228 >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> No executable was specified on the mpirun command line. >>>>>>> >>>>>>> Aborting. >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> and when I type "ls" the directory >>>>>>> "openmpi-sessions-501@Justins-MacBook-Pro-2_0" reappeared. Unless >>>>>>> there's a different directory I need to look for? >>>>>>> >>>>>>> On Thu, Sep 22, 2016 at 4:08 AM, r...@open-mpi.org <r...@open-mpi.org> >>>>>>> wrote: >>>>>>>> >>>>>>>> Maybe I’m missing something, but “mpirun -n 1” doesn’t include the >>>>>>>> name of an application to execute. >>>>>>>> >>>>>>>> The error message prior to that error indicates that you have some >>>>>>>> cruft sitting in your tmpdir. You just need to clean it out - look for >>>>>>>> something that starts with “openmpi” >>>>>>>> >>>>>>>> >>>>>>>>> On Sep 22, 2016, at 1:45 AM, Justin Chang <jychan...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Dear all, >>>>>>>>> >>>>>>>>> So I upgraded/updated my Homebrew on my Macbook and installed Open >>>>>>>>> MPI >>>>>>>>> 2.0.1 using "brew install openmpi". However, when I open up a >>>>>>>>> terminal >>>>>>>>> and type "mpirun -n 1" I get the following messages: >>>>>>>>> >>>>>>>>> ~ mpirun -n 1 >>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] bind() failed on >>>>>>>>> error Address already in use (48) >>>>>>>>> [Justins-MacBook-Pro-2.local:20793] [[13318,0],0] ORTE_ERROR_LOG: >>>>>>>>> Error in file oob_usock_component.c at line 228 >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> No executable was specified on the mpirun command line. >>>>>>>>> >>>>>>>>> Aborting. >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> I have never seen anything like the first two lines. I also >>>>>>>>> installed >>>>>>>>> python and mpi4py via pip, and when I still get the same messages: >>>>>>>>> >>>>>>>>> ~ python -c "from mpi4py import MPI" >>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] bind() failed on >>>>>>>>> error Address already in use (48) >>>>>>>>> [Justins-MacBook-Pro-2.local:20871] [[13496,0],0] ORTE_ERROR_LOG: >>>>>>>>> Error in file oob_usock_component.c at line 228 >>>>>>>>> >>>>>>>>> But now if I add "mpirun -n 1" I get the following: >>>>>>>>> >>>>>>>>> ~ mpirun -n 1 python -c "from mpi4py import MPI" >>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] bind() failed on >>>>>>>>> error Address already in use (48) >>>>>>>>> [Justins-MacBook-Pro-2.local:20935] [[13560,0],0] ORTE_ERROR_LOG: >>>>>>>>> Error in file oob_usock_component.c at line 228 >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] >>>>>>>>> usock_peer_send_blocking: send() to socket 17 failed: Socket is not >>>>>>>>> connected (57) >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] ORTE_ERROR_LOG: >>>>>>>>> Unreachable in file oob_usock_connection.c at line 315 >>>>>>>>> [Justins-MacBook-Pro-2.local:20936] [[13560,1],0] >>>>>>>>> orte_usock_peer_try_connect: usock_peer_send_connect_ack to proc >>>>>>>>> [[13560,0],0] failed: Unreachable (-12) >>>>>>>>> [Justins-MacBook-Pro-2:20936] *** Process received signal *** >>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal: Segmentation fault: 11 (11) >>>>>>>>> [Justins-MacBook-Pro-2:20936] Signal code: (0) >>>>>>>>> [Justins-MacBook-Pro-2:20936] Failing at address: 0x0 >>>>>>>>> ------------------------------------------------------- >>>>>>>>> Primary job terminated normally, but 1 process returned >>>>>>>>> a non-zero exit code.. Per user-direction, the job has been >>>>>>>>> aborted. >>>>>>>>> ------------------------------------------------------- >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> mpirun detected that one or more processes exited with non-zero >>>>>>>>> status, thus causing >>>>>>>>> the job to be terminated. The first process to do so was: >>>>>>>>> >>>>>>>>> Process name: [[13560,1],0] >>>>>>>>> Exit code: 1 >>>>>>>>> >>>>>>>>> -------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> Clearly something is wrong here. I already tried things like "rm >>>>>>>>> -rf >>>>>>>>> $TMPDIR/openmpi-sessions-*" but said directory keeps reappearing >>>>>>>>> and >>>>>>>>> the error persists. Why does this happen and how do I fix it? For >>>>>>>>> what >>>>>>>>> it's worth, here's some other information that may help: >>>>>>>>> >>>>>>>>> ~ mpicc --version >>>>>>>>> Apple LLVM version 8.0.0 (clang-800.0.38) >>>>>>>>> Target: x86_64-apple-darwin15.6.0 >>>>>>>>> Thread model: posix >>>>>>>>> InstalledDir: >>>>>>>>> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin >>>>>>>>> >>>>>>>>> I tested Hello World with both mpicc and mpif90, and they still >>>>>>>>> work >>>>>>>>> despite showing those two error/warning messages. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Justin >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> users@lists.open-mpi.org >>>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> users@lists.open-mpi.org >>>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> users@lists.open-mpi.org >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users _______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users