Re: [OMPI users] OMPI seg fault by a class with weird address.
Hi Jack, 1- Where is your main function to see how you called your class? 2- I do not see the implementation of GetPosition, GetName, etc.? With best regards, -Belaid. From: dtustud...@hotmail.com To: us...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Mon, 14 Mar 2011 19:04:12 -0600 Subject: [OMPI users] OMPI seg fault by a class with weird address. Hi, I got a run-time error of a Open MPI C++ program. The following output is from gdb: --Program received signal SIGSEGV, Segmentation fault.0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 At the point Breakpoint 9, Index::Index (this=0x7fffcb80) at src/index.cpp:2020 Name(0) {} The Index has been called before this point and no problem:---Breakpoint 9, Index::Index (this=0x117d800) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. Breakpoint 9, Index::Index (this=0x117d860) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. It seems that the 0x7fffcb80 address is a problem. But, I donot know the reason and how to remove the bug. Any help is really appreciated. thanks the following is the index definition. -class Index { public:Index();Index(const Index& rhs);~Index(); Index& operator=(const Index& rhs); vector GetPosition() const;vector GetColumn() const; vector GetYear() const;vector GetName() const; int GetPosition(const int idx) const; int GetColumn(const int idx) const; int GetYear(const int idx) const; string GetName(const int idx) const;int GetSize() const; void Add(const int idx, const int col, const string& name); void Add(const int idx, const int col, const int year, const string& name); void Add(const int idx, const Step& col, const string& name); void WriteFile(const char* fileinput) const;private: vector Position; vector Column; vector Year; vector Name;};// Contructors and destructor for the Index classIndex::Index() : Position(0),Column(0), Year(0), Name(0) {} Index::Index(const Index& rhs) :Position(rhs.GetPosition()), Column(rhs.GetColumn()),Year(rhs.GetYear()),Name(rhs.GetName()) {} Index::~Index() {} Index& Index::operator=(const Index& rhs) {Position = rhs.GetPosition(); Column = rhs.GetColumn(), Year = rhs.GetYear(), Name = rhs.GetName(); return *this;}-- ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OMPI seg fault by a class with weird address.
Hi, Because the code is very long, I just show the calling relationship of functions. main(){scheduler(); }scheduler(){ ImportIndices();} ImportIndices(){Index IdxNode ; IdxNode = ReadFile("fileName");} Index ReadFile(const char* fileinput) { Index TempIndex;. } vector Index::GetPosition() const { return Position; }vector Index::GetColumn() const { return Column; }vector Index::GetYear() const { return Year; }vector Index::GetName() const { return Name; }int Index::GetPosition(const int idx) const { return Position[idx]; }int Index::GetColumn(const int idx) const { return Column[idx]; }int Index::GetYear(const int idx) const { return Year[idx]; }string Index::GetName(const int idx) const { return Name[idx]; }int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb:--Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0)at src/nsga2/myNetplanScheduler.cpp:109109 ImportIndices();(gdb) cContinuing. Breakpoint 2, ImportIndices () at src/index.cpp:120120 IdxNode = ReadFile("prepdata/idx_node.csv");(gdb) cContinuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:8686 Index TempIndex;(gdb) cContinuing. Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. Program received signal SIGSEGV, Segmentation fault.0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 ---the backtrace output from the above parallel OpenMPI code: (gdb) bt#0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#2 0x003f7c8bd1dd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6#3 0x004646a7 in __gnu_cxx::new_allocator::allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88#4 0x004646cf in std::_Vector_base >::_M_allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:127#5 0x00464701 in std::_Vector_base >::_Vector_base (this=0x7fffcb80, __n=0, __a=...)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:113#6 0x00464d0b in std::vector >::vector ( this=0x7fffcb80, __n=0, __value=@0x7fffc968, __a=...)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:216#7 0x004890d7 in Index::Index (this=0x7fffcb80)---Type to continue, or q to quit---at src/index.cpp:20#8 0x0048927a in ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:86#9 0x00489533 in ImportIndices () at src/index.cpp:120#10 0x00445e0e in myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {..
Re: [OMPI users] OMPI seg fault by a class with weird address.
Hi Jack, I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without-memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect. With best regards, -Belaid. From: dtustud...@hotmail.com To: belaid_...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 00:30:19 -0600 Hi, Because the code is very long, I just show the calling relationship of functions. main(){scheduler(); }scheduler(){ ImportIndices();} ImportIndices(){Index IdxNode ; IdxNode = ReadFile("fileName");} Index ReadFile(const char* fileinput) { Index TempIndex;. } vector Index::GetPosition() const { return Position; }vector Index::GetColumn() const { return Column; }vector Index::GetYear() const { return Year; }vector Index::GetName() const { return Name; }int Index::GetPosition(const int idx) const { return Position[idx]; }int Index::GetColumn(const int idx) const { return Column[idx]; }int Index::GetYear(const int idx) const { return Year[idx]; }string Index::GetName(const int idx) const { return Name[idx]; }int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb:--Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0)at src/nsga2/myNetplanScheduler.cpp:109109 ImportIndices();(gdb) cContinuing. Breakpoint 2, ImportIndices () at src/index.cpp:120120 IdxNode = ReadFile("prepdata/idx_node.csv");(gdb) cContinuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:8686 Index TempIndex;(gdb) cContinuing. Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. Program received signal SIGSEGV, Segmentation fault.0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 ---the backtrace output from the above parallel OpenMPI code: (gdb) bt#0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#2 0x003f7c8bd1dd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6#3 0x004646a7 in __gnu_cxx::new_allocator::allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88#4 0x004646cf in std::_Vector_base >::_M_allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:127#5 0x00464701 in std::_Vector_base >::_Vector_base (this=0x7fffcb80, __n=0, __a=...)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:113#6 0x00464d0b in std::vector >::vector ( this=0x7fffcb80, __n=0, __value=@0x7fffc968, __a=...)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:216#7 0x004890d7 in Index::Index (this=0x7fffcb80)---Type to continue, or q to quit---at src/index.cpp:20#8 0x0048927a in ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:86#9 0x00489533 in ImportIndices () at src/index.cpp:120#10 0x00445e0e in myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector >
Re: [OMPI users] OpenMPI without IPoIB
On Monday, March 14, 2011 09:37:54 pm Bernardo F Costa wrote: > Ok. Native ibverbs/openib is preferable although cannot be used by all > applications (those who do not have a native ip interface). Applications (in this context at least) uses the MPI interface. MPI in general and OpenMPI in perticular can and should run on top of verbs(btl:openib) or psm(mtl:psm) (Mellanox or Qlogic repectively). /Peter signature.asc Description: This is a digitally signed message part.
Re: [OMPI users] OMPI seg fault by a class with weird address.
You may also want to run your program through a memory-checking debugger such as valgrind to see if it turns up any other problems. AFIK, ptmalloc should be fine for use with STL vector allocation. On Mar 15, 2011, at 4:00 AM, Belaid MOA wrote: > Hi Jack, > I may need to see the whole code to decide but my quick look suggest that > ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the > openMPI internal malloc library. Could you try to build openMPI without > memory management (using --without-memory-manager) and let us know the > outcome. ptmalloc is not needed if you are not using an RDMA interconnect. > > With best regards, > -Belaid. > > From: dtustud...@hotmail.com > To: belaid_...@hotmail.com; us...@open-mpi.org > Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. > Date: Tue, 15 Mar 2011 00:30:19 -0600 > > Hi, > > Because the code is very long, I just show the calling relationship of > functions. > > main() > { > scheduler(); > > } > scheduler() > { > ImportIndices(); > } > > ImportIndices() > { > Index IdxNode ; > IdxNode = ReadFile("fileName"); > } > > Index ReadFile(const char* fileinput) > { > Index TempIndex; > . > > } > > vector Index::GetPosition() const { return Position; } > vector Index::GetColumn() const { return Column; } > vector Index::GetYear() const { return Year; } > vector Index::GetName() const { return Name; } > int Index::GetPosition(const int idx) const { return Position[idx]; } > int Index::GetColumn(const int idx) const { return Column[idx]; } > int Index::GetYear(const int idx) const { return Year[idx]; } > string Index::GetName(const int idx) const { return Name[idx]; } > int Index::GetSize() const { return Position.size(); } > > The sequential code works well, and there is no scheduler(). > > The parallel code output from gdb: > -- > Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, > int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, > std::vector >, > std::allocator > > > &, > std::vector >, > std::allocator > > > &, > std::vector > &, int, > std::vector >, > std::allocator > > > &, > MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, > popSize=, nodeSize=, > myRank=, myChildpop=0x1208d80, genCandTag=65 'A', > generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = > {...}, > message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, > myT2Flag=@0x7fffd688, > resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, > resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, > xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, > resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, > xdata_to_workers_type=0x121c410, myGenerationNum=1, > Mpara_to_workers_type=0x121b9b0, nconNum=0) > at src/nsga2/myNetplanScheduler.cpp:109 > 109 ImportIndices(); > (gdb) c > Continuing. > > Breakpoint 2, ImportIndices () at src/index.cpp:120 > 120 IdxNode = ReadFile("prepdata/idx_node.csv"); > (gdb) c > Continuing. > > Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv") > at src/index.cpp:86 > 86 Index TempIndex; > (gdb) c > Continuing. > > Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:20 > 20 Name(0) {} > (gdb) c > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () >from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 > > --- > the backtrace output from the above parallel OpenMPI code: > > (gdb) bt > #0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () >from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 > #1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () >from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 > #2 0x003f7c8bd1dd in operator new(unsigned long) () >from /usr/lib64/libstdc++.so.6 > #3 0x004646a7 in __gnu_cxx::new_allocator::allocate ( > this=0x7fffcb80, __n=0) > at > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88 > #4 0x004646cf in std::_Vector_base > >::_M_allocate (this=0x7fffcb80, __n=0) > at > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:127 > #5 0x00464701 in std::_Vector_base > >::_Vector_base (this=0x7fffcb80, __n=0, __a=...) > at > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:113 > #6 0x00464d0b in std::vector >::vector ( > this=0x7fffcb80, __n=0, __value=@0x7fffc968, __a=...) > at > /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:216 > #7 0x
[OMPI users] PGI 10.9 build failures
I am building OFED-1.5.3 on CentOS 5.5 (OFED-1.5.2 build fine) and succeeds except for openmpi_pgi (OpenMPI 1.4.3). For version 10.9 of the pgi compilers I get: configure: WARNING: Your compiler does not support offsetof macro configure: error: Configure: Cannot continue error: Bad exit status from /var/tmp/rpm-tmp.66872 (%build) I tried two fixes I found in the web for this offsetof error (about a year old), but they both failed in the same way. We are licensed up to 11.1 for pgi, but both mvapich2 and openmpi fail for it. Looks like that bug is fixed in 11.2. Any idea what is wrong with openmpi and pgi 10.9? thanks, Ben
Re: [OMPI users] PGI 10.9 build failures
I'm afraid that this is a bug in the PGI compiler -- Open MPI uses the offsetof() macro in several places throughout its code base. This is why we put in the configure test that tells you that your compiler does not support it -- we got a lot of reports of this issue during the build phase of Open MPI, so we decided to put in a specific configure test that would tell you if your compiler was buggy. Sorry. :-( On Mar 15, 2011, at 7:09 AM, Ben Miller wrote: > I am building OFED-1.5.3 on CentOS 5.5 (OFED-1.5.2 build fine) and succeeds > except for openmpi_pgi (OpenMPI 1.4.3). For version 10.9 of the pgi > compilers I get: > > configure: WARNING: Your compiler does not support offsetof macro > configure: error: Configure: Cannot continue > error: Bad exit status from /var/tmp/rpm-tmp.66872 (%build) > > I tried two fixes I found in the web for this offsetof error (about a year > old), but they both failed in the same way. > > We are licensed up to 11.1 for pgi, but both mvapich2 and openmpi fail for it. > Looks like that bug is fixed in 11.2. > > Any idea what is wrong with openmpi and pgi 10.9? > > thanks, > Ben > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] OMPI seg fault by a class with weird address.
Thanks, I do not have system administrator authorization. I am afraid that I cannot rebuild OpenMPI --without-memory-manager. Are there other ways to get around it ? For example, use other things to replace "ptmalloc" ? Any help is really appreciated. thanks From: belaid_...@hotmail.com To: dtustud...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 08:00:56 + Hi Jack, I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without-memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect. With best regards, -Belaid. From: dtustud...@hotmail.com To: belaid_...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 00:30:19 -0600 Hi, Because the code is very long, I just show the calling relationship of functions. main(){scheduler(); }scheduler(){ ImportIndices();} ImportIndices(){Index IdxNode ; IdxNode = ReadFile("fileName");} Index ReadFile(const char* fileinput) { Index TempIndex;. } vector Index::GetPosition() const { return Position; }vector Index::GetColumn() const { return Column; }vector Index::GetYear() const { return Year; }vector Index::GetName() const { return Name; }int Index::GetPosition(const int idx) const { return Position[idx]; }int Index::GetColumn(const int idx) const { return Column[idx]; }int Index::GetYear(const int idx) const { return Year[idx]; }string Index::GetName(const int idx) const { return Name[idx]; }int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb:--Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0)at src/nsga2/myNetplanScheduler.cpp:109109 ImportIndices();(gdb) cContinuing. Breakpoint 2, ImportIndices () at src/index.cpp:120120 IdxNode = ReadFile("prepdata/idx_node.csv");(gdb) cContinuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:8686 Index TempIndex;(gdb) cContinuing. Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. Program received signal SIGSEGV, Segmentation fault.0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 ---the backtrace output from the above parallel OpenMPI code: (gdb) bt#0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#2 0x003f7c8bd1dd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6#3 0x004646a7 in __gnu_cxx::new_allocator::allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88#4 0x004646cf in std::_Vector_base >::_M_allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:127#5 0x00464701 in std::_Vector_base >::_Vector_base (this=0x7fffcb80, __n=0, __a=...)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:113#6 0x00464d0b in std::vector >::vector ( this=0x7fffcb80, __n=0, __value=@0x7fffc968, __a=...)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:216#7 0x004890d7 in Index::Index (this
Re: [OMPI users] OMPI seg fault by a class with weird address.
Thanks,From http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap I find that "Currently the wrappers are only buildable with mpiccs which are based on GNU GCC or Intel's C++ Compiler." The cluster which I am working on is using GNU Open MPI mpic++. i am afraid that the Valgrind wrapper can work here. I do not have system administrator authorization. Are there other mem-checkers (open source) that can do this ? thanks Jack > Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. > From: jsquy...@cisco.com > Date: Tue, 15 Mar 2011 06:19:53 -0400 > CC: dtustud...@hotmail.com > To: us...@open-mpi.org > > You may also want to run your program through a memory-checking debugger such > as valgrind to see if it turns up any other problems. > > AFIK, ptmalloc should be fine for use with STL vector allocation. > > > On Mar 15, 2011, at 4:00 AM, Belaid MOA wrote: > > > Hi Jack, > > I may need to see the whole code to decide but my quick look suggest that > > ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the > > openMPI internal malloc library. Could you try to build openMPI without > > memory management (using --without-memory-manager) and let us know the > > outcome. ptmalloc is not needed if you are not using an RDMA interconnect. > > > > With best regards, > > -Belaid. > > > > From: dtustud...@hotmail.com > > To: belaid_...@hotmail.com; us...@open-mpi.org > > Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. > > Date: Tue, 15 Mar 2011 00:30:19 -0600 > > > > Hi, > > > > Because the code is very long, I just show the calling relationship of > > functions. > > > > main() > > { > > scheduler(); > > > > } > > scheduler() > > { > > ImportIndices(); > > } > > > > ImportIndices() > > { > > Index IdxNode ; > > IdxNode = ReadFile("fileName"); > > } > > > > Index ReadFile(const char* fileinput) > > { > > Index TempIndex; > > . > > > > } > > > > vector Index::GetPosition() const { return Position; } > > vector Index::GetColumn() const { return Column; } > > vector Index::GetYear() const { return Year; } > > vector Index::GetName() const { return Name; } > > int Index::GetPosition(const int idx) const { return Position[idx]; } > > int Index::GetColumn(const int idx) const { return Column[idx]; } > > int Index::GetYear(const int idx) const { return Year[idx]; } > > string Index::GetName(const int idx) const { return Name[idx]; } > > int Index::GetSize() const { return Position.size(); } > > > > The sequential code works well, and there is no scheduler(). > > > > The parallel code output from gdb: > > -- > > Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, > > int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, > > std::vector >, > > std::allocator > > > &, > > std::vector >, > > std::allocator > > > &, > > std::vector > &, int, > > std::vector >, > > std::allocator > > > &, > > MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, > > popSize=, nodeSize=, > > myRank=, myChildpop=0x1208d80, genCandTag=65 'A', > > generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = > > {...}, > > message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, > > myT2Flag=@0x7fffd688, > > resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, > > resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, > > xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, > > resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, > > xdata_to_workers_type=0x121c410, myGenerationNum=1, > > Mpara_to_workers_type=0x121b9b0, nconNum=0) > > at src/nsga2/myNetplanScheduler.cpp:109 > > 109 ImportIndices(); > > (gdb) c > > Continuing. > > > > Breakpoint 2, ImportIndices () at src/index.cpp:120 > > 120 IdxNode = ReadFile("prepdata/idx_node.csv"); > > (gdb) c > > Continuing. > > > > Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv") > > at src/index.cpp:86 > > 86 Index TempIndex; > > (gdb) c > > Continuing. > > > > Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:20 > > 20 Name(0) {} > > (gdb) c > > Continuing. > > > > Program received signal SIGSEGV, Segmentation fault. > > 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () > >from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 > > > > --- > > the backtrace output from the above parallel OpenMPI code: > > > > (gdb) bt > > #0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () > >from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 > > #1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () > >from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 > > #2 0x003f7c8bd1dd in operator new(unsigned long) () >
Re: [OMPI users] OMPI seg fault by a class with weird address.
I -think- setting OMPI_MCA_memory_ptmalloc2_disable to 1 will turn off OMPI's memory wrappers without having to rebuild. Someone please correct me if I'm wrong :-). For example (bash-like shell): export OMPI_MCA_memory_ptmalloc2_disable=1 Hope that helps, -- Samuel K. Gutierrez Los Alamos National Laboratory On Mar 15, 2011, at 9:19 AM, Jack Bryan wrote: Thanks, I do not have system administrator authorization. I am afraid that I cannot rebuild OpenMPI --without-memory-manager. Are there other ways to get around it ? For example, use other things to replace "ptmalloc" ? Any help is really appreciated. thanks From: belaid_...@hotmail.com To: dtustud...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. Date: Tue, 15 Mar 2011 08:00:56 + Hi Jack, I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without- memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect. With best regards, -Belaid. From: dtustud...@hotmail.com To: belaid_...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. Date: Tue, 15 Mar 2011 00:30:19 -0600 Hi, Because the code is very long, I just show the calling relationship of functions. main() { scheduler(); } scheduler() { ImportIndices(); } ImportIndices() { Index IdxNode ; IdxNode = ReadFile("fileName"); } Index ReadFile(const char* fileinput) { Index TempIndex; . } vector Index::GetPosition() const { return Position; } vector Index::GetColumn() const { return Column; } vector Index::GetYear() const { return Year; } vector Index::GetName() const { return Name; } int Index::GetPosition(const int idx) const { return Position[idx]; } int Index::GetColumn(const int idx) const { return Column[idx]; } int Index::GetYear(const int idx) const { return Year[idx]; } string Index::GetName(const int idx) const { return Name[idx]; } int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb: -- Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0) at src/nsga2/myNetplanScheduler.cpp:109 109 ImportIndices(); (gdb) c Continuing. Breakpoint 2, ImportIndices () at src/index.cpp:120 120 IdxNode = ReadFile("prepdata/idx_node.csv"); (gdb) c Continuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv") at src/index.cpp:86 86 Index TempIndex; (gdb) c Continuing. Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:20 20 Name(0) {} (gdb) c Continuing. Program received signal SIGSEGV, Segmentation fault. 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 --- the backtrace output from the above parallel OpenMPI code: (gdb) bt #0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 #1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 #2 0x003f7c8bd1dd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6 #3 0x004646a7 in __gnu_cxx::new_allocator::allocate ( this=0x7fffcb80, __n=0) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c+ +/4.1.2/ext/new_allocator.h:88 #4 0x004646cf in std::_Vector_base >::_M_allocate (this=0x7fffcb80, __n=0) at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c+ +/4.1.2/bits/stl_vector.h:127 #5 0x00464701 in std::_Vector_base >::_Vector_base (this=0x7fffcb80, __n=0, __
Re: [OMPI users] OMPI seg fault by a class with weird address.
I have tried export OMPI_MCA_memory_ptmalloc2_disable=1 It does not work. The same error. thanks From: sam...@lanl.gov To: us...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 09:27:35 -0600 Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. I -think- setting OMPI_MCA_memory_ptmalloc2_disable to 1 will turn off OMPI's memory wrappers without having to rebuild. Someone please correct me if I'm wrong :-). For example (bash-like shell): export OMPI_MCA_memory_ptmalloc2_disable=1 Hope that helps, --Samuel K. GutierrezLos Alamos National Laboratory On Mar 15, 2011, at 9:19 AM, Jack Bryan wrote:Thanks, I do not have system administrator authorization. I am afraid that I cannot rebuild OpenMPI --without-memory-manager. Are there other ways to get around it ? For example, use other things to replace "ptmalloc" ? Any help is really appreciated. thanks From: belaid_...@hotmail.com To: dtustud...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 08:00:56 + Hi Jack, I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without-memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect. With best regards, -Belaid. From: dtustud...@hotmail.com To: belaid_...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 00:30:19 -0600 Hi, Because the code is very long, I just show the calling relationship of functions. main(){scheduler(); }scheduler(){ ImportIndices();} ImportIndices(){Index IdxNode ; IdxNode = ReadFile("fileName");} Index ReadFile(const char* fileinput) { Index TempIndex;. } vector Index::GetPosition() const { return Position; }vector Index::GetColumn() const { return Column; }vector Index::GetYear() const { return Year; }vector Index::GetName() const { return Name; }int Index::GetPosition(const int idx) const { return Position[idx]; }int Index::GetColumn(const int idx) const { return Column[idx]; }int Index::GetYear(const int idx) const { return Year[idx]; }string Index::GetName(const int idx) const { return Name[idx]; }int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb:--Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0)at src/nsga2/myNetplanScheduler.cpp:109109 ImportIndices();(gdb) cContinuing. Breakpoint 2, ImportIndices () at src/index.cpp:120120 IdxNode = ReadFile("prepdata/idx_node.csv");(gdb) cContinuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:8686 Index TempIndex;(gdb) cContinuing. Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. Program received signal SIGSEGV, Segmentation fault.0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0 ---the backtrace output from the above parallel OpenMPI code: (gdb) bt#0 0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#1 0x2b3b2bd3 in opal_memory_ptmalloc2_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0#2 0x003f7c8bd1dd in operator new(unsigned long) () from /usr/lib64/libstdc++.so.6#3 0x004646a7 in __gnu_cxx::new_allocator::allocate (this=0x7fffcb80, __n=0)at /usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:88#4
Re: [OMPI users] OMPI seg fault by a class with weird address.
This should be the configure info about Open MPI which I am using. -bash-3.2$ mpic++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --disable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-50) thanks From: sam...@lanl.gov To: us...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 09:27:35 -0600 Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. I -think- setting OMPI_MCA_memory_ptmalloc2_disable to 1 will turn off OMPI's memory wrappers without having to rebuild. Someone please correct me if I'm wrong :-). For example (bash-like shell): export OMPI_MCA_memory_ptmalloc2_disable=1 Hope that helps, --Samuel K. GutierrezLos Alamos National Laboratory On Mar 15, 2011, at 9:19 AM, Jack Bryan wrote:Thanks, I do not have system administrator authorization. I am afraid that I cannot rebuild OpenMPI --without-memory-manager. Are there other ways to get around it ? For example, use other things to replace "ptmalloc" ? Any help is really appreciated. thanks From: belaid_...@hotmail.com To: dtustud...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 08:00:56 + Hi Jack, I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without-memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect. With best regards, -Belaid. From: dtustud...@hotmail.com To: belaid_...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 00:30:19 -0600 Hi, Because the code is very long, I just show the calling relationship of functions. main(){scheduler(); }scheduler(){ ImportIndices();} ImportIndices(){Index IdxNode ; IdxNode = ReadFile("fileName");} Index ReadFile(const char* fileinput) { Index TempIndex;. } vector Index::GetPosition() const { return Position; }vector Index::GetColumn() const { return Column; }vector Index::GetYear() const { return Year; }vector Index::GetName() const { return Name; }int Index::GetPosition(const int idx) const { return Position[idx]; }int Index::GetColumn(const int idx) const { return Column[idx]; }int Index::GetYear(const int idx) const { return Year[idx]; }string Index::GetName(const int idx) const { return Name[idx]; }int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb:--Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0)at src/nsga2/myNetplanScheduler.cpp:109109 ImportIndices();(gdb) cContinuing. Breakpoint 2, ImportIndices () at src/index.cpp:120120 IdxNode = ReadFile("prepdata/idx_node.csv");(gdb) cContinuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/index.cpp:8686 Index TempIndex;(gdb) cContinuing. Breakpoint 5, Index::Index (this=0x7fffcb80) at src/index.cpp:2020 Name(0) {}(gdb) cContinuing. Program received signal SIGSEGV, Segmentation fault.0x2b3b0b81 in opal_memory_ptmalloc2_int_malloc () from /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0
Re: [OMPI users] OMPI seg fault by a class with weird address.
Hi, I think it is time to see the actual code:) Would it be possible to send us a part of the code that we can run and test with? With best regards, -Belaid. From: dtustud...@hotmail.com To: us...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 09:44:35 -0600 Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. This should be the configure info about Open MPI which I am using. -bash-3.2$ mpic++ -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --disable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux Thread model: posix gcc version 4.1.2 20080704 (Red Hat 4.1.2-50) thanks From: sam...@lanl.gov To: us...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 09:27:35 -0600 Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. I -think- setting OMPI_MCA_memory_ptmalloc2_disable to 1 will turn off OMPI's memory wrappers without having to rebuild. Someone please correct me if I'm wrong :-). For example (bash-like shell): export OMPI_MCA_memory_ptmalloc2_disable=1 Hope that helps, --Samuel K. GutierrezLos Alamos National Laboratory On Mar 15, 2011, at 9:19 AM, Jack Bryan wrote:Thanks, I do not have system administrator authorization. I am afraid that I cannot rebuild OpenMPI --without-memory-manager. Are there other ways to get around it ? For example, use other things to replace "ptmalloc" ? Any help is really appreciated. thanks From: belaid_...@hotmail.com To: dtustud...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 08:00:56 + Hi Jack, I may need to see the whole code to decide but my quick look suggest that ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the openMPI internal malloc library. Could you try to build openMPI without memory management (using --without-memory-manager) and let us know the outcome. ptmalloc is not needed if you are not using an RDMA interconnect. With best regards, -Belaid. From: dtustud...@hotmail.com To: belaid_...@hotmail.com; us...@open-mpi.org Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. List-Post: users@lists.open-mpi.org Date: Tue, 15 Mar 2011 00:30:19 -0600 Hi, Because the code is very long, I just show the calling relationship of functions. main(){scheduler(); }scheduler(){ ImportIndices();} ImportIndices(){Index IdxNode ; IdxNode = ReadFile("fileName");} Index ReadFile(const char* fileinput) { Index TempIndex;. } vector Index::GetPosition() const { return Position; }vector Index::GetColumn() const { return Column; }vector Index::GetYear() const { return Year; }vector Index::GetName() const { return Name; }int Index::GetPosition(const int idx) const { return Position[idx]; }int Index::GetColumn(const int idx) const { return Column[idx]; }int Index::GetYear(const int idx) const { return Year[idx]; }string Index::GetName(const int idx) const { return Name[idx]; }int Index::GetSize() const { return Position.size(); } The sequential code works well, and there is no scheduler(). The parallel code output from gdb:--Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, std::vector >, std::allocator > > > &, std::vector >, std::allocator > > > &, std::vector > &, int, std::vector >, std::allocator > > > &, MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, popSize=, nodeSize=, myRank=, myChildpop=0x1208d80, genCandTag=65 'A', generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = {...}, message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, myT2Flag=@0x7fffd688, resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, xdata_to_workers_type=0x121c410, myGenerationNum=1, Mpara_to_workers_type=0x121b9b0, nconNum=0)at src/nsga2/myNetplanScheduler.cpp:109109 ImportIndices();(gdb) cContinuing. Breakpoint 2, ImportIndices () at src/index.cpp:120120 IdxNode = ReadFile("prepdata/idx_node.csv");(gdb) cContinuing. Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv")at src/
Re: [OMPI users] OMPI seg fault by a class with weird address.
You can: mpirun -np 4 valgrind ./my_application That is, you run 4 copies of valgrind, each with one instance of ./my_application. Then you'll get valgrind reports for your applications. You might want to dig into the valgrind command line options to have it dump the results to files with unique prefixes (e.g., PID and/or hostname) so that you can get a unique report from each process. If you disabled ptmalloc and you're still getting the same error, then it sounds like an application error. Check out and see what valgrind tells you. On Mar 15, 2011, at 11:25 AM, Jack Bryan wrote: > Thanks, > > From http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mpiwrap > > I find that > > "Currently the wrappers are only buildable with mpiccs which are based on GNU > GCC or Intel's C++ Compiler." > > The cluster which I am working on is using GNU Open MPI mpic++. i am afraid > that the Valgrind wrapper can work here. > > I do not have system administrator authorization. > > Are there other mem-checkers (open source) that can do this ? > > thanks > > Jack > > > Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. > > From: jsquy...@cisco.com > > Date: Tue, 15 Mar 2011 06:19:53 -0400 > > CC: dtustud...@hotmail.com > > To: us...@open-mpi.org > > > > You may also want to run your program through a memory-checking debugger > > such as valgrind to see if it turns up any other problems. > > > > AFIK, ptmalloc should be fine for use with STL vector allocation. > > > > > > On Mar 15, 2011, at 4:00 AM, Belaid MOA wrote: > > > > > Hi Jack, > > > I may need to see the whole code to decide but my quick look suggest that > > > ptmalloc is causing a problem with STL-vector allocation. ptmalloc is the > > > openMPI internal malloc library. Could you try to build openMPI without > > > memory management (using --without-memory-manager) and let us know the > > > outcome. ptmalloc is not needed if you are not using an RDMA interconnect. > > > > > > With best regards, > > > -Belaid. > > > > > > From: dtustud...@hotmail.com > > > To: belaid_...@hotmail.com; us...@open-mpi.org > > > Subject: RE: [OMPI users] OMPI seg fault by a class with weird address. > > > Date: Tue, 15 Mar 2011 00:30:19 -0600 > > > > > > Hi, > > > > > > Because the code is very long, I just show the calling relationship of > > > functions. > > > > > > main() > > > { > > > scheduler(); > > > > > > } > > > scheduler() > > > { > > > ImportIndices(); > > > } > > > > > > ImportIndices() > > > { > > > Index IdxNode ; > > > IdxNode = ReadFile("fileName"); > > > } > > > > > > Index ReadFile(const char* fileinput) > > > { > > > Index TempIndex; > > > . > > > > > > } > > > > > > vector Index::GetPosition() const { return Position; } > > > vector Index::GetColumn() const { return Column; } > > > vector Index::GetYear() const { return Year; } > > > vector Index::GetName() const { return Name; } > > > int Index::GetPosition(const int idx) const { return Position[idx]; } > > > int Index::GetColumn(const int idx) const { return Column[idx]; } > > > int Index::GetYear(const int idx) const { return Year[idx]; } > > > string Index::GetName(const int idx) const { return Name[idx]; } > > > int Index::GetSize() const { return Position.size(); } > > > > > > The sequential code works well, and there is no scheduler(). > > > > > > The parallel code output from gdb: > > > -- > > > Breakpoint 1, myNeplanTaskScheduler(CNSGA2 *, int, int, int, ._85 *, > > > char, int, message_para_to_workers_VecT &, MPI_Datatype, int &, int &, > > > std::vector >, > > > std::allocator > > > &, > > > std::vector >, > > > std::allocator > > > &, > > > std::vector > &, int, > > > std::vector >, > > > std::allocator > > > &, > > > MPI_Datatype, int, MPI_Datatype, int) (nsga2=0x118c490, > > > popSize=, nodeSize=, > > > myRank=, myChildpop=0x1208d80, genCandTag=65 'A', > > > generationNum=1, myPopParaVec=std::vector of length 4, capacity 4 = > > > {...}, > > > message_to_master_type=0x7fffd540, myT1Flag=@0x7fffd68c, > > > myT2Flag=@0x7fffd688, > > > resultTaskPackageT1=std::vector of length 4, capacity 4 = {...}, > > > resultTaskPackageT2Pr=std::vector of length 4, capacity 4 = {...}, > > > xdataV=std::vector of length 4, capacity 4 = {...}, objSize=7, > > > resultTaskPackageT12=std::vector of length 4, capacity 4 = {...}, > > > xdata_to_workers_type=0x121c410, myGenerationNum=1, > > > Mpara_to_workers_type=0x121b9b0, nconNum=0) > > > at src/nsga2/myNetplanScheduler.cpp:109 > > > 109 ImportIndices(); > > > (gdb) c > > > Continuing. > > > > > > Breakpoint 2, ImportIndices () at src/index.cpp:120 > > > 120 IdxNode = ReadFile("prepdata/idx_node.csv"); > > > (gdb) c > > > Continuing. > > > > > > Breakpoint 4, ReadFile (fileinput=0xd8663d "prepdata/idx_node.csv") > > > at src/index.cpp:86 > > > 86 Index TempIndex; > > > (gdb) c > > > Continuing. > > > >
Re: [OMPI users] OpenMPI without IPoIB
On Mar 14, 2011, at 4:37 PM, Bernardo F Costa wrote: > I've tried ibdiagnet and other ofed tools. I also tried to debug the > network environment with simple jobs to measure bandwidth and latency. > In most cases, I've seen high peaks of measures who come and go > without any reason I could catch for now. I believe I should check the > network configuration and make some tests on it. Does anybody here > know some reference about configuring inifiband without ipoib and/or > the issues raised when doing this ? If possible, I'd like to see ways > of testing the configuration, or know about options that could > increase fault tolerance. I know this is some kind of basic, but I am > not a well experienced user on infiniband. You might want to ping your IB vendor and ask for some guidance. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] OpenMPI without IPoIB
I would recommend you to read OFED (or Mellanox OFED) documentation. It will be good start point. Regards, Pavel (Pasha) Shamis --- Application Performance Tools Group Computer Science and Math Division Oak Ridge National Laboratory On Mar 14, 2011, at 4:37 PM, Bernardo F Costa wrote: > Ok. Native ibverbs/openib is preferable although cannot be used by all > applications (those who do not have a native ip interface). I suppose > that if I configure my network nodes to use ipoib (by simply probing > ib_ipoib module) I'd still be able to use native ibverbs interface > without any delay caused by ipoib on it. And by this way other > applications which aren't able to use native ibverbs could use the > infiniband network as well. It should be the reason why some people > use ipoib I believe, just to offer infiniband to all network > applications. The main reason I've asked this question is that I have > seen lots of references in the net on how to configure an infiniband > network with ipoib, but I was not able to see much references on doing > the same without ipoib. This made me believe configuring infiniband > with ipoib could be a popular option. > I've tried ibdiagnet and other ofed tools. I also tried to debug the > network environment with simple jobs to measure bandwidth and latency. > In most cases, I've seen high peaks of measures who come and go > without any reason I could catch for now. I believe I should check the > network configuration and make some tests on it. Does anybody here > know some reference about configuring inifiband without ipoib and/or > the issues raised when doing this ? If possible, I'd like to see ways > of testing the configuration, or know about options that could > increase fault tolerance. I know this is some kind of basic, but I am > not a well experienced user on infiniband. > > 2011/3/14 : >> >> Please see my comment below. >> >> Pavel (Pasha) Shamis >> --- >> Application Performance Tools Group >> Computer Science and Math Division >> Oak Ridge National Laboratory >> >> >> On Mar 11, 2011, at 2:47 PM, Bernardo F Costa wrote: >> >>> I have found this thread form two years ago. I am some kind of lost on >>> configuring an infiniband cluster for openmpi. What is best: use iboip >>> or use native infiniband ibverbs interface ? For now I am using native >> >> Native openib/verbs interface will work much faster (up X10) then ipoib. >> Ipoib was designed for application that does not have native ip interface/ >> >>> infiniband withou ipoib. But I have lots of problems specially with >>> latency in the cluster. >> >> If you see latency problems over native interface (verbs), then you >> apparently will >> face the same problem with any other application over verbs, including ipoib. >> >> So using ipoib instead of verbs definitely not a work around for you. >> >> I would suggest you to run IB network debug tools , like ibdiagnet , in >> order to analyze >> your network/latency problems. >> >> Regards, >> Pasha > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] OMPI seg fault by a class with weird address.
Hi, I have installed a new open MPI 1.3.4. But I got more weird errors: *** glibc detected *** /lustre/nsga2b: malloc(): memory corruption (fast): 0x1cafc450 ***=== Backtrace: =/lib64/libc.so.6[0x3c50272aeb]/lib64/libc.so.6(__libc_malloc+0x7a)[0x3c5027402a]/usr/lib64/libstdc++.so.6(_Znwm+0x1d)[0x3c590bd17d]/lustre/jxding/netplan49/nsga2b[0x445bc6]/lustre/jxding/netplan49/nsga2b[0x44f43b]/lib64/libc.so.6(__libc_start_main+0xf4)[0x3c5021d974]/lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499)[0x443909]=== Memory map: 0040-00f33000 r-xp 6ac:e3210 685016360 /lustre/netplan49/nsga2b01132000-0117e000 rwxp 00b32000 6ac:e3210 685016360 /lustre/netplan49/nsga2b0117e000-01188000 rwxp 0117e000 00:00 01ca11000-1ca78000 rwxp 1ca11000 00:00 01ca78000-1ca79000 rwxp 1ca78000 00:00 01ca79000-1ca7a000 rwxp 1ca79000 00:00 01ca7a000-1cab8000 rwxp 1ca7a000 00:00 01cab8000-1cac7000 rwxp 1cab8000 00:00 01cac7000-1cacf000 rwxp 1cac7000 00:00 01cacf000-1cad rwxp 1cacf000 00:00 01cad-1cad1000 rwxp 1cad 00:00 01cad1000-1cad2000 rwxp 1cad1000 00:00 01cad2000-1cada000 rwxp 1cad2000 00:00 01cada000-1cadc000 rwxp 1cada000 00:00 01cadc000-1cae rwxp 1cadc000 00:00 0 .51260-3512605000 r-xp 00:11 12043 /usr/lib64/librdmacm.so.13512605000-3512804000 ---p 5000 00:11 12043 /usr/lib64/librdmacm.so.13512804000-3512805000 rwxp 4000 00:11 12043 /usr/lib64/librdmacm.so.13512e0-3512e0c000 r-xp 00:11 5545 /usr/lib64/libibverbs.so.13512e0c000-351300b000 ---p c000 00:11 5545 /usr/lib64/libibverbs.so.1351300b000-351300c000 rwxp b000 00:11 5545 /usr/lib64/libibverbs.so.13c4f20-3c4f21c000 r-xp 00:11 2853 /lib64/ld-2.5.so3c4f41b000-3c4f41c000 r-xp 0001b000 00:11 2853 /lib64/ld-2.5.so3c4f41c000-3c4f41d000 rwxp 0001c000 00:11 2853 /lib64/ld-2.5.so3c5020-3c5034c000 r-xp 00:11 897 /lib64/libc.so.63c5034c000-3c5054c000 ---p 0014c000 00:11 897 /lib64/libc.so.63c5054c000-3c5055 r-xp 0014c000 00:11 897 /lib64/libc.so.63c5055-3c50551000 rwxp 0015 00:11 897/lib64/libc.so.63c50551000-3c50556000 rwxp 3c50551000 00:00 03c5060-3c50682000 r-xp 00:11 2924 /lib64/libm.so.63c50682000-3c50881000 ---p 00082000 00:11 2924 /lib64/libm.so.63c50881000-3c50882000 r-xp 00081000 00:11 2924 /lib64/libm.so.63c50882000-3c50883000 rwxp 00082000 00:11 2924 /lib64/libm.so.63c50a0-3c50a02000 r-xp 00:11 923 /lib64/libdl.so.23c50a02000-3c50c02000 ---p 2000 00:11 923 /lib64/libdl.so.23c50c02000-3c50c03000 r-xp 2000 00:11 923/lib64/libdl.so.23c50c03000-3c50c04000 rwxp 3000 00:11 923 /lib64/libdl.so.23c50e0-3c50e16000 r-xp 00:11 1011 /lib64/libpthread.so.0 .2ae87b05e000-2ae87b075000 r-xp 6ac:e3210 686492235 /lustre/mpi_protocol_091117/openmpi134/lib/libmpi_cxx.so.0.0.02ae87b075000-2ae87b274000 ---p 00017000 6ac:e3210 686492235 /lustre/mpi_protocol_091117/openmpi134/lib/libmpi_cxx.so.0.0.02ae87b274000-2ae87b277000 rwxp 00016000 6ac:e3210 686492235 /lustre/mpi_protocol_091117/openmpi134/lib/libmpi_cxx.so.0.0.0 fff2fa38000-7fff2fa4e000 rwxp 7ffe9000 00:00 0 [stack]ff60-ffe0 ---p 00:00 0 [vdso][n332:82320] *** Process received signal ***[n332:82320] Signal: Aborted (6)[n332:82320] Signal code: (-6)[n332:82320] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0][n332:82320] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3c50230215][n332:82320] [ 2] /lib64/libc.so.6(abort+0x110) [0x3c50231cc0][n332:82320] [ 3] /lib64/libc.so.6 [0x3c5026a7fb][n332:82320] [ 4] /lib64/libc.so.6 [0x3c50272aeb][n332:82320] [ 5] /lib64/libc.so.6(__libc_malloc+0x7a) [0x3c5027402a][n332:82320] [ 6] /usr/lib64/libstdc++.so.6(_Znwm+0x1d) [0x3c590bd17d][n332:82320] [ 7] /lustre/jxding/netplan49/nsga2b [0x445bc6][n332:82320] [ 8] /lustre/jxding/netplan49/nsga2b [0x44f43b][n332:82320] [ 9] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974][n332:82320] [10] /lustre/nsga2b(__gxx_personality_v0+0x499) [0x443909][n332:82320] *** End of error message ***=>> PBS: job killed: walltime 117 exceeded limit 90mpirun: killing job... > Subject: Re: [OMPI users] OMPI seg fault by a class with weird address. > From: