[OMPI users] About valgrind and OpenMPI
Hi there, I am using valgrind to help analyse my MPI program. I used hdfs file system to read/write data. And if I run the code without valgrind, it works correctly. However, if I run with valgrind, for example, *mpirun -np 3 /usr/bin/valgrind --tool=callgrind ./myprogram /input_file /output_file* it returns with following information = *Exception in thread "main" java.lang.InternalError: processing event: 535548453at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:506) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1156) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1107) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1053) at org.apache.hadoop.conf.Configuration.get(Configuration.java:397)at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:594) at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:67) at org.apache.hadoop.net.NetUtils.makeSocketAddr(NetUtils.java:188)at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:168)at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:212) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:99) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:118)at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:116)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:116)Call to org.apache.hadoop.fs.Filesystem::get(URI, Configuration) failed!* = By debugging, I found that the exception happens in hdfsConnect(). But I don't how to fix it. Could anyone give me some advice, please? -- Best Regards. --- Xing FENG PhD Candidate Database Research Group School of Computer Science and Engineering University of New South Wales NSW 2052, Sydney Phone: (+61) 413 857 288
Re: [OMPI users] About valgrind and OpenMPI
HmmmI would guess you should talk to the Hadoop folks as the problem seems to be a conflict between valgrind and HDFS. Does valgrind even support Java programs? I honestly have never tried to do that before. On Oct 2, 2014, at 4:40 AM, XingFENG wrote: > Hi there, > > I am using valgrind to help analyse my MPI program. > > I used hdfs file system to read/write data. And if I run the code without > valgrind, it works correctly. However, if I run with valgrind, for example, > > mpirun -np 3 /usr/bin/valgrind --tool=callgrind ./myprogram /input_file > /output_file > > it returns with following information > > = > Exception in thread "main" java.lang.InternalError: processing event: > 535548453 > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:506) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243) > at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177) > at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1156) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1107) > at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1053) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:397) > at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:594) > at org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:67) > at org.apache.hadoop.net.NetUtils.makeSocketAddr(NetUtils.java:188) > at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:168) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:212) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:99) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263) > at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:118) > at org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:116) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:116) > Call to org.apache.hadoop.fs.Filesystem::get(URI, Configuration) failed! > > > = > > By debugging, I found that the exception happens in hdfsConnect(). But I > don't how to fix it. Could anyone give me some advice, please? > > -- > Best Regards. > --- > Xing FENG > PhD Candidate > Database Research Group > > School of Computer Science and Engineering > University of New South Wales > NSW 2052, Sydney > > Phone: (+61) 413 857 288 > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25425.php
[OMPI users] SENDRECV + MPI_TYPE_CREATE_STRUCT
Dear all. I have some problem with MPI_TYPE_CREATE_STRUCT and as a consequence with SENDRECV. I have this variable type *type particle* *integer :: ip* * real :: RP(2)* * real :: QQ(4)* *end type particle* When I compile in double precision with: *mpif90 -r8 -fpp -DPARALLEL *.f90 * So when I create my own variable type for MPI, I have *TYPES(1)=MPI_INTEGER !We have three variables type in the new varible* *TYPES(2)=MPI_DOUBLE_PRECISION !Integer and Real and Real* * TYPES(3)=MPI_DOUBLE_PRECISION !Integer and Real and Real* * nBLOCKS(1)=1 !number of element in each block * * nBLOCKS(2)=2* * nBLOCKS(3)=4* * !* * DISPLACEMENTS(1)=0* * DISPLACEMENTS(2)=sizeof(dummy%ip)* * DISPLACEMENTS(3)=sizeof(dummy%ip)+sizeof(dummy%RP(1))+sizeof(dummy%RP(2))* * ! * * CALL MPI_TYPE_CREATE_STRUCT(3,nBLOCKS,DISPLACEMENTS,TYPES,MPI_PARTICLE_TYPE, PI%ierr)* * CALL MPI_TYPE_COMMIT(MPI_PARTICLE_TYPE,MPI%ierr)* Am I right? Thanks, in advance, for any kind of help
Re: [OMPI users] General question about running single-node jobs.
Hi Ralph, I've been troubleshooting this issue and communicating with Blue Waters support. It turns out that Q-Chem and OpenMPI are both trying to open sockets, and I get different error messages depending on which one fails. As an aside, I don't know why Q-Chem needs sockets of its own to communicate between ranks; shouldn't OpenMPI be taking care of all that? (I'm unfamiliar with this part of the Q-Chem code base, maybe it's trying to duplicate some functionality?) The Blue Waters support has indicated that there's a problem with their realm-specific IP addressing (RSIP) for the compute nodes, which they're working on fixing. I also tried running the same Q-Chem / OpenMPI job on a management node which I think has the same hardware (but not the RSIP), and the problem went away. So I think I'll shelve this problem for now, until Blue Waters support gets back to me with the fix. :) Thanks, - Lee-Ping From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Lee-Ping Wang Sent: Tuesday, September 30, 2014 1:15 PM To: Open MPI Users Subject: Re: [OMPI users] General question about running single-node jobs. Hi Ralph, Thanks. I'll add some print statements to the code and try to figure out precisely where the failure is happening. - Lee-Ping On Sep 30, 2014, at 12:06 PM, Ralph Castain wrote: On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang wrote: Hi Ralph, If so, then I should be able to (1) locate where the port number is defined in the code, and (2) randomize the port number every time it's called to work around the issue. What do you think? That might work, depending on the code. I'm not sure what it is trying to connect to, and if that code knows how to handle arbitrary connections The main reason why Q-Chem is using MPI is for executing parallel tasks on a single node. Thus, I think it's just the MPI ranks attempting to connect with each other on the same machine. This could be off the mark because I'm still a novice with respect to MPI concepts - but I am sure it is just one machine. Your statement doesn't match what you sent us - you showed that it was your connection code that was failing, not ours. You wouldn't have gotten that far if our connections failed as you would have failed in MPI_Init. You are clearly much further than that as you already passed an MPI_Barrier before reaching the code in question. You might check about those warnings - could be that QCLOCALSCR and QCREF need to be set for the code to work. Thanks; I don't think these environment variables are the issue but I will check again. The calculation runs without any problems on four different clusters (where I don't set these environment variables either), it's only broken on the Blue Waters compute node. Also, the calculation runs without any problems the first time it's executed on the BW compute node - it's only subsequent executions that give the error messages. Thanks, - Lee-Ping On Sep 30, 2014, at 11:05 AM, Ralph Castain wrote: On Sep 30, 2014, at 10:49 AM, Lee-Ping Wang wrote: Hi Ralph, Thank you. I think your diagnosis is probably correct. Are these sockets the same as TCP/UDP ports (though different numbers) that are used in web servers, email etc? Yes If so, then I should be able to (1) locate where the port number is defined in the code, and (2) randomize the port number every time it's called to work around the issue. What do you think? That might work, depending on the code. I'm not sure what it is trying to connect to, and if that code knows how to handle arbitrary connections You might check about those warnings - could be that QCLOCALSCR and QCREF need to be set for the code to work. - Lee-Ping On Sep 29, 2014, at 8:45 PM, Ralph Castain wrote: I don't know anything about your application, or what the functions in your code are doing. I imagine it's possible that you are trying to open statically defined ports, which means that running the job again too soon could leave the OS thinking the socket is already busy. It takes awhile for the OS to release a socket resource. On Sep 29, 2014, at 5:49 PM, Lee-Ping Wang wrote: Here's another data point that might be useful: The error message is much more rare if I run my application on 4 cores instead of 8. Thanks, - Lee-Ping On Sep 29, 2014, at 5:38 PM, Lee-Ping Wang wrote: Sorry for my last email - I think I spoke too quick. I realized after reading some more documentation that OpenMPI always uses TCP sockets for out-of-band communication, so it doesn't make sense for me to set OMPI_MCA_oob=^tcp. That said, I am still running into a strange problem in my application when running on a specific machine (Blue Waters compute node); I don't see this problem on any other nodes. When I run the same job (~5 seconds) in rapid succession, I see the following error message on the second execution: /tmp/leepin
Re: [OMPI users] General question about running single-node jobs.
Hi Lee-Ping Computational Chemistry is Greek to me. However, on pp. 12 of the Q-Chem manual 3.2 (PDF online http://www.q-chem.com/qchem-website/doc_for_web/qchem_manual_3.2.pdf) there are explanations of the meaning of QCSCRATCH and QLOCALSRC, etc, which as Ralph pointed out, seem to be a sticking point, and showed up in the warning messages, which I enclose below. QLOCALSRC specifies a local disk for IO. I wonder if the node(s) is (are) diskless, and this might cause the problem. Another possibility is that mpiexec may not be passing these environment variables. (Do you pass them in the mpiexec/mpirun command line?) QCSCRATCH defines a directory for temporary files. If this is a network shared directory, could it be that some nodes are not mounting it correctly? Likewise, if your home directory or your job run directory are not mounted that could be a problem. Or maybe you don't have write permission (sometimes this happens in /tmp, specially if it is a ramdir/tmpdir, which may also have a small size). Your BlueWaters system administrator may be able to shed some light on these things. Also the Q-Chem manual says it is a pre-compiled executable, which as far as I know would require a matching version of OpenMPI. (Ralph, please correct me if I am wrong.). However, you seem to have the source code, at least you sent a snippet of it. [With all those sockets being opened besides MPI ...] Did you recompile with OpenMPI? Did you add the $OMPI/bin to PATH and $OMPI/lib to LD_LIBRARY_PATH and are these environment variables propagated to the job execution nodes (specially those that are failing)? Anyway, just a bunch of guesses ... Gus Correa * QCSCRATCH Defines the directory in which Q-Chem will store temporary files. Q-Chem will usually remove these files on successful completion of t he job, but they can be saved, if so wished. Therefore, QCSCRATCH should not reside in a directory that will be automatically removed at the end of a job, if the files are to be kept for further calculations. Note that many of these files can be very large, and it should be ensured that the volume that contains this directory has sufficient disk sp ace available. The QCSCRATCH directory should be periodically checked for scratch files remaining from abnormally terminated jobs. QCSCRATCH defaults to the working directory if not explicitly set. Please see se ction 2.6 for details on saving temporary files and consult your systems ad ministrator. QCLOCALSCR On certain platforms, such as Linux clusters, it is sometimes preferable to write the temporary files to a disk local to the node. QCLOCALSCR spec- ifies this directory. The temporary files will be copied to QCSCRATCH at the end of the job, unless the job is terminated abnormally. I n such cases Q-Chem will attempt to remove the files in QCLOCALSCR , but may not be able to due to access restrictions. Please specify this va riable only if required * On 10/02/2014 02:08 PM, Lee-Ping Wang wrote: Hi Ralph, I’ve been troubleshooting this issue and communicating with Blue Waters support. It turns out that Q-Chem and OpenMPI are both trying to open sockets, and I get different error messages depending on which one fails. As an aside, I don’t know why Q-Chem needs sockets of its own to communicate between ranks; shouldn’t OpenMPI be taking care of all that? (I’m unfamiliar with this part of the Q-Chem code base, maybe it’s trying to duplicate some functionality?) The Blue Waters support has indicated that there’s a problem with their realm-specific IP addressing (RSIP) for the compute nodes, which they’re working on fixing. I also tried running the same Q-Chem / OpenMPI job on a management node which I think has the same hardware (but not the RSIP), and the problem went away. So I think I’ll shelve this problem for now, until Blue Waters support gets back to me with the fix. :) Thanks, -Lee-Ping *From:*users [mailto:users-boun...@open-mpi.org] *On Behalf Of *Lee-Ping Wang *Sent:* Tuesday, September 30, 2014 1:15 PM *To:* Open MPI Users *Subject:* Re: [OMPI users] General question about running single-node jobs. Hi Ralph, Thanks. I'll add some print statements to the code and try to figure out precisely where the failure is happening. - Lee-Ping On Sep 30, 2014, at 12:06 PM, Ralph Castain mailto:r...@open-mpi.org>> wrote: On Sep 30, 2014, at 11:19 AM, Lee-Ping Wang mailto:leep...@stanford.edu>> wrote: Hi Ralph, If so, then I should be able to (1) locate where the port number is defined in the code, and (2) randomize the port number every time it's called to work around the issue. What do you think? That might work, depending on the code. I'm not sure what it is trying to connect to, and if that code knows how to handle arbitrary connections The main reason why Q-Chem is using MPI is for executing parallel tasks on
Re: [OMPI users] General question about running single-node jobs.
Hi Gus, Thanks for the suggestions! I know that QCSCRATCH and QCLOCALSCR are not the problem. When I set QCSCRATCH="." and unset QCLOCALSCR it writes all the scratch files to the current directory, which is the behavior I want. The environment variables are correctly passed in the mpirun command line. Since my jobs have a fair bit of I/O, I make sure to change to the locally mounted /tmp folder before running the calculations. I do have permissions to write in there. When I run jobs without OpenMPI they are stable on Blue Waters compute nodes, which suggests the issues are not due to the above. I compiled Q-Chem from the source code, so I built OpenMPI 1.8.3 first and added $OMPI/bin to my PATH (and $OMPI/lib to LD_LIBRARY_PATH). I configured the Q-Chem build so it properly uses "mpicc", etc. The environment variables for OpenMPI are correctly set at runtime. At this point, I think the main problem is a limitation on the networking in the compute nodes, and I believe Blue Waters support is currently working on this. I'll make sure to send an update if anything happens. - Lee-Ping On Oct 2, 2014, at 12:09 PM, Gus Correa wrote: > > Hi Lee-Ping > > Computational Chemistry is Greek to me. > > However, on pp. 12 of the Q-Chem manual 3.2 > > (PDF online > http://www.q-chem.com/qchem-website/doc_for_web/qchem_manual_3.2.pdf) > > there are explanations of the meaning of QCSCRATCH and > QLOCALSRC, etc, which as Ralph pointed out, seem to be a sticking point, > and showed up in the warning messages, which I enclose below. > > QLOCALSRC specifies a local disk for IO. > I wonder if the node(s) is (are) diskless, and this might cause the problem. > Another possibility is that mpiexec may not be passing these > environment variables. > (Do you pass them in the mpiexec/mpirun command line?) > > > QCSCRATCH defines a directory for temporary files. > If this is a network shared directory, could it be that some nodes > are not mounting it correctly? > Likewise, if your home directory or your job run directory are not > mounted that could be a problem. > Or maybe you don't have write permission (sometimes this > happens in /tmp, specially if it is a ramdir/tmpdir, which may also have a > small size). > > Your BlueWaters system administrator may be able to shed some light on these > things. > > Also the Q-Chem manual says it is a pre-compiled executable, > which as far as I know would require a matching version of OpenMPI. > (Ralph, please correct me if I am wrong.). > > However, you seem to have the source code, at least you sent a > snippet of it. [With all those sockets being opened besides MPI ...] > > Did you recompile with OpenMPI? > Did you add the $OMPI/bin to PATH and $OMPI/lib to LD_LIBRARY_PATH > and are these environment variables propagated to the job execution nodes > (specially those that are failing)? > > > Anyway, just a bunch of guesses ... > Gus Correa > > * > QCSCRATCH Defines the directory in which > Q-Chem > will store temporary files. > Q-Chem > will usually remove these files on successful completion of t > he job, but they > can be saved, if so wished. Therefore, > QCSCRATCH > should not reside in > a directory that will be automatically removed at the end of a > job, if the > files are to be kept for further calculations. > Note that many of these files can be very large, and it should be > ensured that > the volume that contains this directory has sufficient disk sp > ace available. > The > QCSCRATCH > directory should be periodically checked for scratch > files remaining from abnormally terminated jobs. > QCSCRATCH > defaults > to the working directory if not explicitly set. Please see se > ction 2.6 for > details on saving temporary files and consult your systems ad > ministrator. > > > QCLOCALSCR On certain platforms, such as Linux clusters, it > is sometimes preferable to > write the temporary files to a disk local to the node. > QCLOCALSCR > spec- > ifies this directory. The temporary files will be copied to > QCSCRATCH > at > the end of the job, unless the job is terminated abnormally. I > n such cases > Q-Chem > will attempt to remove the files in > QCLOCALSCR > , but may not > be able to due to access restrictions. Please specify this va > riable only if > required > * > > On 10/02/2014 02:08 PM, Lee-Ping Wang wrote: >> Hi Ralph, >> >> I’ve been troubleshooting this issue and communicating with Blue Waters >> support. It turns out that Q-Chem and OpenMPI are both trying to open >> sockets, and I get different error messages depending on which one fails. >> >> As an aside, I don’t know why Q-Chem needs sockets of its own to >> communicate between ranks; shouldn’t OpenMPI be taking care of all >> that? (I’m unfamiliar with this part of the Q-Chem code base, maybe >> it’s trying to duplicate some functionality?) >> >> The Blue Waters support has indi
Re: [OMPI users] About valgrind and OpenMPI
Hi Ralph Castain, Thanks very much for your reply. I am using libhdfs, a C API to HDFS. I would ask hadoop guys for help. On Fri, Oct 3, 2014 at 12:14 AM, Ralph Castain wrote: > HmmmI would guess you should talk to the Hadoop folks as the problem > seems to be a conflict between valgrind and HDFS. Does valgrind even > support Java programs? I honestly have never tried to do that before. > > > On Oct 2, 2014, at 4:40 AM, XingFENG wrote: > > Hi there, > > I am using valgrind to help analyse my MPI program. > > I used hdfs file system to read/write data. And if I run the code without > valgrind, it works correctly. However, if I run with valgrind, for example, > > *mpirun -np 3 /usr/bin/valgrind --tool=callgrind ./myprogram /input_file > /output_file* > > it returns with following information > > = > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *Exception in thread "main" java.lang.InternalError: processing event: > 535548453at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:506) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) > at > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243) > at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:177)at > org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1156) > at > org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1107) > at > org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1053) > at org.apache.hadoop.conf.Configuration.get(Configuration.java:397)at > org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:594) > at > org.apache.hadoop.security.SecurityUtil.(SecurityUtil.java:67) > at org.apache.hadoop.net.NetUtils.makeSocketAddr(NetUtils.java:188)at > org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:168)at > org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:212) > at > org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:99) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)at > org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)at > org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)at > org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:118)at > org.apache.hadoop.fs.FileSystem$1.run(FileSystem.java:116)at > java.security.AccessController.doPrivileged(Native Method)at > javax.security.auth.Subject.doAs(Subject.java:415)at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:116)Call to > org.apache.hadoop.fs.Filesystem::get(URI, Configuration) failed!* > > = > > By debugging, I found that the exception happens in hdfsConnect(). But I > don't how to fix it. Could anyone give me some advice, please? > > -- > Best Regards. > --- > Xing FENG > PhD Candidate > Database Research Group > > School of Computer Science and Engineering > University of New South Wales > NSW 2052, Sydney > > Phone: (+61) 413 857 288 > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25425.php > > > > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25426.php > -- Best Regards. --- Xing FENG PhD Candidate Database Research Group School of Computer Science and Engineering University of New South Wales NSW 2052, Sydney Phone: (+61) 413 857 288
Re: [OMPI users] SENDRECV + MPI_TYPE_CREATE_STRUCT
Hi Diego, I don't know what CPU/compiler you are using and what -r8 option means, but DISPLACEMENTS(2) and DISPLACEMENTS(3) is incorrect if integer is 4 bytes and real is 8 bytes. In this case, usually there is a gap between ip and RP. See description about datatype alignment in the MPI Standard. Regards, Takahiro > Dear all. > I have some problem with MPI_TYPE_CREATE_STRUCT and as a consequence > with SENDRECV. > > I have this variable type > > *type particle* > *integer :: ip* > * real :: RP(2)* > * real :: QQ(4)* > *end type particle* > > When I compile in double precision with: > > *mpif90 -r8 -fpp -DPARALLEL *.f90 * > > So when I create my own variable type for MPI, I have > > > *TYPES(1)=MPI_INTEGER !We have three variables > type in the new varible* > *TYPES(2)=MPI_DOUBLE_PRECISION !Integer and Real and Real* > * TYPES(3)=MPI_DOUBLE_PRECISION !Integer and Real and Real* > * nBLOCKS(1)=1 !number of > element in each block * > * nBLOCKS(2)=2* > * nBLOCKS(3)=4* > * !* > * DISPLACEMENTS(1)=0* > * DISPLACEMENTS(2)=sizeof(dummy%ip)* > * > DISPLACEMENTS(3)=sizeof(dummy%ip)+sizeof(dummy%RP(1))+sizeof(dummy%RP(2))* > * ! * > * CALL > MPI_TYPE_CREATE_STRUCT(3,nBLOCKS,DISPLACEMENTS,TYPES,MPI_PARTICLE_TYPE, > PI%ierr)* > * CALL MPI_TYPE_COMMIT(MPI_PARTICLE_TYPE,MPI%ierr)* > > > Am I right? > Thanks, in advance, for any kind of help