Hi All,
I am receiving an error message
[grid-admin@ng2 ~]$ cat dml_test.err
[hydra010:22914] [btl_gm_proc.c:191] error in converting global to local id
[hydra002:07435] [btl_gm_proc.c:191] error in converting global to local id
[hydra009:31492] [btl_gm_proc.c:191] error in converting global to local id
[hydra008:29253] [btl_gm_proc.c:191] error in converting global to local id
[hydra007:02552] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra003:07068] [btl_gm_proc.c:191] error in converting global to local id
[hydra005:27967] [btl_gm_proc.c:191] error in converting global to local id
[hydra006:19420] [btl_gm_proc.c:191] error in converting global to local id
[hydra010:22914] [btl_gm.c:489] send completed with unhandled gm error 18
[hydra010:22914] pml_ob1_sendreq.c:211 FATAL
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 22914 on
node hydra010 exiting without calling "finalize". This may
have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).
--------------------------------------------------------------------------
[grid-admin@ng2 ~]$
I've searched and googled only to find nothing that is able to point me
where this problem may lie. I've looked at the source code and can't see
anything glaringly obvious and am wondering whether this might be a gm
issue? It does appear to start up ok
GM: Version 2.1.30_Linux build 2.1.30_Linux
root@hydra115:/usr/local/src/gm-2.1.30_Linux Tue Apr 27 12:29:17 CST 2010
GM: On i686, kernel version: 2.6.18-92.1.10.el5 #1 SMP Tue Aug 5
07:41:53 EDT 2008
GM: Highmem memory configuration:
GM: PFN_ZERO=0x0, PFN_MAX=0x7fffc, KERNEL_PFN_MAX=0x38000
GM: Memory available for registration: 259456 pages (1013 MBytes)
GM: MCP for unit 0: L9 4K
GM: LANai rate set to 132 MHz (max = 134 MHz)
GM: Board 0 supports 2815 remote nodes.
GM: Board 0 page hash cache has 16384 bins.
GM: Board 0 has 1 packet interfaces.
GM: NOTICE:
/usr/local/src/gm-2.1.30_Linux/drivers/linux/kbuild/gm_arch_k.c:4828:():kernel
GM: ServerWorks chipset detected: avoiding PIO read.
GM: Allocated IRQ10
GM: 1 Myrinet board(s) found and initialized
Any ideas as to where to look would be most appreciated.
Thanks
--
David Logan
eResearch SA, ARCS Grid Administrator
Level 1, School of Physics and Chemistry
North Terrace, Adelaide, 5005
(W) 08 8303 7301
(M) 0458 631 117