Hi all, I have a Cluster with Torque and PVFS. I'm trying to test my environment with MPI-IO Test but some segfault are occurring. Does anyone know what is happening ? The error output is below:
Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad bytes at file offset 0. Expected (null), received (null) Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad bytes at file offset 0. Expected (null), received (null) [campogrande01:10646] *** Process received signal *** Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 1 bad bytes at file offset 0. Expected (null), received (null) Rank 0 Host campogrande04.dcc.ufrj.br WARNING ERROR 1207853304: 65537 bad bytes at file offset 0. Expected (null), received (null) [campogrande04:05192] *** Process received signal *** [campogrande04:05192] Signal: Segmentation fault (11) [campogrande04:05192] Signal code: Address not mapped (1) [campogrande04:05192] Failing at address: 0x10000 Rank 1 Host campogrande03.dcc.ufrj.br WARNING ERROR 1207853304: 65537 bad bytes at file offset 0. Expected (null), received (null) [campogrande03:05377] *** Process received signal *** [campogrande03:05377] Signal: Segmentation fault (11) [campogrande03:05377] Signal code: Address not mapped (1) [campogrande03:05377] Failing at address: 0x10000 [campogrande03:05377] [ 0] [0xffffe440] [campogrande03:05377] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande03:05377] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande03:05377] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande03:05377] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande03:05377] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande03:05377] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande03:05377] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande03:05377] [ 8] mpiIO_test [0x804a7e1] [campogrande03:05377] *** End of error message *** Rank 2 Host campogrande02.dcc.ufrj.br WARNING ERROR 1207853304: 65537 bad bytes at file offset 0. Expected (null), received (null) [campogrande02:05187] *** Process received signal *** [campogrande02:05187] Signal: Segmentation fault (11) [campogrande02:05187] Signal code: Address not mapped (1) [campogrande02:05187] Failing at address: 0x10000 [campogrande01:10646] Signal: Segmentation fault (11) [campogrande01:10646] Signal code: Address not mapped (1) [campogrande01:10646] Failing at address: 0x1a0000 [campogrande02:05187] [ 0] [0xffffe440] [campogrande02:05187] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande02:05187] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande02:05187] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande02:05187] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande02:05187] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande02:05187] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande02:05187] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande02:05187] [ 8] mpiIO_test [0x804a7e1] [campogrande02:05187] *** End of error message *** [campogrande04:05192] [ 0] [0xffffe440] [campogrande04:05192] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande04:05192] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande04:05192] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande04:05192] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande04:05192] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande04:05192] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande04:05192] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande04:05192] [ 8] mpiIO_test [0x804a7e1] [campogrande04:05192] *** End of error message *** [campogrande01:10646] [ 0] [0xffffe440] [campogrande01:10646] [ 1] /lib/tls/i686/cmov/libc.so.6(vsnprintf+0xb4) [0xb7d5fef4] [campogrande01:10646] [ 2] mpiIO_test(make_error_messages+0xcf) [0x80502e4] [campogrande01:10646] [ 3] mpiIO_test(warning_msg+0x8c) [0x8050569] [campogrande01:10646] [ 4] mpiIO_test(report_errs+0xe2) [0x804d413] [campogrande01:10646] [ 5] mpiIO_test(read_write_file+0x594) [0x804d9c2] [campogrande01:10646] [ 6] mpiIO_test(main+0x1d0) [0x804aa14] [campogrande01:10646] [ 7] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe0) [0xb7d15050] [campogrande01:10646] [ 8] mpiIO_test [0x804a7e1] [campogrande01:10646] *** End of error message *** mpiexec noticed that job rank 0 with PID 5192 on node campogrande04 exited on signal 11 (Segmentation fault). -- Davi Vercillo Carneiro Garcia Universidade Federal do Rio de Janeiro Departamento de Ciência da Computação DCC-IM/UFRJ - http://www.dcc.ufrj.br "Good things come to those who... wait." - Debian Project "A computer is like air conditioning: it becomes useless when you open windows." - Linus Torvalds "Há duas coisas infinitas, o universo e a burrice humana. E eu estou em dúvida quanto o primeiro." - Albert Einstein