I have been working on slightly modifying a software package by Sean
Eddy called Hmmer 3.  The hardware acceleration was originally SSE2 but
since most of our compute nodes only have SSE1 and MMX I rewrote a few
small sections to just use those instructions.  (And yes, as far as I
can tell it invokes emms before any floating point operations are run
after each MMX usage.)   On top of that each binary has 3 options for
running the programs: single threaded, threaded, or MPI (using 
Ompi143).  For all other programs in this package everything works
everywhere.  For one called "jackhmmer" this table results (+=runs
correctly, - = problems), where the exact same problem is run in each
test (theoretically exercising exactly the same routines, just under
different threading control):

           SSE2   SSE1 
Single      +      +
Threaded    +      +
Ompi143     +      -

The negative result for the SSE/Ompi143 combination happens whether the
worker nodes are Athlon MP (SSE1 only) or Athlon64.  The test machine
for the single and threaded runs is a two CPU Opteron 280 (4 cores
total).  Ompi143 is 32 bit everywhere (local copies though).  There have
been no modifications whatsoever made to the main jackhmmer.c file,
which is where the various run methods are implemented.

Now if there was some intrinsic problem with my SSE1 code it should
presumably manifest in both the Single and Threaded versions as well
(the thread control is different, but they all feed through the same
underlying functions), or in one of the other programs, which isn't
seen.  Running under valgrind using Single or Threaded produces no
warnings.  Using mpirun with valgrind on the SSE2 produces 3: two
related to OMPI itself which are seen in every OMPI program run in
valgrind, and one caused by an MPIsend operation where the buffer
contains some uninitialized data (this is nothing toxic, just bytes in
fixed length fields which which were never set because a shorter string
is stored there). 

==19802== Syscall param writev(vector[...]) points to uninitialised byte(s)
==19802==    at 0x4C77AC1: writev (in /lib/libc-2.10.1.so)
==19802==    by 0x8A069B5: mca_btl_tcp_frag_send (in
/opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
==19802==    by 0x8A0626E: mca_btl_tcp_endpoint_send (in
/opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
==19802==    by 0x8A01ADC: mca_btl_tcp_send (in
/opt/ompi143.X32/lib/openmpi/mca_btl_tcp.so)
==19802==    by 0x7FA24A9: mca_pml_ob1_send_request_start_prepare (in
/opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so)
==19802==    by 0x7F98443: mca_pml_ob1_send (in
/opt/ompi143.X32/lib/openmpi/mca_pml_ob1.so)
==19802==    by 0x4A8530F: PMPI_Send (in
/opt/ompi143.X32/lib/libmpi.so.0.0.2)
==19802==    by 0x808D5F2: p7_oprofile_MPISend (mpi.c:101)
==19802==    by 0x805762E: main (jackhmmer.c:1149)
==19802==  Address 0x770bc9d is 15,101 bytes inside a block of size
15,389 alloc'd
==19802==    at 0x49E3A12: realloc (vg_replace_malloc.c:476)
==19802==    by 0x808D4E3: p7_oprofile_MPISend (mpi.c:88)
==19802==    by 0x805762E: main (jackhmmer.c:1149)

Do that for the SSE1 version and the same 3 errors are seen, plus many
more like the following:

==9416== Conditional jump or move depends on uninitialised value(s)
==9416==    at 0x807FE3E: forward_engine (fwdback.c:420)
==9416==    by 0x8080051: p7_ForwardParser (fwdback.c:143)
==9416==    by 0x806C3CC: p7_Pipeline (p7_pipeline.c:590)
==9416==    by 0x80564F0: main (jackhmmer.c:1426)

Unfortunately this makes absolutely no sense.  Line 420 is

       if (xE > 1.0e4)

which tells us that xE wasn't set (fine), so assaying uninitialized
with statements like:

  fprintf(stderr,"DEBUG xEv %lld\n",xEv);fflush(stderr);

(each of which generates its own uninitialized value message) the first
uninitialized variable appears very early in the code after this
_mm_setzero_ps:

      register __m128 xEv;
      //other stuff that does not touch xEv
      xEv   = _mm_setzero_ps();

Now this is hair pulling for many reasons.  The first is that nothing of
substance was changed in this file (just some #defines that
resolve to the same values as they had originally).  The second is that
this is an SSE1 operation even in the original unmodified code.  The
third is that it just isn't possible for xEv to be uninitialized after
that statement - yet it is.  (Valgrind with --smc-check=all turns up
nothing more than leaving out that parameter.)   Here is the relevant
section in xmmintrin.h:

/* Create a vector of zeros.  */
extern __inline __m128 __attribute__((__gnu_inline__, __always_inline__,
__artificial__))
_mm_setzero_ps (void)
{
  return __extension__ (__m128){ 0.0f, 0.0f, 0.0f, 0.0f };
}

Of course all of this nonsense is happening on a worker node, which
isn't making getting to the root of the problem any easier.

The module where these uninitialized variables are seen was compiled like;

mpicc -std=gnu99 -O1 -g -m32 -pthread -msse -mno-sse2  -DHAVE_CONFIG_H 
-I../../easel -I../../easel -I. -I.. -I. -I../../src -o fwdback.o -c
fwdback.c

Building it on a 64 bit machine (that's why the -m32 is there) or a 32
bit machine gives the same result.

If any of you have seen something like this before and can suggest a way
to proceed I would be very grateful.

Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech

Reply via email to