Re: [Discuss-gnuradio] Segfault with volk on 32 bit AMD

Frederick Stevens Mon, 19 Mar 2012 13:52:39 -0700

Tom,

New run using my simple "trace"  See attached files.


Cheers,

Fred
On 03/19/2012 11:26 AM, Tom Rondeau wrote:

On Mon, Mar 19, 2012 at 12:04 PM, Frederick Stevens<sk8tesgr...@gmail.com <mailto:sk8tesgr...@gmail.com>> wrote:


    Tom,

    See the attached file.  I am running volk_profile now.  If this is
    what you need then that is great otherwise I will keep working on
    this with whatever suggestions you have.


    Cheers,

    Fred


That'll be a good start. We'll see if that tells us anything.

Thanks,
Tom

    On 03/19/2012 08:10 AM, Tom Rondeau wrote:

    On Sun, Mar 18, 2012 at 8:00 PM, Frederick Stevens
    <sk8tesgr...@gmail.com <mailto:sk8tesgr...@gmail.com>> wrote:

        Volk_profile ran to completion.  I am using the git source
        tree updated just before I did the run.  I commented out line
        38 of volk_profile.cc as you suggested and ran volk_profile
        under gdb.  The output is in the attached text file.  I have
        also attached the generated volk_config from ~/.volk/volk_config.


    Thanks. Strange that it's just that kernel, then. Can you put in
    some debug lines that will print out the size of the buffers
    being used and the 'number' variable in
    volk_32fc_x2_multiply_32fc_a when the crash occurs. I just want
    to see if the loop is trying to go beyond the bounds of the arrays.

        I noted from running gnuradio-companion version 3.5.1, (which
        works) that when I use a multiply block, this message from
        python is generated:

         ./top_block.py
        >>> gr_fir_fff: using 3DNow!

        but volk_profile does not seem to recognize the 3DNow!
        processor extensions (produces sse2 and sse3 messages on the
        Intel Atom 32 bit machine).


    Yeah, that's fine. Without a 3DNow! kernel, Volk will just fall
    back on the generic implementation. The thought being that the
    generic version will work for everyone. So we need to figure out
    why that's not true for your...

        Hope this helps!  Let me know if you want me to try anything
        else.  I'll let you know how things turn out on the other
        machine as well.


        Cheers,

        Fred


    Thanks.

    Tom


        On 03/18/2012 04:31 PM, Tom Rondeau wrote:

        On Fri, Mar 16, 2012 at 6:11 PM, Frederick Stevens
        <sk8tesgr...@gmail.com <mailto:sk8tesgr...@gmail.com>> wrote:

            Well, after a few restarts, here is my output.  I did a
            fresh pull from git because I was getting some errors
            with missing *.h files in gruel/src/swig or something
            like that.  Hope this helps!


            RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a

            Program received signal SIGSEGV, Segmentation fault.
            0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
            (cVector=0xb7448008,
                aVector=0xb7768008, bVector=0xb78f8008,
            num_points=204600)
                at
            
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
            74          *cPtr++ = (*aPtr++) * (*bPtr++);
            (gdb) bt
            #0  0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
            (cVector=0xb7448008,
                aVector=0xb7768008, bVector=0xb78f8008,
            num_points=204600)
                at
            
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74


        Alright, Fred, definitely something strange going on here.
        My only guess is that for some reason on your
        architecture/OS/whatever, something is being handled
        incorrectly and the buffers a, b, and c are not getting
        generated correctly, maybe something like it's not doubling
        the number of items for the complex data type (before this
        function test, there are 16ic, or complex shorts, being
        tested, but this is the first complex float test).

        It's hard to tell if it's something about it being an AMD
        chip, 32-bit, Slackware version, gcc version, etc. And I
        don't have an AMD chip to test on, but I could load up a
        32-bit Slackware VM at least.

        How much work are you willing to put into this to help us
        nail this down?

        If you can follow through the volk_profile test code, we can
        start outputting more debug info. To start with, I'd suggest
        going into volk/apps/volk_profile.cc and commenting out line
        38, rebuild the application, and run this new volk_profile
        to see if it fails on any other kernels.

        Thanks,
        Tom



        _______________________________________________
        Discuss-gnuradio mailing list
        Discuss-gnuradio@gnu.org <mailto:Discuss-gnuradio@gnu.org>
        https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



    _______________________________________________
    Discuss-gnuradio mailing list
    Discuss-gnuradio@gnu.org <mailto:Discuss-gnuradio@gnu.org>
    https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

#ifndef INCLUDED_volk_32fc_32f_multiply_32fc_a_H
#define INCLUDED_volk_32fc_32f_multiply_32fc_a_H

#include <inttypes.h>
#include <stdio.h>

#ifdef LV_HAVE_SSE
#include <xmmintrin.h>
  /*!
    \brief Multiplies the input complex vector with the input float vector and 
store their results in the third vector
    \param cVector The vector where the results will be stored
    \param aVector The complex vector to be multiplied
    \param bVector The vectors containing the float values to be multiplied 
against each complex value in aVector
    \param num_points The number of values in aVector and bVector to be 
multiplied together and stored into cVector
  */
static inline void volk_32fc_32f_multiply_32fc_a_sse(lv_32fc_t* cVector, const 
lv_32fc_t* aVector, const float* bVector, unsigned int num_points){
    unsigned int number = 0;
    const unsigned int quarterPoints = num_points / 4;

    lv_32fc_t* cPtr = cVector;
    const lv_32fc_t* aPtr = aVector;
    const float* bPtr=  bVector;

    __m128 aVal1, aVal2, bVal, bVal1, bVal2, cVal;
    for(;number < quarterPoints; number++){
      
      aVal1 = _mm_load_ps((const float*)aPtr);
      aPtr += 2;
 
      aVal2 = _mm_load_ps((const float*)aPtr); 
      aPtr += 2;

      bVal = _mm_load_ps(bPtr);
      bPtr += 4;

      bVal1 = _mm_shuffle_ps(bVal, bVal, _MM_SHUFFLE(1,1,0,0));
      bVal2 = _mm_shuffle_ps(bVal, bVal, _MM_SHUFFLE(3,3,2,2));

      cVal = _mm_mul_ps(aVal1, bVal1); 
      
      _mm_store_ps((float*)cPtr,cVal); // Store the results back into the C 
container
      cPtr += 2;

      cVal = _mm_mul_ps(aVal2, bVal2); 
      
      _mm_store_ps((float*)cPtr,cVal); // Store the results back into the C 
container

      cPtr += 2;
    }

    number = quarterPoints * 4;
    for(;number < num_points; number++){
      *cPtr++ = (*aPtr++) * (*bPtr);
      bPtr++;
    }
}
#endif /* LV_HAVE_SSE */

#ifdef LV_HAVE_GENERIC
  /*!
    \brief Multiplies the input complex vector with the input lv_32fc_t vector 
and store their results in the third vector
    \param cVector The vector where the results will be stored
    \param aVector The complex vector to be multiplied
    \param bVector The vectors containing the lv_32fc_t values to be multiplied 
against each complex value in aVector
    \param num_points The number of values in aVector and bVector to be 
multiplied together and stored into cVector
  */
static inline void volk_32fc_32f_multiply_32fc_a_generic(lv_32fc_t* cVector, 
const lv_32fc_t* aVector, const float* bVector, unsigned int num_points){
  lv_32fc_t* cPtr = cVector;
  const lv_32fc_t* aPtr = aVector;
  const float* bPtr=  bVector;
  unsigned int number = 0;
  
  for(number = 0; number < num_points; number++){
    *cPtr++ = (*aPtr++) * (*bPtr++);
    printf("%u %u %u %d \n",sizeof(aPtr),sizeof(bPtr),sizeof(cPtr),number);
  }
}
#endif /* LV_HAVE_GENERIC */

#ifdef LV_HAVE_ORC
  /*!
    \brief Multiplies the input complex vector with the input lv_32fc_t vector 
and store their results in the third vector
    \param cVector The vector where the results will be stored
    \param aVector The complex vector to be multiplied
    \param bVector The vectors containing the lv_32fc_t values to be multiplied 
against each complex value in aVector
    \param num_points The number of values in aVector and bVector to be 
multiplied together and stored into cVector
  */
extern void volk_32fc_32f_multiply_32fc_a_orc_impl(lv_32fc_t* cVector, const 
lv_32fc_t* aVector, const float* bVector, unsigned int num_points);
static inline void volk_32fc_32f_multiply_32fc_a_orc(lv_32fc_t* cVector, const 
lv_32fc_t* aVector, const float* bVector, unsigned int num_points){
    volk_32fc_32f_multiply_32fc_a_orc_impl(cVector, aVector, bVector, 
num_points);
}
#endif /* LV_HAVE_GENERIC */



#endif /* INCLUDED_volk_32fc_32f_multiply_32fc_a_H */

4 4 4 102298 
4 4 4 102299 
4 4 4 102300 
4 4 4 102301 
4 4 4 102302 
4 4 4 102303 
4 4 4 102304 
4 4 4 102305 
4 4 4 102306 
4 4 4 102307 
4 4 4 102308 
4 4 4 102309 
4 4 4 102310 
4 4 4 102311 
4 4 4 102312 
4 4 4 102313 
4 4 4 102314 
4 4 4 102315 
4 4 4 102316 
4 4 4 102317 
4 4 4 102318 
4 4 4 102319 
4 4 4 102320 
4 4 4 102321 
4 4 4 102322 
4 4 4 102323 
4 4 4 102324 
4 4 4 102325 
4 4 4 102326 
4 4 4 102327 
4 4 4 102328 
4 4 4 102329 
4 4 4 102330 
4 4 4 102331 
4 4 4 102332 
4 4 4 102333 
4 4 4 102334 
4 4 4 102335 
4 4 4 102336 
4 4 4 102337 
4 4 4 102338 
4 4 4 102339 
4 4 4 102340 
4 4 4 102341 
4 4 4 102342 
4 4 4 102343 
4 4 4 102344 
4 4 4 102345 
4 4 4 102346 
4 4 4 102347 
4 4 4 102348 
4 4 4 102349 
4 4 4 102350 
4 4 4 102351 
4 4 4 102352 
4 4 4 102353 
4 4 4 102354 
4 4 4 102355 
4 4 4 102356 
4 4 4 102357 
4 4 4 102358 
4 4 4 102359 
4 4 4 102360 
4 4 4 102361 
4 4 4 102362 
4 4 4 102363 
4 4 4 102364 
4 4 4 102365 
4 4 4 102366 
4 4 4 102367 
4 4 4 102368 
4 4 4 102369 
4 4 4 102370 
4 4 4 102371 
4 4 4 102372 
4 4 4 102373 
4 4 4 102374 
4 4 4 102375 
4 4 4 102376 
4 4 4 102377 
4 4 4 102378 
4 4 4 102379 
4 4 4 102380 
4 4 4 102381 
4 4 4 102382 
4 4 4 102383 
4 4 4 102384 
4 4 4 102385 
4 4 4 102386 
4 4 4 102387 
4 4 4 102388 
4 4 4 102389 
4 4 4 102390 
4 4 4 102391 
4 4 4 102392 
4 4 4 102393 
4 4 4 102394 
4 4 4 102395 
4 4 4 102396 
4 4 4 102397 
4 4 4 102398 

Program received signal SIGSEGV, Segmentation fault.
0xb7edbb81 in volk_32fc_32f_multiply_32fc_a_generic (cVector=0xb7448008, 
    aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
    at 
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
74          *cPtr++ = (*aPtr++) * (*bPtr++);
(gdb)  bt
#0  0xb7edbb81 in volk_32fc_32f_multiply_32fc_a_generic (cVector=0xb7448008, 
    aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
    at 
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
#1  0xb7ed4d68 in volk_32fc_32f_multiply_32fc_a_manual (cVector=0xb7448008, 
    aVector=0xb7768008, bVector=0xb78f8008, num_points=204600, 
    arch=0x8079ac4 "generic")
    at /home/fred/extras/gnuradio/gnuradio/build/volk/lib/volk.c:749
#2  0x08064533 in run_cast_test3 (
    func=0x80595c0 <volk_32fc_32f_multiply_32fc_a_manual@plt>, buffs=..., 
    vlen=204600, iter=999, arch=...)
    at /home/fred/extras/gnuradio/gnuradio/volk/lib/qa_utils.cc:182
#3  0x08062770 in run_volk_tests (desc=..., 
    manual_func=0x80595c0 <volk_32fc_32f_multiply_32fc_a_manual@plt>, 
    name=..., tol=9.99999975e-05, scalar=..., vlen=204600, iter=1000, 
    best_arch_vector=0xbfffe714)
    at /home/fred/extras/gnuradio/gnuradio/volk/lib/qa_utils.cc:351
#4  0x0805b3d3 in main (argc=1, argv=0xbffff204)
    at /home/fred/extras/gnuradio/gnuradio/volk/apps/volk_profile.cc:38
(gdb) disassemble
Dump of assembler code for function volk_32fc_32f_multiply_32fc_a_generic:
   0xb7edbb39 <+0>:     push   %ebp
   0xb7edbb3a <+1>:     mov    %esp,%ebp
   0xb7edbb3c <+3>:     push   %ebx
   0xb7edbb3d <+4>:     sub    $0x24,%esp
   0xb7edbb40 <+7>:     call   0xb7edbb45 
<volk_32fc_32f_multiply_32fc_a_generic+12>
   0xb7edbb45 <+12>:    pop    %ebx
   0xb7edbb46 <+13>:    add    $0xca753,%ebx
   0xb7edbb4c <+19>:    mov    0x8(%ebp),%eax
   0xb7edbb4f <+22>:    mov    %eax,-0xc(%ebp)
   0xb7edbb52 <+25>:    mov    0xc(%ebp),%eax
   0xb7edbb55 <+28>:    mov    %eax,-0x10(%ebp)
   0xb7edbb58 <+31>:    mov    0x10(%ebp),%eax
   0xb7edbb5b <+34>:    mov    %eax,-0x14(%ebp)
   0xb7edbb5e <+37>:    movl   $0x0,-0x18(%ebp)
   0xb7edbb65 <+44>:    movl   $0x0,-0x18(%ebp)
   0xb7edbb6c <+51>:    jmp    0xb7edbbd6 
<volk_32fc_32f_multiply_32fc_a_generic+157>
   0xb7edbb6e <+53>:    mov    -0x10(%ebp),%eax
   0xb7edbb71 <+56>:    mov    (%eax),%ecx
   0xb7edbb73 <+58>:    mov    0x4(%eax),%edx
   0xb7edbb76 <+61>:    mov    %ecx,%eax
   0xb7edbb78 <+63>:    mov    %eax,-0x1c(%ebp)
   0xb7edbb7b <+66>:    flds   -0x1c(%ebp)
   0xb7edbb7e <+69>:    mov    -0x14(%ebp),%eax
=> 0xb7edbb81 <+72>:    flds   (%eax)
   0xb7edbb83 <+74>:    fmulp  %st,%st(1)
   0xb7edbb85 <+76>:    mov    %edx,-0x1c(%ebp)
   0xb7edbb88 <+79>:    flds   -0x1c(%ebp)
   0xb7edbb8b <+82>:    mov    -0x14(%ebp),%eax
   0xb7edbb8e <+85>:    flds   (%eax)
   0xb7edbb90 <+87>:    fmulp  %st,%st(1)
   0xb7edbb92 <+89>:    fxch   %st(1)
   0xb7edbb94 <+91>:    fstps  -0x1c(%ebp)
   0xb7edbb97 <+94>:    mov    -0x1c(%ebp),%ecx
   0xb7edbb9a <+97>:    fstps  -0x1c(%ebp)
   0xb7edbb9d <+100>:   mov    -0x1c(%ebp),%edx
   0xb7edbba0 <+103>:   mov    -0xc(%ebp),%eax
   0xb7edbba3 <+106>:   mov    %ecx,(%eax)
   0xb7edbba5 <+108>:   mov    %edx,0x4(%eax)
   0xb7edbba8 <+111>:   addl   $0x8,-0xc(%ebp)
   0xb7edbbac <+115>:   addl   $0x8,-0x10(%ebp)
   0xb7edbbb0 <+119>:   addl   $0x4,-0x14(%ebp)
   0xb7edbbb4 <+123>:   addl   $0x4,-0x14(%ebp)
   0xb7edbbb8 <+127>:   lea    -0x82e0(%ebx),%eax
   0xb7edbbbe <+133>:   sub    $0xc,%esp
   0xb7edbbc1 <+136>:   pushl  -0x18(%ebp)
   0xb7edbbc4 <+139>:   push   $0x4
   0xb7edbbc6 <+141>:   push   $0x4
   0xb7edbbc8 <+143>:   push   $0x4
   0xb7edbbca <+145>:   push   %eax
   0xb7edbbcb <+146>:   call   0xb7ecace0 <printf@plt>
   0xb7edbbd0 <+151>:   add    $0x20,%esp
   0xb7edbbd3 <+154>:   incl   -0x18(%ebp)
   0xb7edbbd6 <+157>:   mov    -0x18(%ebp),%eax
   0xb7edbbd9 <+160>:   cmp    0x14(%ebp),%eax
   0xb7edbbdc <+163>:   jb     0xb7edbb6e 
<volk_32fc_32f_multiply_32fc_a_generic+53>
   0xb7edbbde <+165>:   mov    -0x4(%ebp),%ebx
   0xb7edbbe1 <+168>:   leave  
   0xb7edbbe2 <+169>:   ret    
End of assembler dump.
(gdb)

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] Segfault with volk on 32 bit AMD

Reply via email to