Tom,
New run using my simple "trace" See attached files.
Cheers,
Fred
On 03/19/2012 11:26 AM, Tom Rondeau wrote:
On Mon, Mar 19, 2012 at 12:04 PM, Frederick Stevens
<sk8tesgr...@gmail.com <mailto:sk8tesgr...@gmail.com>> wrote:
Tom,
See the attached file. I am running volk_profile now. If this is
what you need then that is great otherwise I will keep working on
this with whatever suggestions you have.
Cheers,
Fred
That'll be a good start. We'll see if that tells us anything.
Thanks,
Tom
On 03/19/2012 08:10 AM, Tom Rondeau wrote:
On Sun, Mar 18, 2012 at 8:00 PM, Frederick Stevens
<sk8tesgr...@gmail.com <mailto:sk8tesgr...@gmail.com>> wrote:
Volk_profile ran to completion. I am using the git source
tree updated just before I did the run. I commented out line
38 of volk_profile.cc as you suggested and ran volk_profile
under gdb. The output is in the attached text file. I have
also attached the generated volk_config from ~/.volk/volk_config.
Thanks. Strange that it's just that kernel, then. Can you put in
some debug lines that will print out the size of the buffers
being used and the 'number' variable in
volk_32fc_x2_multiply_32fc_a when the crash occurs. I just want
to see if the loop is trying to go beyond the bounds of the arrays.
I noted from running gnuradio-companion version 3.5.1, (which
works) that when I use a multiply block, this message from
python is generated:
./top_block.py
>>> gr_fir_fff: using 3DNow!
but volk_profile does not seem to recognize the 3DNow!
processor extensions (produces sse2 and sse3 messages on the
Intel Atom 32 bit machine).
Yeah, that's fine. Without a 3DNow! kernel, Volk will just fall
back on the generic implementation. The thought being that the
generic version will work for everyone. So we need to figure out
why that's not true for your...
Hope this helps! Let me know if you want me to try anything
else. I'll let you know how things turn out on the other
machine as well.
Cheers,
Fred
Thanks.
Tom
On 03/18/2012 04:31 PM, Tom Rondeau wrote:
On Fri, Mar 16, 2012 at 6:11 PM, Frederick Stevens
<sk8tesgr...@gmail.com <mailto:sk8tesgr...@gmail.com>> wrote:
Well, after a few restarts, here is my output. I did a
fresh pull from git because I was getting some errors
with missing *.h files in gruel/src/swig or something
like that. Hope this helps!
RUN_VOLK_TESTS: volk_32fc_32f_multiply_32fc_a
Program received signal SIGSEGV, Segmentation fault.
0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008,
num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
74 *cPtr++ = (*aPtr++) * (*bPtr++);
(gdb) bt
#0 0xb7edbb74 in volk_32fc_32f_multiply_32fc_a_generic
(cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008,
num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
Alright, Fred, definitely something strange going on here.
My only guess is that for some reason on your
architecture/OS/whatever, something is being handled
incorrectly and the buffers a, b, and c are not getting
generated correctly, maybe something like it's not doubling
the number of items for the complex data type (before this
function test, there are 16ic, or complex shorts, being
tested, but this is the first complex float test).
It's hard to tell if it's something about it being an AMD
chip, 32-bit, Slackware version, gcc version, etc. And I
don't have an AMD chip to test on, but I could load up a
32-bit Slackware VM at least.
How much work are you willing to put into this to help us
nail this down?
If you can follow through the volk_profile test code, we can
start outputting more debug info. To start with, I'd suggest
going into volk/apps/volk_profile.cc and commenting out line
38, rebuild the application, and run this new volk_profile
to see if it fails on any other kernels.
Thanks,
Tom
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org <mailto:Discuss-gnuradio@gnu.org>
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org <mailto:Discuss-gnuradio@gnu.org>
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
#ifndef INCLUDED_volk_32fc_32f_multiply_32fc_a_H
#define INCLUDED_volk_32fc_32f_multiply_32fc_a_H
#include <inttypes.h>
#include <stdio.h>
#ifdef LV_HAVE_SSE
#include <xmmintrin.h>
/*!
\brief Multiplies the input complex vector with the input float vector and
store their results in the third vector
\param cVector The vector where the results will be stored
\param aVector The complex vector to be multiplied
\param bVector The vectors containing the float values to be multiplied
against each complex value in aVector
\param num_points The number of values in aVector and bVector to be
multiplied together and stored into cVector
*/
static inline void volk_32fc_32f_multiply_32fc_a_sse(lv_32fc_t* cVector, const
lv_32fc_t* aVector, const float* bVector, unsigned int num_points){
unsigned int number = 0;
const unsigned int quarterPoints = num_points / 4;
lv_32fc_t* cPtr = cVector;
const lv_32fc_t* aPtr = aVector;
const float* bPtr= bVector;
__m128 aVal1, aVal2, bVal, bVal1, bVal2, cVal;
for(;number < quarterPoints; number++){
aVal1 = _mm_load_ps((const float*)aPtr);
aPtr += 2;
aVal2 = _mm_load_ps((const float*)aPtr);
aPtr += 2;
bVal = _mm_load_ps(bPtr);
bPtr += 4;
bVal1 = _mm_shuffle_ps(bVal, bVal, _MM_SHUFFLE(1,1,0,0));
bVal2 = _mm_shuffle_ps(bVal, bVal, _MM_SHUFFLE(3,3,2,2));
cVal = _mm_mul_ps(aVal1, bVal1);
_mm_store_ps((float*)cPtr,cVal); // Store the results back into the C
container
cPtr += 2;
cVal = _mm_mul_ps(aVal2, bVal2);
_mm_store_ps((float*)cPtr,cVal); // Store the results back into the C
container
cPtr += 2;
}
number = quarterPoints * 4;
for(;number < num_points; number++){
*cPtr++ = (*aPtr++) * (*bPtr);
bPtr++;
}
}
#endif /* LV_HAVE_SSE */
#ifdef LV_HAVE_GENERIC
/*!
\brief Multiplies the input complex vector with the input lv_32fc_t vector
and store their results in the third vector
\param cVector The vector where the results will be stored
\param aVector The complex vector to be multiplied
\param bVector The vectors containing the lv_32fc_t values to be multiplied
against each complex value in aVector
\param num_points The number of values in aVector and bVector to be
multiplied together and stored into cVector
*/
static inline void volk_32fc_32f_multiply_32fc_a_generic(lv_32fc_t* cVector,
const lv_32fc_t* aVector, const float* bVector, unsigned int num_points){
lv_32fc_t* cPtr = cVector;
const lv_32fc_t* aPtr = aVector;
const float* bPtr= bVector;
unsigned int number = 0;
for(number = 0; number < num_points; number++){
*cPtr++ = (*aPtr++) * (*bPtr++);
printf("%u %u %u %d \n",sizeof(aPtr),sizeof(bPtr),sizeof(cPtr),number);
}
}
#endif /* LV_HAVE_GENERIC */
#ifdef LV_HAVE_ORC
/*!
\brief Multiplies the input complex vector with the input lv_32fc_t vector
and store their results in the third vector
\param cVector The vector where the results will be stored
\param aVector The complex vector to be multiplied
\param bVector The vectors containing the lv_32fc_t values to be multiplied
against each complex value in aVector
\param num_points The number of values in aVector and bVector to be
multiplied together and stored into cVector
*/
extern void volk_32fc_32f_multiply_32fc_a_orc_impl(lv_32fc_t* cVector, const
lv_32fc_t* aVector, const float* bVector, unsigned int num_points);
static inline void volk_32fc_32f_multiply_32fc_a_orc(lv_32fc_t* cVector, const
lv_32fc_t* aVector, const float* bVector, unsigned int num_points){
volk_32fc_32f_multiply_32fc_a_orc_impl(cVector, aVector, bVector,
num_points);
}
#endif /* LV_HAVE_GENERIC */
#endif /* INCLUDED_volk_32fc_32f_multiply_32fc_a_H */
4 4 4 102298
4 4 4 102299
4 4 4 102300
4 4 4 102301
4 4 4 102302
4 4 4 102303
4 4 4 102304
4 4 4 102305
4 4 4 102306
4 4 4 102307
4 4 4 102308
4 4 4 102309
4 4 4 102310
4 4 4 102311
4 4 4 102312
4 4 4 102313
4 4 4 102314
4 4 4 102315
4 4 4 102316
4 4 4 102317
4 4 4 102318
4 4 4 102319
4 4 4 102320
4 4 4 102321
4 4 4 102322
4 4 4 102323
4 4 4 102324
4 4 4 102325
4 4 4 102326
4 4 4 102327
4 4 4 102328
4 4 4 102329
4 4 4 102330
4 4 4 102331
4 4 4 102332
4 4 4 102333
4 4 4 102334
4 4 4 102335
4 4 4 102336
4 4 4 102337
4 4 4 102338
4 4 4 102339
4 4 4 102340
4 4 4 102341
4 4 4 102342
4 4 4 102343
4 4 4 102344
4 4 4 102345
4 4 4 102346
4 4 4 102347
4 4 4 102348
4 4 4 102349
4 4 4 102350
4 4 4 102351
4 4 4 102352
4 4 4 102353
4 4 4 102354
4 4 4 102355
4 4 4 102356
4 4 4 102357
4 4 4 102358
4 4 4 102359
4 4 4 102360
4 4 4 102361
4 4 4 102362
4 4 4 102363
4 4 4 102364
4 4 4 102365
4 4 4 102366
4 4 4 102367
4 4 4 102368
4 4 4 102369
4 4 4 102370
4 4 4 102371
4 4 4 102372
4 4 4 102373
4 4 4 102374
4 4 4 102375
4 4 4 102376
4 4 4 102377
4 4 4 102378
4 4 4 102379
4 4 4 102380
4 4 4 102381
4 4 4 102382
4 4 4 102383
4 4 4 102384
4 4 4 102385
4 4 4 102386
4 4 4 102387
4 4 4 102388
4 4 4 102389
4 4 4 102390
4 4 4 102391
4 4 4 102392
4 4 4 102393
4 4 4 102394
4 4 4 102395
4 4 4 102396
4 4 4 102397
4 4 4 102398
Program received signal SIGSEGV, Segmentation fault.
0xb7edbb81 in volk_32fc_32f_multiply_32fc_a_generic (cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
74 *cPtr++ = (*aPtr++) * (*bPtr++);
(gdb) bt
#0 0xb7edbb81 in volk_32fc_32f_multiply_32fc_a_generic (cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600)
at
/home/fred/extras/gnuradio/gnuradio/volk/include/volk/volk_32fc_32f_multiply_32fc_a.h:74
#1 0xb7ed4d68 in volk_32fc_32f_multiply_32fc_a_manual (cVector=0xb7448008,
aVector=0xb7768008, bVector=0xb78f8008, num_points=204600,
arch=0x8079ac4 "generic")
at /home/fred/extras/gnuradio/gnuradio/build/volk/lib/volk.c:749
#2 0x08064533 in run_cast_test3 (
func=0x80595c0 <volk_32fc_32f_multiply_32fc_a_manual@plt>, buffs=...,
vlen=204600, iter=999, arch=...)
at /home/fred/extras/gnuradio/gnuradio/volk/lib/qa_utils.cc:182
#3 0x08062770 in run_volk_tests (desc=...,
manual_func=0x80595c0 <volk_32fc_32f_multiply_32fc_a_manual@plt>,
name=..., tol=9.99999975e-05, scalar=..., vlen=204600, iter=1000,
best_arch_vector=0xbfffe714)
at /home/fred/extras/gnuradio/gnuradio/volk/lib/qa_utils.cc:351
#4 0x0805b3d3 in main (argc=1, argv=0xbffff204)
at /home/fred/extras/gnuradio/gnuradio/volk/apps/volk_profile.cc:38
(gdb) disassemble
Dump of assembler code for function volk_32fc_32f_multiply_32fc_a_generic:
0xb7edbb39 <+0>: push %ebp
0xb7edbb3a <+1>: mov %esp,%ebp
0xb7edbb3c <+3>: push %ebx
0xb7edbb3d <+4>: sub $0x24,%esp
0xb7edbb40 <+7>: call 0xb7edbb45
<volk_32fc_32f_multiply_32fc_a_generic+12>
0xb7edbb45 <+12>: pop %ebx
0xb7edbb46 <+13>: add $0xca753,%ebx
0xb7edbb4c <+19>: mov 0x8(%ebp),%eax
0xb7edbb4f <+22>: mov %eax,-0xc(%ebp)
0xb7edbb52 <+25>: mov 0xc(%ebp),%eax
0xb7edbb55 <+28>: mov %eax,-0x10(%ebp)
0xb7edbb58 <+31>: mov 0x10(%ebp),%eax
0xb7edbb5b <+34>: mov %eax,-0x14(%ebp)
0xb7edbb5e <+37>: movl $0x0,-0x18(%ebp)
0xb7edbb65 <+44>: movl $0x0,-0x18(%ebp)
0xb7edbb6c <+51>: jmp 0xb7edbbd6
<volk_32fc_32f_multiply_32fc_a_generic+157>
0xb7edbb6e <+53>: mov -0x10(%ebp),%eax
0xb7edbb71 <+56>: mov (%eax),%ecx
0xb7edbb73 <+58>: mov 0x4(%eax),%edx
0xb7edbb76 <+61>: mov %ecx,%eax
0xb7edbb78 <+63>: mov %eax,-0x1c(%ebp)
0xb7edbb7b <+66>: flds -0x1c(%ebp)
0xb7edbb7e <+69>: mov -0x14(%ebp),%eax
=> 0xb7edbb81 <+72>: flds (%eax)
0xb7edbb83 <+74>: fmulp %st,%st(1)
0xb7edbb85 <+76>: mov %edx,-0x1c(%ebp)
0xb7edbb88 <+79>: flds -0x1c(%ebp)
0xb7edbb8b <+82>: mov -0x14(%ebp),%eax
0xb7edbb8e <+85>: flds (%eax)
0xb7edbb90 <+87>: fmulp %st,%st(1)
0xb7edbb92 <+89>: fxch %st(1)
0xb7edbb94 <+91>: fstps -0x1c(%ebp)
0xb7edbb97 <+94>: mov -0x1c(%ebp),%ecx
0xb7edbb9a <+97>: fstps -0x1c(%ebp)
0xb7edbb9d <+100>: mov -0x1c(%ebp),%edx
0xb7edbba0 <+103>: mov -0xc(%ebp),%eax
0xb7edbba3 <+106>: mov %ecx,(%eax)
0xb7edbba5 <+108>: mov %edx,0x4(%eax)
0xb7edbba8 <+111>: addl $0x8,-0xc(%ebp)
0xb7edbbac <+115>: addl $0x8,-0x10(%ebp)
0xb7edbbb0 <+119>: addl $0x4,-0x14(%ebp)
0xb7edbbb4 <+123>: addl $0x4,-0x14(%ebp)
0xb7edbbb8 <+127>: lea -0x82e0(%ebx),%eax
0xb7edbbbe <+133>: sub $0xc,%esp
0xb7edbbc1 <+136>: pushl -0x18(%ebp)
0xb7edbbc4 <+139>: push $0x4
0xb7edbbc6 <+141>: push $0x4
0xb7edbbc8 <+143>: push $0x4
0xb7edbbca <+145>: push %eax
0xb7edbbcb <+146>: call 0xb7ecace0 <printf@plt>
0xb7edbbd0 <+151>: add $0x20,%esp
0xb7edbbd3 <+154>: incl -0x18(%ebp)
0xb7edbbd6 <+157>: mov -0x18(%ebp),%eax
0xb7edbbd9 <+160>: cmp 0x14(%ebp),%eax
0xb7edbbdc <+163>: jb 0xb7edbb6e
<volk_32fc_32f_multiply_32fc_a_generic+53>
0xb7edbbde <+165>: mov -0x4(%ebp),%ebx
0xb7edbbe1 <+168>: leave
0xb7edbbe2 <+169>: ret
End of assembler dump.
(gdb)
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio