On 1/8/19, Simon Nash <si...@cjnash.com> wrote: > Hello, > This is my first message to ffmpeg-devel, so please excuse me if > anything here doesn't conform to the normal mailing list conventions. > > I have encountered a problem with ffmpeg (a segmentation fault) that > occurs only when running ffmpeg on the Marvell Armada 370 processor. > > I used gdb to debug the cause of the problem and found that it was > caused by a malfunction in the floating-point unit of the Armada 370. > This is not specific to a single machine but appears to affect all > Armada 370 chips. > > The problem occurs in the following section of code in > libavfilter/af_afir.c (lines 441 to 447): > > for (ch = 0; ch < ctx->inputs[1]->channels; ch++) { > float *time = (float *)s->in[1]->extended_data[!s->one2many * ch]; > > for (i = 0; i < s->nb_taps; i++) > power += time[i] * time[i]; > } > s->gain = sqrtf(ch / power); > > The generated assembler code for the inner loop includes the following: > > 0x0018a8d4 <activate+1660>: add.w r0, r6, #100 ; 0x64 > 0x0018a8d8 <activate+1664>: movs r3, #0 > 0x0018a8da <activate+1666>: vldr s15, [r0, #-100] ; > 0xffffff9c > 0x0018a8de <activate+1670>: adds r3, #8 > 0x0018a8e0 <activate+1672>: vldr s8, [r0, #-96] ; 0xffffffa0 > 0x0018a8e4 <activate+1676>: cmp r3, lr > 0x0018a8e6 <activate+1678>: vldr s9, [r0, #-92] ; 0xffffffa4 > 0x0018a8ea <activate+1682>: pld [r0] > 0x0018a8ee <activate+1686>: vldr s10, [r0, #-88] ; 0xffffffa8 > 0x0018a8f2 <activate+1690>: vmla.f32 s12, s15, s15 > 0x0018a8f6 <activate+1694>: vldr s11, [r0, #-84] ; 0xffffffac > 0x0018a8fa <activate+1698>: vldr s13, [r0, #-80] ; 0xffffffb0 > 0x0018a8fe <activate+1702>: vldr s14, [r0, #-76] ; 0xffffffb4 > 0x0018a902 <activate+1706>: vldr s15, [r0, #-72] ; 0xffffffb8 > 0x0018a906 <activate+1710>: add.w r0, r0, #32 > > When the 32-bit floating-point multiply instruction > 0x0018a8f2 <activate+1690>: vmla.f32 s12, s15, s15 > at activate+1690 is executed, there is a segmentation fault. This > should never happen because the vmla.f32 instruction uses register > values only. This problem happens when the value of the operand in > s15 is small enough to cause an underflow when multiplied by itself > (7.04930082e-28 in this case). I have a console log showing the > state of the registers before and after single-stepping through the > vmla.f32 instruction and I would be pleased to make this available > in any convenient form. > > In other cases (when ffmpeg was compiled with a lower optimization > level for debugging purposes), the vmla.f32 instruction completes > but corrupts the unrelated register r7, causing a segmentation fault > on a subsequent instruction. It seems that vmla.f32 produces > unpredictable results on the Armada 370 when underflow occurs. > > I have investigated possible workarounds. The problem does not occur > if the 32-bit float value is converted to a 64-bit value and > multiplied using a vmla.f64 instruction producing a 64-bit result. > This is because there is no underflow from ther multiplication. > The following patch (diff file from git) does this. Changing to > 64-bit multiplication would (I believe) avoid the possibility of > underflow for any 32-bit coefficient value, which would mean that > the code works correctly on the Armada 370 as well as all other > processors. > > index 31919f6..3e69107 100644 > --- a/libavfilter/af_afir.c > +++ b/libavfilter/af_afir.c > @@ -375,6 +375,7 @@ static int convert_coeffs(AVFilterContext *ctx) > int left, offset = 0, part_size, max_part_size; > int ret, i, ch, n; > float power = 0; > + double power_d = 0; > > s->nb_taps = ff_inlink_queued_samples(ctx->inputs[1]); > if (s->nb_taps <= 0) > @@ -441,9 +442,12 @@ static int convert_coeffs(AVFilterContext *ctx) > for (ch = 0; ch < ctx->inputs[1]->channels; ch++) { > float *time = (float *)s->in[1]->extended_data[!s->one2many * > ch]; > > - for (i = 0; i < s->nb_taps; i++) > - power += time[i] * time[i]; > + for (i = 0; i < s->nb_taps; i++) { > + double coeff_d = (double)time[i]; > + power_d += coeff_d * coeff_d; // ensure no underflow > + } > } > + power = (float)power_d; > s->gain = sqrtf(ch / power); > break; > default: > > This produces the following assembler code, with vmla.f32 replaced with > vmla.f64: > > 0x0018a8d4 <activate+1660>: add.w r0, r6, #100 ; 0x64 > 0x0018a8d8 <activate+1664>: movs r3, #0 > 0x0018a8da <activate+1666>: vldr s12, [r0, #-100] ; > 0xffffff9c > 0x0018a8de <activate+1670>: adds r3, #8 > 0x0018a8e0 <activate+1672>: vldr s0, [r0, #-96] ; 0xffffffa0 > 0x0018a8e4 <activate+1676>: cmp r3, lr > 0x0018a8e6 <activate+1678>: vldr s2, [r0, #-92] ; 0xffffffa4 > 0x0018a8ea <activate+1682>: pld [r0] > 0x0018a8ee <activate+1686>: vldr s4, [r0, #-88] ; 0xffffffa8 > 0x0018a8f2 <activate+1690>: vcvt.f64.f32 d6, s12 > 0x0018a8f6 <activate+1694>: vcvt.f64.f32 d0, s0 > 0x0018a8fa <activate+1698>: vcvt.f64.f32 d1, s2 > 0x0018a8fe <activate+1702>: vcvt.f64.f32 d2, s4 > 0x0018a902 <activate+1706>: vmla.f64 d7, d6, d6 > 0x0018a906 <activate+1710>: vldr s6, [r0, #-84] ; 0xffffffac > 0x0018a90a <activate+1714>: vldr s8, [r0, #-80] ; 0xffffffb0 > 0x0018a90e <activate+1718>: vldr s10, [r0, #-76] ; 0xffffffb4 > 0x0018a912 <activate+1722>: vldr s12, [r0, #-72] ; 0xffffffb8 > 0x0018a916 <activate+1726>: add.w r0, r0, #32 > > I have tested this and it works on the Armada 370 as well as on > other ARMv7 platforms. > > The change to using 64-bit multiplication is unlikely to be a > performance issue because this code runs only during activation. > However, if there are concerns about using 64-bit multiplication > on processors that don't have the bug, I can modify the patch to > check whether the processor is the Armada 370 and use the current > 32-bit multiplication code when running on other processors. > > I can provide a test case if required. This will reproduce the bug > reliably if ffmpeg is running on an Armada 370 machine. > > I realize that this is not a bug in FFmpeg but I would like to ask > the FFmpeg developers to apply this patch (or equivalent) so that > ffmpeg is able to run correctly on the Armada 370. My application > (MinimServer) is using ffmpeg for various audio transformations > including applying FIR filters and has users with devices that use > the Armada 370 as well as users on many other platforms. > > Please advise me what I should do now to help make this change > happen. Many thanks. > > Best regards, > Simon
This could use doubles, but perhaps same issue could happen there too in another scenario. Others may be against it. So better wait for others opinions. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel