Re: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx}

2018-09-14 Thread Henrik Gramner
On Thu, Sep 13, 2018 at 3:08 PM, James Almer wrote: > +lea lenq, [lend*8 - mmsize*4] Is len guaranteed to be a multiple of mmsize/8? Otherwise this would cause misalignment. It will also break if len < mmsize/2. Also if you want a 32-bit result from lea it should be written as "lea len

Re: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx}

2018-09-14 Thread Henrik Gramner
On Fri, Sep 14, 2018 at 4:51 PM, Henrik Gramner wrote: > I can't really think of any scenario where using a 32-bit register > address operand with a 64-bit destination for LEA is not a mistake. To clarify on this, using a 32-bit memory operand means the calculated effective address

Re: [FFmpeg-devel] [PATCH 2/2] avutil/float_dsp: add ff_vector_dmul_{sse2, avx}

2018-09-14 Thread Henrik Gramner
On Fri, Sep 14, 2018 at 3:26 PM, James Almer wrote: > On 9/14/2018 9:57 AM, Henrik Gramner wrote: >> Also if you want a 32-bit result from lea it should be written as "lea >> lend, [lenq*8 - mmsize*4]" which is equivalent but has a shorter >> opcode (e.g. always u

Re: [FFmpeg-devel] swscale/x86/rgb2rgb : port shuffle2103 to external asm

2018-10-09 Thread Henrik Gramner
On Mon, Oct 8, 2018 at 7:46 PM Martin Vignali wrote: > > Hello, > > Patch in attach port inline asm shuffle 2103 func (mmx/mmxext) to external > asm > and remove the inline asm version > > Martin Keeping both MMX and MMXEXT seems a bit excessive. Ideally both would be replaced with something more

Re: [FFmpeg-devel] [PATCH] avcodec/libx264: remove FF_CODEC_CAP_INIT_THREADSAFE flag

2018-10-21 Thread Henrik Gramner
Fixed in x264-sandbox. All uses of plain strtok() will be removed from x264 in the next push. I though all of the strtok() uses in x264 had already been converted to strtok_r() but apparently that wasn't the case. Sorry about that. ___ ffmpeg-devel maili

Re: [FFmpeg-devel] [PATCH] avcodec/libx264: remove FF_CODEC_CAP_INIT_THREADSAFE flag

2018-10-23 Thread Henrik Gramner
On Tue, Oct 23, 2018 at 3:22 PM Derek Buitenhuis wrote: > I'd like to point out that this patch or some variant may be required anyway. > > libx264 only uses strtok_r or strtok_s if available on the platform. > > See: > https://git.videolan.org/?p=x264.git;a=blob;f=common/osdep.h;h=715ef8a00c01ad

Re: [FFmpeg-devel] [PATCH 5/5] checkasm: aarch64: Check for stack overflows

2020-05-15 Thread Henrik Gramner
All 5 lgtm. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH 4/4] avfilter/vf_v360: x86 SIMD for interpolations

2019-09-04 Thread Henrik Gramner
On Wed, Sep 4, 2019 at 10:01 PM James Almer wrote: > On 9/4/2019 4:28 PM, Paul B Mahol wrote: > > +vpmulld m3, m1, m0 > > +vpaddd m1, m3, m2 > > pmulld m1, m0 > paddd m1, m2 Could use pmaddwd instead as well, it's faster than pmulld on pretty much every CPU. >

Re: [FFmpeg-devel] [PATCH 4/4] avfilter/vf_v360: x86 SIMD for interpolations

2019-09-05 Thread Henrik Gramner
On Wed, Sep 4, 2019 at 9:29 PM Paul B Mahol wrote: > +movd xm6, [pd_255] > +vpbroadcastdm6, xm6 vpbroadcastdm6, [pd_255] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To

Re: [FFmpeg-devel] [PATCH V3 2/2] libswscale/x86/yuv2rgb: add ssse3 version

2019-12-16 Thread Henrik Gramner
On Wed, Dec 4, 2019 at 4:03 AM Ting Fu wrote: > +VBROADCASTSD y_offset, [pointer_c_ditherq + 8 * 8] > +VBROADCASTSD u_offset, [pointer_c_ditherq + 9 * 8] > +VBROADCASTSD v_offset, [pointer_c_ditherq + 10 * 8] > +VBROADCASTSD ug_coff, [pointer_c_ditherq + 7 * 8] > +VBROADCAS

Re: [FFmpeg-devel] [PATCH] Workaround to build ffmpeg on MacOs 10.15

2020-01-03 Thread Henrik Gramner
On Fri, Jan 3, 2020 at 7:37 PM Moritz Barsnick wrote: > On Fri, Jan 03, 2020 at 11:05:25 +0100, Timo Rothenpieler wrote: > > I think this was discussed on this list in the past. > > Not sure what the conclusion was, but I think an unconditional flag like > > this being added wasn't all that well r

Re: [FFmpeg-devel] [PATCH] ffmpeg: add -fpsmin to clamp output framerate

2021-06-14 Thread Henrik Gramner
On Mon, Jun 14, 2021 at 9:22 AM Matthias Neugebauer wrote: > Anything I can do to not land in spam? On another Google groups > mailing list I (and many others including the admin accounts) had > the same issue a couple of times. This is caused by sending emails from a domain with a DMARC reject o

Re: [FFmpeg-devel] [PATCH v2 3/9] avcodec/av1dec: support setup shear process

2021-07-06 Thread Henrik Gramner
On Mon, Jul 5, 2021 at 4:32 AM Fei Wang wrote: > +int64_t v, w; > +int32_t *param = &s->cur_frame.gm_params[idx][0]; ... > +v = param[4] * (1 << AV1_WARPEDMODEL_PREC_BITS); > +w = param[3] * param[4]; Possible integer overflow? Might need some int64_t casting before the mu

Re: [FFmpeg-devel] [PATCH] avutil: Rename RSHIFT macro to ROUNDED_RSHIFT

2019-01-27 Thread Henrik Gramner
On Mon, Jan 21, 2019 at 9:54 PM James Almer wrote: > There's also no good way to deprecate a define and replace it with > another while informing the library user, so for something purely > cosmetic like this i don't think it's worth the trouble. Would it be possible to create a deprecated inline

Re: [FFmpeg-devel] [PATCH] avutil/mem: Mark DECLARE_ASM_ALIGNED as visibility("hidden") for __GNUC__

2019-03-13 Thread Henrik Gramner
On Wed, Feb 20, 2019 at 8:03 PM Fāng-ruì Sòng wrote: > --- a/libavutil/mem.h > +++ b/libavutil/mem.h > > +#if defined(__GNUC__) && !(defined(_WIN32) || defined(__CYGWIN__)) > +#define DECLARE_HIDDEN __attribute__ ((visibility ("hidden"))) > +#else > +#define DECLARE_HIDDEN > +#endif libav

Re: [FFmpeg-devel] [PATCH] avutil/x86inc: don't use movss in VBROADCASTSS macro when src and dst args are the same

2017-03-21 Thread Henrik Gramner
x86util, not x86inc. Otherwise LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 1/5] avcodec/h264: change RETs into REP_RETs where appropriate

2017-04-05 Thread Henrik Gramner
On Wed, Apr 5, 2017 at 3:53 AM, James Darnley wrote: > call h264_idct_add8_mmx_plane > -RET > +RET ; TODO: check rep ret after a function call call followed by RET should be replaced by the TAIL_CALL macro instead which outputs a jmp instruction if there's no function epilogu

Re: [FFmpeg-devel] [PATCH 1/5] avcodec/h264: change RETs into REP_RETs where appropriate

2017-04-14 Thread Henrik Gramner
On Fri, Apr 14, 2017 at 1:19 PM, James Darnley wrote: > Do you want me to change this patch to add that? Either the same patch or a different one, pick whichever is most convenient for you. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ff

Re: [FFmpeg-devel] [PATCH 2/6] avcodec/h264: change some labels to be macro-local

2017-04-15 Thread Henrik Gramner
What about just using strip -x on the assembly files to discard the local symbols? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 5/5] x86: Add some additional cpuflag relations

2017-06-09 Thread Henrik Gramner
On Fri, Jun 9, 2017 at 1:05 AM, James Darnley wrote: >Where should I put the aesni define? Between sse42 and avx. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 1/5] x86inc: Fix call with memory operands

2017-06-09 Thread Henrik Gramner
On Fri, Jun 9, 2017 at 1:04 AM, James Darnley wrote: > libavutil/x86/x86inc.asm | 6 +- Bump the date in the header to 2017 as well. That was done in x264 as part of an earlier commit but might as well squash it into this one. ___ ffmpeg-devel maili

Re: [FFmpeg-devel] [PATCH v3] mdct15: add assembly optimizations for the 15-point FFT

2017-06-22 Thread Henrik Gramner
On Fri, Jun 23, 2017 at 12:44 AM, Rostislav Pehlivanov wrote: > +%macro FFT5 3 ; %1 - in_offset, %2 - dst1 (64bit used), %3 - dst2 > +movddup xm0, [inq + 0*16 + 0 + %1] ; in[ 0].re, in[ 0].im, in[ 0].re, > in[ 0].im > +movsd xm1, [inq + 1*16 + 8 + %1] ; in[ 3].re, in[ 3].im, 0

Re: [FFmpeg-devel] [PATCH] Remove REP_RET usage throughout x86 asm files

2017-11-13 Thread Henrik Gramner
On Sun, Nov 12, 2017 at 9:59 PM, Rostislav Pehlivanov wrote: > No longer needed as AUTO_REP_RET deals with it on normal RETs. Only when the RET follows a branch instruction. If it's a branch target (that isn't by itself preceded by a branch instruction) there is no way of automatically detecting

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread Henrik Gramner
On Sun, Nov 26, 2017 at 11:51 PM, James Darnley wrote: > -pd_0_int_min: times 2 dd 0, -2147483648 > -pq_int_min: times 2 dq -2147483648 > -pq_int_max: times 2 dq 2147483647 > +pd_0_int_min: times 4 dd 0, -2147483648 > +pq_int_min: times 4 dq -2147483648 > +pq_int_max: times 4 dq 21

Re: [FFmpeg-devel] avcodec/x86/bswapdsp : convert pb_bswap32 to ymm constant in order to simplify code

2017-11-27 Thread Henrik Gramner
On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali wrote: > Hello, > > In attach patch to convert pb_bswap32 to ymm constant > and remove the vbroadcasti128 part > > Speed seems to be similar to me This just wastes cache for no reason. A tiny amount, sure, but minor things tends to add up eventually

Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-27 Thread Henrik Gramner
>> Using 128-bit broadcasts is preferable over duplicating the constants >> to 256-bit unless there's a good reason for doing so since it wastes >> less cache and is faster on AMD CPU:s. > > What would that reason be? Afaik broadcasts are expensive, since they > both load from memory then splat dat

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-11-28 Thread Henrik Gramner
On Mon, Nov 27, 2017 at 11:37 PM, James Almer wrote: > On 11/27/2017 7:33 PM, James Darnley wrote: >> If the condition was made "mmsize > 16" would this work correctly for >> zmm registers? (Assume I finally push my AVX-512 patches). > > No, there's no EVEX variant of vbroadcasti128. For that you

Re: [FFmpeg-devel] avutil/x86util : add macro for 128 bits constant load

2017-12-02 Thread Henrik Gramner
On Fri, Dec 1, 2017 at 9:03 PM, Martin Vignali wrote: > If no one have objections, i will push these patch tomorrow. > > Martin Follow James' suggestion to use >16 instead of ==32, otherwise OK. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http

Re: [FFmpeg-devel] avfilter/x86/vf_hflip : make macro and add AVX2

2017-12-13 Thread Henrik Gramner
On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali wrote: > +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at > load > +vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at > load Would doing 2x 128-bit movu + 2x vinserti128 be faster? __

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-13 Thread Henrik Gramner
On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali wrote: > the idea in AVX2 is to load 128bits of data (2x 64 bits) > then shuffle accross lane, the two 64 bits in the low part of each lane, to > keep the rest of the process similar > to the sse version What about using pmovzxbw instead of movu + vp

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 version for 8b func (WIP)

2017-12-17 Thread Henrik Gramner
On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali wrote: > 2017-12-13 17:37 GMT+01:00 Henrik Gramner : >> You could also do vextracti128 + 128-bit packuswb instead of 256-bit >> packuswb + vpermq. >> > Sorry don't understand this part > do you mean 128 bit pack

Re: [FFmpeg-devel] libavcodec/exr : add SIMD for reorder pixels (SSE and AVX2) v3 (WIP)

2017-09-10 Thread Henrik Gramner
On Sun, Sep 10, 2017 at 5:17 PM, Martin Vignali wrote: > +void (*reorder_pixels)(uint8_t *src, uint8_t *dst, int size); size should be ptrdiff_t instead of int since it's used as a 64-bit operand in the asm on x86-64 and the upper 32 bits are undefined otherwise. > +++ b/libavcodec/x86/exrds

Re: [FFmpeg-devel] [PATCH 3/3] avcodec/x86/lossless_videoencdsp: Fix warning: signed dword value exceeds bounds

2017-09-30 Thread Henrik Gramner
On Sat, Sep 30, 2017 at 12:58 AM, Michael Niedermayer wrote: > -andi, -2 * regsize > +andi, -(2 * regsize) regsize is defined to mmsize / 2 in the relevant case so the expression resolves to -2 * 16 / 2 In nasm integers are 64-bit and / is unsigned divisio

Re: [FFmpeg-devel] libavcodec/exr : add x86 SIMD for predictor

2017-10-01 Thread Henrik Gramner
On Fri, Sep 22, 2017 at 11:12 PM, Martin Vignali wrote: > +static void predictor_scalar(uint8_t *src, ptrdiff_t size) > +{ > +uint8_t *t= src + 1; > +uint8_t *stop = src + size; > + > +while (t < stop) { > +int d = (int) t[-1] + (int) t[0] - 128; > +t[0] = d; > +

Re: [FFmpeg-devel] libavcodec/exr : add x86 SIMD for predictor

2017-10-01 Thread Henrik Gramner
On Sun, Oct 1, 2017 at 4:14 PM, James Almer wrote: > We normally use int for counters, and don't mix declaration and statements. > And in any case ptrdiff_t would be "more correct" for this. Ah right. C90, ugh. Too used to C99. Yeah, feel free to use whatever datatype that's most appropriate for

Re: [FFmpeg-devel] [PATCH]lavc/h264:Only check x264_build if it was set

2017-10-06 Thread Henrik Gramner
On Thu, Oct 5, 2017 at 8:31 AM, Carl Eugen Hoyos wrote: > Hi! > > Attached patch fixes ticket #6717. > > Please comment, Carl Eugen Signed numbers are converted to unsigned when compared to unsigned numbers which means -1 becomes UINT_MAX so this patch shouldn't actually change anything. #6717 i

[FFmpeg-devel] [PATCH] swscale: Reduce verbosity of misalignment reporting

2017-10-22 Thread Henrik Gramner
It's a bit overzealous to complain about misalignment with AV_LOG_WARNING, especially since memory bandwidth is much more likely to be the bottleneck compared to data alignment which the user may not even have control over. --- libswscale/swscale.c | 18 +++--- 1 file changed, 3 insert

Re: [FFmpeg-devel] [PATCH] swscale: Reduce verbosity of misalignment reporting

2017-10-29 Thread Henrik Gramner
On Sun, Oct 22, 2017 at 11:47 AM, Henrik Gramner wrote: > It's a bit overzealous to complain about misalignment with AV_LOG_WARNING, > especially since memory bandwidth is much more likely to be the bottleneck > compared to data alignment which the user may not even have contr

Re: [FFmpeg-devel] [PATCH 8/8] avcodec/v210enc: add AVX-512 10-bit line pack function

2017-10-30 Thread Henrik Gramner
On Mon, Oct 30, 2017 at 2:08 PM, James Darnley wrote: > +INIT_YMM avx512 ymm? ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH 2/2] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-08 Thread Henrik Gramner
On Sat, Oct 8, 2016 at 5:20 PM, Rostislav Pehlivanov wrote: > +cglobal aac_quantize_bands, 8, 8, 7, out, in, scaled, size, Q34, is_signed, > maxval, rounding [...] > +movdm4, is_signedd movd is SSE2. Can be worked around by moving it through the stack though. [...] > +/* Can't pass

Re: [FFmpeg-devel] [PATCH v2] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-09 Thread Henrik Gramner
On Sun, Oct 9, 2016 at 2:15 PM, Rostislav Pehlivanov wrote: > +cglobal aac_quantize_bands, 6, 6, 6, out, in, scaled, size, is_signed, > maxval, Q34, rounding Now that this function is SSE2 you should explicitly use floating-point instructions to avoid bypass delays from transitioning between int

Re: [FFmpeg-devel] [PATCH v2] aacenc: add SIMD optimizations for abs_pow34 and quantization

2016-10-09 Thread Henrik Gramner
On Sun, Oct 9, 2016 at 5:04 PM, Michael Niedermayer wrote: > this segfaults on x86-32 I'm guessing due to unaligned local arrays in search_for_ms(): float M[128], S[128]; ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/li

Re: [FFmpeg-devel] [PATCH] avutil/x86/emms: Document the emms_c() vs alloc/free relation.

2016-10-24 Thread Henrik Gramner
On Mon, Oct 24, 2016 at 8:09 PM, wm4 wrote: > B) add a compile time option that runs emms() unconditionally at the end >of each mmx asm block > > Since musl intentionally evades detection, neither can be enabled > automatically, probably. It would be interesting to see what the speed > impact

Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Henrik Gramner
On Mon, Oct 24, 2016 at 9:34 PM, wm4 wrote: > a ASM function must, according to the calling convention, reset the > MMX state when returning. > > What FFmpeg does here was misdesigned from the very start. The decision to issue emms manually instead of after every MMX function was a deliberate dec

Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-24 Thread Henrik Gramner
On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje wrote: > Good idea to reference Hendrik Gramner here, who keeps insisting we get rid > of all MMX code in ffmpeg (at least as an option) for future Intel CPUs in > which MMX will be deprecated. Replacing MMX with SSE2 is indeed the most "proper" f

Re: [FFmpeg-devel] [PATCH 09/13] avcodec/svq1dec: clear MMX state after MB decode loop

2016-10-25 Thread Henrik Gramner
On Tue, Oct 25, 2016 at 2:28 PM, wrote: > It would be nice to look at a benchmarking comparison, to be able to > see the actual practical performance gain of the decision not to follow > the ABI. Just a quick comparison from adding EMMS to a random MMX function (from x264, because I happened to

Re: [FFmpeg-devel] [PATCH] avformat/hls: Added subtitle support

2016-11-15 Thread Henrik Gramner
On Tue, Nov 15, 2016 at 1:39 PM, Franklin Phillips wrote: > Sorry, I forgot to mention that my first attempt was using git send-email, > when that didn't work, I tried mutt, same result using both clients. Might be spam-filter related. All your e-mails end up in my spam folder due to failing DMA

Re: [FFmpeg-devel] [PATCH 3/4] avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter

2016-12-07 Thread Henrik Gramner
On Wed, Dec 7, 2016 at 2:07 PM, James Darnley wrote: > Because a few instructions using 3 operand form should be quicker. The > fact that it doesn't show is no doubt down to the out of order execution > managing to do the moves earlier than written. Register-register moves are handled in the reg

[FFmpeg-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-25 Thread Henrik Gramner
When allocating stack space with an alignment requirement that is larger than the current stack alignment we need to store a copy of the original stack pointer in order to be able to restore it later. If we chose to use another register for this purpose we should not pick eax/rax since it can be o

Re: [FFmpeg-devel] [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner
On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje wrote: > I know I'm terribly nitpicking here for the limited scope of the comment, > but this only matters for functions that have a return value. Do you think > it makes sense to allow functions to opt out of this requirement if they > explicitly

Re: [FFmpeg-devel] [libav-devel] [PATCH] x86inc: Avoid using eax/rax for storing the stack pointer

2016-12-26 Thread Henrik Gramner
On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultje wrote: > Hm, OK, I think it affects unix64/x86-32 also when using 32-byte > alignment. We do use the stack pointer then. On 32-bit and UNIX64 it simply uses a different caller-saved register which doesn't require additional instructions. > I thi

Re: [FFmpeg-devel] [PATCH] vp9: add 32x32 idct AVX2 implementation.

2016-07-16 Thread Henrik Gramner
On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje wrote: > +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob [...] > +movd xm0, [blockq] > +movam1, [pw_11585x2] > +pmulhrswm0, m1 > +pmulhrswm0, m1 > +

Re: [FFmpeg-devel] lurking bugs in the mmx-related assembler code (?)

2016-10-01 Thread Henrik Gramner
On Sat, Oct 1, 2016 at 5:37 PM, wrote: > musl libc which uses floating point in its malloc() implementation. That's honestly the real "WTF?" here. On Sat, Oct 1, 2016 at 5:56 PM, wrote: > On Sat, Oct 01, 2016 at 05:44:13PM +0200, wm4 wrote: >> AFAIK most MMX code in FFmpeg does not run emms (

Re: [FFmpeg-devel] lurking bugs in the mmx-related assembler code (?)

2016-10-01 Thread Henrik Gramner
Ensuring that emms is issued before every single libc function call is likely problematic. What if we simply document the requirement that C standard library functions are assumed to not modify the x87 FPU state unless specifically designated to handle floating-point numbers? _

Re: [FFmpeg-devel] [PATCH] fate/source: Attempt to fix BSD sed

2016-02-12 Thread Henrik Gramner
On Fri, Feb 12, 2016 at 7:27 AM, Timothy Gu wrote: > -e 's/[^A-Za-z0-9]\{1\,\}/_/g' \ I don't think the comma is supposed to be escaped. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] x86/vf_blend: Add SSE2 optimization for divide

2016-02-14 Thread Henrik Gramner
You could try doing 8 or 16 bytes per iteration instead of 4, it might be faster depending on how good your cpu is at OOE. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] 2.9/3.0, 2.8.5, ...

2016-02-14 Thread Henrik Gramner
On Fri, Jan 1, 2016 at 3:19 PM, Michael Niedermayer wrote: > Hi all > > Its a while since 2.8 so unless there are objections i will make a > 2.9 or if people prefer a 3.0 within the next month or so The Ubuntu 16.04 LTS feature freeze is coming up next week, so it'd be nice to have a release befo

Re: [FFmpeg-devel] [PATCH 0/3] showcqt x86 optimization using intrinsic

2016-03-10 Thread Henrik Gramner
On Thu, Mar 10, 2016 at 12:01 PM, Ismail Donmez wrote: > On Thu, Mar 10, 2016 at 12:04 PM, wm4 wrote: >> We generally don't accept intrinsic in ffmpeg. > > Given this policy has roots from gcc 2.x times, it might be a good > idea to discuss it again in the context of gcc5 and clang 3.8 and > late

Re: [FFmpeg-devel] [PATCH 1/3] configure: Force mingw's ld to keep the reloc section

2016-03-19 Thread Henrik Gramner
On Sat, Mar 19, 2016 at 7:25 PM, Hendrik Leppkes wrote: > Then tell that to binutils to actually produce proper binaries, and > not this broken mess that it produces now. https://sourceware.org/bugzilla/show_bug.cgi?id=19011 Doesn't seem like anything has happened since that bug report though. _

[FFmpeg-devel] [PATCH 1/4] x86inc: Fix AVX emulation of scalar float instructions

2016-04-18 Thread Henrik Gramner
Those instructions are not commutative since they only change the first element in the vector and leave the rest unmodified. --- libavutil/x86/x86inc.asm | 28 ++-- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86i

[FFmpeg-devel] [PATCH 2/4] x86inc: Fix AVX emulation of some instructions

2016-04-18 Thread Henrik Gramner
From: Anton Mitrofanov --- libavutil/x86/x86inc.asm | 44 1 file changed, 24 insertions(+), 20 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 22608ea..a53477b 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/

[FFmpeg-devel] [PATCH 3/4] x86inc: Improve handling of %ifid with multi-token parameters

2016-04-18 Thread Henrik Gramner
From: Anton Mitrofanov The yasm/nasm preprocessor only checks the first token, which means that parameters such as `dword [rax]` are treated as identifiers, which is generally not what we want. --- libavutil/x86/x86inc.asm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/l

[FFmpeg-devel] [PATCH 0/4] x86inc: Sync changes from x264

2016-04-18 Thread Henrik Gramner
Anton Mitrofanov (3): x86inc: Fix AVX emulation of some instructions x86inc: Improve handling of %ifid with multi-token parameters x86inc: Enable AVX emulation in additional cases Henrik Gramner (1): x86inc: Fix AVX emulation of scalar float instructions libavutil/x86/x86inc.asm | 95

[FFmpeg-devel] [PATCH 4/4] x86inc: Enable AVX emulation in additional cases

2016-04-18 Thread Henrik Gramner
From: Anton Mitrofanov Allows emulation to work when dst is equal to src2 as long as the instruction is commutative, e.g. `addps m0, m1, m0`. --- libavutil/x86/x86inc.asm | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavut

Re: [FFmpeg-devel] [PATCH 0/4] x86inc: Sync changes from x264

2016-04-20 Thread Henrik Gramner
On Tue, Apr 19, 2016 at 4:20 AM, Michael Niedermayer wrote: > should be ok Thanks, pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] vp9: add 16x16 idct avx2 (8-bit).

2016-07-11 Thread Henrik Gramner
On Mon, Jul 11, 2016 at 11:48 PM, Carl Eugen Hoyos wrote: > Ronald S. Bultje gmail.com> writes: > >> +%if ARCH_X86_64 > > Just curious: Why does this not work on x86-32? > Isn't there some asm magic that moves some > parameters to the stack if necessary? > > Carl Eugen Uses more than 8 vector re

Re: [FFmpeg-devel] [PATCH 3/4] x86util: import MOVHL macro

2017-02-14 Thread Henrik Gramner
On Mon, Feb 13, 2017 at 1:44 PM, James Darnley wrote: > Originally committed to x264 in 1637239a by Henrik Gramner who has > agreed to re-license it as LGPL. Original commit message follows. > > x86: Avoid some bypass delays and false dependencies > > A bypass delay o

Re: [FFmpeg-devel] [PATCH v4] mdct15: add assembly optimizations for the 15-point FFT

2017-06-23 Thread Henrik Gramner
On Fri, Jun 23, 2017 at 10:18 PM, Michael Niedermayer wrote: > seems to fail to build here: > > libavcodec/x86/mdct15.asm:116: error: invalid combination of opcode and > operands > libavcodec/x86/mdct15.asm:117: error: invalid combination of opcode and > operands > libavcodec/x86/mdct15.asm:118:

Re: [FFmpeg-devel] [PATCH 09/11] avcodec/x86: allow future 8-bit simple idct to have "DC only hack"

2017-06-24 Thread Henrik Gramner
On Mon, Jun 19, 2017 at 5:11 PM, James Darnley wrote: > +por m1, m8, m13 > +por m1, m12 > +por m1, [blockq+ 16] ; { row[1] }[0-7] > +por m1, [blockq+ 48] ; { row[3] }[0-7] > +por m1, [blockq+ 80] ; { row[5] }[0-7] > +por m1, [blockq

Re: [FFmpeg-devel] [WIP][PATCH]v2 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-06-25 Thread Henrik Gramner
On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev wrote: > +%define HADDPS_IS_FAST 0 > +%define PHADDD_IS_FAST 0 [...] > +haddps %1, %1 > +haddps %1, %1 [...] > + phaddd xmm%1,xmm%1 > + phaddd xmm%1,xmm%1 You can safely assume that those instru

Re: [FFmpeg-devel] [PATCH 1/2] checkasm: add sbrdsp tests

2017-06-29 Thread Henrik Gramner
On Fri, Jun 30, 2017 at 1:58 AM, Michael Niedermayer wrote: > Program received signal SIGSEGV, Segmentation fault. > 0x00684919 in ff_sbr_hf_gen_sse () >0x00684909 : sub%r9,%r8 > => 0x00684919 : movaps (%rsi,%r8,1),%xmm0 > r9 0xdeadbeef0080

Re: [FFmpeg-devel] [PATCH] avfilter: add LIBVMAF filter

2017-07-16 Thread Henrik Gramner
`./configure && make` results in "libavfilter/vf_libvmaf.c:29:21: fatal error: libvmaf.h: No such file or directory". I don't have libvmaf installed, but it configures it as enabled and detects it as installed anyway. ___ ffmpeg-devel mailing list ffmpeg

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-07-31 Thread Henrik Gramner
On Wed, Jul 26, 2017 at 4:56 PM, Ivan Kalvachev wrote: > +++ b/libavcodec/x86/opus_pvq_search.asm Generic minor stuff: Use rN instead of rNq for numbered registers (q suffix is used for named args only due to preprocessor limitations). Use the same "standard" vertical alignment rules as most ex

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-02 Thread Henrik Gramner
On Tue, Aug 1, 2017 at 11:46 PM, Ivan Kalvachev wrote: > On 7/31/17, Henrik Gramner wrote: >> Use rN instead of rNq for numbered registers (q suffix is used for >> named args only due to preprocessor limitations). > > Is this documented? Not sure, but there's probably

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-04 Thread Henrik Gramner
On Thu, Aug 3, 2017 at 11:36 PM, Ivan Kalvachev wrote: >> 1234_1234_1234_123 >> VBROADCASTSS ym1, xm1 >> BLENDVPS m1, m2, m3 >> >> is the most commonly used alignment. > > I see that a lot of .asm files use different alignments. > I'll try to pick something similar that I

Re: [FFmpeg-devel] [PATCH] Add macros used in opus_pvq_search to x86util.asm

2017-08-06 Thread Henrik Gramner
On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev wrote: > +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm > +%if cpuflag(avx2) > +vbroadcastss %1, %2; ymm, xmm > +%elif cpuflag(avx) > +%ifnum sizeof%2 ; avx1 register > +vpermilps xmm%1, xmm%2, q

Re: [FFmpeg-devel] [PATCH]v6 Opus Pyramid Vector Quantization Search in x86 SIMD asm

2017-08-06 Thread Henrik Gramner
On Sat, Aug 5, 2017 at 12:58 AM, Ivan Kalvachev wrote: > 8 packed, 8 scalar. > > Unless I miss something (and as I've said before, > I'm not confident enough to mess with that code.) > > (AVX does extend to 32 variants, but they are not > SSE compatible, so no need to emulate them.) Oh, right. I

Re: [FFmpeg-devel] [PATCH] avcodec/x86/hpeldsp: fix half pel interpolation

2018-04-27 Thread Henrik Gramner
On Fri, Apr 27, 2018 at 4:47 PM, Jerome Borsboom wrote: > In the put_no_rnd_pixels functions, the psubusb instruction subtracts one > from each > unsigned byte to correct for the rouding that the PAVGB instruction performs. > The psubusb > instruction, however, uses saturation when the value doe

Re: [FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD for yuv444 format when main stream has no alpha

2018-04-30 Thread Henrik Gramner
On Mon, Apr 30, 2018 at 6:17 PM, Paul B Mahol wrote: > +.loop0: > +movu m1, [dq + xq] > +movu m2, [aq + xq] > +movu m3, [sq + xq] > + > +pshufb m1, [pb_b2dw] > +pshufb m2, [pb_b2dw] > +pshufb m3, [pb_b2dw] > +

Re: [FFmpeg-devel] [PATCH] avfilter/vf_overlay: add x86 SIMD

2018-05-01 Thread Henrik Gramner
On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol wrote: > +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x [...] > +movum2, [aq+2*xq] > +pandm2, m3 > +movum6, [aq+2*xq] > +pandm6, m7 > +psrlw m6, 8 > +p

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-12 Thread Henrik Gramner
On Thu, Jan 11, 2018 at 9:45 PM, Martin Vignali wrote: > +if (check_func(c.sub_left_predict, "sub_left_predict")) { > +call_ref(dst0, src0, stride, width, height); > +call_new(dst1, src0, stride, width, height); > +if (memcmp(dst0, dst1, width)) > +fail(); >

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-13 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali wrote: > i try to change int width -> ptrdiff_t width to remove movsxdifnidn > but i have a segfault if height > 1 I'm guessing due to > +declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst, const uint8_t > *src, > + ptr

Re: [FFmpeg-devel] [PATCH 3/3] avfilter/vf_framerate: add SIMD functions for frame blending

2018-01-14 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint wrote: > +.loop: > +movum0, [src1q + xq] > +movum1, [src2q + xq] > +punpckl%1%2 m5, m0, m2 ; 0e0f0g0h > +punpckh%1%2 m0, m2 ; 0a0b0c0d > +punpckl%1%2

Re: [FFmpeg-devel] avcodec/utvideoenc : add SIMD (SSSE3) for sub_left_pred

2018-01-14 Thread Henrik Gramner
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali wrote: > +#define randomize_buffers(buf, size) \ > +do { \ > +int j; \ > +uint8_t *tmp_buf = (uint8_t *)buf;\ > +for (j = 0; j < size; j++) \ > +

Re: [FFmpeg-devel] avfilter/x86/vf_blend : add avx2 for 8b func (v2)

2018-01-17 Thread Henrik Gramner
On Tue, Jan 16, 2018 at 11:33 PM, Martin Vignali wrote: > BLEND_INIT grainextract, 4 You could also try doing twice as much per iteration which might be more efficient, especially in avx2 since it avoids cross-lane shuffles. Applies to some other ones as well. E.g. something like: pxor

[FFmpeg-devel] [PATCH 0/5] x86inc: Sync changes from x264

2018-01-18 Thread Henrik Gramner
Henrik Gramner (5): x86inc: Enable AVX emulation for floating-point pseudo-instructions x86inc: Use .rdata instead of .rodata on Windows x86inc: Support creating global symbols from local labels x86inc: Correctly set mmreg variables x86inc: Drop cpuflags_slowctz libavutil/x86

[FFmpeg-devel] [PATCH 3/5] x86inc: Support creating global symbols from local labels

2018-01-18 Thread Henrik Gramner
index 57cd4d80de..de048f863d 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -4,9 +4,9 @@ ;* Copyright (C) 2005-2017 x264 project ;* ;* Authors: Loren Merritt +;* Henrik Gramner ;* Anton Mitrofanov ;* Fiona Glaser -;* Henrik

[FFmpeg-devel] [PATCH 2/5] x86inc: Use .rdata instead of .rodata on Windows

2018-01-18 Thread Henrik Gramner
The standard section for read-only data on Windows is .rdata. Nasm will flag non-standard sections as executable by default which isn't ideal. --- libavutil/x86/x86inc.asm | 4 1 file changed, 4 insertions(+) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 3b43dbc2e0..

[FFmpeg-devel] [PATCH 1/5] x86inc: Enable AVX emulation for floating-point pseudo-instructions

2018-01-18 Thread Henrik Gramner
There are 32 pseudo-instructions for each floating-point comparison instruction, but only 8 of them are actually valid in legacy-encoded mode. The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions and can therefore be disregarded for this purpose. --- libavutil/x86/x86inc.asm

[FFmpeg-devel] [PATCH 5/5] x86inc: Drop cpuflags_slowctz

2018-01-18 Thread Henrik Gramner
--- libavutil/x86/x86inc.asm | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm index 438863042f..5044ee86f0 100644 --- a/libavutil/x86/x86inc.asm +++ b/libavutil/x86/x86inc.asm @@ -827,9 +827,8 @@ BRANCH_INSTR jz, je, jnz,

[FFmpeg-devel] [PATCH 4/5] x86inc: Correctly set mmreg variables

2018-01-18 Thread Henrik Gramner
;* ;* Authors: Loren Merritt ;* Henrik Gramner @@ -892,6 +892,36 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg, jge, jng, jnge, ja, jae, %undef %1%2 %endmacro +%macro DEFINE_MMREGS 1 ; mmtype +%assign %%prev_mmregs 0 +%ifdef num_mmregs +%assign

Re: [FFmpeg-devel] [PATCH 0/5] x86inc: Sync changes from x264

2018-01-20 Thread Henrik Gramner
Pushed. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Re: [FFmpeg-devel] [PATCH] Moves yuv2yuvX_sse3 to yasm, unrolls main loop and other small optimizations for ~20% speedup.

2020-11-17 Thread Henrik Gramner
On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly wrote: > +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset, > src Only 8 xmm registers are used, so 8 should be used instead of 16 here. Otherwise it causes unnecessary spilling of registers on 64-bit Windows. > +%if ARCH_X86_

Re: [FFmpeg-devel] Discrepancy between comments for AVX512 flags

2022-08-27 Thread Henrik Gramner
> On Sat, Aug 27, 2022 at 12:04 AM James Darnley wrote: > I think the feature selection is fine as-is, if you want to clarify > the comments go ahead. AVX512 wouldn't be useful with a subset even > smaller then what the plain AVX512 is looking for (there is also no > CPUs with any smaller set, afa

Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly

2022-09-02 Thread Henrik Gramner
On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote: > +movd xmm4, strided > +neg t2d > +movd xmm5, t2d > +SPLATD xmm4 > +SPLATD xmm5 > +vperm2f128 m4, m4, m4, 0x00 ; +stride splatted > +vperm2f128 m5, m5, m5, 0x00 ; -stride splatted movd xm4, strided pxor m5, m5 vpbr

Re: [FFmpeg-devel] [PATCH v3] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

2022-09-06 Thread Henrik Gramner
On Tue, Aug 23, 2022 at 10:43 AM wrote: > +.loop1: > +pxor m4, m4 > +pxor m5, m5 > + > +;Gx > +SOBEL_MUL_16 0, data_n1, 4 > +SOBEL_MUL_16 1, data_n2, 4 > +SOBEL_MUL_16 2, data_n1, 4 > +SOBEL_ADD_16 6, 4 > +SOBEL_MUL_16 7, data_p2, 4 > +SOBEL_ADD_16 8, 4 > + > [.

Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: Fix building for platforms with a symbol prefix

2022-09-06 Thread Henrik Gramner
LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-devel] [PATCH v4] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI

2022-09-19 Thread Henrik Gramner
On Wed, Sep 7, 2022 at 8:47 AM wrote: > +.loop1: > +pxor m4, m4 > +pxor m5, m5 Those zero-initializations are redundant. Aside from that the asm LGTM. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmp

Re: [FFmpeg-devel] [PATCH 2/4] lavc/pthread_frame: set worker thread names

2022-10-18 Thread Henrik Gramner
On Tue, Oct 18, 2022 at 6:54 PM Anton Khirnov wrote: > +static void thread_set_name(PerThreadContext *p) > +{ > +AVCodecContext *avctx = p->avctx; > +int idx = p - p->parent->threads; > +char name[16]; > + > +snprintf(name, sizeof(name), "d:%.7s:ft%d", avctx->codec->name, idx); > +

Re: [FFmpeg-devel] [PATCH] RFC: v210enc optimisations and initial AVX-512

2022-10-21 Thread Henrik Gramner
On Fri, Oct 21, 2022 at 5:41 AM Kieran Kunhya wrote: > > Hi, > > Please see attached an attempt to optimise the 8-bit input to v210enc to > reduce the number of shuffles. > This comes at the cost of having to extract the middle element and perform > a DWORD shift on it and then reinserting it. > I

  1   2   3   >