On Thu, Sep 13, 2018 at 3:08 PM, James Almer wrote:
> +lea lenq, [lend*8 - mmsize*4]
Is len guaranteed to be a multiple of mmsize/8? Otherwise this would
cause misalignment. It will also break if len < mmsize/2.
Also if you want a 32-bit result from lea it should be written as "lea
len
On Fri, Sep 14, 2018 at 4:51 PM, Henrik Gramner wrote:
> I can't really think of any scenario where using a 32-bit register
> address operand with a 64-bit destination for LEA is not a mistake.
To clarify on this, using a 32-bit memory operand means the calculated
effective address
On Fri, Sep 14, 2018 at 3:26 PM, James Almer wrote:
> On 9/14/2018 9:57 AM, Henrik Gramner wrote:
>> Also if you want a 32-bit result from lea it should be written as "lea
>> lend, [lenq*8 - mmsize*4]" which is equivalent but has a shorter
>> opcode (e.g. always u
On Mon, Oct 8, 2018 at 7:46 PM Martin Vignali wrote:
>
> Hello,
>
> Patch in attach port inline asm shuffle 2103 func (mmx/mmxext) to external
> asm
> and remove the inline asm version
>
> Martin
Keeping both MMX and MMXEXT seems a bit excessive. Ideally both would
be replaced with something more
Fixed in x264-sandbox. All uses of plain strtok() will be removed from
x264 in the next push.
I though all of the strtok() uses in x264 had already been converted
to strtok_r() but apparently that wasn't the case. Sorry about that.
___
ffmpeg-devel maili
On Tue, Oct 23, 2018 at 3:22 PM Derek Buitenhuis
wrote:
> I'd like to point out that this patch or some variant may be required anyway.
>
> libx264 only uses strtok_r or strtok_s if available on the platform.
>
> See:
> https://git.videolan.org/?p=x264.git;a=blob;f=common/osdep.h;h=715ef8a00c01ad
All 5 lgtm.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
On Wed, Sep 4, 2019 at 10:01 PM James Almer wrote:
> On 9/4/2019 4:28 PM, Paul B Mahol wrote:
> > +vpmulld m3, m1, m0
> > +vpaddd m1, m3, m2
>
> pmulld m1, m0
> paddd m1, m2
Could use pmaddwd instead as well, it's faster than pmulld on pretty
much every CPU.
>
On Wed, Sep 4, 2019 at 9:29 PM Paul B Mahol wrote:
> +movd xm6, [pd_255]
> +vpbroadcastdm6, xm6
vpbroadcastdm6, [pd_255]
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To
On Wed, Dec 4, 2019 at 4:03 AM Ting Fu wrote:
> +VBROADCASTSD y_offset, [pointer_c_ditherq + 8 * 8]
> +VBROADCASTSD u_offset, [pointer_c_ditherq + 9 * 8]
> +VBROADCASTSD v_offset, [pointer_c_ditherq + 10 * 8]
> +VBROADCASTSD ug_coff, [pointer_c_ditherq + 7 * 8]
> +VBROADCAS
On Fri, Jan 3, 2020 at 7:37 PM Moritz Barsnick wrote:
> On Fri, Jan 03, 2020 at 11:05:25 +0100, Timo Rothenpieler wrote:
> > I think this was discussed on this list in the past.
> > Not sure what the conclusion was, but I think an unconditional flag like
> > this being added wasn't all that well r
On Mon, Jun 14, 2021 at 9:22 AM Matthias Neugebauer wrote:
> Anything I can do to not land in spam? On another Google groups
> mailing list I (and many others including the admin accounts) had
> the same issue a couple of times.
This is caused by sending emails from a domain with a DMARC reject o
On Mon, Jul 5, 2021 at 4:32 AM Fei Wang wrote:
> +int64_t v, w;
> +int32_t *param = &s->cur_frame.gm_params[idx][0];
...
> +v = param[4] * (1 << AV1_WARPEDMODEL_PREC_BITS);
> +w = param[3] * param[4];
Possible integer overflow? Might need some int64_t casting before the
mu
On Mon, Jan 21, 2019 at 9:54 PM James Almer wrote:
> There's also no good way to deprecate a define and replace it with
> another while informing the library user, so for something purely
> cosmetic like this i don't think it's worth the trouble.
Would it be possible to create a deprecated inline
On Wed, Feb 20, 2019 at 8:03 PM Fāng-ruì Sòng
wrote:
> --- a/libavutil/mem.h
> +++ b/libavutil/mem.h
>
> +#if defined(__GNUC__) && !(defined(_WIN32) || defined(__CYGWIN__))
> +#define DECLARE_HIDDEN __attribute__ ((visibility ("hidden")))
> +#else
> +#define DECLARE_HIDDEN
> +#endif
libav
x86util, not x86inc. Otherwise LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Wed, Apr 5, 2017 at 3:53 AM, James Darnley wrote:
> call h264_idct_add8_mmx_plane
> -RET
> +RET ; TODO: check rep ret after a function call
call followed by RET should be replaced by the TAIL_CALL macro instead
which outputs a jmp instruction if there's no function epilogu
On Fri, Apr 14, 2017 at 1:19 PM, James Darnley wrote:
> Do you want me to change this patch to add that?
Either the same patch or a different one, pick whichever is most
convenient for you.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ff
What about just using strip -x on the assembly files to discard the
local symbols?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Fri, Jun 9, 2017 at 1:05 AM, James Darnley wrote:
>Where should I put the aesni define?
Between sse42 and avx.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Fri, Jun 9, 2017 at 1:04 AM, James Darnley wrote:
> libavutil/x86/x86inc.asm | 6 +-
Bump the date in the header to 2017 as well. That was done in x264 as
part of an earlier commit but might as well squash it into this one.
___
ffmpeg-devel maili
On Fri, Jun 23, 2017 at 12:44 AM, Rostislav Pehlivanov
wrote:
> +%macro FFT5 3 ; %1 - in_offset, %2 - dst1 (64bit used), %3 - dst2
> +movddup xm0, [inq + 0*16 + 0 + %1] ; in[ 0].re, in[ 0].im, in[ 0].re,
> in[ 0].im
> +movsd xm1, [inq + 1*16 + 8 + %1] ; in[ 3].re, in[ 3].im, 0
On Sun, Nov 12, 2017 at 9:59 PM, Rostislav Pehlivanov
wrote:
> No longer needed as AUTO_REP_RET deals with it on normal RETs.
Only when the RET follows a branch instruction. If it's a branch
target (that isn't by itself preceded by a branch instruction) there
is no way of automatically detecting
On Sun, Nov 26, 2017 at 11:51 PM, James Darnley wrote:
> -pd_0_int_min: times 2 dd 0, -2147483648
> -pq_int_min: times 2 dq -2147483648
> -pq_int_max: times 2 dq 2147483647
> +pd_0_int_min: times 4 dd 0, -2147483648
> +pq_int_min: times 4 dq -2147483648
> +pq_int_max: times 4 dq 21
On Sat, Nov 25, 2017 at 9:53 PM, Martin Vignali
wrote:
> Hello,
>
> In attach patch to convert pb_bswap32 to ymm constant
> and remove the vbroadcasti128 part
>
> Speed seems to be similar to me
This just wastes cache for no reason. A tiny amount, sure, but minor
things tends to add up eventually
>> Using 128-bit broadcasts is preferable over duplicating the constants
>> to 256-bit unless there's a good reason for doing so since it wastes
>> less cache and is faster on AMD CPU:s.
>
> What would that reason be? Afaik broadcasts are expensive, since they
> both load from memory then splat dat
On Mon, Nov 27, 2017 at 11:37 PM, James Almer wrote:
> On 11/27/2017 7:33 PM, James Darnley wrote:
>> If the condition was made "mmsize > 16" would this work correctly for
>> zmm registers? (Assume I finally push my AVX-512 patches).
>
> No, there's no EVEX variant of vbroadcasti128. For that you
On Fri, Dec 1, 2017 at 9:03 PM, Martin Vignali wrote:
> If no one have objections, i will push these patch tomorrow.
>
> Martin
Follow James' suggestion to use >16 instead of ==32, otherwise OK.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http
On Wed, Dec 13, 2017 at 6:07 AM, Martin Vignali
wrote:
> +vpermq m1, [srcq + xq - mmsize + %3], 0x4e; flip each lane at
> load
> +vpermq m2, [srcq + xq - 2 * mmsize + %3], 0x4e; flip each lane at
> load
Would doing 2x 128-bit movu + 2x vinserti128 be faster?
__
On Sat, Dec 9, 2017 at 1:11 PM, Martin Vignali wrote:
> the idea in AVX2 is to load 128bits of data (2x 64 bits)
> then shuffle accross lane, the two 64 bits in the low part of each lane, to
> keep the rest of the process similar
> to the sse version
What about using pmovzxbw instead of movu + vp
On Thu, Dec 14, 2017 at 11:16 AM, Martin Vignali
wrote:
> 2017-12-13 17:37 GMT+01:00 Henrik Gramner :
>> You could also do vextracti128 + 128-bit packuswb instead of 256-bit
>> packuswb + vpermq.
>>
> Sorry don't understand this part
> do you mean 128 bit pack
On Sun, Sep 10, 2017 at 5:17 PM, Martin Vignali
wrote:
> +void (*reorder_pixels)(uint8_t *src, uint8_t *dst, int size);
size should be ptrdiff_t instead of int since it's used as a 64-bit
operand in the asm on x86-64 and the upper 32 bits are undefined
otherwise.
> +++ b/libavcodec/x86/exrds
On Sat, Sep 30, 2017 at 12:58 AM, Michael Niedermayer
wrote:
> -andi, -2 * regsize
> +andi, -(2 * regsize)
regsize is defined to mmsize / 2 in the relevant case so the
expression resolves to -2 * 16 / 2
In nasm integers are 64-bit and / is unsigned divisio
On Fri, Sep 22, 2017 at 11:12 PM, Martin Vignali
wrote:
> +static void predictor_scalar(uint8_t *src, ptrdiff_t size)
> +{
> +uint8_t *t= src + 1;
> +uint8_t *stop = src + size;
> +
> +while (t < stop) {
> +int d = (int) t[-1] + (int) t[0] - 128;
> +t[0] = d;
> +
On Sun, Oct 1, 2017 at 4:14 PM, James Almer wrote:
> We normally use int for counters, and don't mix declaration and statements.
> And in any case ptrdiff_t would be "more correct" for this.
Ah right. C90, ugh. Too used to C99.
Yeah, feel free to use whatever datatype that's most appropriate for
On Thu, Oct 5, 2017 at 8:31 AM, Carl Eugen Hoyos wrote:
> Hi!
>
> Attached patch fixes ticket #6717.
>
> Please comment, Carl Eugen
Signed numbers are converted to unsigned when compared to unsigned
numbers which means -1 becomes UINT_MAX so this patch shouldn't
actually change anything.
#6717 i
It's a bit overzealous to complain about misalignment with AV_LOG_WARNING,
especially since memory bandwidth is much more likely to be the bottleneck
compared to data alignment which the user may not even have control over.
---
libswscale/swscale.c | 18 +++---
1 file changed, 3 insert
On Sun, Oct 22, 2017 at 11:47 AM, Henrik Gramner wrote:
> It's a bit overzealous to complain about misalignment with AV_LOG_WARNING,
> especially since memory bandwidth is much more likely to be the bottleneck
> compared to data alignment which the user may not even have contr
On Mon, Oct 30, 2017 at 2:08 PM, James Darnley wrote:
> +INIT_YMM avx512
ymm?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Sat, Oct 8, 2016 at 5:20 PM, Rostislav Pehlivanov
wrote:
> +cglobal aac_quantize_bands, 8, 8, 7, out, in, scaled, size, Q34, is_signed,
> maxval, rounding
[...]
> +movdm4, is_signedd
movd is SSE2. Can be worked around by moving it through the stack though.
[...]
> +/* Can't pass
On Sun, Oct 9, 2016 at 2:15 PM, Rostislav Pehlivanov
wrote:
> +cglobal aac_quantize_bands, 6, 6, 6, out, in, scaled, size, is_signed,
> maxval, Q34, rounding
Now that this function is SSE2 you should explicitly use
floating-point instructions to avoid bypass delays from transitioning
between int
On Sun, Oct 9, 2016 at 5:04 PM, Michael Niedermayer
wrote:
> this segfaults on x86-32
I'm guessing due to unaligned local arrays in search_for_ms():
float M[128], S[128];
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/li
On Mon, Oct 24, 2016 at 8:09 PM, wm4 wrote:
> B) add a compile time option that runs emms() unconditionally at the end
>of each mmx asm block
>
> Since musl intentionally evades detection, neither can be enabled
> automatically, probably. It would be interesting to see what the speed
> impact
On Mon, Oct 24, 2016 at 9:34 PM, wm4 wrote:
> a ASM function must, according to the calling convention, reset the
> MMX state when returning.
>
> What FFmpeg does here was misdesigned from the very start.
The decision to issue emms manually instead of after every MMX
function was a deliberate dec
On Mon, Oct 24, 2016 at 9:59 PM, Ronald S. Bultje wrote:
> Good idea to reference Hendrik Gramner here, who keeps insisting we get rid
> of all MMX code in ffmpeg (at least as an option) for future Intel CPUs in
> which MMX will be deprecated.
Replacing MMX with SSE2 is indeed the most "proper" f
On Tue, Oct 25, 2016 at 2:28 PM, wrote:
> It would be nice to look at a benchmarking comparison, to be able to
> see the actual practical performance gain of the decision not to follow
> the ABI.
Just a quick comparison from adding EMMS to a random MMX function
(from x264, because I happened to
On Tue, Nov 15, 2016 at 1:39 PM, Franklin Phillips
wrote:
> Sorry, I forgot to mention that my first attempt was using git send-email,
> when that didn't work, I tried mutt, same result using both clients.
Might be spam-filter related. All your e-mails end up in my spam
folder due to failing DMA
On Wed, Dec 7, 2016 at 2:07 PM, James Darnley wrote:
> Because a few instructions using 3 operand form should be quicker. The
> fact that it doesn't show is no doubt down to the out of order execution
> managing to do the moves earlier than written.
Register-register moves are handled in the reg
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be o
On Mon, Dec 26, 2016 at 2:32 AM, Ronald S. Bultje wrote:
> I know I'm terribly nitpicking here for the limited scope of the comment,
> but this only matters for functions that have a return value. Do you think
> it makes sense to allow functions to opt out of this requirement if they
> explicitly
On Mon, Dec 26, 2016 at 2:52 PM, Ronald S. Bultje wrote:
> Hm, OK, I think it affects unix64/x86-32 also when using 32-byte
> alignment. We do use the stack pointer then.
On 32-bit and UNIX64 it simply uses a different caller-saved register
which doesn't require additional instructions.
> I thi
On Wed, Jul 13, 2016 at 6:37 PM, Ronald S. Bultje wrote:
> +cglobal vp9_idct_idct_32x32_add, 4, 9, 16, 2048, dst, stride, block, eob
[...]
> +movd xm0, [blockq]
> +movam1, [pw_11585x2]
> +pmulhrswm0, m1
> +pmulhrswm0, m1
> +
On Sat, Oct 1, 2016 at 5:37 PM, wrote:
> musl libc which uses floating point in its malloc() implementation.
That's honestly the real "WTF?" here.
On Sat, Oct 1, 2016 at 5:56 PM, wrote:
> On Sat, Oct 01, 2016 at 05:44:13PM +0200, wm4 wrote:
>> AFAIK most MMX code in FFmpeg does not run emms (
Ensuring that emms is issued before every single libc function call is
likely problematic.
What if we simply document the requirement that C standard library
functions are assumed to not modify the x87 FPU state unless
specifically designated to handle floating-point numbers?
_
On Fri, Feb 12, 2016 at 7:27 AM, Timothy Gu wrote:
> -e 's/[^A-Za-z0-9]\{1\,\}/_/g' \
I don't think the comma is supposed to be escaped.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
You could try doing 8 or 16 bytes per iteration instead of 4, it might
be faster depending on how good your cpu is at OOE.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Fri, Jan 1, 2016 at 3:19 PM, Michael Niedermayer
wrote:
> Hi all
>
> Its a while since 2.8 so unless there are objections i will make a
> 2.9 or if people prefer a 3.0 within the next month or so
The Ubuntu 16.04 LTS feature freeze is coming up next week, so it'd be
nice to have a release befo
On Thu, Mar 10, 2016 at 12:01 PM, Ismail Donmez wrote:
> On Thu, Mar 10, 2016 at 12:04 PM, wm4 wrote:
>> We generally don't accept intrinsic in ffmpeg.
>
> Given this policy has roots from gcc 2.x times, it might be a good
> idea to discuss it again in the context of gcc5 and clang 3.8 and
> late
On Sat, Mar 19, 2016 at 7:25 PM, Hendrik Leppkes wrote:
> Then tell that to binutils to actually produce proper binaries, and
> not this broken mess that it produces now.
https://sourceware.org/bugzilla/show_bug.cgi?id=19011
Doesn't seem like anything has happened since that bug report though.
_
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
---
libavutil/x86/x86inc.asm | 28 ++--
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86i
From: Anton Mitrofanov
---
libavutil/x86/x86inc.asm | 44
1 file changed, 24 insertions(+), 20 deletions(-)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 22608ea..a53477b 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/
From: Anton Mitrofanov
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
---
libavutil/x86/x86inc.asm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/l
Anton Mitrofanov (3):
x86inc: Fix AVX emulation of some instructions
x86inc: Improve handling of %ifid with multi-token parameters
x86inc: Enable AVX emulation in additional cases
Henrik Gramner (1):
x86inc: Fix AVX emulation of scalar float instructions
libavutil/x86/x86inc.asm | 95
From: Anton Mitrofanov
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
---
libavutil/x86/x86inc.asm | 21 +
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/libavutil/x86/x86inc.asm b/libavut
On Tue, Apr 19, 2016 at 4:20 AM, Michael Niedermayer
wrote:
> should be ok
Thanks, pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Mon, Jul 11, 2016 at 11:48 PM, Carl Eugen Hoyos wrote:
> Ronald S. Bultje gmail.com> writes:
>
>> +%if ARCH_X86_64
>
> Just curious: Why does this not work on x86-32?
> Isn't there some asm magic that moves some
> parameters to the stack if necessary?
>
> Carl Eugen
Uses more than 8 vector re
On Mon, Feb 13, 2017 at 1:44 PM, James Darnley wrote:
> Originally committed to x264 in 1637239a by Henrik Gramner who has
> agreed to re-license it as LGPL. Original commit message follows.
>
> x86: Avoid some bypass delays and false dependencies
>
> A bypass delay o
On Fri, Jun 23, 2017 at 10:18 PM, Michael Niedermayer
wrote:
> seems to fail to build here:
>
> libavcodec/x86/mdct15.asm:116: error: invalid combination of opcode and
> operands
> libavcodec/x86/mdct15.asm:117: error: invalid combination of opcode and
> operands
> libavcodec/x86/mdct15.asm:118:
On Mon, Jun 19, 2017 at 5:11 PM, James Darnley wrote:
> +por m1, m8, m13
> +por m1, m12
> +por m1, [blockq+ 16] ; { row[1] }[0-7]
> +por m1, [blockq+ 48] ; { row[3] }[0-7]
> +por m1, [blockq+ 80] ; { row[5] }[0-7]
> +por m1, [blockq
On Sat, Jun 24, 2017 at 10:39 PM, Ivan Kalvachev wrote:
> +%define HADDPS_IS_FAST 0
> +%define PHADDD_IS_FAST 0
[...]
> +haddps %1, %1
> +haddps %1, %1
[...]
> + phaddd xmm%1,xmm%1
> + phaddd xmm%1,xmm%1
You can safely assume that those instru
On Fri, Jun 30, 2017 at 1:58 AM, Michael Niedermayer
wrote:
> Program received signal SIGSEGV, Segmentation fault.
> 0x00684919 in ff_sbr_hf_gen_sse ()
>0x00684909 : sub%r9,%r8
> => 0x00684919 : movaps (%rsi,%r8,1),%xmm0
> r9 0xdeadbeef0080
`./configure && make` results in "libavfilter/vf_libvmaf.c:29:21:
fatal error: libvmaf.h: No such file or directory".
I don't have libvmaf installed, but it configures it as enabled and
detects it as installed anyway.
___
ffmpeg-devel mailing list
ffmpeg
On Wed, Jul 26, 2017 at 4:56 PM, Ivan Kalvachev wrote:
> +++ b/libavcodec/x86/opus_pvq_search.asm
Generic minor stuff:
Use rN instead of rNq for numbered registers (q suffix is used for
named args only due to preprocessor limitations).
Use the same "standard" vertical alignment rules as most ex
On Tue, Aug 1, 2017 at 11:46 PM, Ivan Kalvachev wrote:
> On 7/31/17, Henrik Gramner wrote:
>> Use rN instead of rNq for numbered registers (q suffix is used for
>> named args only due to preprocessor limitations).
>
> Is this documented?
Not sure, but there's probably
On Thu, Aug 3, 2017 at 11:36 PM, Ivan Kalvachev wrote:
>> 1234_1234_1234_123
>> VBROADCASTSS ym1, xm1
>> BLENDVPS m1, m2, m3
>>
>> is the most commonly used alignment.
>
> I see that a lot of .asm files use different alignments.
> I'll try to pick something similar that I
On Sat, Aug 5, 2017 at 9:10 PM, Ivan Kalvachev wrote:
> +%macro VBROADCASTSS 2 ; dst xmm/ymm, src m32/xmm
> +%if cpuflag(avx2)
> +vbroadcastss %1, %2; ymm, xmm
> +%elif cpuflag(avx)
> +%ifnum sizeof%2 ; avx1 register
> +vpermilps xmm%1, xmm%2, q
On Sat, Aug 5, 2017 at 12:58 AM, Ivan Kalvachev wrote:
> 8 packed, 8 scalar.
>
> Unless I miss something (and as I've said before,
> I'm not confident enough to mess with that code.)
>
> (AVX does extend to 32 variants, but they are not
> SSE compatible, so no need to emulate them.)
Oh, right. I
On Fri, Apr 27, 2018 at 4:47 PM, Jerome Borsboom
wrote:
> In the put_no_rnd_pixels functions, the psubusb instruction subtracts one
> from each
> unsigned byte to correct for the rouding that the PAVGB instruction performs.
> The psubusb
> instruction, however, uses saturation when the value doe
On Mon, Apr 30, 2018 at 6:17 PM, Paul B Mahol wrote:
> +.loop0:
> +movu m1, [dq + xq]
> +movu m2, [aq + xq]
> +movu m3, [sq + xq]
> +
> +pshufb m1, [pb_b2dw]
> +pshufb m2, [pb_b2dw]
> +pshufb m3, [pb_b2dw]
> +
On Tue, May 1, 2018 at 10:02 AM, Paul B Mahol wrote:
> +cglobal overlay_row_22, 6, 8, 8, 0, d, da, s, a, w, al, r, x
[...]
> +movum2, [aq+2*xq]
> +pandm2, m3
> +movum6, [aq+2*xq]
> +pandm6, m7
> +psrlw m6, 8
> +p
On Thu, Jan 11, 2018 at 9:45 PM, Martin Vignali
wrote:
> +if (check_func(c.sub_left_predict, "sub_left_predict")) {
> +call_ref(dst0, src0, stride, width, height);
> +call_new(dst1, src0, stride, width, height);
> +if (memcmp(dst0, dst1, width))
> +fail();
>
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
wrote:
> i try to change int width -> ptrdiff_t width to remove movsxdifnidn
> but i have a segfault if height > 1
I'm guessing due to
> +declare_func_emms(AV_CPU_FLAG_MMX, void, uint8_t *dst, const uint8_t
> *src,
> + ptr
On Sat, Jan 13, 2018 at 10:57 PM, Marton Balint wrote:
> +.loop:
> +movum0, [src1q + xq]
> +movum1, [src2q + xq]
> +punpckl%1%2 m5, m0, m2 ; 0e0f0g0h
> +punpckh%1%2 m0, m2 ; 0a0b0c0d
> +punpckl%1%2
On Sat, Jan 13, 2018 at 5:22 PM, Martin Vignali
wrote:
> +#define randomize_buffers(buf, size) \
> +do { \
> +int j; \
> +uint8_t *tmp_buf = (uint8_t *)buf;\
> +for (j = 0; j < size; j++) \
> +
On Tue, Jan 16, 2018 at 11:33 PM, Martin Vignali
wrote:
> BLEND_INIT grainextract, 4
You could also try doing twice as much per iteration which might be
more efficient, especially in avx2 since it avoids cross-lane
shuffles. Applies to some other ones as well.
E.g. something like:
pxor
Henrik Gramner (5):
x86inc: Enable AVX emulation for floating-point pseudo-instructions
x86inc: Use .rdata instead of .rodata on Windows
x86inc: Support creating global symbols from local labels
x86inc: Correctly set mmreg variables
x86inc: Drop cpuflags_slowctz
libavutil/x86
index 57cd4d80de..de048f863d 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -4,9 +4,9 @@
;* Copyright (C) 2005-2017 x264 project
;*
;* Authors: Loren Merritt
+;* Henrik Gramner
;* Anton Mitrofanov
;* Fiona Glaser
-;* Henrik
The standard section for read-only data on Windows is .rdata. Nasm will
flag non-standard sections as executable by default which isn't ideal.
---
libavutil/x86/x86inc.asm | 4
1 file changed, 4 insertions(+)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 3b43dbc2e0..
There are 32 pseudo-instructions for each floating-point comparison
instruction, but only 8 of them are actually valid in legacy-encoded mode.
The remaining 24 requires the use of VEX-encoded (v-prefixed) instructions
and can therefore be disregarded for this purpose.
---
libavutil/x86/x86inc.asm
---
libavutil/x86/x86inc.asm | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/libavutil/x86/x86inc.asm b/libavutil/x86/x86inc.asm
index 438863042f..5044ee86f0 100644
--- a/libavutil/x86/x86inc.asm
+++ b/libavutil/x86/x86inc.asm
@@ -827,9 +827,8 @@ BRANCH_INSTR jz, je, jnz,
;*
;* Authors: Loren Merritt
;* Henrik Gramner
@@ -892,6 +892,36 @@ BRANCH_INSTR jz, je, jnz, jne, jl, jle, jnl, jnle, jg,
jge, jng, jnge, ja, jae,
%undef %1%2
%endmacro
+%macro DEFINE_MMREGS 1 ; mmtype
+%assign %%prev_mmregs 0
+%ifdef num_mmregs
+%assign
Pushed.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
On Mon, Nov 16, 2020 at 11:03 AM Alan Kelly
wrote:
> +cglobal yuv2yuvX, 6, 7, 16, filter, filterSize, dest, dstW, dither, offset,
> src
Only 8 xmm registers are used, so 8 should be used instead of 16 here.
Otherwise it causes unnecessary spilling of registers on 64-bit
Windows.
> +%if ARCH_X86_
> On Sat, Aug 27, 2022 at 12:04 AM James Darnley wrote:
> I think the feature selection is fine as-is, if you want to clarify
> the comments go ahead. AVX512 wouldn't be useful with a subset even
> smaller then what the plain AVX512 is looking for (there is also no
> CPUs with any smaller set, afa
On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote:
> +movd xmm4, strided
> +neg t2d
> +movd xmm5, t2d
> +SPLATD xmm4
> +SPLATD xmm5
> +vperm2f128 m4, m4, m4, 0x00 ; +stride splatted
> +vperm2f128 m5, m5, m5, 0x00 ; -stride splatted
movd xm4, strided
pxor m5, m5
vpbr
On Tue, Aug 23, 2022 at 10:43 AM wrote:
> +.loop1:
> +pxor m4, m4
> +pxor m5, m5
> +
> +;Gx
> +SOBEL_MUL_16 0, data_n1, 4
> +SOBEL_MUL_16 1, data_n2, 4
> +SOBEL_MUL_16 2, data_n1, 4
> +SOBEL_ADD_16 6, 4
> +SOBEL_MUL_16 7, data_p2, 4
> +SOBEL_ADD_16 8, 4
> +
> [.
LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
On Wed, Sep 7, 2022 at 8:47 AM wrote:
> +.loop1:
> +pxor m4, m4
> +pxor m5, m5
Those zero-initializations are redundant. Aside from that the asm LGTM.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmp
On Tue, Oct 18, 2022 at 6:54 PM Anton Khirnov wrote:
> +static void thread_set_name(PerThreadContext *p)
> +{
> +AVCodecContext *avctx = p->avctx;
> +int idx = p - p->parent->threads;
> +char name[16];
> +
> +snprintf(name, sizeof(name), "d:%.7s:ft%d", avctx->codec->name, idx);
> +
On Fri, Oct 21, 2022 at 5:41 AM Kieran Kunhya wrote:
>
> Hi,
>
> Please see attached an attempt to optimise the 8-bit input to v210enc to
> reduce the number of shuffles.
> This comes at the cost of having to extract the middle element and perform
> a DWORD shift on it and then reinserting it.
> I
1 - 100 of 250 matches
Mail list logo