[FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Jim DeLaHunt
Previously, the Developer Documentation
 contained a single chapter,
1. Developer Guide, with all content under that single
chapter. Thus the document structure was one level deeper
and more complicated than it needed to be.  It differed
from similar documents such as /faq.html, which have
multiple chapters.

Also, the Developer Documentation had instructions to
subscribe to the ffmpeg-cvslog email list. But that is
no longer accurate. For the purposes in this section --
review of patches, discussion of development issues --
ffmpeg_devel is the appropriate email list. Some developers
may want to monitor ffmpeg-cvslog, but it is not mandatory
for all contributors.

1. In doc/developer.texi, eliminate the single chapter,
and promote each section underneath to chapter, and
each subsection to section. Thus content and relative
structure remains the same, but the overall structure is
simpler.  Anchors within the page remain the same.

2. In doc/developer.texi, add a new section about
ffmpeg-devel, based on existing text from ffmpeg-cvslog
section regarding discussion of patches and of
development issues.

3. In doc/developer.texi, rewrite the ffmpeg-cvslog section
to match the current usage of ffmpeg-cvslog. Some developers
choose to follow this list, but it is not mandatory.

See ffmpeg-devel thread about the first version of this patch, at
.
I believe all comments to date in that thread are addressed
with this patch.

I believe there were no links to the eliminated "Developer
Documentation" chapter, based on a search of the source
code.

There are a lot of improvements possible to the
Developer Documentation page, beyond this refactoring.
However, making those improvements is a much bigger
and more difficult task.  This change is "low hanging
fruit".

Signed-off-by: Jim DeLaHunt 
---
 doc/developer.texi | 74 +++---
 1 file changed, 43 insertions(+), 31 deletions(-)

diff --git a/doc/developer.texi b/doc/developer.texi
index a7b4f1d737..bdcce015d3 100644
--- a/doc/developer.texi
+++ b/doc/developer.texi
@@ -10,9 +10,7 @@
 
 @contents
 
-@chapter Developers Guide
-
-@section Notes for external developers
+@chapter Notes for external developers
 
 This document is mostly useful for internal FFmpeg developers.
 External developers who need to use the API in their application should
@@ -30,7 +28,7 @@ For more detailed legal information about the use of FFmpeg in
 external programs read the @file{LICENSE} file in the source tree and
 consult @url{https://ffmpeg.org/legal.html}.
 
-@section Contributing
+@chapter Contributing
 
 There are 3 ways by which code gets into FFmpeg.
 @itemize @bullet
@@ -47,9 +45,9 @@ The developer making the commit and the author are 
responsible for their changes
 and should try to fix issues their commit causes.
 
 @anchor{Coding Rules}
-@section Coding Rules
+@chapter Coding Rules
 
-@subsection Code formatting conventions
+@section Code formatting conventions
 
 There are the following guidelines regarding the indentation in files:
 
@@ -74,7 +72,7 @@ The presentation is one inspired by 'indent -i4 -kr -nut'.
 The main priority in FFmpeg is simplicity and small code size in order to
 minimize the bug count.
 
-@subsection Comments
+@section Comments
 Use the JavaDoc/Doxygen  format (see examples below) so that code documentation
 can be generated automatically. All nontrivial functions should have a comment
 above them explaining what the function does, even if it is just one sentence.
@@ -114,7 +112,7 @@ int myfunc(int my_parameter)
 ...
 @end example
 
-@subsection C language features
+@section C language features
 
 FFmpeg is programmed in the ISO C90 language with a few additional
 features from ISO C99, namely:
@@ -160,7 +158,7 @@ mixing statements and declarations;
 GCC statement expressions (@samp{(x = (@{ int y = 4; y; @})}).
 @end itemize
 
-@subsection Naming conventions
+@section Naming conventions
 All names should be composed with underscores (_), not CamelCase. For example,
 @samp{avfilter_get_video_buffer} is an acceptable function name and
 @samp{AVFilterGetVideo} is not. The exception from this are type names, like
@@ -204,7 +202,7 @@ letter as they are reserved by the C standard. Names 
starting with @code{_}
 are reserved at the file level and may not be used for externally visible
 symbols. If in doubt, just avoid names starting with @code{_} altogether.
 
-@subsection Miscellaneous conventions
+@section Miscellaneous conventions
 
 @itemize @bullet
 @item
@@ -216,7 +214,7 @@ Casts should be used only when necessary. Unneeded 
parentheses
 should also be avoided if they don't make the code easier to understand.
 @end itemize
 
-@subsection Editor configuration
+@section Editor configuration
 In order to configure Vim to follow FFmpeg formatting conventions, paste
 the following snippet into your @file{.vimrc}:
 @example
@@ -249,9 +247,9 @@ For Emacs,

Re: [FFmpeg-devel] How to preserve @subheading directives, and styling, when compiling doc/developer.texi ?

2017-11-26 Thread Jim DeLaHunt

On 2017-11-21 01:20, Jim DeLaHunt wrote:


Hello, doc maintainers:

Could I get some help with the Texinfo compilation which produces 
/developer.html, please?


Page  is a document which has some 
section headings displayed in green text, and some subheadings 
displayed in grey text. Section headings appear in the Table of 
Contents, but subheadings do not. For instance, at 
, the 
section begins:





   _1.4.3 Documentation/Other_


   _Subscribe to the ffmpeg-cvslog mailing list._

__

It is important to do this as the diffs of all commits are sent there 
and reviewed by all the other developers. Bugs and possible 
improvements or general questions regarding commits are discussed 
there. We expect you to react if problems with your code are uncovered.




I believe this page corresponds to file /doc/developer.html in the 
FFmpeg build tree, which is compiled from source in 
/doc/developer.texi via "make doc".


When I do "make doc" and generate /doc/developer.html from the current 
*master* branch (commit ba98f84), I get a different-looking HTML file. 
The section title is in light grey text, and the subheading is missing 
entirely.  At 
, 
the section begins:





   _1.4.3 Documentation/Other_

It is important to do this as the diffs of all commits are sent there 
and reviewed by all the other developers. Bugs and possible 
improvements or general questions regarding commits are discussed 
there. We expect you to react if problems with your code are uncovered.



The section headings come from @section and @subsection directives. 
The subheadings come from @subheading directives. In 
doc/developer.texi, the source of this section reads,



@subsection Documentation/Other
@subheading Subscribe to the ffmpeg-cvslog mailing list.
It is important to do this as the diffs of all commits are sent there and
reviewed by all the other developers. Bugs and possible improvements or
general questions regarding commits are discussed there. We expect you to
react if problems with your code are uncovered.


So, it appears that when compiling on my local machine, the 
@subheading directives are discarded, and the styling differs from 
what is on ffmpeg.org.


How can I compile /doc/developer.html from the FFmpeg sources so that:

1. the subheadings are included, and

2. the styling matches what is at ffmpeg.org/developer.html

Thanks in advance for your help.  Best regards,
   —Jim DeLaHunt


Hearing no suggestions, I've filed a bug report:

*HTML generation differences (@subheading missing, formatting) in local 
build compared to official build* (#6872) 



So, hopefully the issue will be tracked and investigated in due course.

Best regards,
  —Jim DeLaHunt


--
--Jim DeLaHunt, j...@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
  multilingual websites consultant

  355-1027 Davie St, Vancouver BC V6E 4L2, Canada
 Canada mobile +1-604-376-8953

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] hls demuxer: add option to defer parsing of variants

2017-11-26 Thread Rainer Hochecker
fixed mem leak poined out by Steven
 
---
 doc/demuxers.texi |   5 +
 libavformat/hls.c | 304 --
 2 files changed, 209 insertions(+), 100 deletions(-)

diff --git a/doc/demuxers.texi b/doc/demuxers.texi
index 73dc0feec1..634b122e10 100644
--- a/doc/demuxers.texi
+++ b/doc/demuxers.texi
@@ -316,6 +316,11 @@ segment index to start live streams at (negative values 
are from the end).
 @item max_reload
 Maximum number of times a insufficient list is attempted to be reloaded.
 Default value is 1000.
+
+@item load_all_variants
+If 0, only the first variant/playlist is loaded on open. All other variants
+get disabled and can be enabled by setting discard option in program.
+Default value is 1.
 @end table
 
 @section image2
diff --git a/libavformat/hls.c b/libavformat/hls.c
index 786934af03..c42e0b0f95 100644
--- a/libavformat/hls.c
+++ b/libavformat/hls.c
@@ -112,6 +112,7 @@ struct playlist {
 int n_segments;
 struct segment **segments;
 int needed, cur_needed;
+int parsed;
 int cur_seq_no;
 int64_t cur_seg_offset;
 int64_t last_load_time;
@@ -206,6 +207,7 @@ typedef struct HLSContext {
 int strict_std_compliance;
 char *allowed_extensions;
 int max_reload;
+int load_all_variants;
 } HLSContext;
 
 static int read_chomp_line(AVIOContext *s, char *buf, int maxlen)
@@ -314,6 +316,9 @@ static struct playlist *new_playlist(HLSContext *c, const 
char *url,
 pls->is_id3_timestamped = -1;
 pls->id3_mpegts_timestamp = AV_NOPTS_VALUE;
 
+pls->index = c->n_playlists;
+pls->parsed = 0;
+pls->needed = 0;
 dynarray_add(&c->playlists, &c->n_playlists, pls);
 return pls;
 }
@@ -721,6 +726,7 @@ static int parse_playlist(HLSContext *c, const char *url,
 free_segment_list(pls);
 pls->finished = 0;
 pls->type = PLS_TYPE_UNSPECIFIED;
+pls->parsed = 1;
 }
 while (!avio_feof(in)) {
 read_chomp_line(in, line, sizeof(line));
@@ -1377,23 +1383,41 @@ reload:
 static void add_renditions_to_variant(HLSContext *c, struct variant *var,
   enum AVMediaType type, const char 
*group_id)
 {
-int i;
+int i, j;
+int found;
 
 for (i = 0; i < c->n_renditions; i++) {
 struct rendition *rend = c->renditions[i];
 
 if (rend->type == type && !strcmp(rend->group_id, group_id)) {
 
-if (rend->playlist)
+if (rend->playlist) {
 /* rendition is an external playlist
  * => add the playlist to the variant */
-dynarray_add(&var->playlists, &var->n_playlists, 
rend->playlist);
-else
+found = 0;
+for (j = 0; j < var->n_playlists; j++) {
+if (var->playlists[j] == rend->playlist) {
+found = 1;
+break;
+}
+}
+if (!found)
+dynarray_add(&var->playlists, &var->n_playlists, 
rend->playlist);
+} else {
 /* rendition is part of the variant main Media Playlist
  * => add the rendition to the main Media Playlist */
-dynarray_add(&var->playlists[0]->renditions,
- &var->playlists[0]->n_renditions,
- rend);
+found = 0;
+for (j = 0; j < var->playlists[0]->n_renditions; j++) {
+if (var->playlists[0]->renditions[j] == rend) {
+found = 1;
+break;
+}
+}
+if (!found)
+dynarray_add(&var->playlists[0]->renditions,
+ &var->playlists[0]->n_renditions,
+ rend);
+}
 }
 }
 }
@@ -1631,6 +1655,124 @@ static int hls_close(AVFormatContext *s)
 return 0;
 }
 
+static int init_playlist(HLSContext *c, struct playlist *pls)
+{
+AVInputFormat *in_fmt = NULL;
+int highest_cur_seq_no = 0;
+int ret;
+int i;
+
+if (!(pls->ctx = avformat_alloc_context())) {
+return AVERROR(ENOMEM);
+}
+
+if (pls->n_segments == 0)
+return 0;
+
+pls->needed = 1;
+pls->parent = c->ctx;
+
+/*
+ * If this is a live stream and this playlist looks like it is one segment
+ * behind, try to sync it up so that every substream starts at the same
+ * time position (so e.g. avformat_find_stream_info() will see packets from
+ * all active streams within the first few seconds). This is not very 
generic,
+ * though, as the sequence numbers are technically independent.
+ */
+highest_cur_seq_no = 0;
+for (i = 0; i < c->n_playlists; i++) {
+struct playlist *pls = c->playlists[i];
+if (!pls->parsed)
+continue;
+if (pls->cur_seq_no > highest_cur_seq_no)
+   

Re: [FFmpeg-devel] [PATCH] hls demuxer: add option to defer parsing of variants

2017-11-26 Thread Steven Liu
2017-11-26 18:46 GMT+08:00 Rainer Hochecker :
> fixed mem leak poined out by Steven
Hi Rainer,

I'm not sure that is memleak, but looks like memleak when reading
the code, i see the code always in hls.c before this patch, but no
people report it memleak.
If that is memleak, maybe use goto method is better way, because
the workflow of bellow have alloc resource faild check, i will point
out base on your patch.
>
> ---
>  doc/demuxers.texi |   5 +
>  libavformat/hls.c | 304 
> --
>  2 files changed, 209 insertions(+), 100 deletions(-)
>
> diff --git a/doc/demuxers.texi b/doc/demuxers.texi
> index 73dc0feec1..634b122e10 100644
> --- a/doc/demuxers.texi
> +++ b/doc/demuxers.texi
> @@ -316,6 +316,11 @@ segment index to start live streams at (negative values 
> are from the end).
>  @item max_reload
>  Maximum number of times a insufficient list is attempted to be reloaded.
>  Default value is 1000.
> +
> +@item load_all_variants
> +If 0, only the first variant/playlist is loaded on open. All other variants
> +get disabled and can be enabled by setting discard option in program.
> +Default value is 1.
>  @end table
>
>  @section image2
> diff --git a/libavformat/hls.c b/libavformat/hls.c
> index 786934af03..c42e0b0f95 100644
> --- a/libavformat/hls.c
> +++ b/libavformat/hls.c
> @@ -112,6 +112,7 @@ struct playlist {
>  int n_segments;
>  struct segment **segments;
>  int needed, cur_needed;
> +int parsed;
>  int cur_seq_no;
>  int64_t cur_seg_offset;
>  int64_t last_load_time;
> @@ -206,6 +207,7 @@ typedef struct HLSContext {
>  int strict_std_compliance;
>  char *allowed_extensions;
>  int max_reload;
> +int load_all_variants;
>  } HLSContext;
>
>  static int read_chomp_line(AVIOContext *s, char *buf, int maxlen)
> @@ -314,6 +316,9 @@ static struct playlist *new_playlist(HLSContext *c, const 
> char *url,
>  pls->is_id3_timestamped = -1;
>  pls->id3_mpegts_timestamp = AV_NOPTS_VALUE;
>
> +pls->index = c->n_playlists;
> +pls->parsed = 0;
> +pls->needed = 0;
>  dynarray_add(&c->playlists, &c->n_playlists, pls);
>  return pls;
>  }
> @@ -721,6 +726,7 @@ static int parse_playlist(HLSContext *c, const char *url,
>  free_segment_list(pls);
>  pls->finished = 0;
>  pls->type = PLS_TYPE_UNSPECIFIED;
> +pls->parsed = 1;
>  }
>  while (!avio_feof(in)) {
>  read_chomp_line(in, line, sizeof(line));
> @@ -1377,23 +1383,41 @@ reload:
>  static void add_renditions_to_variant(HLSContext *c, struct variant *var,
>enum AVMediaType type, const char 
> *group_id)
>  {
> -int i;
> +int i, j;
> +int found;
>
>  for (i = 0; i < c->n_renditions; i++) {
>  struct rendition *rend = c->renditions[i];
>
>  if (rend->type == type && !strcmp(rend->group_id, group_id)) {
>
> -if (rend->playlist)
> +if (rend->playlist) {
>  /* rendition is an external playlist
>   * => add the playlist to the variant */
> -dynarray_add(&var->playlists, &var->n_playlists, 
> rend->playlist);
> -else
> +found = 0;
> +for (j = 0; j < var->n_playlists; j++) {
> +if (var->playlists[j] == rend->playlist) {
> +found = 1;
> +break;
> +}
> +}
> +if (!found)
> +dynarray_add(&var->playlists, &var->n_playlists, 
> rend->playlist);
> +} else {
>  /* rendition is part of the variant main Media Playlist
>   * => add the rendition to the main Media Playlist */
> -dynarray_add(&var->playlists[0]->renditions,
> - &var->playlists[0]->n_renditions,
> - rend);
> +found = 0;
> +for (j = 0; j < var->playlists[0]->n_renditions; j++) {
> +if (var->playlists[0]->renditions[j] == rend) {
> +found = 1;
> +break;
> +}
> +}
> +if (!found)
> +dynarray_add(&var->playlists[0]->renditions,
> + &var->playlists[0]->n_renditions,
> + rend);
> +}
>  }
>  }
>  }
> @@ -1631,6 +1655,124 @@ static int hls_close(AVFormatContext *s)
>  return 0;
>  }
>
> +static int init_playlist(HLSContext *c, struct playlist *pls)
> +{
> +AVInputFormat *in_fmt = NULL;
> +int highest_cur_seq_no = 0;
> +int ret;
> +int i;
> +
> +if (!(pls->ctx = avformat_alloc_context())) {
> +return AVERROR(ENOMEM);
> +}
> +
> +if (pls->n_segments == 0)
> +return 0;
> +
> +pls->needed = 1;
> +pls->parent = c->ctx;

Re: [FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 9:31 GMT+01:00 Jim DeLaHunt :

> -@subsection Documentation/Other
> +@section Documentation/Other
> +@subheading Subscribe to the ffmpeg-devel mailing list.
> +It is important to be subscribed to the

Of course it is important but I would much, much prefer
if people send their patches without being subscribed
than not sending their patches because it is implied
that they cannot send patches if they don't want to
subscribe.

> +@uref{https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-devel, ffmpeg-devel}
> +mailing list, because any patch you contribute must be sent there

No:
I believe it is very important that trivial patches are not sent
to the development mailing list - its volume is already so big
that some patches are sadly (!) forgotten.

> +and reviewed by the other developers. They may have comments about your
> +contribution. We expect you see those comments, and to improve your 
> contribution
> +if requested.

Yes.

But if people are not interested in improving their contribution,
I would still prefer the patches to be sent.

> +Also, this list is where bugs and possible improvements or

I believe this is misleading or even wrong.

> +general questions regarding commits are discussed. That may be helpful
> +information as you write your contribution. Finally, by being a list
> +subscriber your contribution will be posted immediately to the list,
> +without the moderation hold which messages from non-subscribers experience.
> +
>  @subheading Subscribe to the ffmpeg-cvslog mailing list.
> -It is important to do this as the diffs of all commits are sent there and
> -reviewed by all the other developers. Bugs and possible improvements or
> -general questions regarding commits are discussed there. We expect you to
> -react if problems with your code are uncovered.
> +Diffs of all commits are sent to the
> +@uref{https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-cvslog, ffmpeg-cvslog}
> +mailing list. Some developers read this list to review all code base changes
> +from all sources. Subscribing to this list is not mandatory, if
> +all you want to do is submit a patch here and there.

I am (still) against this change.

Sorry, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Paul B Mahol
On 11/26/17, Carl Eugen Hoyos  wrote:
> 2017-11-26 9:31 GMT+01:00 Jim DeLaHunt :
>
>> -@subsection Documentation/Other
>> +@section Documentation/Other
>> +@subheading Subscribe to the ffmpeg-devel mailing list.
>> +It is important to be subscribed to the
>
> Of course it is important but I would much, much prefer
> if people send their patches without being subscribed
> than not sending their patches because it is implied
> that they cannot send patches if they don't want to
> subscribe.
>
>> +@uref{https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-devel,
>> ffmpeg-devel}
>> +mailing list, because any patch you contribute must be sent there
>
> No:
> I believe it is very important that trivial patches are not sent
> to the development mailing list - its volume is already so big
> that some patches are sadly (!) forgotten.
>
>> +and reviewed by the other developers. They may have comments about your
>> +contribution. We expect you see those comments, and to improve your
>> contribution
>> +if requested.
>
> Yes.
>
> But if people are not interested in improving their contribution,
> I would still prefer the patches to be sent.
>
>> +Also, this list is where bugs and possible improvements or
>
> I believe this is misleading or even wrong.
>
>> +general questions regarding commits are discussed. That may be helpful
>> +information as you write your contribution. Finally, by being a list
>> +subscriber your contribution will be posted immediately to the list,
>> +without the moderation hold which messages from non-subscribers
>> experience.
>> +
>>  @subheading Subscribe to the ffmpeg-cvslog mailing list.
>> -It is important to do this as the diffs of all commits are sent there and
>> -reviewed by all the other developers. Bugs and possible improvements or
>> -general questions regarding commits are discussed there. We expect you to
>> -react if problems with your code are uncovered.
>> +Diffs of all commits are sent to the
>> +@uref{https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-cvslog,
>> ffmpeg-cvslog}
>> +mailing list. Some developers read this list to review all code base
>> changes
>> +from all sources. Subscribing to this list is not mandatory, if
>> +all you want to do is submit a patch here and there.
>
> I am (still) against this change.
>
> Sorry, Carl Eugen

Your opinions are irrelevant.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Nicolas George
Paul B Mahol (2017-11-26):
> Your opinions are irrelevant.

# Be friendly and respectful towards others and third parties.
# Treat others the way you yourself want to be treated.

Please stop trampling the code of conduct.

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Paul B Mahol
On 11/26/17, Nicolas George  wrote:
> Paul B Mahol (2017-11-26):
>> Your opinions are irrelevant.
>
> # Be friendly and respectful towards others and third parties.
> # Treat others the way you yourself want to be treated.
>
> Please stop trampling the code of conduct.

Please stop being extremly rude and ignorant of other people's work.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/1][TOOL][HACK] Allocation NULL check fuzzer

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 2:05 GMT+01:00 Derek Buitenhuis :
> On 11/26/2017 12:14 AM, Carl Eugen Hoyos wrote:
>> I am of course in favour of such checks but is there an allocator we support
>> that actually returns NULL on oom?
>
> Anything that doesn't use overcommit. Windows is the big obvious one here. 
> Also
> various UNIX-like things, and even Linux is not guaranteed to return non-NULL,
> depending on how the kernel is set up (e.g. on some RHELs I think, or on
> plenty of embedded setups.)
>
> Some libcs will fail if the requested size is outside of the allowed range.

Thank you for the explanation!

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

2017-11-26 Thread hydra3333
OK, the experimental -opencl_bench option has been removed by recent 
commits.

Thank you for the updates to OpenCL.

It cross-compiles OK, however I'm not sure about the replacement 
functionality.  So, 2 things.


1. may I enquire how one ascertains the numbers to use like this ?
  -opencl_options platform_idx=1:device_idx=0

The old -opencl_bench told me I have these devices - OpenCL using the nvidia 
750Ti is the one I wish to use.

platform_idxdevice_idxdevice_nameruntime
10GeForce GTX 750 Ti1801
00Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz11210

2. I must have misinterpreted -init_hw_device and -filter_hw_device as well 
for use in Windows 10 (please see the error below)
  -opencl_options platform_idx=1:device_idx=0 -init_hw_device 
opencl -filter_hw_device opencl

Advice would be most welcomed.

"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -opencl_options 
platform_idx=1:device_idx=0 -init_hw_device opencl -filter_hw_device 
opencl -i ".\test_01.mpg" -an -map_metadata -1 -sws_flags 
lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -filter:v 
yadif=0:0:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,setdar=dar=16/9 
-r 25 -c:v h264_nvenc -preset slow -bf 2 -g 50 -refs 3 -rc:v 
vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -movflags 
+faststart -profile:v high -level 4.1 -pixel_format yuv420p -y 
".\test_01.newest.MP4"
ffmpeg version N-89088-gce001bb8fc Copyright (c) 2000-2017 the FFmpeg 
developers

 built with gcc 7.2.0 (GCC)
 configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=/home/u/Desktop/ffmpeg-windows-build-helpers-withOpenCL-master/sandbox/cross_compilers/mingw-w64-x86_64/bin/x86_64-w64-mingw32- 
--pkg-config=pkg-config --pkg-config-flags=--static --enable-gray --enable-version3 
--disable-debug --disable-doc --disable-htmlpages --disable-manpages --disable-podpages 
--disable-txtpages --disable-w32threads --enable-nvenc --enable-cuda --enable-cuvid 
--enable-d3d11va --enable-libsoxr --enable-fontconfig --enable-libass --enable-libbluray 
--enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug 
--extra-libs=-lstdc++ --extra-libs=-lpng --extra-libs=-loleaut32 --enable-libmp3lame 
--enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora 
--enable-libspeex --enable-libopenjpeg --enable-gnutls --enable-libgsm --enable-libfreetype 
--enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb 
--enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack 
--enable-libwebp --enable-libgme --enable-dxva2 --enable-gray --enable-libopenh264 
--enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg 
--enable-libbs2b --enable-gmp --enable-libfribidi --enable-cross-compile --enable-pic 
--extra-libs=-lpsapi --extra-libs=-lspeexdsp --disable-schannel --extra-cflags=-DLIBTWOLAME_STATIC 
--extra-cflags=-DMODPLUG_STATIC --extra-cflags=-DCACA_STATIC --enable-gpl --enable-avisynth 
--enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab 
--enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libmfx 
--enable-avresample --enable-libcdio --extra-cflags='-mtune=generic' --extra-cflags=-O3 
--enable-static --disable-shared --prefix=/home/u/Desktop/ffmpeg-windows-build-helpers-withOpenCL-master/sandbox/cross_compilers/mingw-w64-x86_64/x86_64-w64-mingw32 
--enable-nonfree --enable-decklink --enable-libfdk-aac --enable-opencl --enable-runtime-cpudetect 
--extra-libs=-lcrypt32 --extra-libs=-lshlwapi --extra-libs=-lstdc++ --extra-libs=-lass 
--extra-libs=-lfontconfig --extra-libs=-lexpat --extra-libs=-lfribidi --extra-libs=-lfreetype 
--extra-libs=-lharfbuzz --extra-libs=-lbz2 --extra-libs=-llzma --extra-libs=-liconv 
--extra-libs=-lcdio --extra-libs=-lcdio_paranoia --extra-libs=-lz --extra-libs=-lpsapi 
--extra-libs=-lspeexdsp --extra-libs=-lgdi32 --extra-libs=-lwinmm


 Invalid device specification "opencl": unknown device type
Failed to set value 'opencl' for option 'init_hw_device': Invalid argument
Error parsing global options: Invalid argument

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/2] vorbisenc: Check the return value of av_frame_clone

2017-11-26 Thread Derek Buitenhuis
On 11/24/2017 7:27 PM, Derek Buitenhuis wrote:
> Prevents a segfault when alloc fails.
> 
> Signed-off-by: Derek Buitenhuis 
> ---
>  libavcodec/vorbisenc.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)

If there are no objections, I'll push this today after a FATE run.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] udp: Actually fail when we're missing required options, like the "warning" says.

2017-11-26 Thread Derek Buitenhuis
On 11/22/2017 3:28 PM, Derek Buitenhuis wrote:
> Signed-off-by: Derek Buitenhuis 
> ---
> There was no reasoning in the commit that added this, so maybe someone on
> the list has some insights.
> ---
>  libavformat/udp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Ping.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
Hi!

Attached patch adds a missing dependency to libvmaf, I don't know if
other threads also work.

Please comment, Carl Eugen
From 4ee0fe8778c67a0f623b352a257e68485dd1d559 Mon Sep 17 00:00:00 2001
From: Carl Eugen Hoyos 
Date: Sun, 26 Nov 2017 14:29:27 +0100
Subject: [PATCH] configure: libvmaf depends on pthreads.

---
 configure |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/configure b/configure
index 0cc97eb..46ff3e8 100755
--- a/configure
+++ b/configure
@@ -3290,7 +3290,7 @@ uspp_filter_deps="gpl avcodec"
 vaguedenoiser_filter_deps="gpl"
 vidstabdetect_filter_deps="libvidstab"
 vidstabtransform_filter_deps="libvidstab"
-libvmaf_filter_deps="libvmaf"
+libvmaf_filter_deps="libvmaf threads"
 zmq_filter_deps="libzmq"
 zoompan_filter_deps="swscale"
 zscale_filter_deps="libzimg const_nan"
-- 
1.7.10.4

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 1:35 PM, Carl Eugen Hoyos wrote:
> Attached patch adds a missing dependency to libvmaf, I don't know if
> other threads also work.

This should also be filed as a bug against libvmaf, since its pkg-config
file isn't complete, then.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 14:40 GMT+01:00 Derek Buitenhuis :
> On 11/26/2017 1:35 PM, Carl Eugen Hoyos wrote:
>> Attached patch adds a missing dependency to libvmaf, I don't know if
>> other threads also work.
>
> This should also be filed as a bug against libvmaf, since its pkg-config
> file isn't complete, then.

Sorry, I don't understand how pkg-config is related to the missing
dependency of our configure script: Please explain.

Thank you, Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 1:50 PM, Carl Eugen Hoyos wrote:
> Sorry, I don't understand how pkg-config is related to the missing
> dependency of our configure script: Please explain.

Because pthreads is a dependency of libvmaf.

Looking at libvmaf, it does list pthreads as a dependency:

https://github.com/Netflix/vmaf/blob/master/wrapper/libvmaf.pc

It should work as-is. Is there a way I can reproduce this?

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 14:53 GMT+01:00 Derek Buitenhuis :
> On 11/26/2017 1:50 PM, Carl Eugen Hoyos wrote:
>> Sorry, I don't understand how pkg-config is related to the missing
>> dependency of our configure script: Please explain.
>
> Because pthreads is a dependency of libvmaf.
>
> Looking at libvmaf, it does list pthreads as a dependency:
>
> https://github.com/Netflix/vmaf/blob/master/wrapper/libvmaf.pc
>
> It should work as-is. Is there a way I can reproduce this?

The following produces an ffmpeg binary that loads pthreads
at start-time:
$ ./configure --enable-libvmaf --disable-pthreads

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Nicolas George
Carl Eugen Hoyos (2017-11-26):
> The following produces an ffmpeg binary that loads pthreads
> at start-time:
> $ ./configure --enable-libvmaf --disable-pthreads

That is not a problem. The fact that it produces a binary with pthreads
symbols in it is.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 14:59 GMT+01:00 Nicolas George :
> Carl Eugen Hoyos (2017-11-26):
>> The following produces an ffmpeg binary that loads pthreads
>> at start-time:
>> $ ./configure --enable-libvmaf --disable-pthreads
>
> That is not a problem. The fact that it produces a binary with
> pthreads symbols in it is.

The way I understand (Linux) dynamic libraries, one implies
the other.

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Nicolas George
Carl Eugen Hoyos (2017-11-26):
> The way I understand (Linux) dynamic libraries, one implies
> the other.

Yes, but not the other way around.

If a library uses threads internally but provides an interface that can
be used without threads, then it is not our problem.

If a library does not use threads but our calls to the library require
threads, then we must consider it a dependency.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 2:02 PM, Carl Eugen Hoyos wrote:
> The way I understand (Linux) dynamic libraries, one implies
> the other.

The problem is that libvmaf's .pc file put all of its deps in
Libs instead of splitting them out into Libs.private, which
is used only when static linking. Stuff like -lpthreads is
only needed if static linking, and stuff like -lstdc++ is
just wrong on any system that doesn't use libstdc++ (and
also only used for static linking).

I assume this is what Nicholas is referring to.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 15:05 GMT+01:00 Nicolas George :
> Carl Eugen Hoyos (2017-11-26):
>> The way I understand (Linux) dynamic libraries, one implies
>> the other.
>
> Yes, but not the other way around.
>
> If a library uses threads internally but provides an interface that can
> be used without threads, then it is not our problem.
>
> If a library does not use threads but our calls to the library require
> threads, then we must consider it a dependency.

In this case the library and our interface both depend on pthreads
but configure ignores this.

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 2:05 PM, Nicolas George wrote:
> If a library does not use threads but our calls to the library require
> threads, then we must consider it a dependency.

Netflix made their pkg-config file incorrectly, which causes this. It should
be fixed there, but do we want to work around that in the mean time like this?
I have no real strong opinion on that, I just thought it should be reported
upstream so they could fix it.

-Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 15:06 GMT+01:00 Derek Buitenhuis :
> On 11/26/2017 2:02 PM, Carl Eugen Hoyos wrote:
>> The way I understand (Linux) dynamic libraries, one implies
>> the other.
>
> The problem is that libvmaf's .pc file put all of its deps in
> Libs instead of splitting them out into Libs.private, which
> is used only when static linking.

> Stuff like -lpthreads is
> only needed if static linking, and stuff like -lstdc++ is
> just wrong on any system that doesn't use libstdc++ (and
> also only used for static linking).

As said before, it is non-trivial to find such a system
(it worked fine here on osx when I tested last).

> I assume this is what Nicholas is referring to.

So is the patch ok?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 2:06 PM, Carl Eugen Hoyos wrote:
> In this case the library and our interface both depend on pthreads
> but configure ignores this.

Sorry, I wasn't aware our own wrapper code use pthreads too. Patch should
be OK then.

Upstream pkg-config file is still broken, though. :)

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Nicolas George
Derek Buitenhuis (2017-11-26):
> The problem is that libvmaf's .pc file put all of its deps in
> Libs instead of splitting them out into Libs.private, which
> is used only when static linking. Stuff like -lpthreads is
> only needed if static linking, and stuff like -lstdc++ is
> just wrong on any system that doesn't use libstdc++ (and
> also only used for static linking).
> 
> I assume this is what Nicholas is referring to.

No, the problem is that you are not speaking of the same thing.

$ grep -c pthread libavfilter/*vmaf*
libavfilter/vf_libvmaf.c:22
libavfilter/vf_vmafmotion.c:0
libavfilter/vmaf_motion.h:0

-> our code depends on pthreads for this filters, it must be expressed
in configure: Carl Eugen's patch is right, there is no need to bugreport
anything. His explanations later were wrong, but it may only be caused
by the confusion you brought.

What you describe is possibly true, but not related.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 2:09 PM, Carl Eugen Hoyos wrote:
> As said before, it is non-trivial to find such a system
> (it worked fine here on osx when I tested last).

OS X does not ship with libstdc++, IIRC. You must have been using a
ports-build toolchain? Using clang-cl or MSVC on windows will also not
use libstdc++. FreeBSD does not use libstdc++.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 2:10 PM, Nicolas George wrote:
> -> our code depends on pthreads for this filters, it must be expressed
> in configure: Carl Eugen's patch is right, there is no need to bugreport
> anything. His explanations later were wrong, but it may only be caused
> by the confusion you brought.

Yeah I didn't realize that. Apologies on that.

> 
> What you describe is possibly true, but not related.

Correct - I will still report it upstream.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 15:11 GMT+01:00 Derek Buitenhuis :
> On 11/26/2017 2:09 PM, Carl Eugen Hoyos wrote:
>> As said before, it is non-trivial to find such a system
>> (it worked fine here on osx when I tested last).
>
> OS X does not ship with libstdc++, IIRC. You must
> have been using a ports-build toolchain?

I believe I have explained before that I always only use
vanilla toolchains because that's the only thing users
typically have access to.

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 2:15 PM, Carl Eugen Hoyos wrote:
> I believe I have explained before that I always only use
> vanilla toolchains because that's the only thing users
> typically have access to.

It's possible the OS X version was outdated then, since the
system clang has used libc++ as default since OS X 10.9.

But I digress, this is now off-topic, and no longer related to
the patch at hand.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH]configure: libvmaf depends on pthreads

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 15:10 GMT+01:00 Nicolas George :

> $ grep -c pthread libavfilter/*vmaf*
> libavfilter/vf_libvmaf.c:22
> libavfilter/vf_vmafmotion.c:0
> libavfilter/vmaf_motion.h:0
>
> -> our code depends on pthreads for this filters, it must be expressed
> in configure: Carl Eugen's patch is right

Is the correct dependency pthreads or threads?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-26 Thread Michael Niedermayer
On Tue, Nov 21, 2017 at 10:48:19PM +0100, Marton Balint wrote:
> 
> 
> On Thu, 9 Nov 2017, James Cowgill wrote:
> 
> >Hi,
> >
> >On 09/11/17 14:02, Hendrik Leppkes wrote:
> >>On Thu, Nov 9, 2017 at 1:21 PM, James Cowgill  wrote:
> >>>In commit 061a0c14bb57 ("decode: restructure the core decoding code"), the
> >>>deprecated avcodec_decode_* APIs were reworked so that they called into the
> >>>new avcodec_send_packet / avcodec_receive_frame API. This had the side 
> >>>effect
> >>>of prohibiting sending new packets containing data after a drain
> >>>packet, but in previous versions of FFmpeg this "worked" and some
> >>>applications relied on it.
> >>>
> >>>To restore some compatibility, reset the codec if we receive a new 
> >>>non-drain
> >>>packet using the old API after draining has completed. While this does
> >>>not give the same behaviour as the old API did, in the majority of cases
> >>>it works and it does not require changes to any other part of the decoding
> >>>code.
> >>>
> >>>Fixes ticket #6775
> >>>Signed-off-by: James Cowgill 
> >>>---
> >>> libavcodec/decode.c | 5 +
> >>> 1 file changed, 5 insertions(+)
> >>>
> >>>diff --git a/libavcodec/decode.c b/libavcodec/decode.c
> >>>index 86fe5aef52..2f1932fa85 100644
> >>>--- a/libavcodec/decode.c
> >>>+++ b/libavcodec/decode.c
> >>>@@ -726,6 +726,11 @@ static int compat_decode(AVCodecContext *avctx, 
> >>>AVFrame *frame,
> >>>
> >>> av_assert0(avci->compat_decode_consumed == 0);
> >>>
> >>>+if (avci->draining_done && pkt && pkt->size != 0) {
> >>>+av_log(avctx, AV_LOG_WARNING, "Got unexpected packet after 
> >>>EOF\n");
> >>>+avcodec_flush_buffers(avctx);
> >>>+}
> >>>+
> >>
> >>I don't think this is a good idea. Draining and not flushing
> >>afterwards is a bug in the calling code, and even before recent
> >>changes it would result in inconsistent behavior and even crashes
> >>(with select decoders).
> >
> >I am fully aware that this will only trigger if the calling code is
> >buggy. I am trying to avoid silent breakage of those applications doing
> >this when upgrading to ffmpeg 3.4.
> >
> >I was looking at the documentation of avcodec_decode_* recently because
> >of this and I had some trouble deciding if using the API this way was
> >incorrect. I expect the downstreams affected thought that what they were
> >doing was fine and then got angry when ffmpeg suddenly "broke" their
> >code. This patch at least allows some sort of "transitional period"
> >until downstreams update.
> 
> I think the intent was to flush the codec by passing the NULL
> packets to it, so it makes a lot of sense to actually do that.
> Especially since by implicitly doing a flush, we can avoid the
> undefined behaviour/crashes on the codec side.
> 
> Also this is only compatibility code, which probably will be removed
> at the next bump, I see no harm in making it as compatible as
> realistically possible.

i agree and i would appreciate if this gets resolved one way or another
so i can make the release

Thanks

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

The educated differ from the uneducated as much as the living from the
dead. -- Aristotle 


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 0/1][TOOL][HACK] Allocation NULL check fuzzer

2017-11-26 Thread Derek Buitenhuis
On 11/25/2017 12:07 AM, Michael Niedermayer wrote:
> I do not know that but i would be surprised if null dereferences tests
> where unwelcome
> 
> oss-fuzz will already report null derferences and OOM conditions, as
> well as undefined behavior. So in some sense various points on the map
> surrounding this here are already tested for

Locally, I've made this work with something like:

configure --malloc-prefix=fuzzer_ --extra-libs=-lallocfuzz

I'll push that library up to a git repo some time today.

Should be pretty easy to integrate into oss-fuzz like this, I think?

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-26 Thread James Almer
On 11/21/2017 6:48 PM, Marton Balint wrote:
> 
> 
> On Thu, 9 Nov 2017, James Cowgill wrote:
> 
>> Hi,
>>
>> On 09/11/17 14:02, Hendrik Leppkes wrote:
>>> On Thu, Nov 9, 2017 at 1:21 PM, James Cowgill 
>>> wrote:
 In commit 061a0c14bb57 ("decode: restructure the core decoding
 code"), the
 deprecated avcodec_decode_* APIs were reworked so that they called
 into the
 new avcodec_send_packet / avcodec_receive_frame API. This had the
 side effect
 of prohibiting sending new packets containing data after a drain
 packet, but in previous versions of FFmpeg this "worked" and some
 applications relied on it.

 To restore some compatibility, reset the codec if we receive a new
 non-drain
 packet using the old API after draining has completed. While this does
 not give the same behaviour as the old API did, in the majority of
 cases
 it works and it does not require changes to any other part of the
 decoding
 code.

 Fixes ticket #6775
 Signed-off-by: James Cowgill 
 ---
  libavcodec/decode.c | 5 +
  1 file changed, 5 insertions(+)

 diff --git a/libavcodec/decode.c b/libavcodec/decode.c
 index 86fe5aef52..2f1932fa85 100644
 --- a/libavcodec/decode.c
 +++ b/libavcodec/decode.c
 @@ -726,6 +726,11 @@ static int compat_decode(AVCodecContext *avctx,
 AVFrame *frame,

  av_assert0(avci->compat_decode_consumed == 0);

 +    if (avci->draining_done && pkt && pkt->size != 0) {
 +    av_log(avctx, AV_LOG_WARNING, "Got unexpected packet after
 EOF\n");
 +    avcodec_flush_buffers(avctx);
 +    }
 +
>>>
>>> I don't think this is a good idea. Draining and not flushing
>>> afterwards is a bug in the calling code, and even before recent
>>> changes it would result in inconsistent behavior and even crashes
>>> (with select decoders).
>>
>> I am fully aware that this will only trigger if the calling code is
>> buggy. I am trying to avoid silent breakage of those applications doing
>> this when upgrading to ffmpeg 3.4.
>>
>> I was looking at the documentation of avcodec_decode_* recently because
>> of this and I had some trouble deciding if using the API this way was
>> incorrect. I expect the downstreams affected thought that what they were
>> doing was fine and then got angry when ffmpeg suddenly "broke" their
>> code. This patch at least allows some sort of "transitional period"
>> until downstreams update.
> 
> I think the intent was to flush the codec by passing the NULL packets to
> it, so it makes a lot of sense to actually do that. Especially since by
> implicitly doing a flush, we can avoid the undefined behaviour/crashes
> on the codec side.
> 
> Also this is only compatibility code, which probably will be removed at
> the next bump, I see no harm in making it as compatible as realistically
> possible.

The old decode API is not scheduled for removal right now probably
because 99% of decoders need to be ported.
This compat code was written so the old API becomes a wrapper for the
new rather than the other way around, as it was up to 3.3. Supposedly a
good portion of the versatility of the new API would be handicapped
otherwise.

Personally, I think this should be left as is. It is a good incentive
for downstream to migrate to the new API, as they technically were
misusing the old API to begin with.
Between fixing their old API usage and migrating, the choice should be
obvious.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavf/mov: fix huge alloc in mov_read_ctts

2017-11-26 Thread John Stebbins
On 11/25/2017 05:03 PM, James Almer wrote:
> On 11/25/2017 10:00 PM, John Stebbins wrote:
>> On 11/25/2017 04:03 PM, Carl Eugen Hoyos wrote:
>>> 2017-11-25 21:11 GMT+01:00 John Stebbins :
 An invalid file may cause huge alloc.  Delay expansion of ctts entries
 until the number of samples is known in mov_build_index.
>>> Please mention zhao dongzhuo from ADlab of Venustech who found this
>>> issue, I can confirm that the memory allocation gets fixed.
>>>
>>>
>> Sure. Should I amend the commit message to add something like, "Thanks to 
>> zhao dongzhuo from ADlab of Venustech for
>> reporting this issue"?
> Or just a "Found-by:" line.
>
>

Is there some git magic for this, or is this just something you add manually to 
the bottom of the commit message?

-- 
John  GnuPG fingerprint: D0EC B3DB C372 D1F1 0B01  83F0 49F1 D7B2 60D4 D0F7




signature.asc
Description: OpenPGP digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-26 Thread Nicolas George
James Almer (2017-11-26):
> The old decode API is not scheduled for removal right now probably
> because 99% of decoders need to be ported.

I think this statement contains some confusion that is harmful to the
discussion.

There are two interfaces worth considering in this discussion: the
application -> library interface, i.e. the avcodec_decode_*dio()
functions, and the framework -> decoder interface, i.e. the decode /
receive_frame / ... callbacks.

When you are stating "because 99% of decoders need to be ported", you
are referring to the framework-decoder interface. On the other hand, the
misuse of the API that is at the origin of this thread is related to the
application-library interface.

We could deprecate avcodec_decode_*dio() right now even though the
decoders are not all ported.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavf/mov: fix huge alloc in mov_read_ctts

2017-11-26 Thread Derek Buitenhuis
On 11/26/2017 3:10 PM, John Stebbins wrote:
> Is there some git magic for this, or is this just something you add manually 
> to the bottom of the commit message?

It's just manual.

- Derek
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] lavf/mov: fix huge alloc in mov_read_ctts

2017-11-26 Thread John Stebbins
An invalid file may cause huge alloc.  Delay expansion of ctts entries
until the number of samples is known in mov_build_index.

Found-by: zhao dongzhuo, AD-lab of Venustech
---
 libavformat/mov.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/libavformat/mov.c b/libavformat/mov.c
index ddb1e59b85..7a7fd13099 100644
--- a/libavformat/mov.c
+++ b/libavformat/mov.c
@@ -2896,7 +2896,7 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, 
MOVAtom atom)
 {
 AVStream *st;
 MOVStreamContext *sc;
-unsigned int i, j, entries, ctts_count = 0;
+unsigned int i, entries, ctts_count = 0;
 
 if (c->fc->nb_streams < 1)
 return 0;
@@ -2929,9 +2929,8 @@ static int mov_read_ctts(MOVContext *c, AVIOContext *pb, 
MOVAtom atom)
 continue;
 }
 
-/* Expand entries such that we have a 1-1 mapping with samples. */
-for (j = 0; j < count; j++)
-add_ctts_entry(&sc->ctts_data, &ctts_count, 
&sc->ctts_allocated_size, 1, duration);
+add_ctts_entry(&sc->ctts_data, &ctts_count, &sc->ctts_allocated_size,
+   count, duration);
 
 av_log(c->fc, AV_LOG_TRACE, "count=%d, duration=%d\n",
 count, duration);
@@ -3580,6 +3579,8 @@ static void mov_build_index(MOVContext *mov, AVStream *st)
 unsigned int stps_index = 0;
 unsigned int i, j;
 uint64_t stream_size = 0;
+MOVStts *ctts_data_old = sc->ctts_data;
+unsigned int ctts_count_old = sc->ctts_count;
 
 if (sc->elst_count) {
 int i, edit_start_index = 0, multiple_edits = 0;
@@ -3648,6 +3649,28 @@ static void mov_build_index(MOVContext *mov, AVStream 
*st)
 }
 st->index_entries_allocated_size = (st->nb_index_entries + 
sc->sample_count) * sizeof(*st->index_entries);
 
+if (ctts_data_old) {
+// Expand ctts entries such that we have a 1-1 mapping with samples
+if (sc->sample_count >= UINT_MAX / sizeof(*sc->ctts_data))
+return;
+sc->ctts_count = 0;
+sc->ctts_allocated_size = 0;
+sc->ctts_data = av_fast_realloc(NULL, &sc->ctts_allocated_size,
+sc->sample_count * sizeof(*sc->ctts_data));
+if (!sc->ctts_data) {
+av_free(ctts_data_old);
+return;
+}
+for (i = 0; i < ctts_count_old &&
+sc->ctts_count < sc->sample_count; i++)
+for (j = 0; j < ctts_data_old[i].count &&
+sc->ctts_count < sc->sample_count; j++)
+add_ctts_entry(&sc->ctts_data, &sc->ctts_count,
+   &sc->ctts_allocated_size, 1,
+   ctts_data_old[i].duration);
+av_free(ctts_data_old);
+}
+
 for (i = 0; i < sc->chunk_count; i++) {
 int64_t next_offset = i+1 < sc->chunk_count ? 
sc->chunk_offsets[i+1] : INT64_MAX;
 current_offset = sc->chunk_offsets[i];
-- 
2.14.3

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-26 Thread James Almer
On 11/26/2017 12:19 PM, Nicolas George wrote:
> James Almer (2017-11-26):
>> The old decode API is not scheduled for removal right now probably
>> because 99% of decoders need to be ported.
> 
> I think this statement contains some confusion that is harmful to the
> discussion.
> 
> There are two interfaces worth considering in this discussion: the
> application -> library interface, i.e. the avcodec_decode_*dio()
> functions, and the framework -> decoder interface, i.e. the decode /
> receive_frame / ... callbacks.
> 
> When you are stating "because 99% of decoders need to be ported", you
> are referring to the framework-decoder interface. On the other hand, the
> misuse of the API that is at the origin of this thread is related to the
> application-library interface.

Yes, my bad. Got the public API and the internal callbacks mixed. So
ignore that part.
Guess then that the functions did not get a removal schedule because
they are still too ubiquitous downstream.

My second paragraph stands, in any case. I consider this a good chance
to get downstreams to migrate.

> 
> We could deprecate avcodec_decode_*dio() right now even though the
> decoders are not all ported.

They are already deprecated. I assume you meant remove.

> 
> Regards,
> 
> 
> 
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

2017-11-26 Thread hydra3333
-Original Message- 
From: hydra3...@gmail.com

Sent: Sunday, November 26, 2017 11:37 PM
To: ffmpeg-devel@ffmpeg.org
Cc: hydra
Subject: Re: [FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

OK, the experimental -opencl_bench option has been removed by recent
commits.
Thank you for the updates to OpenCL.
...

The old -opencl_bench told me I have these devices - OpenCL using the nvidia
750Ti is the one I wish to use.
platform_idxdevice_idxdevice_nameruntime
10GeForce GTX 750 Ti1801
00Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz11210

2. I must have misinterpreted -init_hw_device and -filter_hw_device as well
for use in Windows 10 (please see the error below)
  -opencl_options platform_idx=1:device_idx=0 -init_hw_device
opencl -filter_hw_device opencl
Advice would be most welcomed.

...

--

Oh.   Damn, I used the wrong build for testing that :(  Sorry.

Still, This
-opencl_options platform_idx=1:device_idx=0
no longer works at all.

 Unrecognized option 'opencl_options'.
 Error splitting the argument list: Option not found

I wonder if someone could explain how to use the equivalent new syntax on 
Windows ? 


___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

2017-11-26 Thread Mark Thompson
On 26/11/17 13:07, hydra3...@gmail.com wrote:
> OK, the experimental -opencl_bench option has been removed by recent commits.
> Thank you for the updates to OpenCL.
> 
> It cross-compiles OK, however I'm not sure about the replacement 
> functionality.  So, 2 things.
> 
> 1. may I enquire how one ascertains the numbers to use like this ?
>   -opencl_options platform_idx=1:device_idx=0

Most systems have a "clinfo" program which will list all the queryable 
properties of the available devices.

> 
> The old -opencl_bench told me I have these devices - OpenCL using the nvidia 
> 750Ti is the one I wish to use.
> platform_idx    device_idx    device_name    runtime
> 1    0    GeForce GTX 750 Ti    1801
> 0    0    Intel(R) Core(TM) i7-3820 CPU @ 3.60GHz    11210
> 
> 2. I must have misinterpreted -init_hw_device and -filter_hw_device as well 
> for use in Windows 10 (please see the error below)
>   -opencl_options platform_idx=1:device_idx=0 -init_hw_device opencl 
> -filter_hw_device opencl
> Advice would be most welcomed.

I've added documentation for the opencl device as an option to -init_hw_device 
in 
.

> 
> "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -opencl_options 
> platform_idx=1:device_idx=0 -init_hw_device opencl -filter_hw_device opencl 
> -i ".\test_01.mpg" -an -map_metadata -1 -sws_flags 
> lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -filter:v 
> yadif=0:0:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,setdar=dar=16/9 
> -r 25 -c:v h264_nvenc -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq 
> -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -movflags +faststart 
> -profile:v high -level 4.1 -pixel_format yuv420p -y ".\test_01.newest.MP4"
> ffmpeg version N-89088-gce001bb8fc Copyright (c) 2000-2017 the FFmpeg 
> developers
>  built with gcc 7.2.0 (GCC)
>  configuration: --arch=x86_64 --target-os=mingw32 
> --cross-prefix=/home/u/Desktop/ffmpeg-windows-build-helpers-withOpenCL-master/sandbox/cross_compilers/mingw-w64-x86_64/bin/x86_64-w64-mingw32-
>  --pkg-config=pkg-config --pkg-config-flags=--static --enable-gray 
> --enable-version3 --disable-debug --disable-doc --disable-htmlpages 
> --disable-manpages --disable-podpages --disable-txtpages --disable-w32threads 
> --enable-nvenc --enable-cuda --enable-cuvid --enable-d3d11va --enable-libsoxr 
> --enable-fontconfig --enable-libass --enable-libbluray --enable-iconv 
> --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug 
> --extra-libs=-lstdc++ --extra-libs=-lpng --extra-libs=-loleaut32 
> --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp 
> --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libopenjpeg 
> --enable-gnutls --enable-libgsm --enable-libfreetype --enable-libopus 
> --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb
> --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack 
> --enable-libwebp --enable-libgme --enable-dxva2 --enable-gray 
> --enable-libopenh264 --enable-libmysofa --enable-libflite --enable-lzma 
> --enable-libsnappy --enable-libzimg --enable-libbs2b --enable-gmp 
> --enable-libfribidi --enable-cross-compile --enable-pic --extra-libs=-lpsapi 
> --extra-libs=-lspeexdsp --disable-schannel --extra-cflags=-DLIBTWOLAME_STATIC 
> --extra-cflags=-DMODPLUG_STATIC --extra-cflags=-DCACA_STATIC --enable-gpl 
> --enable-avisynth --enable-frei0r --enable-filter=frei0r 
> --enable-librubberband --enable-libvidstab --enable-libx264 --enable-libx265 
> --enable-libxavs --enable-libxvid --enable-libmfx --enable-avresample 
> --enable-libcdio --extra-cflags='-mtune=generic' --extra-cflags=-O3 
> --enable-static --disable-shared 
> --prefix=/home/u/Desktop/ffmpeg-windows-build-helpers-withOpenCL-master/sandbox/cross_compilers/mingw-w64-x86_64/x86_64-w64-mingw32
>  --enable-nonfree --enable-decklink
> --enable-libfdk-aac --enable-opencl --enable-runtime-cpudetect 
> --extra-libs=-lcrypt32 --extra-libs=-lshlwapi --extra-libs=-lstdc++ 
> --extra-libs=-lass --extra-libs=-lfontconfig --extra-libs=-lexpat 
> --extra-libs=-lfribidi --extra-libs=-lfreetype --extra-libs=-lharfbuzz 
> --extra-libs=-lbz2 --extra-libs=-llzma --extra-libs=-liconv 
> --extra-libs=-lcdio --extra-libs=-lcdio_paranoia --extra-libs=-lz 
> --extra-libs=-lpsapi --extra-libs=-lspeexdsp --extra-libs=-lgdi32 
> --extra-libs=-lwinmm
> 
>  Invalid device specification "opencl": unknown device type
> Failed to set value 'opencl' for option 'init_hw_device': Invalid argument
> Error parsing global options: Invalid argument
From docs:

-init_hw_device type[=name][:device[,key=value...]]
Initialise a new hardware device of type type called name, using the given 
device parameters. If no name is specified it will receive a default name of 
the form "type%d". 

-filter_hw_device name
Pass the hardware device called name to all filters in any filter graph.

So, eit

Re: [FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

2017-11-26 Thread hydra3333
Most systems have a "clinfo" program which will list all the queryable 
properties of the available devices.



From docs:


-init_hw_device type[=name][:device[,key=value...]]
   Initialise a new hardware device of type type called name, using the 
given device parameters. If no name is specified it will receive a default 
name of the form "type%d".


-filter_hw_device name
   Pass the hardware device called name to all filters in any filter graph.

So, either give it a name explicitly or deduce what the name will be:

-init_hw_device opencl=arbitrary_name:1.0 -filter_hw_device arbitrary_name

or

-init_hw_device opencl:1.0 -filter_hw_device opencl0

(There is some thought of adding an option -opencl_device working 
like -vaapi_device to avoid the two steps for simple upload.  The most 
useful cases all use device derivation/mapping to keep things on the GPU 
side, though, so I'm not entirely sure whether it's actually wanted.)


- Mark

--

OK and Thank you, Mark.

One last question if I may,

Most systems have a "clinfo" program which will list all the queryable 
properties of the available devices.


In both of these cases
-init_hw_device opencl=arbitrary_name:1.0
-init_hw_device opencl:1.0

the device is seen to be "1.0" ... how does one determine that device "id" 
in Windows 10 in order to use it ?
(I have a another PC with intel graphics plus a 1050Ti and need to specify 
the 1050Ti)


I tried to use a clinfo command in DOS under Win10 ... but no go.
Display Adaptor properties didn't show anything obvious to an end-user.
GPU Caps Viewer didn't show anything useful.
ffmpeg -init_hw_device list showed
Supported hardware device types:
cuda
dxva2
qsv
d3d11va
opencl

Thank you.



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avformat/matroskaenc: actually enforce the stream limit

2017-11-26 Thread James Almer
Prevents out of array accesses. Adressess ticket #6873

Signed-off-by: James Almer 
---
 libavformat/matroskaenc.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/libavformat/matroskaenc.c b/libavformat/matroskaenc.c
index dad6d6c93f..06126781f8 100644
--- a/libavformat/matroskaenc.c
+++ b/libavformat/matroskaenc.c
@@ -1859,6 +1859,13 @@ static int mkv_write_header(AVFormatContext *s)
 av_dict_get(s->metadata, "alpha_mode", NULL, 0))
 version = 4;
 
+if (s->nb_streams > MAX_TRACKS) {
+av_log(s, AV_LOG_ERROR,
+   "At most %d streams are supported for muxing in Matroska\n",
+   MAX_TRACKS);
+return AVERROR(EINVAL);
+}
+
 for (i = 0; i < s->nb_streams; i++) {
 if (s->streams[i]->codecpar->codec_id == AV_CODEC_ID_ATRAC3 ||
 s->streams[i]->codecpar->codec_id == AV_CODEC_ID_COOK ||
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] fate/hap : add test for hap encoding

2017-11-26 Thread Martin Vignali
Hello,

Patch in attach, add test for hap encoding (currently not cover) (patch 002)
and move decoding tests to a separate file (patch 001)

decoding can be test with
make fate-hap SAMPLES=fate-suite/

and encoding can be test with
make fate-hapenc SAMPLES=fate-suite/

Hap encoding need ffmpeg compile with libsnappy (--enable-libsnappy)


Martin


0001-fate-hap-move-decoding-test-to-a-separate-file.patch
Description: Binary data


0002-fate-hap-add-test-for-hap-encoding.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 00/15] OpenCL infrastructure, filters

2017-11-26 Thread Mark Thompson
On 26/11/17 16:14, hydra3...@gmail.com wrote:
> Most systems have a "clinfo" program which will list all the queryable 
> properties of the available devices.
> 
>> From docs:
> 
> -init_hw_device type[=name][:device[,key=value...]]
>    Initialise a new hardware device of type type called name, using the given 
> device parameters. If no name is specified it will receive a default name of 
> the form "type%d".
> 
> -filter_hw_device name
>    Pass the hardware device called name to all filters in any filter graph.
> 
> So, either give it a name explicitly or deduce what the name will be:
> 
> -init_hw_device opencl=arbitrary_name:1.0 -filter_hw_device arbitrary_name
> 
> or
> 
> -init_hw_device opencl:1.0 -filter_hw_device opencl0
> 
> (There is some thought of adding an option -opencl_device working like 
> -vaapi_device to avoid the two steps for simple upload.  The most useful 
> cases all use device derivation/mapping to keep things on the GPU side, 
> though, so I'm not entirely sure whether it's actually wanted.)
> 
> - Mark
> 
> --
> 
> OK and Thank you, Mark.
> 
> One last question if I may,
> 
>> Most systems have a "clinfo" program which will list all the queryable 
>> properties of the available devices.
> 
> In both of these cases
> -init_hw_device opencl=arbitrary_name:1.0
> -init_hw_device opencl:1.0
> 
> the device is seen to be "1.0" ... how does one determine that device "id" in 
> Windows 10 in order to use it ?
> (I have a another PC with intel graphics plus a 1050Ti and need to specify 
> the 1050Ti)
> 
> I tried to use a clinfo command in DOS under Win10 ... but no go.
> Display Adaptor properties didn't show anything obvious to an end-user.
> GPU Caps Viewer didn't show anything useful.
> ffmpeg -init_hw_device list showed
> Supported hardware device types:
> cuda
> dxva2
> qsv
> d3d11va
> opencl

The devices found will be listed when run with "-v verbose" (or higher).  So, 
if you pass no additional options as in

ffmpeg -v verbose -init_hw_device opencl

you will get a list of all the devices with their names and indices.

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] avcodec/huffyuvenc : try to call dsp with aligned data, and remove code duplication

2017-11-26 Thread Martin Vignali
Hello,

in attach patchs

0001-avcodec-huffyuvenc-increase-scalar-loop-count
and
0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds

like diff_bytes and diff_bytes16, have AVX2 version, increase the scalar
loop
to call the aligned version in most case



0002-avcodec-huffyuvenc-remove-code-duplication-in
remove some code duplication, for width < 32 and for the initial scalar loop


pass fate test for me (x86_64, mac os 10.12)

Martin


0001-avcodec-huffyuvenc-increase-scalar-loop-count.patch
Description: Binary data


0002-avcodec-huffyuvenc-remove-code-duplication-in.patch
Description: Binary data


0003-avcodec-huffyuvenc-sub_left_prediction_bgr32-call-ds.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] avcodec/utvideodec : use dsp add_median_pred for second line

2017-11-26 Thread Martin Vignali
Hello,

Patch in attach

dsp func need align16 data
make only the start of the line in scalar, and call the dsp for the rest
instead of process the entire line in scalar


pass make fate-utvideo for me

Martin


0001-avcodec-utvideodec-use-dsp-add_median_pred-for-secon.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter/vf_tile: add init_padding option

2017-11-26 Thread Nicolas George
Paul B Mahol (2017-11-24):
> Signed-off-by: Paul B Mahol 
> ---
>  doc/filters.texi  |  4 
>  libavfilter/vf_tile.c | 12 +++-
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/filters.texi b/doc/filters.texi
> index 76929e4db5..11ce0482c2 100644
> --- a/doc/filters.texi
> +++ b/doc/filters.texi
> @@ -14497,6 +14497,10 @@ is "black".
>  @item overlap
>  Set the number of frames to overlap when tiling several successive frames 
> together.
>  The value must be between @code{0} and @var{nb_frames - 1}.
> +

> +@item init_padding
> +Set the number of input frames to initially consume before displaying first 
> output frame.
> +The value must be between @code{0} and @var{nb_frames - 1}.

The documentations says that, the code does the opposite.

>  @end table
>  
>  @subsection Examples
> diff --git a/libavfilter/vf_tile.c b/libavfilter/vf_tile.c
> index 7717ce12e7..c78fa611dd 100644
> --- a/libavfilter/vf_tile.c
> +++ b/libavfilter/vf_tile.c
> @@ -38,6 +38,7 @@ typedef struct TileContext {
>  unsigned margin;
>  unsigned padding;
>  unsigned overlap;
> +unsigned init_padding;
>  unsigned current;
>  unsigned nb_frames;
>  FFDrawContext draw;
> @@ -62,6 +63,8 @@ static const AVOption tile_options[] = {
>  { "color",   "set the color of the unused area", OFFSET(rgba_color), 
> AV_OPT_TYPE_COLOR, {.str = "black"}, .flags = FLAGS },
>  { "overlap", "set how many frames to overlap for each render", 
> OFFSET(overlap),
>  AV_OPT_TYPE_INT, {.i64 = 0}, 0, INT_MAX, FLAGS },
> +{ "init_padding", " set how many frames to initially pad", 
> OFFSET(init_padding),
> +AV_OPT_TYPE_INT, {.i64 = 0}, 0, INT_MAX, FLAGS },
>  { NULL }
>  };
>  
> @@ -99,6 +102,13 @@ static av_cold int init(AVFilterContext *ctx)
>  tile->overlap = tile->nb_frames - 1;
>  }
>  
> +if (tile->init_padding >= tile->nb_frames) {
> +av_log(ctx, AV_LOG_WARNING, "init_padding must be less than %d\n", 
> tile->nb_frames);

> +tile->current = 0;

Unnecessary and confusing.

> +} else {
> +tile->current = tile->init_padding;
> +}
> +
>  return 0;
>  }
>  
> @@ -201,7 +211,7 @@ static int filter_frame(AVFilterLink *inlink, AVFrame 
> *picref)
>  tile->out_ref->height = outlink->h;
>  
>  /* fill surface once for margin/padding */

> -if (tile->margin || tile->padding)
> +if (tile->margin || tile->padding || tile->init_padding != 0)

This change should only be applied to the first frame.

>  ff_fill_rectangle(&tile->draw, &tile->blank,
>tile->out_ref->data,
>tile->out_ref->linesize,

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-26 Thread Marton Balint



On Sun, 26 Nov 2017, James Almer wrote:


On 11/26/2017 12:19 PM, Nicolas George wrote:

James Almer (2017-11-26):

The old decode API is not scheduled for removal right now probably
because 99% of decoders need to be ported.


I think this statement contains some confusion that is harmful to the
discussion.

There are two interfaces worth considering in this discussion: the
application -> library interface, i.e. the avcodec_decode_*dio()
functions, and the framework -> decoder interface, i.e. the decode /
receive_frame / ... callbacks.

When you are stating "because 99% of decoders need to be ported", you
are referring to the framework-decoder interface. On the other hand, the
misuse of the API that is at the origin of this thread is related to the
application-library interface.


Yes, my bad. Got the public API and the internal callbacks mixed. So
ignore that part.
Guess then that the functions did not get a removal schedule because
they are still too ubiquitous downstream.

My second paragraph stands, in any case. I consider this a good chance
to get downstreams to migrate.


Okay, I am exagarating a bit, but unconditionally returning 
AVERROR(ENOSYS) would be an even better incentive, no? :)


We can blame API usage (we should rather blame unclear documentation), but 
no matter how we put it, with the change, we broke the user experience of 
two major projects. If fixing it (at least partially) is so easy, I still 
don't see why we should not do that.


People who still oppose this change, please respond.

Thanks,
Marton
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] lavc: reset codec on receiving packet after EOF in compat_decode

2017-11-26 Thread Nicolas George
Marton Balint (2017-11-26):
> Okay, I am exagarating a bit, but unconditionally returning AVERROR(ENOSYS)
> would be an even better incentive, no? :)

For invalid uses of the API that can be easily avoided by the
application (like not explicitly passing NULL to a function), a hard
crash is even better.

Regards,

-- 
  Nicolas George


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hls demuxer: add option to defer parsing of variants

2017-11-26 Thread Rainer Hochecker
>> +/*
>> + * If this is a live stream and this playlist looks like it is one 
>> segment
>> + * behind, try to sync it up so that every substream starts at the same
>> + * time position (so e.g. avformat_find_stream_info() will see packets 
>> from
>> + * all active streams within the first few seconds). This is not very 
>> generic,
>> + * though, as the sequence numbers are technically independent.
>> + */
>> +highest_cur_seq_no = 0;
>> +for (i = 0; i < c->n_playlists; i++) {
>> +struct playlist *pls = c->playlists[i];
>> +if (!pls->parsed)
>> +continue;
>> +if (pls->cur_seq_no > highest_cur_seq_no)
>> +highest_cur_seq_no = pls->cur_seq_no;
>> +}
>> +if (!pls->finished && pls->cur_seq_no == highest_cur_seq_no - 1 &&
>> +highest_cur_seq_no < pls->start_seq_no + pls->n_segments) {
>> +pls->cur_seq_no = highest_cur_seq_no;
>> +}
>> +
>> +pls->read_buffer = av_malloc(INITIAL_BUFFER_SIZE);
>> +if (!pls->read_buffer){
>> +ret = AVERROR(ENOMEM);
>> +avformat_free_context(pls->ctx);
>> +pls->ctx = NULL;
>> +return ret;
>> +}
>> +ffio_init_context(&pls->pb, pls->read_buffer, INITIAL_BUFFER_SIZE, 0, 
>> pls,
>> +  read_data, NULL, NULL);
>> +pls->pb.seekable = 0;
>> +ret = av_probe_input_buffer(&pls->pb, &in_fmt, pls->segments[0]->url,
>> +NULL, 0, 0);
>> +if (ret < 0) {
>> +/* Free the ctx - it isn't initialized properly at this point,
>> + * so avformat_close_input shouldn't be called. If
>> + * avformat_open_input fails below, it frees and zeros the
>> + * context, so it doesn't need any special treatment like this. */
>> +av_log(c->ctx, AV_LOG_ERROR, "Error when loading first segment 
>> '%s'\n", pls->segments[0]->url);
>> +avformat_free_context(pls->ctx);
>> +pls->ctx = NULL;
>> +return ret;
> Is that pls->read_buffer will memleak?
>
>

yes, looks like this. this is already an issue in current code.
nevertheless, I will fix it here.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avfilter: add normalize filter

2017-11-26 Thread Richard Ling
Thanks Paul.

Thanks also to all reviewers for your comments! It's very helpful to have
extra sets of eyes to find my bugs.

Moritz is right, there is an unused #define, I will try to find time to
patch. Or maybe Paul can remove it

Regards
R.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hls demuxer: add option to defer parsing of variants

2017-11-26 Thread Rainer Hochecker
2017-11-26 12:04 GMT+01:00 Steven Liu :
> 2017-11-26 18:46 GMT+08:00 Rainer Hochecker :
>> fixed mem leak poined out by Steven
> Hi Rainer,
>
> I'm not sure that is memleak, but looks like memleak when reading
> the code, i see the code always in hls.c before this patch, but no
> people report it memleak.
> If that is memleak, maybe use goto method is better way, because
> the workflow of bellow have alloc resource faild check, i will point
> out base on your patch.
>>

Hi Steven,

As soon as you associate the probe buffer with the context, the context cares
about allocated resources. That's most likely the reason why pb was not
cleared here. It is assigned to the context right after this block.

Rainer
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] hls demuxer: add option to defer parsing of variants

2017-11-26 Thread Rainer Hochecker

Variants are presented as programs and can be loaded later by
setting discard flags on the program. Currently Kodi chooses the
program that best matches the desired bit rate.

Rainer
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/3] ffmpeg: use avformat_init_output to initialize output files

2017-11-26 Thread James Almer
Postpone writing the header until the first output packet is ready to be
written.
This makes sure any stream parameter change that could take place while
processing an input frame will be taken into account when writing the
output file header.

Signed-off-by: James Almer 
---
 fftools/ffmpeg.c | 31 ++-
 fftools/ffmpeg.h |  3 +++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/fftools/ffmpeg.c b/fftools/ffmpeg.c
index 0c16e75ab0..07476e88e7 100644
--- a/fftools/ffmpeg.c
+++ b/fftools/ffmpeg.c
@@ -700,7 +700,7 @@ static void write_packet(OutputFile *of, AVPacket *pkt, 
OutputStream *ost, int u
 ost->frame_number++;
 }
 
-if (!of->header_written) {
+if (!of->initialized) {
 AVPacket tmp_pkt = {0};
 /* the muxer is not initialized yet, buffer the packet */
 if (!av_fifo_space(ost->muxing_queue)) {
@@ -804,6 +804,17 @@ static void write_packet(OutputFile *of, AVPacket *pkt, 
OutputStream *ost, int u
   );
 }
 
+if (!of->header_written) {
+ret = avformat_write_header(s, &of->opts);
+if (ret < 0) {
+av_log(NULL, AV_LOG_ERROR,
+   "Could not write header for output file #%d: %s\n",
+   ost->file_index, av_err2str(ret));
+exit_program(1);
+}
+of->header_written = 1;
+}
+
 ret = av_interleaved_write_frame(s, pkt);
 if (ret < 0) {
 print_error("av_interleaved_write_frame()", ret);
@@ -2756,7 +2767,7 @@ static void print_sdp(void)
 AVFormatContext **avc;
 
 for (i = 0; i < nb_output_files; i++) {
-if (!output_files[i]->header_written)
+if (!output_files[i]->initialized)
 return;
 }
 
@@ -2947,16 +2958,26 @@ static int check_init_output_file(OutputFile *of, int 
file_index)
 
 of->ctx->interrupt_callback = int_cb;
 
-ret = avformat_write_header(of->ctx, &of->opts);
+ret = avformat_init_output(of->ctx, &of->opts);
 if (ret < 0) {
 av_log(NULL, AV_LOG_ERROR,
-   "Could not write header for output file #%d "
+   "Could not initialize output file #%d "
"(incorrect codec parameters ?): %s\n",
file_index, av_err2str(ret));
 return ret;
 }
 //assert_avoptions(of->opts);
-of->header_written = 1;
+of->initialized = ret;
+if (!ret) {
+ret = avformat_write_header(of->ctx, &of->opts);
+if (ret < 0) {
+av_log(NULL, AV_LOG_ERROR,
+   "Could not write header for output file #%d: %s\n",
+   file_index, av_err2str(ret));
+return ret;
+}
+of->initialized = of->header_written = 1;
+}
 
 av_dump_format(of->ctx, file_index, of->ctx->filename, 1);
 
diff --git a/fftools/ffmpeg.h b/fftools/ffmpeg.h
index e0977e1bf1..c46ffd8b03 100644
--- a/fftools/ffmpeg.h
+++ b/fftools/ffmpeg.h
@@ -571,6 +571,9 @@ typedef struct OutputFile {
 
 int shortest;
 
+// avformat_init_output() has been called for this file
+int initialized;
+// avformat_write_header() has been called for this file
 int header_written;
 } OutputFile;
 
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 3/3] avformat: deprecate AVFMT_FLAG_AUTO_BSF

2017-11-26 Thread James Almer
The bitstream filters inserted by this option should not be optional.
They are needed to succesfully mux files in some cases, and to prevent
muxing broken files in others.

This is more in line with AVCodec.bsfs()

Signed-off-by: James Almer 
---
 libavformat/avformat.h  | 4 +++-
 libavformat/mux.c   | 3 ---
 libavformat/options_table.h | 6 --
 libavformat/version.h   | 3 +++
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/libavformat/avformat.h b/libavformat/avformat.h
index 4f2798a871..e39e5e4ada 100644
--- a/libavformat/avformat.h
+++ b/libavformat/avformat.h
@@ -1449,7 +1449,9 @@ typedef struct AVFormatContext {
 #endif
 #define AVFMT_FLAG_FAST_SEEK   0x8 ///< Enable fast, but inaccurate seeks 
for some formats
 #define AVFMT_FLAG_SHORTEST   0x10 ///< Stop muxing when the shortest 
stream stops.
-#define AVFMT_FLAG_AUTO_BSF   0x20 ///< Add bitstream filters as requested 
by the muxer
+#if FF_API_LAVF_AUTO_BSF_FLAG
+#define AVFMT_FLAG_AUTO_BSF   0x20 ///< Deprecated, does nothing.
+#endif
 
 /**
  * Maximum size of the data read from input for determining
diff --git a/libavformat/mux.c b/libavformat/mux.c
index ebb9102f11..382c20fc63 100644
--- a/libavformat/mux.c
+++ b/libavformat/mux.c
@@ -825,9 +825,6 @@ static int do_packet_auto_bsf(AVFormatContext *s, AVPacket 
*pkt) {
 AVStream *st = s->streams[pkt->stream_index];
 int i, ret;
 
-if (!(s->flags & AVFMT_FLAG_AUTO_BSF))
-return 1;
-
 if (s->oformat->check_bitstream) {
 if (!st->internal->bitstream_checked) {
 if ((ret = s->oformat->check_bitstream(s, pkt)) < 0)
diff --git a/libavformat/options_table.h b/libavformat/options_table.h
index b8fa47c6fd..8b1e6f3d18 100644
--- a/libavformat/options_table.h
+++ b/libavformat/options_table.h
@@ -39,7 +39,7 @@ static const AVOption avformat_options[] = {
 {"probesize", "set probing size", OFFSET(probesize), AV_OPT_TYPE_INT64, {.i64 
= 500 }, 32, INT64_MAX, D},
 {"formatprobesize", "number of bytes to probe file format", 
OFFSET(format_probesize), AV_OPT_TYPE_INT, {.i64 = PROBE_BUF_MAX}, 0, 
INT_MAX-1, D},
 {"packetsize", "set packet size", OFFSET(packet_size), AV_OPT_TYPE_INT, {.i64 
= DEFAULT }, 0, INT_MAX, E},
-{"fflags", NULL, OFFSET(flags), AV_OPT_TYPE_FLAGS, {.i64 = AVFMT_FLAG_AUTO_BSF 
}, INT_MIN, INT_MAX, D|E, "fflags"},
+{"fflags", NULL, OFFSET(flags), AV_OPT_TYPE_FLAGS, {.i64 = 0 }, INT_MIN, 
INT_MAX, D|E, "fflags"},
 {"flush_packets", "reduce the latency by flushing out packets immediately", 0, 
AV_OPT_TYPE_CONST, {.i64 = AVFMT_FLAG_FLUSH_PACKETS }, INT_MIN, INT_MAX, E, 
"fflags"},
 {"ignidx", "ignore index", 0, AV_OPT_TYPE_CONST, {.i64 = AVFMT_FLAG_IGNIDX }, 
INT_MIN, INT_MAX, D, "fflags"},
 {"genpts", "generate pts", 0, AV_OPT_TYPE_CONST, {.i64 = AVFMT_FLAG_GENPTS }, 
INT_MIN, INT_MAX, D, "fflags"},
@@ -57,7 +57,9 @@ static const AVOption avformat_options[] = {
 {"seek2any", "allow seeking to non-keyframes on demuxer level when supported", 
OFFSET(seek2any), AV_OPT_TYPE_BOOL, {.i64 = 0 }, 0, 1, D},
 {"bitexact", "do not write random/volatile data", 0, AV_OPT_TYPE_CONST, { .i64 
= AVFMT_FLAG_BITEXACT }, 0, 0, E, "fflags" },
 {"shortest", "stop muxing with the shortest stream", 0, AV_OPT_TYPE_CONST, { 
.i64 = AVFMT_FLAG_SHORTEST }, 0, 0, E, "fflags" },
-{"autobsf", "add needed bsfs automatically", 0, AV_OPT_TYPE_CONST, { .i64 = 
AVFMT_FLAG_AUTO_BSF }, 0, 0, E, "fflags" },
+#if FF_API_LAVF_AUTO_BSF_FLAG
+{"autobsf", "deprecated, does nothing", 0, AV_OPT_TYPE_CONST, { .i64 = 
AVFMT_FLAG_AUTO_BSF }, 0, 0, E, "fflags" },
+#endif
 {"analyzeduration", "specify how many microseconds are analyzed to probe the 
input", OFFSET(max_analyze_duration), AV_OPT_TYPE_INT64, {.i64 = 0 }, 0, 
INT64_MAX, D},
 {"cryptokey", "decryption key", OFFSET(key), AV_OPT_TYPE_BINARY, {.dbl = 0}, 
0, 0, D},
 {"indexmem", "max memory used for timestamp index (per stream)", 
OFFSET(max_index_size), AV_OPT_TYPE_INT, {.i64 = 1<<20 }, 0, INT_MAX, D},
diff --git a/libavformat/version.h b/libavformat/version.h
index feb1461c41..d2427dd875 100644
--- a/libavformat/version.h
+++ b/libavformat/version.h
@@ -82,6 +82,9 @@
 #ifndef FF_API_OLD_AVIO_EOF_0
 #define FF_API_OLD_AVIO_EOF_0   (LIBAVFORMAT_VERSION_MAJOR < 59)
 #endif
+#ifndef FF_API_LAVF_AUTO_BSF_FLAG
+#define FF_API_LAVF_AUTO_BSF_FLAG   (LIBAVFORMAT_VERSION_MAJOR < 59)
+#endif
 
 
 #ifndef FF_API_R_FRAME_RATE
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/3] avformat/mux: stop delaying writing the header

2017-11-26 Thread James Almer
Every bitstream filter behaves as intended now, so there's no need to
wait for the first packet of every stream.

Signed-off-by: James Almer 
---
 libavformat/avformat.h |  2 +-
 libavformat/internal.h |  6 -
 libavformat/mux.c  | 52 --
 libavformat/options_table.h|  2 +-
 libavformat/tests/fifo_muxer.c | 52 --
 tests/ref/fate/fifo-muxer-tst  |  1 -
 6 files changed, 12 insertions(+), 103 deletions(-)

diff --git a/libavformat/avformat.h b/libavformat/avformat.h
index 322210fae0..4f2798a871 100644
--- a/libavformat/avformat.h
+++ b/libavformat/avformat.h
@@ -1449,7 +1449,7 @@ typedef struct AVFormatContext {
 #endif
 #define AVFMT_FLAG_FAST_SEEK   0x8 ///< Enable fast, but inaccurate seeks 
for some formats
 #define AVFMT_FLAG_SHORTEST   0x10 ///< Stop muxing when the shortest 
stream stops.
-#define AVFMT_FLAG_AUTO_BSF   0x20 ///< Wait for packet data before 
writing a header, and add bitstream filters as requested by the muxer
+#define AVFMT_FLAG_AUTO_BSF   0x20 ///< Add bitstream filters as requested 
by the muxer
 
 /**
  * Maximum size of the data read from input for determining
diff --git a/libavformat/internal.h b/libavformat/internal.h
index fcd47840a5..36a57214ce 100644
--- a/libavformat/internal.h
+++ b/libavformat/internal.h
@@ -120,12 +120,6 @@ struct AVFormatInternal {
 
 int avoid_negative_ts_use_pts;
 
-/**
- * Whether or not a header has already been written
- */
-int header_written;
-int write_header_ret;
-
 /**
  * Timestamp of the end of the shortest stream.
  */
diff --git a/libavformat/mux.c b/libavformat/mux.c
index b1244c67f3..ebb9102f11 100644
--- a/libavformat/mux.c
+++ b/libavformat/mux.c
@@ -458,25 +458,6 @@ static void flush_if_needed(AVFormatContext *s)
 }
 }
 
-static int write_header_internal(AVFormatContext *s)
-{
-if (!(s->oformat->flags & AVFMT_NOFILE) && s->pb)
-avio_write_marker(s->pb, AV_NOPTS_VALUE, AVIO_DATA_MARKER_HEADER);
-if (s->oformat->write_header) {
-int ret = s->oformat->write_header(s);
-if (ret >= 0 && s->pb && s->pb->error < 0)
-ret = s->pb->error;
-s->internal->write_header_ret = ret;
-if (ret < 0)
-return ret;
-flush_if_needed(s);
-}
-s->internal->header_written = 1;
-if (!(s->oformat->flags & AVFMT_NOFILE) && s->pb)
-avio_write_marker(s->pb, AV_NOPTS_VALUE, AVIO_DATA_MARKER_UNKNOWN);
-return 0;
-}
-
 int avformat_init_output(AVFormatContext *s, AVDictionary **options)
 {
 int ret = 0;
@@ -515,11 +496,18 @@ int avformat_write_header(AVFormatContext *s, 
AVDictionary **options)
 if ((ret = avformat_init_output(s, options)) < 0)
 return ret;
 
-if (!(s->oformat->check_bitstream && s->flags & AVFMT_FLAG_AUTO_BSF)) {
-ret = write_header_internal(s);
+if (!(s->oformat->flags & AVFMT_NOFILE) && s->pb)
+avio_write_marker(s->pb, AV_NOPTS_VALUE, AVIO_DATA_MARKER_HEADER);
+if (s->oformat->write_header) {
+int ret = s->oformat->write_header(s);
+if (ret >= 0 && s->pb && s->pb->error < 0)
+ret = s->pb->error;
 if (ret < 0)
 goto fail;
+flush_if_needed(s);
 }
+if (!(s->oformat->flags & AVFMT_NOFILE) && s->pb)
+avio_write_marker(s->pb, AV_NOPTS_VALUE, AVIO_DATA_MARKER_UNKNOWN);
 
 if (!s->internal->streams_initialized) {
 if ((ret = init_pts(s)) < 0)
@@ -739,12 +727,6 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt)
 }
 }
 
-if (!s->internal->header_written) {
-ret = s->internal->write_header_ret ? s->internal->write_header_ret : 
write_header_internal(s);
-if (ret < 0)
-goto fail;
-}
-
 if ((pkt->flags & AV_PKT_FLAG_UNCODED_FRAME)) {
 AVFrame *frame = (AVFrame *)pkt->data;
 av_assert0(pkt->size == UNCODED_FRAME_PACKET_SIZE);
@@ -760,8 +742,6 @@ static int write_packet(AVFormatContext *s, AVPacket *pkt)
 ret = s->pb->error;
 }
 
-fail:
-
 if (ret < 0) {
 pkt->pts = pts_backup;
 pkt->dts = dts_backup;
@@ -894,11 +874,6 @@ int av_write_frame(AVFormatContext *s, AVPacket *pkt)
 
 if (!pkt) {
 if (s->oformat->flags & AVFMT_ALLOW_FLUSH) {
-if (!s->internal->header_written) {
-ret = s->internal->write_header_ret ? 
s->internal->write_header_ret : write_header_internal(s);
-if (ret < 0)
-return ret;
-}
 ret = s->oformat->write_packet(s, NULL);
 flush_if_needed(s);
 if (ret >= 0 && s->pb && s->pb->error < 0)
@@ -1282,14 +1257,8 @@ int av_write_trailer(AVFormatContext *s)
 goto fail;
 }
 
-if (!s->internal->header_written) {
-ret = s->internal->write_header_ret ? s->internal->write_header_ret : 
write_

[FFmpeg-devel] avcodec/utvideodec : add x86 SIMD (SSSE3) for gradient prediction

2017-11-26 Thread Martin Vignali
Hello,

Patch in attach add SIMD (SSSE 3) for gradient prediction
and a checkasm test

Checkasm result (width = 1024) (kaby lake, macos 10.12)
add_gradient_pred_c: 1708.8
add_gradient_pred_ssse3: 533.0

Benchmark on a 3 min HD File in gradient (422)
without SIMD :
bench: utime=102.695s
bench: maxrss=102592512kB

with SIMD
bench: utime=91.712s
bench: maxrss=102543360kB



i will add AVX2 version later (need more clean before submitting, and will
conflict with another patch add_left_pred avx2 version)

This new dsp func, can probably also be use by magicyuv decoder


i'm not sure about the best asm way for load an uint8_t to all part of an
xmm,

Comment welcome

Martin
Jokyo Images


0001-avcodec-utvideodec-add-SIMD-SSSE3-for-gradient_pred.patch
Description: Binary data


0002-checkasm-llviddsp-add-test-for-add_gradient_pred.patch
Description: Binary data
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] avformat: deprecate AVFMT_FLAG_AUTO_BSFr

2017-11-26 Thread Clément Bœsch
On Sun, Nov 26, 2017 at 05:51:04PM -0300, James Almer wrote:
> The bitstream filters inserted by this option should not be optional.

Will ffmpeg error out if it's built without the required bsf? (or is there
a hard dep in the configure?)

-- 
Clément B.


signature.asc
Description: PGP signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Jim DeLaHunt

On 2017-11-26 04:38, Paul B Mahol wrote:

On 11/26/17, Nicolas George  wrote:

Paul B Mahol (2017-11-26):

Your opinions are irrelevant.

# Be friendly and respectful towards others and third parties.
# Treat others the way you yourself want to be treated.

Please stop trampling the code of conduct.

Please stop being extremly rude and ignorant of other people's work.
Paul, I am new on this list, but it seems to me that the only one being 
rude in this thread is you. I certainly hope this behaviour is not how 
you interpret "Be friendly and respectful towards others and third parties".


--
--Jim DeLaHunt, j...@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
  multilingual websites consultant

  355-1027 Davie St, Vancouver BC V6E 4L2, Canada
 Canada mobile +1-604-376-8953

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] Refactor Developer Docs, update dev list section (v2)

2017-11-26 Thread Paul B Mahol
On 11/26/17, Jim DeLaHunt  wrote:
> On 2017-11-26 04:38, Paul B Mahol wrote:
>> On 11/26/17, Nicolas George  wrote:
>>> Paul B Mahol (2017-11-26):
 Your opinions are irrelevant.
>>> # Be friendly and respectful towards others and third parties.
>>> # Treat others the way you yourself want to be treated.
>>>
>>> Please stop trampling the code of conduct.
>> Please stop being extremly rude and ignorant of other people's work.
> Paul, I am new on this list, but it seems to me that the only one being
> rude in this thread is you. I certainly hope this behaviour is not how
> you interpret "Be friendly and respectful towards others and third parties".

 I'm not first who started it, and since you are new to list you missed
 lots of flame wars...
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] Policy on ffmpeg-devel list and contributions [was: Re: [PATCH] Refactor Developer Docs, update dev list section (v2)]

2017-11-26 Thread Jim DeLaHunt

On 2017-11-26 03:42, Carl Eugen Hoyos wrote:

2017-11-26 9:31 GMT+01:00 Jim DeLaHunt :

-@subsection Documentation/Other
+@section Documentation/Other
+@subheading Subscribe to the ffmpeg-devel mailing list.
+It is important to be subscribed to the

Of course it is important but I would much, much prefer
if people send their patches without being subscribed
than not sending their patches because it is implied
that they cannot send patches if they don't want to
subscribe
But if people are not interested in improving their contribution,
I would still prefer the patches to be sent.


So, how realistic is this concern about non-subscribers sending patches 
to ffmpeg-devel?  Does it actually happen? Can you point to, say, three 
patches in the last six months which were sent by non-subscribers to 
ffmpeg-devel and were applied to the code base?


Given how so many of the patches submitted by subscribers who know the 
unwritten rules are subjected to veto and revision, I would be surprised 
if many non-subscribers who are ignorant of the unwritten rules would 
produce something satisfactory.


That said, would your concern be addressed if I were to add this sentence:

   However, it is more important to the project that we receive your
   patch than that you be subscribed to the ffmpeg-devel list. If you
   have a patch, and don't want to subscribe and discuss the patch,
   then please do send it to the list.

(I am tempted to add a phrase like, "If you want to send your patch to 
ffmpeg-devel without discussion, as if  abandoning your baby on the 
steps of the orphanage, please do; one of the kind caregivers on the 
list may pick it up and find it a good home."  But this is probably too 
snarky to be appropriate.)



+@uref{https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-devel, ffmpeg-devel}
+mailing list, because any patch you contribute must be sent there

No:
I believe it is very important that trivial patches are not sent
to the development mailing list - its volume is already so big
that some patches are sadly (!) forgotten.
Tell me more about the procedure for trivial patches. I have not seen 
this documented, and I don't know about it. Does this apply to 
occasional contributors, or only to trusted experienced ffmpeg project 
members with commit privileges to the repository?


The proposed text does not distinguish between occasional contributors 
and experienced project members. Maybe it should. I believe that the 
main audience of `doc/developer.html` is new and occasional 
contributors, because the experienced members will have internalised all 
the undocumented norms, and won't be referring to this page.


What revised wording do you propose for the above phrase "any patch you 
contribute must be sent there"?



+Also, this list is where bugs and possible improvements or

I believe this is misleading or even wrong.
Oh?  I took this wording from the existing 
 regarding the 
ffmpeg-cvslog list:
"Bugs and possible improvements or general questions regarding commits 
are discussed there."

What is misleading or wrong about this wording? What is your objection?

What alternate wording would you propose for this sentence, which 
describes why contributors should pay attention to the content of 
ffmpeg-devel?

+general questions regarding commits are discussed. That may be helpful
+information as you write your contribution. Finally, by being a list
+subscriber your contribution will be posted immediately to the list,
+without the moderation hold which messages from non-subscribers experience.
+


[...]

I think what is important about this new section is that it describes 
the policy and importance of the ffmpeg-devel list. It's interesting 
that the project had not put this into words in the current 
documentation. I'm trying to do that.  Carl Eugen, you are quick to 
object to what you don't like about proposed wording. I think it's 
especially important that you suggest wording that does capture what you 
do support. You obviously care.


Best regards,
 —Jim DeLaHunt

--
--Jim DeLaHunt, j...@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
  multilingual websites consultant

  355-1027 Davie St, Vancouver BC V6E 4L2, Canada
 Canada mobile +1-604-376-8953

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] avformat: deprecate AVFMT_FLAG_AUTO_BSFr

2017-11-26 Thread James Almer
On 11/26/2017 6:16 PM, Clément Bœsch wrote:
> On Sun, Nov 26, 2017 at 05:51:04PM -0300, James Almer wrote:
>> The bitstream filters inserted by this option should not be optional.
> 
> Will ffmpeg error out if it's built without the required bsf? (or is there
> a hard dep in the configure?)

Mmh, no, as is it will silently keep going.

I'll resend the patch making the muxers select the required bsfs.
Although admittedly there is no way to guarantee they will be available
even with that, since they are in a different library and you could
always dynamically load a different lavc.

Maybe we could emit a warning at runtime if the bsfs is required but not
compiled in?
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] Accurately describing ffmpeg-cvslog list [was: Re: [PATCH] Refactor Developer Docs, update dev list section (v2)]

2017-11-26 Thread Jim DeLaHunt

On 2017-11-26 03:42, Carl Eugen Hoyos wrote:


2017-11-26 9:31 GMT+01:00 Jim DeLaHunt :
[...]

+
  @subheading Subscribe to the ffmpeg-cvslog mailing list.
-It is important to do this as the diffs of all commits are sent there and
-reviewed by all the other developers. Bugs and possible improvements or
-general questions regarding commits are discussed there. We expect you to
-react if problems with your code are uncovered.
+Diffs of all commits are sent to the
+@uref{https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-cvslog, ffmpeg-cvslog}
+mailing list. Some developers read this list to review all code base changes
+from all sources. Subscribing to this list is not mandatory, if
+all you want to do is submit a patch here and there.

I am (still) against this change.


OK, what specifically are you against?  More important, what are you in 
favour of?


It's difficult for me to read your mind via email.  Would you please 
read the existing section, "Subscribe to the ffmpeg-cvslog mailing 
list."[1], and give wording which to you describes accurately the 
current reality?


I'll observe that we have already heard other opinions:

 * Paul[2]: "Not at all. To be a contributor, it is not needed to
   subscribe to [ffmpeg-cvslog] list."
 * Timo[3]: "Usually if a discussion comes up the mail from cvslog is
   replied to on [ffmpeg-devel] list, so no actual discussion happens
   on the automatic cvslog list."

I don't have strong feelings on the policy for the -cvslog list, except 
that the current documentation is clearly inaccurate in describing the 
current reality. That is obvious even on a short exposure to the 
community.  "Bugs and possible improvements or general questions 
regarding commits" are /not/ discussed "there", on ffmpeg-cvslog. The 
statement "We expect you to react if problems with your code are 
uncovered." is correct, but more accurately describes behaviour on 
ffmpeg-devel, not ffmpeg-cvslog.



Sorry, Carl Eugen


It's great that you care.  What wording do you support?

Best regards,
 —Jim DeLaHunt

[1] 
[2] 

[3] 



--
--Jim DeLaHunt, j...@jdlh.com http://blog.jdlh.com/ (http://jdlh.com/)
  multilingual websites consultant

  355-1027 Davie St, Vancouver BC V6E 4L2, Canada
 Canada mobile +1-604-376-8953

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 12/17] vaapi_decode: Ignore the profile when not useful

2017-11-26 Thread Mark Thompson
On 24/11/17 16:50, Philip Langdale wrote:
> On Fri, 24 Nov 2017 00:51:29 +
> Mark Thompson  wrote:
> 
>> Enables VP8 decoding - the decoder places the the bitstream version
>> in the profile field, which we want to ignore.
>> ---
>>  libavcodec/vaapi_decode.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/vaapi_decode.c b/libavcodec/vaapi_decode.c
>> index d36ef906a2..572b3a40ac 100644
>> --- a/libavcodec/vaapi_decode.c
>> +++ b/libavcodec/vaapi_decode.c
>> @@ -324,7 +324,8 @@ static int
>> vaapi_decode_make_config(AVCodecContext *avctx, int profile_match = 0;
>>  if (avctx->codec_id != vaapi_profile_map[i].codec_id)
>>  continue;
>> -if (avctx->profile == vaapi_profile_map[i].codec_profile)
>> +if (avctx->profile == vaapi_profile_map[i].codec_profile ||
>> +vaapi_profile_map[i].codec_profile == FF_PROFILE_UNKNOWN)
>>  profile_match = 1;
>>  for (j = 0; j < profile_count; j++) {
>>  if (vaapi_profile_map[i].va_profile == profile_list[j]) {
> 
> First 12 parts look good.

First 12 applied; I have a bit more to do on MJPEG hwaccel for the rest.

Given how many small things got touched here it is quite likely that something 
has broken with this - I've tried to get some testing on all of the affected 
platforms, but do tell me if you find anything further and I'll try to fix it 
asap.

Thanks to everyone who commented on / reviewed this series :)

- Mark
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec: Implement vp8 nvdec hwaccel

2017-11-26 Thread Philip Langdale
Signed-off-by: Philip Langdale 
---
 Changelog  |  2 +-
 configure  |  2 ++
 libavcodec/Makefile|  1 +
 libavcodec/hwaccels.h  |  1 +
 libavcodec/nvdec.c |  1 +
 libavcodec/nvdec_vp8.c | 97 ++
 libavcodec/version.h   |  3 +-
 libavcodec/vp8.c   |  6 
 8 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 libavcodec/nvdec_vp8.c

diff --git a/Changelog b/Changelog
index e3092e211f..4db1d57721 100644
--- a/Changelog
+++ b/Changelog
@@ -13,7 +13,7 @@ version :
 - PCE support for extended channel layouts in the AAC encoder
 - native aptX encoder and decoder
 - Raw aptX muxer and demuxer
-- NVIDIA NVDEC-accelerated H.264, HEVC, MPEG-1/2/4, VC1 and VP9 hwaccel 
decoding
+- NVIDIA NVDEC-accelerated H.264, HEVC, MPEG-1/2/4, VC1, VP8 and VP9 hwaccel 
decoding
 - Intel QSV-accelerated overlay filter
 - mcompand audio filter
 - acontrast audio filter
diff --git a/configure b/configure
index bc00b71489..e5fa61e83d 100755
--- a/configure
+++ b/configure
@@ -2748,6 +2748,8 @@ vc1_vaapi_hwaccel_deps="vaapi"
 vc1_vaapi_hwaccel_select="vc1_decoder"
 vc1_vdpau_hwaccel_deps="vdpau"
 vc1_vdpau_hwaccel_select="vc1_decoder"
+vp8_nvdec_hwaccel_deps="nvdec"
+vp8_nvdec_hwaccel_select="vp8_decoder"
 vp8_vaapi_hwaccel_deps="vaapi VAPictureParameterBufferVP8"
 vp8_vaapi_hwaccel_select="vp8_decoder"
 vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
diff --git a/libavcodec/Makefile b/libavcodec/Makefile
index 640edfb590..ca7960cdf4 100644
--- a/libavcodec/Makefile
+++ b/libavcodec/Makefile
@@ -872,6 +872,7 @@ OBJS-$(CONFIG_VC1_NVDEC_HWACCEL)  += nvdec_vc1.o
 OBJS-$(CONFIG_VC1_QSV_HWACCEL)+= qsvdec_other.o
 OBJS-$(CONFIG_VC1_VAAPI_HWACCEL)  += vaapi_vc1.o
 OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)  += vdpau_vc1.o
+OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)  += nvdec_vp8.o
 OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)  += vaapi_vp8.o
 OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)+= dxva2_vp9.o
 OBJS-$(CONFIG_VP9_DXVA2_HWACCEL)  += dxva2_vp9.o
diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
index cefd2b15be..420e2feeea 100644
--- a/libavcodec/hwaccels.h
+++ b/libavcodec/hwaccels.h
@@ -60,6 +60,7 @@ extern const AVHWAccel ff_vc1_dxva2_hwaccel;
 extern const AVHWAccel ff_vc1_nvdec_hwaccel;
 extern const AVHWAccel ff_vc1_vaapi_hwaccel;
 extern const AVHWAccel ff_vc1_vdpau_hwaccel;
+extern const AVHWAccel ff_vp8_nvdec_hwaccel;
 extern const AVHWAccel ff_vp8_vaapi_hwaccel;
 extern const AVHWAccel ff_vp9_d3d11va_hwaccel;
 extern const AVHWAccel ff_vp9_d3d11va2_hwaccel;
diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c
index da4451a739..c7a02ff40f 100644
--- a/libavcodec/nvdec.c
+++ b/libavcodec/nvdec.c
@@ -58,6 +58,7 @@ static int map_avcodec_id(enum AVCodecID id)
 case AV_CODEC_ID_MPEG2VIDEO: return cudaVideoCodec_MPEG2;
 case AV_CODEC_ID_MPEG4:  return cudaVideoCodec_MPEG4;
 case AV_CODEC_ID_VC1:return cudaVideoCodec_VC1;
+case AV_CODEC_ID_VP8:return cudaVideoCodec_VP8;
 case AV_CODEC_ID_VP9:return cudaVideoCodec_VP9;
 case AV_CODEC_ID_WMV3:   return cudaVideoCodec_VC1;
 }
diff --git a/libavcodec/nvdec_vp8.c b/libavcodec/nvdec_vp8.c
new file mode 100644
index 00..6fc0ac7ded
--- /dev/null
+++ b/libavcodec/nvdec_vp8.c
@@ -0,0 +1,97 @@
+/*
+ * VP8 HW decode acceleration through NVDEC
+ *
+ * Copyright (c) 2017 Philip Langdale
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#include "avcodec.h"
+#include "nvdec.h"
+#include "decode.h"
+#include "internal.h"
+#include "vp8.h"
+
+static unsigned char safe_get_ref_idx(VP8Frame *frame)
+{
+return frame ? ff_nvdec_get_ref_idx(frame->tf.f) : 255;
+}
+
+static int nvdec_vp8_start_frame(AVCodecContext *avctx, const uint8_t *buffer, 
uint32_t size)
+{
+VP8Context *h = avctx->priv_data;
+
+NVDECContext  *ctx = avctx->internal->hwaccel_priv_data;
+CUVIDPICPARAMS *pp = &ctx->pic_params;
+FrameDecodeData *fdd;
+NVDECFrame *cf;
+AVFrame *cur_frame = h->framep[VP56_FRAME_CURRENT]->tf.f;
+
+int ret;
+
+ret = ff_nvdec_start_frame(avctx, cur_frame);
+if (ret < 0)
+return ret;
+
+fdd = (FrameDecodeData*)cur_frame->private_

Re: [FFmpeg-devel] [PATCH v3 3/3] error_resilience: remove avpriv_atomic usage

2017-11-26 Thread Michael Niedermayer
On Sat, Nov 25, 2017 at 05:01:57PM +, Rostislav Pehlivanov wrote:
> Signed-off-by: Rostislav Pehlivanov 
> ---
>  libavcodec/error_resilience.c | 20 ++--
>  libavcodec/error_resilience.h |  3 ++-
>  2 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/libavcodec/error_resilience.c b/libavcodec/error_resilience.c
> index 0c7f29d171..8f172beca6 100644
> --- a/libavcodec/error_resilience.c
> +++ b/libavcodec/error_resilience.c
> @@ -807,7 +807,7 @@ void ff_er_frame_start(ERContext *s)
>  
>  memset(s->error_status_table, ER_MB_ERROR | VP_START | ER_MB_END,
> s->mb_stride * s->mb_height * sizeof(uint8_t));
> -s->error_count= 3 * s->mb_num;
> +atomic_init(&s->error_count, 3 * s->mb_num);
>  s->error_occurred = 0;
>  }
>  
> @@ -852,20 +852,20 @@ void ff_er_add_slice(ERContext *s, int startx, int 
> starty,
>  mask &= ~VP_START;
>  if (status & (ER_AC_ERROR | ER_AC_END)) {
>  mask   &= ~(ER_AC_ERROR | ER_AC_END);
> -avpriv_atomic_int_add_and_fetch(&s->error_count, start_i - end_i - 
> 1);
> +atomic_fetch_add(&s->error_count, start_i - end_i - 1);
>  }
>  if (status & (ER_DC_ERROR | ER_DC_END)) {
>  mask   &= ~(ER_DC_ERROR | ER_DC_END);
> -avpriv_atomic_int_add_and_fetch(&s->error_count, start_i - end_i - 
> 1);
> +atomic_fetch_add(&s->error_count, start_i - end_i - 1);
>  }
>  if (status & (ER_MV_ERROR | ER_MV_END)) {
>  mask   &= ~(ER_MV_ERROR | ER_MV_END);
> -avpriv_atomic_int_add_and_fetch(&s->error_count, start_i - end_i - 
> 1);
> +atomic_fetch_add(&s->error_count, start_i - end_i - 1);
>  }
>  
>  if (status & ER_MB_ERROR) {
>  s->error_occurred = 1;
> -avpriv_atomic_int_set(&s->error_count, INT_MAX);
> +atomic_store(&s->error_count, INT_MAX);
>  }
>  
>  if (mask == ~0x7F) {
> @@ -878,7 +878,7 @@ void ff_er_add_slice(ERContext *s, int startx, int starty,
>  }
>  
>  if (end_i == s->mb_num)
> -avpriv_atomic_int_set(&s->error_count, INT_MAX);
> +atomic_store(&s->error_count, INT_MAX);
>  else {
>  s->error_status_table[end_xy] &= mask;
>  s->error_status_table[end_xy] |= status;
> @@ -893,7 +893,7 @@ void ff_er_add_slice(ERContext *s, int startx, int starty,
>  prev_status &= ~ VP_START;
>  if (prev_status != (ER_MV_END | ER_DC_END | ER_AC_END)) {
>  s->error_occurred = 1;
> -avpriv_atomic_int_set(&s->error_count, INT_MAX);
> +atomic_store(&s->error_count, INT_MAX);
>  }
>  }
>  }
> @@ -910,10 +910,10 @@ void ff_er_frame_end(ERContext *s)
>  
>  /* We do not support ER of field pictures yet,
>   * though it should not crash if enabled. */
> -if (!s->avctx->error_concealment || s->error_count == 0||
> +if (!s->avctx->error_concealment || !atomic_load(&s->error_count)  ||
>  s->avctx->lowres   ||
>  !er_supported(s)   ||
> -s->error_count == 3 * s->mb_width *
> +atomic_load(&s->error_count) == 3 * s->mb_width *
>(s->avctx->skip_top + s->avctx->skip_bottom)) {
>  return;
>  }
> @@ -927,7 +927,7 @@ void ff_er_frame_end(ERContext *s)
>  if (   mb_x == s->mb_width
>  && s->avctx->codec_id == AV_CODEC_ID_MPEG2VIDEO
>  && (FFALIGN(s->avctx->height, 16)&16)
> -&& s->error_count == 3 * s->mb_width * (s->avctx->skip_top + 
> s->avctx->skip_bottom + 1)
> +&& atomic_load(&s->error_count) == 3 * s->mb_width * 
> (s->avctx->skip_top + s->avctx->skip_bottom + 1)
>  ) {
>  av_log(s->avctx, AV_LOG_DEBUG, "ignoring last missing slice\n");
>  return;

looking at this again , I suspect these can use some more
lax memory ordering

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Concerning the gods, I have no means of knowing whether they exist or not
or of what sort they may be, because of the obscurity of the subject, and
the brevity of human life -- Protagoras


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH] avcodec/error_resilience: Use atomic set on writing error_occurred per slice

2017-11-26 Thread Michael Niedermayer
This is more correct if multiple slices are handled in parallel

Signed-off-by: Michael Niedermayer 
---
 libavcodec/error_resilience.c | 6 +++---
 libavcodec/error_resilience.h | 2 +-
 libavcodec/h263dec.c  | 2 +-
 libavcodec/h264_slice.c   | 4 ++--
 libavcodec/mpegvideo.c| 3 ++-
 libavcodec/vc1dec.c   | 2 +-
 6 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/libavcodec/error_resilience.c b/libavcodec/error_resilience.c
index 8f172beca6..c9da30b84b 100644
--- a/libavcodec/error_resilience.c
+++ b/libavcodec/error_resilience.c
@@ -808,7 +808,7 @@ void ff_er_frame_start(ERContext *s)
 memset(s->error_status_table, ER_MB_ERROR | VP_START | ER_MB_END,
s->mb_stride * s->mb_height * sizeof(uint8_t));
 atomic_init(&s->error_count, 3 * s->mb_num);
-s->error_occurred = 0;
+atomic_init(&s->error_occurred, 0);
 }
 
 static int er_supported(ERContext *s)
@@ -864,7 +864,7 @@ void ff_er_add_slice(ERContext *s, int startx, int starty,
 }
 
 if (status & ER_MB_ERROR) {
-s->error_occurred = 1;
+atomic_store_explicit(&s->error_occurred, 1, memory_order_relaxed);
 atomic_store(&s->error_count, INT_MAX);
 }
 
@@ -892,7 +892,7 @@ void ff_er_add_slice(ERContext *s, int startx, int starty,
 
 prev_status &= ~ VP_START;
 if (prev_status != (ER_MV_END | ER_DC_END | ER_AC_END)) {
-s->error_occurred = 1;
+atomic_store_explicit(&s->error_occurred, 1, memory_order_relaxed);
 atomic_store(&s->error_count, INT_MAX);
 }
 }
diff --git a/libavcodec/error_resilience.h b/libavcodec/error_resilience.h
index 664a765659..5c000e13d1 100644
--- a/libavcodec/error_resilience.h
+++ b/libavcodec/error_resilience.h
@@ -62,7 +62,7 @@ typedef struct ERContext {
 ptrdiff_t b8_stride;
 
 atomic_int error_count;
-int error_occurred;
+atomic_int error_occurred;
 uint8_t *error_status_table;
 uint8_t *er_temp_buffer;
 int16_t *dc_val[3];
diff --git a/libavcodec/h263dec.c b/libavcodec/h263dec.c
index b222de793b..6fa8a657a4 100644
--- a/libavcodec/h263dec.c
+++ b/libavcodec/h263dec.c
@@ -637,7 +637,7 @@ retry:
 if (ff_h263_resync(s) < 0)
 break;
 if (prev_y * s->mb_width + prev_x < s->mb_y * s->mb_width + 
s->mb_x)
-s->er.error_occurred = 1;
+atomic_store_explicit(&s->er.error_occurred, 1, 
memory_order_relaxed);
 }
 
 if (s->msmpeg4_version < 4 && s->h263_pred)
diff --git a/libavcodec/h264_slice.c b/libavcodec/h264_slice.c
index da76b9293f..5b37596d81 100644
--- a/libavcodec/h264_slice.c
+++ b/libavcodec/h264_slice.c
@@ -2480,7 +2480,7 @@ static void decode_finish_row(const H264Context *h, 
H264SliceContext *sl)
 
 ff_h264_draw_horiz_band(h, sl, top, height);
 
-if (h->droppable || sl->h264->slice_ctx[0].er.error_occurred)
+if (h->droppable || 
atomic_load_explicit(&sl->h264->slice_ctx[0].er.error_occurred, 
memory_order_relaxed))
 return;
 
 ff_thread_report_progress(&h->cur_pic_ptr->tf, top + height - 1,
@@ -2532,7 +2532,7 @@ static int decode_slice(struct AVCodecContext *avctx, 
void *arg)
 int prev_status = 
h->slice_ctx[0].er.error_status_table[h->slice_ctx[0].er.mb_index2xy[start_i - 
1]];
 prev_status &= ~ VP_START;
 if (prev_status != (ER_MV_END | ER_DC_END | ER_AC_END))
-h->slice_ctx[0].er.error_occurred = 1;
+atomic_store_explicit(&h->slice_ctx[0].er.error_occurred, 1, 
memory_order_relaxed);
 }
 }
 
diff --git a/libavcodec/mpegvideo.c b/libavcodec/mpegvideo.c
index 2eb19c21bb..2581589bb7 100644
--- a/libavcodec/mpegvideo.c
+++ b/libavcodec/mpegvideo.c
@@ -49,6 +49,7 @@
 #include "thread.h"
 #include "wmv2.h"
 #include 
+#include 
 
 static void dct_unquantize_mpeg1_intra_c(MpegEncContext *s,
int16_t *block, int n, int qscale)
@@ -2592,6 +2593,6 @@ void ff_set_qscale(MpegEncContext * s, int qscale)
 
 void ff_mpv_report_decode_progress(MpegEncContext *s)
 {
-if (s->pict_type != AV_PICTURE_TYPE_B && !s->partitioned_frame && 
!s->er.error_occurred)
+if (s->pict_type != AV_PICTURE_TYPE_B && !s->partitioned_frame && 
!atomic_load_explicit(&s->er.error_occurred, memory_order_relaxed))
 ff_thread_report_progress(&s->current_picture_ptr->tf, s->mb_y, 0);
 }
diff --git a/libavcodec/vc1dec.c b/libavcodec/vc1dec.c
index 96b8bb5364..648c7370fe 100644
--- a/libavcodec/vc1dec.c
+++ b/libavcodec/vc1dec.c
@@ -1058,7 +1058,7 @@ static int vc1_decode_frame(AVCodecContext *avctx, void 
*data,
 get_bits_count(&s->gb), s->gb.size_in_bits);
 //  if (get_bits_count(&s->gb) > buf_size * 8)
 //  return -1;
-if(s->er.error_occurred && s->pict_type == AV_PICTURE_TYPE_B) {
+if(atomic_load_explicit(&s->er.error_occurred, memory_order_relaxed) 
&& s->pict_type == AV_PICTURE_TYPE_B) {
 ret = AVER

Re: [FFmpeg-devel] [PATCH] avcodec: Implement vp8 nvdec hwaccel

2017-11-26 Thread Mark Thompson
On 26/11/17 22:04, Philip Langdale wrote:
> Signed-off-by: Philip Langdale 
> ---
>  Changelog  |  2 +-
>  configure  |  2 ++
>  libavcodec/Makefile|  1 +
>  libavcodec/hwaccels.h  |  1 +
>  libavcodec/nvdec.c |  1 +
>  libavcodec/nvdec_vp8.c | 97 
> ++
>  libavcodec/version.h   |  3 +-
>  libavcodec/vp8.c   |  6 
>  8 files changed, 111 insertions(+), 2 deletions(-)
>  create mode 100644 libavcodec/nvdec_vp8.c
> 
> diff --git a/Changelog b/Changelog
> index e3092e211f..4db1d57721 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -13,7 +13,7 @@ version :
>  - PCE support for extended channel layouts in the AAC encoder
>  - native aptX encoder and decoder
>  - Raw aptX muxer and demuxer
> -- NVIDIA NVDEC-accelerated H.264, HEVC, MPEG-1/2/4, VC1 and VP9 hwaccel 
> decoding
> +- NVIDIA NVDEC-accelerated H.264, HEVC, MPEG-1/2/4, VC1, VP8 and VP9 hwaccel 
> decoding
>  - Intel QSV-accelerated overlay filter
>  - mcompand audio filter
>  - acontrast audio filter
> diff --git a/configure b/configure
> index bc00b71489..e5fa61e83d 100755
> --- a/configure
> +++ b/configure
> @@ -2748,6 +2748,8 @@ vc1_vaapi_hwaccel_deps="vaapi"
>  vc1_vaapi_hwaccel_select="vc1_decoder"
>  vc1_vdpau_hwaccel_deps="vdpau"
>  vc1_vdpau_hwaccel_select="vc1_decoder"
> +vp8_nvdec_hwaccel_deps="nvdec"
> +vp8_nvdec_hwaccel_select="vp8_decoder"
>  vp8_vaapi_hwaccel_deps="vaapi VAPictureParameterBufferVP8"
>  vp8_vaapi_hwaccel_select="vp8_decoder"
>  vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
> diff --git a/libavcodec/Makefile b/libavcodec/Makefile
> index 640edfb590..ca7960cdf4 100644
> --- a/libavcodec/Makefile
> +++ b/libavcodec/Makefile
> @@ -872,6 +872,7 @@ OBJS-$(CONFIG_VC1_NVDEC_HWACCEL)  += nvdec_vc1.o
>  OBJS-$(CONFIG_VC1_QSV_HWACCEL)+= qsvdec_other.o
>  OBJS-$(CONFIG_VC1_VAAPI_HWACCEL)  += vaapi_vc1.o
>  OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)  += vdpau_vc1.o
> +OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)  += nvdec_vp8.o
>  OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)  += vaapi_vp8.o
>  OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)+= dxva2_vp9.o
>  OBJS-$(CONFIG_VP9_DXVA2_HWACCEL)  += dxva2_vp9.o
> diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
> index cefd2b15be..420e2feeea 100644
> --- a/libavcodec/hwaccels.h
> +++ b/libavcodec/hwaccels.h
> @@ -60,6 +60,7 @@ extern const AVHWAccel ff_vc1_dxva2_hwaccel;
>  extern const AVHWAccel ff_vc1_nvdec_hwaccel;
>  extern const AVHWAccel ff_vc1_vaapi_hwaccel;
>  extern const AVHWAccel ff_vc1_vdpau_hwaccel;
> +extern const AVHWAccel ff_vp8_nvdec_hwaccel;
>  extern const AVHWAccel ff_vp8_vaapi_hwaccel;
>  extern const AVHWAccel ff_vp9_d3d11va_hwaccel;
>  extern const AVHWAccel ff_vp9_d3d11va2_hwaccel;
> diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c
> index da4451a739..c7a02ff40f 100644
> --- a/libavcodec/nvdec.c
> +++ b/libavcodec/nvdec.c
> @@ -58,6 +58,7 @@ static int map_avcodec_id(enum AVCodecID id)
>  case AV_CODEC_ID_MPEG2VIDEO: return cudaVideoCodec_MPEG2;
>  case AV_CODEC_ID_MPEG4:  return cudaVideoCodec_MPEG4;
>  case AV_CODEC_ID_VC1:return cudaVideoCodec_VC1;
> +case AV_CODEC_ID_VP8:return cudaVideoCodec_VP8;
>  case AV_CODEC_ID_VP9:return cudaVideoCodec_VP9;
>  case AV_CODEC_ID_WMV3:   return cudaVideoCodec_VC1;
>  }
> diff --git a/libavcodec/nvdec_vp8.c b/libavcodec/nvdec_vp8.c
> new file mode 100644
> index 00..6fc0ac7ded
> --- /dev/null
> +++ b/libavcodec/nvdec_vp8.c
> @@ -0,0 +1,97 @@
> +/*
> + * VP8 HW decode acceleration through NVDEC
> + *
> + * Copyright (c) 2017 Philip Langdale
> + *
> + * This file is part of FFmpeg.
> + *
> + * FFmpeg is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * FFmpeg is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with FFmpeg; if not, write to the Free Software
> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 
> USA
> + */
> +
> +#include "avcodec.h"
> +#include "nvdec.h"
> +#include "decode.h"
> +#include "internal.h"
> +#include "vp8.h"
> +
> +static unsigned char safe_get_ref_idx(VP8Frame *frame)
> +{
> +return frame ? ff_nvdec_get_ref_idx(frame->tf.f) : 255;
> +}
> +
> +static int nvdec_vp8_start_frame(AVCodecContext *avctx, const uint8_t 
> *buffer, uint32_t size)
> +{
> +VP8Context *h = avctx->priv_data;
> +
> +NVDECContext  *ctx = avctx->internal->hwaccel_priv_data;
> +CUVIDPICPARAMS

[FFmpeg-devel] [PATCH 5/8] lavc/x86/flac_dsp_gpl: cosmetic whitespace alignment

2017-11-26 Thread James Darnley
---
 libavcodec/x86/flac_dsp_gpl.asm | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index 4d212ed212..952fc8b86b 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libavcodec/x86/flac_dsp_gpl.asm
@@ -75,42 +75,42 @@ neg  orderq
 %if cpuflag(avx)
 vbroadcastss m2, [coefsq+posj*4]
 %else
-movd   m2, [coefsq+posj*4] ; c = coefs[j]
-SPLATD m2
+movd m2, [coefsq+posj*4] ; c = coefs[j]
+SPLATD   m2
 %endif
 %if cpuflag(avx)
-vpmulld m1, m2, [smpq+negj*4-4]
-vpmulld m5, m2, [smpq+negj*4-4+mmsize]
-vpmulld m7, m2, [smpq+negj*4-4+mmsize*2]
-vpaddd  m0, m1
-vpaddd  m4, m5
-vpaddd  m6, m7
+vpmulld  m1,  m2, [smpq+negj*4-4]
+vpmulld  m5,  m2, [smpq+negj*4-4+mmsize]
+vpmulld  m7,  m2, [smpq+negj*4-4+mmsize*2]
+vpaddd   m0,  m1
+vpaddd   m4,  m5
+vpaddd   m6,  m7
 %else
-movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]
-movu   m5, [smpq+negj*4-4+mmsize]
-movu   m7, [smpq+negj*4-4+mmsize*2]
-pmulld m1,  m2
-pmulld m5,  m2
-pmulld m7,  m2
-paddd  m0,  m1 ; p += c * s
-paddd  m4,  m5
-paddd  m6,  m7
+movu m1, [smpq+negj*4-4] ; s = smp[i-j-1]
+movu m5, [smpq+negj*4-4+mmsize]
+movu m7, [smpq+negj*4-4+mmsize*2]
+pmulld   m1,  m2
+pmulld   m5,  m2
+pmulld   m7,  m2
+padddm0,  m1 ; p += c * s
+padddm4,  m5
+padddm6,  m7
 %endif
 
 decnegj
 incposj
 jnz .looporder
 
-psrad  m0, xm3  ; p >>= shift
+psrad  m0, xm3   ; p >>= shift
 psrad  m4, xm3
 psrad  m6, xm3
 movu   m1,[smpq]
 movu   m5,[smpq+mmsize]
 movu   m7,[smpq+mmsize*2]
-psubd  m1, m0  ; smp[i] - p
+psubd  m1, m0; smp[i] - p
 psubd  m5, m4
 psubd  m7, m6
-movu  [resq],  m1  ; res[i] = smp[i] - (p >> shift)
+movu  [resq],  m1; res[i] = smp[i] - (p >> shift)
 movu  [resq+mmsize], m5
 movu  [resq+mmsize*2], m7
 
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 0/8] left-overs of an ancient patch set for the flac encoder

2017-11-26 Thread James Darnley
Three years ago I was writing some assembly to speed-up the flac encoder.  I got
part of the set committed at that time.  Since then the encoder had a small
overhaul and a major bugfix.  That all meant this set needed a little work to
bring it back on top of master.  I did most of that work in August and finished
it today.

Some of you have been bugging me off and on about finishing it so here it is.
Enjoy, review, critique, whatever.  When people have signed off on it I will
push the set, after addressing issues people have with it.

That bugfix I mentioned was e609cfd697.  It made the benchmarking I originally
did a little less useful because both types of the lpc coder are used for both
sample depths (16 and 24).  That does make the 32-bit version more useful though
because it gets used with 16-bit samples when the intermediates overflow 32
bits.

James Darnley (8):
  avcodec/flac: document limitations of the LPC encoder
  avcodec/flac: add AVX2 version of the 16-bit LPC encoder
  avcodec/flac: add SSE4.2 version of the 32-bit lpc encoder
  avcodec/flac: partially unroll loop in flac_enc_lpc_32
  lavc/x86/flac_dsp_gpl: cosmetic whitespace alignment
  lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder
  lavc/flacenc: add AVX2 version of the 32-bit LPC encoder
  checkasm: add tests for flacenc lpc coder

 libavcodec/flacdsp.h|   8 ++
 libavcodec/flacenc.c|   2 +-
 libavcodec/x86/flac_dsp_gpl.asm | 267 +---
 libavcodec/x86/flacdsp_init.c   |  13 ++
 tests/checkasm/flacdsp.c|  72 +++
 5 files changed, 343 insertions(+), 19 deletions(-)

-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 4/8] avcodec/flac: partially unroll loop in flac_enc_lpc_32

2017-11-26 Thread James Darnley
Now does 6 samples per iteration, up from 2.

From 1.6 to 2.1 times faster again.  2.5 to 3.9 times faster overall.
Runtime is reduced by a further 4 to 17%.  Reduced by 9 to 65% overall.

Same conditions as previously.
---
 libavcodec/x86/flac_dsp_gpl.asm | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index 618306eb5f..4d212ed212 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libavcodec/x86/flac_dsp_gpl.asm
@@ -152,13 +152,13 @@ RET
 %macro FUNCTION_BODY_32 0
 
 %if ARCH_X86_64
-cglobal flac_enc_lpc_32, 5, 7, 4, mmsize, res, smp, len, order, coefs
+cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
 DECLARE_REG_TMP 5, 6
 %define length r2d
 
 movsxd orderq, orderd
 %else
-cglobal flac_enc_lpc_32, 5, 6, 4, mmsize, res, smp, len, order, coefs
+cglobal flac_enc_lpc_32, 5, 6, 8, mmsize, res, smp, len, order, coefs
 DECLARE_REG_TMP 2, 5
 %define length r2mp
 %endif
@@ -190,6 +190,8 @@ mova  [rsp],m4; save sign extend mask
 
 .looplen:
 pxor m0,   m0
+pxor m4,   m4
+pxor m6,   m6
 mov  posj, orderq
 xor  negj, negj
 
@@ -197,23 +199,41 @@ mova  [rsp],m4; save sign extend mask
 movd   m2,  [coefsq+posj*4] ; c = coefs[j]
 SPLATD m2
 pmovzxdq m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
+pmovzxdq m5,  [smpq+negj*4-4+mmsize/2]
+pmovzxdq m7,  [smpq+negj*4-4+mmsize]
 pmuldq m1,   m2
+pmuldq m5,   m2
+pmuldq m7,   m2
 paddq  m0,   m1 ; p += c * s
+paddq  m4,   m5
+paddq  m6,   m7
 
 decnegj
 incposj
 jnz .looporder
 
 HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
+HACK_PSRAQ m4, m3, [rsp], m2
+HACK_PSRAQ m6, m3, [rsp], m2
 CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
+CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
+CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
 pshufd  m0,m0, q0020 ; pack into first 2 dwords
+pshufd  m4,m4, q0020
+pshufd  m6,m6, q0020
 movhm1,   [smpq]
+movhm5,   [smpq+mmsize/2]
+movhm7,   [smpq+mmsize]
 psubd   m1,m0   ; smp[i] - p
+psubd   m5,m4
+psubd   m7,m6
 movh   [resq], m1   ; res[i] = smp[i] - (p >> shift)
+movh   [resq+mmsize/2], m5
+movh   [resq+mmsize], m7
 
-add resq,   mmsize/2
-add smpq,   mmsize/2
-sub length, mmsize/8
+add resq,   (3*mmsize)/2
+add smpq,   (3*mmsize)/2
+sub length, (3*mmsize)/8
 jg .looplen
 RET
 
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 2/8] avcodec/flac: add AVX2 version of the 16-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4 version, runtime is reduced by 0.5 to 20%.
After a bug fix log, long ago in e609cfd697 the 16-bit lpc encoder is
used so little that the runtime reduction is no longer correct.  The
function itself is around 2 times faster.  (As one might expect for
doing twice as many samples every iteration.)
---
 libavcodec/flacenc.c|  2 +-
 libavcodec/x86/flac_dsp_gpl.asm | 32 +++-
 libavcodec/x86/flacdsp_init.c   |  5 +
 3 files changed, 33 insertions(+), 6 deletions(-)

diff --git a/libavcodec/flacenc.c b/libavcodec/flacenc.c
index 170c3caf48..cf25982c91 100644
--- a/libavcodec/flacenc.c
+++ b/libavcodec/flacenc.c
@@ -88,7 +88,7 @@ typedef struct FlacSubframe {
 uint64_t rc_sums[32][MAX_PARTITIONS];
 
 int32_t samples[FLAC_MAX_BLOCKSIZE];
-int32_t residual[FLAC_MAX_BLOCKSIZE+11];
+int32_t residual[FLAC_MAX_BLOCKSIZE+23];
 } FlacSubframe;
 
 typedef struct FlacFrame {
diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index e285158185..c461c666be 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libavcodec/x86/flac_dsp_gpl.asm
@@ -24,7 +24,8 @@
 
 SECTION .text
 
-INIT_XMM sse4
+%macro FUNCTION_BODY_16 0
+
 %if ARCH_X86_64
 cglobal flac_enc_lpc_16, 5, 7, 8, 0, res, smp, len, order, coefs
 DECLARE_REG_TMP 5, 6
@@ -51,7 +52,7 @@ lea  resq,   [resq+orderq*4]
 lea  smpq,   [smpq+orderq*4]
 lea  coefsq, [coefsq+orderq*4]
 sub  length,  orderd
-movd m3,  r5m
+movd xm3, r5m
 neg  orderq
 
 %define posj t0q
@@ -65,8 +66,20 @@ neg  orderq
 xor  negj, negj
 
 .looporder:
+%if cpuflag(avx)
+vbroadcastss m2, [coefsq+posj*4]
+%else
 movd   m2, [coefsq+posj*4] ; c = coefs[j]
 SPLATD m2
+%endif
+%if cpuflag(avx)
+vpmulld m1, m2, [smpq+negj*4-4]
+vpmulld m5, m2, [smpq+negj*4-4+mmsize]
+vpmulld m7, m2, [smpq+negj*4-4+mmsize*2]
+vpaddd  m0, m1
+vpaddd  m4, m5
+vpaddd  m6, m7
+%else
 movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]
 movu   m5, [smpq+negj*4-4+mmsize]
 movu   m7, [smpq+negj*4-4+mmsize*2]
@@ -76,14 +89,15 @@ neg  orderq
 paddd  m0,  m1 ; p += c * s
 paddd  m4,  m5
 paddd  m6,  m7
+%endif
 
 decnegj
 incposj
 jnz .looporder
 
-psrad  m0, m3  ; p >>= shift
-psrad  m4, m3
-psrad  m6, m3
+psrad  m0, xm3  ; p >>= shift
+psrad  m4, xm3
+psrad  m6, xm3
 movu   m1,[smpq]
 movu   m5,[smpq+mmsize]
 movu   m7,[smpq+mmsize*2]
@@ -99,3 +113,11 @@ neg  orderq
 sub length, (3*mmsize)/4
 jg .looplen
 RET
+
+%endmacro
+
+INIT_XMM sse4
+FUNCTION_BODY_16
+
+INIT_YMM avx2
+FUNCTION_BODY_16
diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
index 1971f81b8d..0a5c01859f 100644
--- a/libavcodec/x86/flacdsp_init.c
+++ b/libavcodec/x86/flacdsp_init.c
@@ -28,6 +28,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int 
coeffs[32], int order,
 int qlevel, int len);
 
 void ff_flac_enc_lpc_16_sse4(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
+void ff_flac_enc_lpc_16_avx2(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
 
 #define DECORRELATE_FUNCS(fmt, opt)
  \
 void ff_flac_decorrelate_ls_##fmt##_##opt(uint8_t **out, int32_t **in, int 
channels, \
@@ -110,6 +111,10 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum 
AVSampleFormat fmt, int
 if (CONFIG_GPL)
 c->lpc16_encode = ff_flac_enc_lpc_16_sse4;
 }
+if (EXTERNAL_AVX2(cpu_flags)) {
+if (CONFIG_GPL)
+c->lpc16_encode = ff_flac_enc_lpc_16_avx2;
+}
 #endif
 #endif /* HAVE_X86ASM */
 }
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 8/8] checkasm: add tests for flacenc lpc coder

2017-11-26 Thread James Darnley
---
 tests/checkasm/flacdsp.c | 72 
 1 file changed, 72 insertions(+)

diff --git a/tests/checkasm/flacdsp.c b/tests/checkasm/flacdsp.c
index dccb54d672..08e5e264ea 100644
--- a/tests/checkasm/flacdsp.c
+++ b/tests/checkasm/flacdsp.c
@@ -20,13 +20,16 @@
 
 #include 
 #include "checkasm.h"
+#include "libavcodec/flac.h"
 #include "libavcodec/flacdsp.h"
 #include "libavutil/common.h"
 #include "libavutil/internal.h"
 #include "libavutil/intreadwrite.h"
+#include "libavcodec/mathops.h"
 
 #define BUF_SIZE 256
 #define MAX_CHANNELS 8
+#define BLOCKSIZE 4608
 
 #define randomize_buffers() \
 do {\
@@ -53,6 +56,23 @@ static void check_decorrelate(uint8_t **ref_dst, uint8_t 
**ref_src, uint8_t **ne
 bench_new(new_dst, (int32_t **)new_src, channels, BUF_SIZE / 
sizeof(int32_t), 8);
 }
 
+static void randomize_coefs(int32_t coef[32], int bits)
+{
+int i;
+for (i = 0; i < 32; i++)
+coef[i] = sign_extend(rnd(), bits);
+}
+
+static void randomize_audio(int32_t *a, int32_t *b, int bits)
+{
+int i;
+for (i = 0; i < BLOCKSIZE; i++) {
+int32_t value = sign_extend(rnd(), bits);
+a[i] = value;
+b[i] = value;
+}
+}
+
 void checkasm_check_flacdsp(void)
 {
 LOCAL_ALIGNED_16(uint8_t, ref_dst, [BUF_SIZE*MAX_CHANNELS]);
@@ -87,4 +107,56 @@ void checkasm_check_flacdsp(void)
 }
 
 report("decorrelate");
+
+if (check_func(h.lpc16_encode, "flacdsp.lpc16_encode")) {
+int32_t samples_ref[BLOCKSIZE];
+int32_t samples_new[BLOCKSIZE];
+int32_t residual_ref[BLOCKSIZE+23];
+int32_t residual_new[BLOCKSIZE+23];
+int32_t coefs[32];
+declare_func(void, int32_t *res, const int32_t *smp, int len, int 
order,
+ const int32_t coefs[32], int shift);
+int order;
+
+randomize_audio(samples_ref, samples_new, 16);
+randomize_coefs(coefs, 16);
+for (order = 1; order < 32; order++) {
+int shift = rnd() & 15;
+call_ref(residual_ref, samples_ref, BLOCKSIZE, order, coefs, 
shift);
+call_new(residual_new, samples_new, BLOCKSIZE, order, coefs, 
shift);
+if (memcmp(samples_ref, samples_new, sizeof samples_ref)
+|| memcmp(residual_ref, residual_new, BLOCKSIZE * 
sizeof(int32_t))) {
+fprintf(stderr, "failed at order= %d\n", order);
+fail();
+}
+bench_new(residual_new, samples_new, BLOCKSIZE, order, coefs, 
shift);
+}
+}
+report("flacdsp.lpc16_encode");
+
+if (check_func(h.lpc32_encode, "flacdsp.lpc32_encode")) {
+int32_t samples_ref[BLOCKSIZE];
+int32_t samples_new[BLOCKSIZE];
+int32_t residual_ref[BLOCKSIZE+23];
+int32_t residual_new[BLOCKSIZE+23];
+int32_t coefs[32];
+declare_func(void, int32_t *res, const int32_t *smp, int len, int 
order,
+ const int32_t coefs[32], int shift);
+int order;
+
+randomize_audio(samples_ref, samples_new, 24);
+randomize_coefs(coefs, 24);
+for (order = 1; order < 32; order++) {
+int shift = rnd() & 15;
+call_ref(residual_ref, samples_ref, BLOCKSIZE, order, coefs, 
shift);
+call_new(residual_new, samples_new, BLOCKSIZE, order, coefs, 
shift);
+if (memcmp(samples_ref, samples_new, sizeof samples_ref)
+|| memcmp(residual_ref, residual_new, BLOCKSIZE * 
sizeof(int32_t))) {
+fprintf(stderr, "failed at order= %d\n", order);
+fail();
+}
+bench_new(residual_new, samples_new, BLOCKSIZE, order, coefs, 
shift);
+}
+}
+report("flacdsp.lpc32_encode");
 }
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 3/8] avcodec/flac: add SSE4.2 version of the 32-bit lpc encoder

2017-11-26 Thread James Darnley
From 1.3 to 2.5 times faster.  Runtime reduced by 4 to 58%.  As with the
16-bit version the speed-up generally increases with compression_level.

Also like the 16-bit version, it is not used with levels less than 3.

After this bug fix in long, long ago in e609cfd697 this 32-bit lpc
encoder is heavily used with 16-bit samples.
---
 libavcodec/x86/flac_dsp_gpl.asm | 106 
 libavcodec/x86/flacdsp_init.c   |   5 ++
 2 files changed, 111 insertions(+)

diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index c461c666be..618306eb5f 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libavcodec/x86/flac_dsp_gpl.asm
@@ -22,6 +22,12 @@
 
 %include "libavutil/x86/x86util.asm"
 
+SECTION_RODATA
+
+pd_0_int_min: times  2 dd 0, -2147483648
+pq_int_min:   times  2 dq -2147483648
+pq_int_max:   times  2 dq  2147483647
+
 SECTION .text
 
 %macro FUNCTION_BODY_16 0
@@ -116,8 +122,108 @@ RET
 
 %endmacro
 
+%macro PMINSQ 3
+pcmpgtq %3, %2, %1
+pand%1, %3
+pandn   %3, %2
+por %1, %3
+%endmacro
+
+%macro PMAXSQ 3
+pcmpgtq %3, %1, %2
+pand%1, %3
+pandn   %3, %2
+por %1, %3
+%endmacro
+
+%macro CLIPQ 4 ; reg, min, max, tmp
+PMAXSQ %1, %2, %4
+PMINSQ %1, %3, %4
+%endmacro
+
+%macro HACK_PSRAQ 4 ; dst, src (shift), sign extend mask, tmp
+pxor%4, %4 ; zero
+pcmpgtq %4, %1 ; mask where 0 > dst
+pand%4, %3 ; mask & sign extend mask
+psrlq   %1, %2 ; dst >>= shift
+por %1, %4 ; dst | mask
+%endmacro
+
+%macro FUNCTION_BODY_32 0
+
+%if ARCH_X86_64
+cglobal flac_enc_lpc_32, 5, 7, 4, mmsize, res, smp, len, order, coefs
+DECLARE_REG_TMP 5, 6
+%define length r2d
+
+movsxd orderq, orderd
+%else
+cglobal flac_enc_lpc_32, 5, 6, 4, mmsize, res, smp, len, order, coefs
+DECLARE_REG_TMP 2, 5
+%define length r2mp
+%endif
+
+; Here we assume that the maximum order value is 32.  This means that we only
+; need to copy a maximum of 32 samples.  Therefore we let the preprocessor
+; unroll this loop and copy all 32.
+%assign iter 0
+%rep 32/(mmsize/4)
+movu  m0, [smpq+iter]
+movu [resq+iter],  m0
+%assign iter iter+mmsize
+%endrep
+
+learesq,   [resq+orderq*4]
+leasmpq,   [smpq+orderq*4]
+leacoefsq, [coefsq+orderq*4]
+sublength,  orderd
+movd   m3,  r5m
+negorderq
+
+movu   m4, [pd_0_int_min] ; load 1 bit
+psrad  m4,  m3; turn that into shift+1 bits
+pslld  m4,  1 ; reduce that
+mova  [rsp],m4; save sign extend mask
+
+%define posj t0q
+%define negj t1q
+
+.looplen:
+pxor m0,   m0
+mov  posj, orderq
+xor  negj, negj
+
+.looporder:
+movd   m2,  [coefsq+posj*4] ; c = coefs[j]
+SPLATD m2
+pmovzxdq m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
+pmuldq m1,   m2
+paddq  m0,   m1 ; p += c * s
+
+decnegj
+incposj
+jnz .looporder
+
+HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
+CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
+pshufd  m0,m0, q0020 ; pack into first 2 dwords
+movhm1,   [smpq]
+psubd   m1,m0   ; smp[i] - p
+movh   [resq], m1   ; res[i] = smp[i] - (p >> shift)
+
+add resq,   mmsize/2
+add smpq,   mmsize/2
+sub length, mmsize/8
+jg .looplen
+RET
+
+%endmacro ; FUNCTION_BODY_32
+
 INIT_XMM sse4
 FUNCTION_BODY_16
 
+INIT_XMM sse42
+FUNCTION_BODY_32
+
 INIT_YMM avx2
 FUNCTION_BODY_16
diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
index 0a5c01859f..f827186c26 100644
--- a/libavcodec/x86/flacdsp_init.c
+++ b/libavcodec/x86/flacdsp_init.c
@@ -29,6 +29,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int 
coeffs[32], int order,
 
 void ff_flac_enc_lpc_16_sse4(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
 void ff_flac_enc_lpc_16_avx2(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
+void ff_flac_enc_lpc_32_sse42(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
 
 #define DECORRELATE_FUNCS(fmt, opt)
  \
 void ff_flac_decorrelate_ls_##fmt##_##opt(uint8_t **out, int32_t **in, int 
channels, \
@@ -111,6 +112,10 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c, enum 
AVSampleFormat fmt, int
 if (CONFIG_GPL)
 c->lpc16_encode = ff_flac_enc_lpc_16_sse4;
 }
+if (EXTERNAL_SSE42(cpu_flags)) {
+if (CONFIG_GPL)
+c->lpc32_encode = ff_flac_enc_lpc_32_sse42;
+}
 if (EXTERNAL_AVX2(cpu_flags)) {
 if (CONFIG_GPL)
 c->lpc16_encode = ff_flac_enc_lpc_16_avx2;
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 1/8] avcodec/flac: document limitations of the LPC encoder

2017-11-26 Thread James Darnley
State that the maximum value of order is 32.  This limit is used in both
C and x86 assebly code.
---
 libavcodec/flacdsp.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/libavcodec/flacdsp.h b/libavcodec/flacdsp.h
index 7bb0dd0e9a..90fd3f04b5 100644
--- a/libavcodec/flacdsp.h
+++ b/libavcodec/flacdsp.h
@@ -30,6 +30,14 @@ typedef struct FLACDSPContext {
   int qlevel, int len);
 void (*lpc32)(int32_t *samples, const int coeffs[32], int order,
   int qlevel, int len);
+
+/**
+ * These encoder functions support a maximum order of 32.
+ *
+ * This limit is used:
+ * - when CONFIG_SMALL is 0 to unroll a loop in the C template.
+ * - when SSE4 (or newer) is available on x86 to unroll a copy loop.
+ */
 void (*lpc16_encode)(int32_t *res, const int32_t *smp, int len, int order,
  const int32_t coefs[32], int shift);
 void (*lpc32_encode)(int32_t *res, const int32_t *smp, int len, int order,
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread James Darnley
Around 1.1 times faster and reduces runtime by up to 6%.
---
 libavcodec/x86/flac_dsp_gpl.asm | 91 -
 1 file changed, 72 insertions(+), 19 deletions(-)

diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index 952fc8b86b..91989ce560 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libavcodec/x86/flac_dsp_gpl.asm
@@ -152,13 +152,13 @@ RET
 %macro FUNCTION_BODY_32 0
 
 %if ARCH_X86_64
-cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
+cglobal flac_enc_lpc_32, 5, 7, 8, mmsize*4, res, smp, len, order, coefs
 DECLARE_REG_TMP 5, 6
 %define length r2d
 
 movsxd orderq, orderd
 %else
-cglobal flac_enc_lpc_32, 5, 6, 8, mmsize, res, smp, len, order, coefs
+cglobal flac_enc_lpc_32, 5, 6, 8, mmsize*4, res, smp, len, order, coefs
 DECLARE_REG_TMP 2, 5
 %define length r2mp
 %endif
@@ -189,18 +189,23 @@ mova  [rsp],m4; save sign extend mask
 %define negj t1q
 
 .looplen:
+; process "odd" samples
 pxor m0,   m0
 pxor m4,   m4
 pxor m6,   m6
 mov  posj, orderq
 xor  negj, negj
 
-.looporder:
+.looporder1:
 movd   m2,  [coefsq+posj*4] ; c = coefs[j]
 SPLATD m2
-pmovzxdq m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
-pmovzxdq m5,  [smpq+negj*4-4+mmsize/2]
-pmovzxdq m7,  [smpq+negj*4-4+mmsize]
+movu   m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
+movu   m5,  [smpq+negj*4-4+mmsize]
+movu   m7,  [smpq+negj*4-4+mmsize*2]
+; Rather than explicitly unpack adjacent samples into qwords we can let
+; the pmuldq instruction unpack the 0th and 2nd samples for us when it
+; does its multiply.  This saves an unpack for every sample in the 
inner
+; loop meaning it should be (much) quicker.
 pmuldq m1,   m2
 pmuldq m5,   m2
 pmuldq m7,   m2
@@ -210,7 +215,7 @@ mova  [rsp],m4; save sign extend mask
 
 decnegj
 incposj
-jnz .looporder
+jnz .looporder1
 
 HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
 HACK_PSRAQ m4, m3, [rsp], m2
@@ -218,22 +223,70 @@ mova  [rsp],m4; save sign extend mask
 CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
 CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
 CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
-pshufd  m0,m0, q0020 ; pack into first 2 dwords
-pshufd  m4,m4, q0020
-pshufd  m6,m6, q0020
-movhm1,   [smpq]
-movhm5,   [smpq+mmsize/2]
-movhm7,   [smpq+mmsize]
+movum1,   [smpq]
+movum5,   [smpq+mmsize]
+movum7,   [smpq+mmsize*2]
 psubd   m1,m0   ; smp[i] - p
 psubd   m5,m4
 psubd   m7,m6
-movh   [resq], m1   ; res[i] = smp[i] - (p >> shift)
-movh   [resq+mmsize/2], m5
-movh   [resq+mmsize], m7
+mova   [rsp+mmsize], m1   ; res[i] = smp[i] - (p >> shift)
+mova   [rsp+mmsize*2], m5
+mova   [rsp+mmsize*3], m7
+
+; process "even" samples
+pxor m0,   m0
+pxor m4,   m4
+pxor m6,   m6
+mov  posj, orderq
+xor  negj, negj
+
+.looporder2:
+movd   m2,  [coefsq+posj*4] ; c = coefs[j]
+SPLATD m2
+movu   m1,  [smpq+negj*4] ; s = smp[i-j-1]
+movu   m5,  [smpq+negj*4+mmsize]
+movu   m7,  [smpq+negj*4+mmsize*2]
+pmuldq m1,   m2
+pmuldq m5,   m2
+pmuldq m7,   m2
+paddq  m0,   m1 ; p += c * s
+paddq  m4,   m5
+paddq  m6,   m7
+
+decnegj
+incposj
+jnz .looporder2
+
+HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
+HACK_PSRAQ m4, m3, [rsp], m2
+HACK_PSRAQ m6, m3, [rsp], m2
+CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
+CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
+CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
+movum1,   [smpq+4]
+movum5,   [smpq+4+mmsize]
+movum7,   [smpq+4+mmsize*2]
+psubd   m1,m0   ; smp[i] - p
+psubd   m5,m4
+psubd   m7,m6
+
+; interleave odd and even samples
+pslldq  m1, 4
+pslldq  m5, 4
+pslldq  m7, 4
+
+pblendw m1, [rsp+mmsize], q0303
+pblendw m5, [rsp+mmsize*2], q0303
+pblendw m7, [rsp+mmsize*3], q0303
+
+movu [resq], m1
+movu [resq+mmsize], m5
+movu [resq+mmsize*2], m7
+
+add resq,3*mmsize
+add smpq,3*mmsize
+sub length, (3*mmsize)/4
 
-add resq,   (3*mmsize)/2
-add smpq,   (3*mmsize)/2
-sub length, (3*mmsize)/8
 jg .looplen
 RET
 
-- 
2.15.0

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


[FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
When compared to the SSE4.2 version runtime, is reduced by 1 to 26%.  The
function itself is around 2 times faster.
---
 libavcodec/x86/flac_dsp_gpl.asm | 56 +++--
 libavcodec/x86/flacdsp_init.c   |  5 +++-
 2 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
index 91989ce560..749e66dec8 100644
--- a/libavcodec/x86/flac_dsp_gpl.asm
+++ b/libavcodec/x86/flac_dsp_gpl.asm
@@ -22,11 +22,11 @@
 
 %include "libavutil/x86/x86util.asm"
 
-SECTION_RODATA
+SECTION_RODATA 32
 
-pd_0_int_min: times  2 dd 0, -2147483648
-pq_int_min:   times  2 dq -2147483648
-pq_int_max:   times  2 dq  2147483647
+pd_0_int_min: times  4 dd 0, -2147483648
+pq_int_min:   times  4 dq -2147483648
+pq_int_max:   times  4 dq  2147483647
 
 SECTION .text
 
@@ -123,7 +123,10 @@ RET
 %endmacro
 
 %macro PMINSQ 3
-pcmpgtq %3, %2, %1
+mova%3, %2
+; We cannot use the 3-operand format because the memory location cannot be
+; the second operand, only the third.
+pcmpgtq %3, %1
 pand%1, %3
 pandn   %3, %2
 por %1, %3
@@ -177,11 +180,11 @@ learesq,   [resq+orderq*4]
 leasmpq,   [smpq+orderq*4]
 leacoefsq, [coefsq+orderq*4]
 sublength,  orderd
-movd   m3,  r5m
+movd   xm3, r5m
 negorderq
 
 movu   m4, [pd_0_int_min] ; load 1 bit
-psrad  m4,  m3; turn that into shift+1 bits
+psrad  m4,  xm3   ; turn that into shift+1 bits
 pslld  m4,  1 ; reduce that
 mova  [rsp],m4; save sign extend mask
 
@@ -197,8 +200,20 @@ mova  [rsp],m4; save sign extend mask
 xor  negj, negj
 
 .looporder1:
+%if cpuflag(avx)
+vbroadcastss m2, [coefsq+posj*4]
+%else
 movd   m2,  [coefsq+posj*4] ; c = coefs[j]
 SPLATD m2
+%endif
+%if cpuflag(avx)
+vpmuldq  m1, m2, [smpq+negj*4-4]
+vpmuldq  m5, m2, [smpq+negj*4-4+mmsize]
+vpmuldq  m7, m2, [smpq+negj*4-4+mmsize*2]
+vpaddq   m0, m1
+vpaddq   m4, m5
+vpaddq   m6, m7
+%else
 movu   m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
 movu   m5,  [smpq+negj*4-4+mmsize]
 movu   m7,  [smpq+negj*4-4+mmsize*2]
@@ -212,14 +227,15 @@ mova  [rsp],m4; save sign extend mask
 paddq  m0,   m1 ; p += c * s
 paddq  m4,   m5
 paddq  m6,   m7
+%endif
 
 decnegj
 incposj
 jnz .looporder1
 
-HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
-HACK_PSRAQ m4, m3, [rsp], m2
-HACK_PSRAQ m6, m3, [rsp], m2
+HACK_PSRAQ m0, xm3, [rsp], m2; p >>= shift
+HACK_PSRAQ m4, xm3, [rsp], m2
+HACK_PSRAQ m6, xm3, [rsp], m2
 CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
 CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
 CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
@@ -241,8 +257,20 @@ mova  [rsp],m4; save sign extend mask
 xor  negj, negj
 
 .looporder2:
+%if cpuflag(avx)
+vbroadcastss m2, [coefsq+posj*4]
+%else
 movd   m2,  [coefsq+posj*4] ; c = coefs[j]
 SPLATD m2
+%endif
+%if cpuflag(avx)
+vpmuldq  m1, m2, [smpq+negj*4]
+vpmuldq  m5, m2, [smpq+negj*4+mmsize]
+vpmuldq  m7, m2, [smpq+negj*4+mmsize*2]
+vpaddq   m0, m1
+vpaddq   m4, m5
+vpaddq   m6, m7
+%else
 movu   m1,  [smpq+negj*4] ; s = smp[i-j-1]
 movu   m5,  [smpq+negj*4+mmsize]
 movu   m7,  [smpq+negj*4+mmsize*2]
@@ -252,14 +280,15 @@ mova  [rsp],m4; save sign extend mask
 paddq  m0,   m1 ; p += c * s
 paddq  m4,   m5
 paddq  m6,   m7
+%endif
 
 decnegj
 incposj
 jnz .looporder2
 
-HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
-HACK_PSRAQ m4, m3, [rsp], m2
-HACK_PSRAQ m6, m3, [rsp], m2
+HACK_PSRAQ m0, xm3, [rsp], m2; p >>= shift
+HACK_PSRAQ m4, xm3, [rsp], m2
+HACK_PSRAQ m6, xm3, [rsp], m2
 CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
 CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
 CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
@@ -300,3 +329,4 @@ FUNCTION_BODY_32
 
 INIT_YMM avx2
 FUNCTION_BODY_16
+FUNCTION_BODY_32
diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
index f827186c26..fbe70894a0 100644
--- a/libavcodec/x86/flacdsp_init.c
+++ b/libavcodec/x86/flacdsp_init.c
@@ -30,6 +30,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int 
coeffs[32], int order,
 void ff_flac_enc_lpc_16_sse4(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
 void ff_flac_enc_lpc_16_avx2(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
 void ff_flac_enc_lpc_32_sse42(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
+void ff_flac_enc_lpc_32_avx2(int32_t *, const int32_t *, int, int, const 
int32_t *,int);
 
 #define DECORRELATE_FUNCS(fmt, opt)  

Re: [FFmpeg-devel] [PATCH] avformat/matroskaenc: actually enforce the stream limit

2017-11-26 Thread Michael Niedermayer
On Sun, Nov 26, 2017 at 02:03:09PM -0300, James Almer wrote:
> Prevents out of array accesses. Adressess ticket #6873
> 
> Signed-off-by: James Almer 
> ---
>  libavformat/matroskaenc.c | 7 +++
>  1 file changed, 7 insertions(+)

LGTM

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

It is dangerous to be right in matters on which the established authorities
are wrong. -- Voltaire


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 5/8] lavc/x86/flac_dsp_gpl: cosmetic whitespace alignment

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley  wrote:

> ---
>  libavcodec/x86/flac_dsp_gpl.asm | 40 --
> --
>  1 file changed, 20 insertions(+), 20 deletions(-)
>
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm
> b/libavcodec/x86/flac_dsp_gpl.asm
> index 4d212ed212..952fc8b86b 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -75,42 +75,42 @@ neg  orderq
>  %if cpuflag(avx)
>  vbroadcastss m2, [coefsq+posj*4]
>  %else
> -movd   m2, [coefsq+posj*4] ; c = coefs[j]
> -SPLATD m2
> +movd m2, [coefsq+posj*4] ; c = coefs[j]
> +SPLATD   m2
>  %endif
>  %if cpuflag(avx)
> -vpmulld m1, m2, [smpq+negj*4-4]
> -vpmulld m5, m2, [smpq+negj*4-4+mmsize]
> -vpmulld m7, m2, [smpq+negj*4-4+mmsize*2]
> -vpaddd  m0, m1
> -vpaddd  m4, m5
> -vpaddd  m6, m7
> +vpmulld  m1,  m2, [smpq+negj*4-4]
> +vpmulld  m5,  m2, [smpq+negj*4-4+mmsize]
> +vpmulld  m7,  m2, [smpq+negj*4-4+mmsize*2]
> +vpaddd   m0,  m1
> +vpaddd   m4,  m5
> +vpaddd   m6,  m7
>  %else
> -movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]
> -movu   m5, [smpq+negj*4-4+mmsize]
> -movu   m7, [smpq+negj*4-4+mmsize*2]
> -pmulld m1,  m2
> -pmulld m5,  m2
> -pmulld m7,  m2
> -paddd  m0,  m1 ; p += c * s
> -paddd  m4,  m5
> -paddd  m6,  m7
> +movu m1, [smpq+negj*4-4] ; s = smp[i-j-1]
> +movu m5, [smpq+negj*4-4+mmsize]
> +movu m7, [smpq+negj*4-4+mmsize*2]
> +pmulld   m1,  m2
> +pmulld   m5,  m2
> +pmulld   m7,  m2
> +padddm0,  m1 ; p += c * s
> +padddm4,  m5
> +padddm6,  m7
>  %endif
>
>  decnegj
>  incposj
>  jnz .looporder
>
> -psrad  m0, xm3  ; p >>= shift
> +psrad  m0, xm3   ; p >>= shift
>  psrad  m4, xm3
>  psrad  m6, xm3
>  movu   m1,[smpq]
>  movu   m5,[smpq+mmsize]
>  movu   m7,[smpq+mmsize*2]
> -psubd  m1, m0  ; smp[i] - p
> +psubd  m1, m0; smp[i] - p
>  psubd  m5, m4
>  psubd  m7, m6
> -movu  [resq],  m1  ; res[i] = smp[i] - (p >> shift)
> +movu  [resq],  m1; res[i] = smp[i] - (p >> shift)
>  movu  [resq+mmsize], m5
>  movu  [resq+mmsize*2], m7
>
> --
> 2.15.0
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

lgtm, should have just pushed this
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/8] avcodec/flac: add SSE4.2 version of the 32-bit lpc encoder

2017-11-26 Thread Carl Eugen Hoyos
2017-11-26 23:51 GMT+01:00 James Darnley :

> +if (EXTERNAL_SSE42(cpu_flags)) {
> +if (CONFIG_GPL)
> +c->lpc32_encode = ff_flac_enc_lpc_32_sse42;
> +}

Any objections over "if (CONFIG_GPL && EXTERNAL_..)"?

Carl Eugen
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] tests/checkasm/float_dsp: Increase allowed difference for float_dsp.vector_dmul

2017-11-26 Thread Michael Niedermayer
On Sun, Nov 26, 2017 at 12:47:22AM +0100, Michael Niedermayer wrote:
> On Sun, Nov 26, 2017 at 12:10:38AM +0100, Michael Niedermayer wrote:
> > On Fri, Nov 24, 2017 at 11:37:36PM -0300, James Almer wrote:
> > > On 10/29/2017 11:57 AM, Michael Niedermayer wrote:
> > > > The choosen value is the lowest power of 2 that allows 1000 iterations 
> > > > of fate-checkasm-float_dsp
> > > > to pass on x86-32
> > > 
> > > Ticket #6848 reports this value is still not enough. Maybe something
> > > like 1.0e-12 or 1.0e-13 instead?
> > 
> > ok, ill push it with 1e-12
> 
> Or do people prefer this: (this should be more correct)
> 
> commit 67ba87a320faba623c0b35a0692adb916860ac40 (HEAD -> master)
> Author: Michael Niedermayer 
> Date:   Sun Oct 29 15:26:50 2017 +0100
> 
> tests/checkasm/float_dsp: Increase allowed difference for 
> float_dsp.vector_dmul
> 
> Tested for 1 iterations on x86-32
> 
> Fixes: Ticket6848
> 
> Signed-off-by: Michael Niedermayer 

ill push a variant of this so that gets fixed. We can change it
later if people prefer somthing else
leaving it open is bad ...

[...]

-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

If you drop bombs on a foreign country and kill a hundred thousand
innocent people, expect your government to call the consequence
"unprovoked inhuman terrorist attacks" and use it to justify dropping
more bombs and killing more people. The technology changed, the idea is old.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley  wrote:

> When compared to the SSE4.2 version runtime, is reduced by 1 to 26%.  The
> function itself is around 2 times faster.
> ---
>  libavcodec/x86/flac_dsp_gpl.asm | 56 ++
> +--
>  libavcodec/x86/flacdsp_init.c   |  5 +++-
>  2 files changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm
> b/libavcodec/x86/flac_dsp_gpl.asm
> index 91989ce560..749e66dec8 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -22,11 +22,11 @@
>
>  %include "libavutil/x86/x86util.asm"
>
> -SECTION_RODATA
> +SECTION_RODATA 32
>
> -pd_0_int_min: times  2 dd 0, -2147483648
> -pq_int_min:   times  2 dq -2147483648
> -pq_int_max:   times  2 dq  2147483647
> +pd_0_int_min: times  4 dd 0, -2147483648
> +pq_int_min:   times  4 dq -2147483648
> +pq_int_max:   times  4 dq  2147483647
>
>  SECTION .text
>
> @@ -123,7 +123,10 @@ RET
>  %endmacro
>
>  %macro PMINSQ 3
> -pcmpgtq %3, %2, %1
> +mova%3, %2
> +; We cannot use the 3-operand format because the memory location
> cannot be
> +; the second operand, only the third.
> +pcmpgtq %3, %1
>

I don't get it, how did it work before then?


>  pand%1, %3
>  pandn   %3, %2
>  por %1, %3
> @@ -177,11 +180,11 @@ learesq,   [resq+orderq*4]
>  leasmpq,   [smpq+orderq*4]
>  leacoefsq, [coefsq+orderq*4]
>  sublength,  orderd
> -movd   m3,  r5m
> +movd   xm3, r5m
>  negorderq
>
>  movu   m4, [pd_0_int_min] ; load 1 bit
> -psrad  m4,  m3; turn that into shift+1 bits
> +psrad  m4,  xm3   ; turn that into shift+1 bits
>  pslld  m4,  1 ; reduce that
>  mova  [rsp],m4; save sign extend mask
>
> @@ -197,8 +200,20 @@ mova  [rsp],m4; save sign extend mask
>  xor  negj, negj
>
>  .looporder1:
> +%if cpuflag(avx)
> +vbroadcastss m2, [coefsq+posj*4]
> +%else
>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
> +%endif
> +%if cpuflag(avx)
> +vpmuldq  m1, m2, [smpq+negj*4-4]
> +vpmuldq  m5, m2, [smpq+negj*4-4+mmsize]
> +vpmuldq  m7, m2, [smpq+negj*4-4+mmsize*2]
> +vpaddq   m0, m1
> +vpaddq   m4, m5
> +vpaddq   m6, m7
>

Why force VEX encoding for these instructions, on avx no less?


> +%else
>  movu   m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
>  movu   m5,  [smpq+negj*4-4+mmsize]
>  movu   m7,  [smpq+negj*4-4+mmsize*2]
> @@ -212,14 +227,15 @@ mova  [rsp],m4; save sign extend mask
>  paddq  m0,   m1 ; p += c * s
>  paddq  m4,   m5
>  paddq  m6,   m7
> +%endif
>
>  decnegj
>  incposj
>  jnz .looporder1
>
> -HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
> -HACK_PSRAQ m4, m3, [rsp], m2
> -HACK_PSRAQ m6, m3, [rsp], m2
> +HACK_PSRAQ m0, xm3, [rsp], m2; p >>= shift
> +HACK_PSRAQ m4, xm3, [rsp], m2
> +HACK_PSRAQ m6, xm3, [rsp], m2
>  CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
>  CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
>  CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
> @@ -241,8 +257,20 @@ mova  [rsp],m4; save sign extend mask
>  xor  negj, negj
>
>  .looporder2:
> +%if cpuflag(avx)
> +vbroadcastss m2, [coefsq+posj*4]
> +%else
>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
> +%endif
> +%if cpuflag(avx)
> +vpmuldq  m1, m2, [smpq+negj*4]
> +vpmuldq  m5, m2, [smpq+negj*4+mmsize]
> +vpmuldq  m7, m2, [smpq+negj*4+mmsize*2]
> +vpaddq   m0, m1
> +vpaddq   m4, m5
> +vpaddq   m6, m7
> +%else
>  movu   m1,  [smpq+negj*4] ; s = smp[i-j-1]
>  movu   m5,  [smpq+negj*4+mmsize]
>  movu   m7,  [smpq+negj*4+mmsize*2]
> @@ -252,14 +280,15 @@ mova  [rsp],m4; save sign extend mask
>  paddq  m0,   m1 ; p += c * s
>  paddq  m4,   m5
>  paddq  m6,   m7
> +%endif
>
>  decnegj
>  incposj
>  jnz .looporder2
>
> -HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
> -HACK_PSRAQ m4, m3, [rsp], m2
> -HACK_PSRAQ m6, m3, [rsp], m2
> +HACK_PSRAQ m0, xm3, [rsp], m2; p >>= shift
> +HACK_PSRAQ m4, xm3, [rsp], m2
> +HACK_PSRAQ m6, xm3, [rsp], m2
>  CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
>  CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
>  CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
> @@ -300,3 +329,4 @@ FUNCTION_BODY_32
>
>  INIT_YMM avx2
>  FUNCTION_BODY_16
> +FUNCTION_BODY_32
> diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
> index f827186c26..fbe70894a0 100644
> --- a/libavcodec/x86/flacdsp_init.c
> +++ b/libavcodec/x86/flacdsp_init.c
> @@ -30,6 +30,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int

Re: [FFmpeg-devel] [PATCH] tests/checkasm/float_dsp: Increase allowed difference for float_dsp.vector_dmul

2017-11-26 Thread James Almer
On 11/26/2017 8:09 PM, Michael Niedermayer wrote:
> On Sun, Nov 26, 2017 at 12:47:22AM +0100, Michael Niedermayer wrote:
>> On Sun, Nov 26, 2017 at 12:10:38AM +0100, Michael Niedermayer wrote:
>>> On Fri, Nov 24, 2017 at 11:37:36PM -0300, James Almer wrote:
 On 10/29/2017 11:57 AM, Michael Niedermayer wrote:
> The choosen value is the lowest power of 2 that allows 1000 iterations of 
> fate-checkasm-float_dsp
> to pass on x86-32

 Ticket #6848 reports this value is still not enough. Maybe something
 like 1.0e-12 or 1.0e-13 instead?
>>>
>>> ok, ill push it with 1e-12
>>
>> Or do people prefer this: (this should be more correct)
>>
>> commit 67ba87a320faba623c0b35a0692adb916860ac40 (HEAD -> master)
>> Author: Michael Niedermayer 
>> Date:   Sun Oct 29 15:26:50 2017 +0100
>>
>> tests/checkasm/float_dsp: Increase allowed difference for 
>> float_dsp.vector_dmul
>>
>> Tested for 1 iterations on x86-32
>>
>> Fixes: Ticket6848
>>
>> Signed-off-by: Michael Niedermayer 
> 
> ill push a variant of this so that gets fixed. We can change it
> later if people prefer somthing else
> leaving it open is bad ...

Any solution is fine with me. Thanks for fixing it.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley  wrote:

> Around 1.1 times faster and reduces runtime by up to 6%.
> ---
>  libavcodec/x86/flac_dsp_gpl.asm | 91 ++
> ++-
>  1 file changed, 72 insertions(+), 19 deletions(-)
>
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm
> b/libavcodec/x86/flac_dsp_gpl.asm
> index 952fc8b86b..91989ce560 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -152,13 +152,13 @@ RET
>  %macro FUNCTION_BODY_32 0
>
>  %if ARCH_X86_64
> -cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
> +cglobal flac_enc_lpc_32, 5, 7, 8, mmsize*4, res, smp, len, order,
> coefs
>

Why x4, shouldn't this be x2?


>  DECLARE_REG_TMP 5, 6
>  %define length r2d
>
>  movsxd orderq, orderd
>  %else
> -cglobal flac_enc_lpc_32, 5, 6, 8, mmsize, res, smp, len, order, coefs
> +cglobal flac_enc_lpc_32, 5, 6, 8, mmsize*4, res, smp, len, order,
> coefs
>  DECLARE_REG_TMP 2, 5
>  %define length r2mp
>  %endif
> @@ -189,18 +189,23 @@ mova  [rsp],m4; save sign extend mask
>  %define negj t1q
>
>  .looplen:
> +; process "odd" samples
>  pxor m0,   m0
>  pxor m4,   m4
>  pxor m6,   m6
>  mov  posj, orderq
>  xor  negj, negj
>
> -.looporder:
> +.looporder1:
>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
> -pmovzxdq m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
> -pmovzxdq m5,  [smpq+negj*4-4+mmsize/2]
> -pmovzxdq m7,  [smpq+negj*4-4+mmsize]
> +movu   m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
> +movu   m5,  [smpq+negj*4-4+mmsize]
> +movu   m7,  [smpq+negj*4-4+mmsize*2]
> +; Rather than explicitly unpack adjacent samples into qwords we
> can let
> +; the pmuldq instruction unpack the 0th and 2nd samples for us
> when it
> +; does its multiply.  This saves an unpack for every sample in
> the inner
> +; loop meaning it should be (much) quicker.
>  pmuldq m1,   m2
>  pmuldq m5,   m2
>  pmuldq m7,   m2
> @@ -210,7 +215,7 @@ mova  [rsp],m4; save sign extend mask
>
>  decnegj
>  incposj
> -jnz .looporder
> +jnz .looporder1
>
>  HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
>  HACK_PSRAQ m4, m3, [rsp], m2
> @@ -218,22 +223,70 @@ mova  [rsp],m4; save sign extend mask
>  CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
>  CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
>  CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
> -pshufd  m0,m0, q0020 ; pack into first 2 dwords
> -pshufd  m4,m4, q0020
> -pshufd  m6,m6, q0020
> -movhm1,   [smpq]
> -movhm5,   [smpq+mmsize/2]
> -movhm7,   [smpq+mmsize]
> +movum1,   [smpq]
> +movum5,   [smpq+mmsize]
> +movum7,   [smpq+mmsize*2]
>  psubd   m1,m0   ; smp[i] - p
>  psubd   m5,m4
>  psubd   m7,m6
> -movh   [resq], m1   ; res[i] = smp[i] - (p >> shift)
> -movh   [resq+mmsize/2], m5
> -movh   [resq+mmsize], m7
> +mova   [rsp+mmsize], m1   ; res[i] = smp[i] - (p >> shift)
> +mova   [rsp+mmsize*2], m5
> +mova   [rsp+mmsize*3], m7
> +
> +; process "even" samples
> +pxor m0,   m0
> +pxor m4,   m4
> +pxor m6,   m6
> +mov  posj, orderq
> +xor  negj, negj
> +
> +.looporder2:
> +movd   m2,  [coefsq+posj*4] ; c = coefs[j]
> +SPLATD m2
> +movu   m1,  [smpq+negj*4] ; s = smp[i-j-1]
> +movu   m5,  [smpq+negj*4+mmsize]
> +movu   m7,  [smpq+negj*4+mmsize*2]
> +pmuldq m1,   m2
> +pmuldq m5,   m2
> +pmuldq m7,   m2
> +paddq  m0,   m1 ; p += c * s
> +paddq  m4,   m5
> +paddq  m6,   m7
> +
> +decnegj
> +incposj
> +jnz .looporder2
> +
> +HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
> +HACK_PSRAQ m4, m3, [rsp], m2
> +HACK_PSRAQ m6, m3, [rsp], m2
> +CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
> +CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
> +CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
> +movum1,   [smpq+4]
> +movum5,   [smpq+4+mmsize]
> +movum7,   [smpq+4+mmsize*2]
> +psubd   m1,m0   ; smp[i] - p
> +psubd   m5,m4
> +psubd   m7,m6
> +
> +; interleave odd and even samples
> +pslldq  m1, 4
> +pslldq  m5, 4
> +pslldq  m7, 4
> +
> +pblendw m1, [rsp+mmsize], q0303
> +pblendw m5, [rsp+mmsize*2], q0303
> +pblendw m7, [rsp+mmsize*3], q0303
> +
> +movu [resq], m1
> +movu [resq+mmsize], m5
> +movu [resq+mmsize*2], m7
> +
> +add resq,3*mmsize
> +add smpq,3*mmsize
> +sub length, (3*mmsize)/4
>
> -add resq,   (3*mmsize)/2
> -add smpq,   (3*mmsize)/2
> -sub length, (

Re: [FFmpeg-devel] [PATCH 1/8] avcodec/flac: document limitations of the LPC encoder

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley  wrote:

> State that the maximum value of order is 32.  This limit is used in both
> C and x86 assebly code.
> ---
>  libavcodec/flacdsp.h | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/libavcodec/flacdsp.h b/libavcodec/flacdsp.h
> index 7bb0dd0e9a..90fd3f04b5 100644
> --- a/libavcodec/flacdsp.h
> +++ b/libavcodec/flacdsp.h
> @@ -30,6 +30,14 @@ typedef struct FLACDSPContext {
>int qlevel, int len);
>  void (*lpc32)(int32_t *samples, const int coeffs[32], int order,
>int qlevel, int len);
> +
> +/**
> + * These encoder functions support a maximum order of 32.
> + *
> + * This limit is used:
> + * - when CONFIG_SMALL is 0 to unroll a loop in the C template.
> + * - when SSE4 (or newer) is available on x86 to unroll a copy loop.
> + */
>  void (*lpc16_encode)(int32_t *res, const int32_t *smp, int len, int
> order,
>   const int32_t coefs[32], int shift);
>  void (*lpc32_encode)(int32_t *res, const int32_t *smp, int len, int
> order,
> --
> 2.15.0
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

lgtm, should have just pushed
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 2/8] avcodec/flac: add AVX2 version of the 16-bit LPC encoder

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley  wrote:

> When compared to the SSE4 version, runtime is reduced by 0.5 to 20%.
> After a bug fix log, long ago in e609cfd697 the 16-bit lpc encoder is
> used so little that the runtime reduction is no longer correct.  The
> function itself is around 2 times faster.  (As one might expect for
> doing twice as many samples every iteration.)
> ---
>  libavcodec/flacenc.c|  2 +-
>  libavcodec/x86/flac_dsp_gpl.asm | 32 +++-
>  libavcodec/x86/flacdsp_init.c   |  5 +
>  3 files changed, 33 insertions(+), 6 deletions(-)
>
> diff --git a/libavcodec/flacenc.c b/libavcodec/flacenc.c
> index 170c3caf48..cf25982c91 100644
> --- a/libavcodec/flacenc.c
> +++ b/libavcodec/flacenc.c
> @@ -88,7 +88,7 @@ typedef struct FlacSubframe {
>  uint64_t rc_sums[32][MAX_PARTITIONS];
>
>  int32_t samples[FLAC_MAX_BLOCKSIZE];
> -int32_t residual[FLAC_MAX_BLOCKSIZE+11];
> +int32_t residual[FLAC_MAX_BLOCKSIZE+23];
>  } FlacSubframe;
>
>  typedef struct FlacFrame {
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm
> b/libavcodec/x86/flac_dsp_gpl.asm
> index e285158185..c461c666be 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -24,7 +24,8 @@
>
>  SECTION .text
>
> -INIT_XMM sse4
> +%macro FUNCTION_BODY_16 0
> +
>  %if ARCH_X86_64
>  cglobal flac_enc_lpc_16, 5, 7, 8, 0, res, smp, len, order, coefs
>  DECLARE_REG_TMP 5, 6
> @@ -51,7 +52,7 @@ lea  resq,   [resq+orderq*4]
>  lea  smpq,   [smpq+orderq*4]
>  lea  coefsq, [coefsq+orderq*4]
>  sub  length,  orderd
> -movd m3,  r5m
> +movd xm3, r5m
>  neg  orderq
>
>  %define posj t0q
> @@ -65,8 +66,20 @@ neg  orderq
>  xor  negj, negj
>
>  .looporder:
> +%if cpuflag(avx)
> +vbroadcastss m2, [coefsq+posj*4]
> +%else
>  movd   m2, [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
> +%endif
> +%if cpuflag(avx)
> +vpmulld m1, m2, [smpq+negj*4-4]
> +vpmulld m5, m2, [smpq+negj*4-4+mmsize]
> +vpmulld m7, m2, [smpq+negj*4-4+mmsize*2]
> +vpaddd  m0, m1
> +vpaddd  m4, m5
> +vpaddd  m6, m7
>

Same as the 32bit lpc avx2 patch


> +%else
>  movu   m1, [smpq+negj*4-4] ; s = smp[i-j-1]
>  movu   m5, [smpq+negj*4-4+mmsize]
>  movu   m7, [smpq+negj*4-4+mmsize*2]
> @@ -76,14 +89,15 @@ neg  orderq
>  paddd  m0,  m1 ; p += c * s
>  paddd  m4,  m5
>  paddd  m6,  m7
> +%endif
>
>  decnegj
>  incposj
>  jnz .looporder
>
> -psrad  m0, m3  ; p >>= shift
> -psrad  m4, m3
> -psrad  m6, m3
> +psrad  m0, xm3  ; p >>= shift
> +psrad  m4, xm3
> +psrad  m6, xm3
>  movu   m1,[smpq]
>  movu   m5,[smpq+mmsize]
>  movu   m7,[smpq+mmsize*2]
> @@ -99,3 +113,11 @@ neg  orderq
>  sub length, (3*mmsize)/4
>  jg .looplen
>  RET
> +
> +%endmacro
> +
> +INIT_XMM sse4
> +FUNCTION_BODY_16
> +
> +INIT_YMM avx2
> +FUNCTION_BODY_16
> diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
> index 1971f81b8d..0a5c01859f 100644
> --- a/libavcodec/x86/flacdsp_init.c
> +++ b/libavcodec/x86/flacdsp_init.c
> @@ -28,6 +28,7 @@ void ff_flac_lpc_32_xop(int32_t *samples, const int
> coeffs[32], int order,
>  int qlevel, int len);
>
>  void ff_flac_enc_lpc_16_sse4(int32_t *, const int32_t *, int, int, const
> int32_t *,int);
> +void ff_flac_enc_lpc_16_avx2(int32_t *, const int32_t *, int, int, const
> int32_t *,int);
>
>  #define DECORRELATE_FUNCS(fmt, opt)
> \
>  void ff_flac_decorrelate_ls_##fmt##_##opt(uint8_t **out, int32_t **in,
> int channels, \
> @@ -110,6 +111,10 @@ av_cold void ff_flacdsp_init_x86(FLACDSPContext *c,
> enum AVSampleFormat fmt, int
>  if (CONFIG_GPL)
>  c->lpc16_encode = ff_flac_enc_lpc_16_sse4;
>  }
> +if (EXTERNAL_AVX2(cpu_flags)) {
> +if (CONFIG_GPL)
>

yeah, just combine them, if someone wants to add non-gpl asm this is the
least of their problems


> +c->lpc16_encode = ff_flac_enc_lpc_16_avx2;
> +}
>  #endif
>  #endif /* HAVE_X86ASM */
>  }
> --
> 2.15.0
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 4/8] avcodec/flac: partially unroll loop in flac_enc_lpc_32

2017-11-26 Thread Rostislav Pehlivanov
On 26 November 2017 at 22:51, James Darnley  wrote:

> Now does 6 samples per iteration, up from 2.
>
> From 1.6 to 2.1 times faster again.  2.5 to 3.9 times faster overall.
> Runtime is reduced by a further 4 to 17%.  Reduced by 9 to 65% overall.
>
> Same conditions as previously.
> ---
>  libavcodec/x86/flac_dsp_gpl.asm | 30 +-
>  1 file changed, 25 insertions(+), 5 deletions(-)
>
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm
> b/libavcodec/x86/flac_dsp_gpl.asm
> index 618306eb5f..4d212ed212 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -152,13 +152,13 @@ RET
>  %macro FUNCTION_BODY_32 0
>
>  %if ARCH_X86_64
> -cglobal flac_enc_lpc_32, 5, 7, 4, mmsize, res, smp, len, order, coefs
> +cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
>  DECLARE_REG_TMP 5, 6
>  %define length r2d
>
>  movsxd orderq, orderd
>  %else
> -cglobal flac_enc_lpc_32, 5, 6, 4, mmsize, res, smp, len, order, coefs
> +cglobal flac_enc_lpc_32, 5, 6, 8, mmsize, res, smp, len, order, coefs
>  DECLARE_REG_TMP 2, 5
>  %define length r2mp
>  %endif
> @@ -190,6 +190,8 @@ mova  [rsp],m4; save sign extend mask
>
>  .looplen:
>  pxor m0,   m0
> +pxor m4,   m4
> +pxor m6,   m6
>  mov  posj, orderq
>  xor  negj, negj
>
> @@ -197,23 +199,41 @@ mova  [rsp],m4; save sign extend mask
>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
>  pmovzxdq m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
> +pmovzxdq m5,  [smpq+negj*4-4+mmsize/2]
> +pmovzxdq m7,  [smpq+negj*4-4+mmsize]
>  pmuldq m1,   m2
> +pmuldq m5,   m2
> +pmuldq m7,   m2
>  paddq  m0,   m1 ; p += c * s
> +paddq  m4,   m5
> +paddq  m6,   m7
>
>  decnegj
>  incposj
>  jnz .looporder
>
>  HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
> +HACK_PSRAQ m4, m3, [rsp], m2
> +HACK_PSRAQ m6, m3, [rsp], m2
>  CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
> +CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
> +CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
>  pshufd  m0,m0, q0020 ; pack into first 2 dwords
> +pshufd  m4,m4, q0020
> +pshufd  m6,m6, q0020
>  movhm1,   [smpq]
> +movhm5,   [smpq+mmsize/2]
> +movhm7,   [smpq+mmsize]
>  psubd   m1,m0   ; smp[i] - p
> +psubd   m5,m4
> +psubd   m7,m6
>  movh   [resq], m1   ; res[i] = smp[i] - (p >> shift)
> +movh   [resq+mmsize/2], m5
> +movh   [resq+mmsize], m7
>
> -add resq,   mmsize/2
> -add smpq,   mmsize/2
> -sub length, mmsize/8
> +add resq,   (3*mmsize)/2
> +add smpq,   (3*mmsize)/2
> +sub length, (3*mmsize)/8
>  jg .looplen
>  RET
>
> --
> 2.15.0
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>

lgtm, tnx
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/8] avcodec/flac: add SSE4.2 version of the 32-bit lpc encoder

2017-11-26 Thread James Almer
On 11/26/2017 8:07 PM, Carl Eugen Hoyos wrote:
> 2017-11-26 23:51 GMT+01:00 James Darnley :
> 
>> +if (EXTERNAL_SSE42(cpu_flags)) {
>> +if (CONFIG_GPL)
>> +c->lpc32_encode = ff_flac_enc_lpc_32_sse42;
>> +}
> 
> Any objections over "if (CONFIG_GPL && EXTERNAL_..)"?
> 
> Carl Eugen

I prefer it as is. It's not only similar to other checks around it, but
also if someone decides to write an lgpl sse4.2 function they will not
have to change the existing statement or add a duplicate one.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:13, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley  wrote:
>> @@ -123,7 +123,10 @@ RET
>>  %endmacro
>>
>>  %macro PMINSQ 3
>> -pcmpgtq %3, %2, %1
>> +mova%3, %2
>> +; We cannot use the 3-operand format because the memory location
>> cannot be
>> +; the second operand, only the third.
>> +pcmpgtq %3, %1
>>
> 
> I don't get it, how did it work before then?

Easy.  3-operand instructions were never generated using it meaning it
was always emulated with a move.

>> @@ -197,8 +200,20 @@ mova  [rsp],m4; save sign extend mask
>>  xor  negj, negj
>>
>>  .looporder1:
>> +%if cpuflag(avx)
>> +vbroadcastss m2, [coefsq+posj*4]
>> +%else
>>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>>  SPLATD m2
>> +%endif
>> +%if cpuflag(avx)
>> +vpmuldq  m1, m2, [smpq+negj*4-4]
>> +vpmuldq  m5, m2, [smpq+negj*4-4+mmsize]
>> +vpmuldq  m7, m2, [smpq+negj*4-4+mmsize*2]
>> +vpaddq   m0, m1
>> +vpaddq   m4, m5
>> +vpaddq   m6, m7
>>
> 
> Why force VEX encoding for these instructions, on avx no less?

Not sure.  Legacy code written before I knew what I was doing?  Perhaps
some issue arose with the assembler or x86inc at that time and this is
how I worked around it.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Almer
On 11/26/2017 7:51 PM, James Darnley wrote:
> When compared to the SSE4.2 version runtime, is reduced by 1 to 26%.  The
> function itself is around 2 times faster.
> ---
>  libavcodec/x86/flac_dsp_gpl.asm | 56 
> +++--
>  libavcodec/x86/flacdsp_init.c   |  5 +++-
>  2 files changed, 47 insertions(+), 14 deletions(-)
> 
> diff --git a/libavcodec/x86/flac_dsp_gpl.asm b/libavcodec/x86/flac_dsp_gpl.asm
> index 91989ce560..749e66dec8 100644
> --- a/libavcodec/x86/flac_dsp_gpl.asm
> +++ b/libavcodec/x86/flac_dsp_gpl.asm
> @@ -22,11 +22,11 @@
>  
>  %include "libavutil/x86/x86util.asm"
>  
> -SECTION_RODATA
> +SECTION_RODATA 32
>  
> -pd_0_int_min: times  2 dd 0, -2147483648
> -pq_int_min:   times  2 dq -2147483648
> -pq_int_max:   times  2 dq  2147483647
> +pd_0_int_min: times  4 dd 0, -2147483648
> +pq_int_min:   times  4 dq -2147483648
> +pq_int_max:   times  4 dq  2147483647
>  
>  SECTION .text
>  
> @@ -123,7 +123,10 @@ RET
>  %endmacro
>  
>  %macro PMINSQ 3
> -pcmpgtq %3, %2, %1
> +mova%3, %2
> +; We cannot use the 3-operand format because the memory location cannot 
> be
> +; the second operand, only the third.
> +pcmpgtq %3, %1
>  pand%1, %3
>  pandn   %3, %2
>  por %1, %3
> @@ -177,11 +180,11 @@ learesq,   [resq+orderq*4]
>  leasmpq,   [smpq+orderq*4]
>  leacoefsq, [coefsq+orderq*4]
>  sublength,  orderd
> -movd   m3,  r5m
> +movd   xm3, r5m
>  negorderq
>  
>  movu   m4, [pd_0_int_min] ; load 1 bit
> -psrad  m4,  m3; turn that into shift+1 bits
> +psrad  m4,  xm3   ; turn that into shift+1 bits
>  pslld  m4,  1 ; reduce that
>  mova  [rsp],m4; save sign extend mask
>  
> @@ -197,8 +200,20 @@ mova  [rsp],m4; save sign extend mask
>  xor  negj, negj
>  
>  .looporder1:
> +%if cpuflag(avx)

Either avx2, or check instead for mmsize == 32

> +vbroadcastss m2, [coefsq+posj*4]

vpbroadcastd. Or just use the VPBROADCASTD macro to cover both the avx2
and sse4 cases without ifdeffery.

> +%else
>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
> +%endif
> +%if cpuflag(avx)
> +vpmuldq  m1, m2, [smpq+negj*4-4]
> +vpmuldq  m5, m2, [smpq+negj*4-4+mmsize]
> +vpmuldq  m7, m2, [smpq+negj*4-4+mmsize*2]
> +vpaddq   m0, m1
> +vpaddq   m4, m5
> +vpaddq   m6, m7
> +%else
>  movu   m1,  [smpq+negj*4-4] ; s = smp[i-j-1]
>  movu   m5,  [smpq+negj*4-4+mmsize]
>  movu   m7,  [smpq+negj*4-4+mmsize*2]
> @@ -212,14 +227,15 @@ mova  [rsp],m4; save sign extend mask
>  paddq  m0,   m1 ; p += c * s
>  paddq  m4,   m5
>  paddq  m6,   m7
> +%endif
>  
>  decnegj
>  incposj
>  jnz .looporder1
>  
> -HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
> -HACK_PSRAQ m4, m3, [rsp], m2
> -HACK_PSRAQ m6, m3, [rsp], m2
> +HACK_PSRAQ m0, xm3, [rsp], m2; p >>= shift
> +HACK_PSRAQ m4, xm3, [rsp], m2
> +HACK_PSRAQ m6, xm3, [rsp], m2
>  CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
>  CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
>  CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
> @@ -241,8 +257,20 @@ mova  [rsp],m4; save sign extend mask
>  xor  negj, negj
>  
>  .looporder2:
> +%if cpuflag(avx)
> +vbroadcastss m2, [coefsq+posj*4]

Same

> +%else
>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>  SPLATD m2
> +%endif
> +%if cpuflag(avx)
> +vpmuldq  m1, m2, [smpq+negj*4]
> +vpmuldq  m5, m2, [smpq+negj*4+mmsize]
> +vpmuldq  m7, m2, [smpq+negj*4+mmsize*2]
> +vpaddq   m0, m1
> +vpaddq   m4, m5
> +vpaddq   m6, m7
> +%else
>  movu   m1,  [smpq+negj*4] ; s = smp[i-j-1]
>  movu   m5,  [smpq+negj*4+mmsize]
>  movu   m7,  [smpq+negj*4+mmsize*2]
> @@ -252,14 +280,15 @@ mova  [rsp],m4; save sign extend mask
>  paddq  m0,   m1 ; p += c * s
>  paddq  m4,   m5
>  paddq  m6,   m7
> +%endif
>  
>  decnegj
>  incposj
>  jnz .looporder2
>  
> -HACK_PSRAQ m0, m3, [rsp], m2; p >>= shift
> -HACK_PSRAQ m4, m3, [rsp], m2
> -HACK_PSRAQ m6, m3, [rsp], m2
> +HACK_PSRAQ m0, xm3, [rsp], m2; p >>= shift
> +HACK_PSRAQ m4, xm3, [rsp], m2
> +HACK_PSRAQ m6, xm3, [rsp], m2
>  CLIPQ   m0,   [pq_int_min], [pq_int_max], m2 ; clip(p >> shift)
>  CLIPQ   m4,   [pq_int_min], [pq_int_max], m2
>  CLIPQ   m6,   [pq_int_min], [pq_int_max], m2
> @@ -300,3 +329,4 @@ FUNCTION_BODY_32
>  
>  INIT_YMM avx2
>  FUNCTION_BODY_16
> +FUNCTION_BODY_32
> diff --git a/libavcodec/x86/flacdsp_init.c b/libavcodec/x86/flacdsp_init.c
> index f827186c26..fbe70894a0 100644
> --- a/libavcodec/x86/flacdsp_init.c
> +++ b/libavcodec/x86/flacdsp_init.c
> @@ 

Re: [FFmpeg-devel] [PATCH 6/8] lavc/x86/flac_dsp_gpl: partially unroll 32-bit LPC encoder

2017-11-26 Thread James Darnley
On 2017-11-27 00:17, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley  wrote:
>> @@ -152,13 +152,13 @@ RET
>>  %macro FUNCTION_BODY_32 0
>>
>>  %if ARCH_X86_64
>> -cglobal flac_enc_lpc_32, 5, 7, 8, mmsize, res, smp, len, order, coefs
>> +cglobal flac_enc_lpc_32, 5, 7, 8, mmsize*4, res, smp, len, order,
>> coefs
>>
> 
> Why x4, shouldn't this be x2?

I write 3 mm registers more to the stack.  The first one is the sign
extension for my hacked qword arithmetic shift added in the first 32-bit
patch.  The new 3 are to store the "odd" values created in the first
inner loop.

I admit that this is a rather ugly construction for a little speed gain
but I think I've seen other ugly things since writing this.

___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 7/8] lavc/flacenc: add AVX2 version of the 32-bit LPC encoder

2017-11-26 Thread James Almer
On 11/26/2017 8:13 PM, Rostislav Pehlivanov wrote:
> On 26 November 2017 at 22:51, James Darnley  wrote:
> 
>> When compared to the SSE4.2 version runtime, is reduced by 1 to 26%.  The
>> function itself is around 2 times faster.
>> ---
>>  libavcodec/x86/flac_dsp_gpl.asm | 56 ++
>> +--
>>  libavcodec/x86/flacdsp_init.c   |  5 +++-
>>  2 files changed, 47 insertions(+), 14 deletions(-)
>>
>> diff --git a/libavcodec/x86/flac_dsp_gpl.asm
>> b/libavcodec/x86/flac_dsp_gpl.asm
>> index 91989ce560..749e66dec8 100644
>> --- a/libavcodec/x86/flac_dsp_gpl.asm
>> +++ b/libavcodec/x86/flac_dsp_gpl.asm
>> @@ -22,11 +22,11 @@
>>
>>  %include "libavutil/x86/x86util.asm"
>>
>> -SECTION_RODATA
>> +SECTION_RODATA 32
>>
>> -pd_0_int_min: times  2 dd 0, -2147483648
>> -pq_int_min:   times  2 dq -2147483648
>> -pq_int_max:   times  2 dq  2147483647
>> +pd_0_int_min: times  4 dd 0, -2147483648
>> +pq_int_min:   times  4 dq -2147483648
>> +pq_int_max:   times  4 dq  2147483647
>>
>>  SECTION .text
>>
>> @@ -123,7 +123,10 @@ RET
>>  %endmacro
>>
>>  %macro PMINSQ 3
>> -pcmpgtq %3, %2, %1
>> +mova%3, %2
>> +; We cannot use the 3-operand format because the memory location
>> cannot be
>> +; the second operand, only the third.
>> +pcmpgtq %3, %1
>>
> 
> I don't get it, how did it work before then?
> 
> 
>>  pand%1, %3
>>  pandn   %3, %2
>>  por %1, %3
>> @@ -177,11 +180,11 @@ learesq,   [resq+orderq*4]
>>  leasmpq,   [smpq+orderq*4]
>>  leacoefsq, [coefsq+orderq*4]
>>  sublength,  orderd
>> -movd   m3,  r5m
>> +movd   xm3, r5m
>>  negorderq
>>
>>  movu   m4, [pd_0_int_min] ; load 1 bit
>> -psrad  m4,  m3; turn that into shift+1 bits
>> +psrad  m4,  xm3   ; turn that into shift+1 bits
>>  pslld  m4,  1 ; reduce that
>>  mova  [rsp],m4; save sign extend mask
>>
>> @@ -197,8 +200,20 @@ mova  [rsp],m4; save sign extend mask
>>  xor  negj, negj
>>
>>  .looporder1:
>> +%if cpuflag(avx)
>> +vbroadcastss m2, [coefsq+posj*4]
>> +%else
>>  movd   m2,  [coefsq+posj*4] ; c = coefs[j]
>>  SPLATD m2
>> +%endif
>> +%if cpuflag(avx)
>> +vpmuldq  m1, m2, [smpq+negj*4-4]
>> +vpmuldq  m5, m2, [smpq+negj*4-4+mmsize]
>> +vpmuldq  m7, m2, [smpq+negj*4-4+mmsize*2]
>> +vpaddq   m0, m1
>> +vpaddq   m4, m5
>> +vpaddq   m6, m7
>>
> 
> Why force VEX encoding for these instructions, on avx no less?

It's avx2 and using ymm regs, not avx.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec: Implement vp8 nvdec hwaccel

2017-11-26 Thread Philip Langdale
On Sun, 26 Nov 2017 22:35:58 +
Mark Thompson  wrote:

> On 26/11/17 22:04, Philip Langdale wrote:
> > Signed-off-by: Philip Langdale 
> > ---
> >  Changelog  |  2 +-
> >  configure  |  2 ++
> >  libavcodec/Makefile|  1 +
> >  libavcodec/hwaccels.h  |  1 +
> >  libavcodec/nvdec.c |  1 +
> >  libavcodec/nvdec_vp8.c | 97
> > ++
> > libavcodec/version.h   |  3 +- libavcodec/vp8.c   |  6 
> >  8 files changed, 111 insertions(+), 2 deletions(-)
> >  create mode 100644 libavcodec/nvdec_vp8.c
> > 
> > diff --git a/Changelog b/Changelog
> > index e3092e211f..4db1d57721 100644
> > --- a/Changelog
> > +++ b/Changelog
> > @@ -13,7 +13,7 @@ version :
> >  - PCE support for extended channel layouts in the AAC encoder
> >  - native aptX encoder and decoder
> >  - Raw aptX muxer and demuxer
> > -- NVIDIA NVDEC-accelerated H.264, HEVC, MPEG-1/2/4, VC1 and VP9
> > hwaccel decoding +- NVIDIA NVDEC-accelerated H.264, HEVC,
> > MPEG-1/2/4, VC1, VP8 and VP9 hwaccel decoding
> >  - Intel QSV-accelerated overlay filter
> >  - mcompand audio filter
> >  - acontrast audio filter
> > diff --git a/configure b/configure
> > index bc00b71489..e5fa61e83d 100755
> > --- a/configure
> > +++ b/configure
> > @@ -2748,6 +2748,8 @@ vc1_vaapi_hwaccel_deps="vaapi"
> >  vc1_vaapi_hwaccel_select="vc1_decoder"
> >  vc1_vdpau_hwaccel_deps="vdpau"
> >  vc1_vdpau_hwaccel_select="vc1_decoder"
> > +vp8_nvdec_hwaccel_deps="nvdec"
> > +vp8_nvdec_hwaccel_select="vp8_decoder"
> >  vp8_vaapi_hwaccel_deps="vaapi VAPictureParameterBufferVP8"
> >  vp8_vaapi_hwaccel_select="vp8_decoder"
> >  vp9_d3d11va_hwaccel_deps="d3d11va DXVA_PicParams_VP9"
> > diff --git a/libavcodec/Makefile b/libavcodec/Makefile
> > index 640edfb590..ca7960cdf4 100644
> > --- a/libavcodec/Makefile
> > +++ b/libavcodec/Makefile
> > @@ -872,6 +872,7 @@ OBJS-$(CONFIG_VC1_NVDEC_HWACCEL)  +=
> > nvdec_vc1.o OBJS-$(CONFIG_VC1_QSV_HWACCEL)+=
> > qsvdec_other.o OBJS-$(CONFIG_VC1_VAAPI_HWACCEL)  +=
> > vaapi_vc1.o OBJS-$(CONFIG_VC1_VDPAU_HWACCEL)  += vdpau_vc1.o
> > +OBJS-$(CONFIG_VP8_NVDEC_HWACCEL)  += nvdec_vp8.o
> >  OBJS-$(CONFIG_VP8_VAAPI_HWACCEL)  += vaapi_vp8.o
> >  OBJS-$(CONFIG_VP9_D3D11VA_HWACCEL)+= dxva2_vp9.o
> >  OBJS-$(CONFIG_VP9_DXVA2_HWACCEL)  += dxva2_vp9.o
> > diff --git a/libavcodec/hwaccels.h b/libavcodec/hwaccels.h
> > index cefd2b15be..420e2feeea 100644
> > --- a/libavcodec/hwaccels.h
> > +++ b/libavcodec/hwaccels.h
> > @@ -60,6 +60,7 @@ extern const AVHWAccel ff_vc1_dxva2_hwaccel;
> >  extern const AVHWAccel ff_vc1_nvdec_hwaccel;
> >  extern const AVHWAccel ff_vc1_vaapi_hwaccel;
> >  extern const AVHWAccel ff_vc1_vdpau_hwaccel;
> > +extern const AVHWAccel ff_vp8_nvdec_hwaccel;
> >  extern const AVHWAccel ff_vp8_vaapi_hwaccel;
> >  extern const AVHWAccel ff_vp9_d3d11va_hwaccel;
> >  extern const AVHWAccel ff_vp9_d3d11va2_hwaccel;
> > diff --git a/libavcodec/nvdec.c b/libavcodec/nvdec.c
> > index da4451a739..c7a02ff40f 100644
> > --- a/libavcodec/nvdec.c
> > +++ b/libavcodec/nvdec.c
> > @@ -58,6 +58,7 @@ static int map_avcodec_id(enum AVCodecID id)
> >  case AV_CODEC_ID_MPEG2VIDEO: return cudaVideoCodec_MPEG2;
> >  case AV_CODEC_ID_MPEG4:  return cudaVideoCodec_MPEG4;
> >  case AV_CODEC_ID_VC1:return cudaVideoCodec_VC1;
> > +case AV_CODEC_ID_VP8:return cudaVideoCodec_VP8;
> >  case AV_CODEC_ID_VP9:return cudaVideoCodec_VP9;
> >  case AV_CODEC_ID_WMV3:   return cudaVideoCodec_VC1;
> >  }
> > diff --git a/libavcodec/nvdec_vp8.c b/libavcodec/nvdec_vp8.c
> > new file mode 100644
> > index 00..6fc0ac7ded
> > --- /dev/null
> > +++ b/libavcodec/nvdec_vp8.c
> > @@ -0,0 +1,97 @@
> > +/*
> > + * VP8 HW decode acceleration through NVDEC
> > + *
> > + * Copyright (c) 2017 Philip Langdale
> > + *
> > + * This file is part of FFmpeg.
> > + *
> > + * FFmpeg is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU Lesser General Public
> > + * License as published by the Free Software Foundation; either
> > + * version 2.1 of the License, or (at your option) any later
> > version.
> > + *
> > + * FFmpeg is distributed in the hope that it will be useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > GNU
> > + * Lesser General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU Lesser General Public
> > + * License along with FFmpeg; if not, write to the Free Software
> > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
> > 02110-1301 USA
> > + */
> > +
> > +#include "avcodec.h"
> > +#include "nvdec.h"
> > +#include "decode.h"
> > +#include "internal.h"
> > +#include "vp8.h"
> > +
> > +static unsigned char safe_get_ref_idx(VP8Frame *frame)
> > +{
> > +return frame ? ff_nvdec_g

Re: [FFmpeg-devel] [PATCH 1/2] avcodec/kgv1dec: Check that there is enough input for maximum RLE compression

2017-11-26 Thread Michael Niedermayer
On Wed, Nov 22, 2017 at 09:00:57PM +0100, Michael Niedermayer wrote:
> Fixes: Timeout
> Fixes: 4271/clusterfuzz-testcase-4676667768307712
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/kgv1dec.c | 3 +++
>  1 file changed, 3 insertions(+)

will apply

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Opposition brings concord. Out of discord comes the fairest harmony.
-- Heraclitus


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] libavformat/mov: Replace duplicate stream_nb check by assert

2017-11-26 Thread Michael Niedermayer
On Wed, Nov 22, 2017 at 08:19:45PM +, Derek Buitenhuis wrote:
> On 11/22/2017 8:09 PM, Michael Niedermayer wrote:
> > not much, no
> > its a non static function tough
> > i can remove the check completely if thats preferred ?
> 
> I guess leave it since it's non-static.
> 
> LGTM.

ok, will apply

thx

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety -- Benjamin Franklin


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/h264idct_template: Fix integer overflow in ff_h264_idct8_add

2017-11-26 Thread Michael Niedermayer
On Mon, Nov 20, 2017 at 02:58:15PM +0100, Michael Niedermayer wrote:
> Fixes: signed integer overflow: 452986184 - -2113885312 cannot be represented 
> in type 'int'
> Fixes: 4196/clusterfuzz-testcase-minimized-5580648594014208
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/h264idct_template.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)

applied

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Let us carefully observe those good qualities wherein our enemies excel us
and endeavor to excel them, by avoiding what is faulty, and imitating what
is excellent in them. -- Plutarch


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] avcodec/mlpdsp: Fix signed integer overflow, 2nd try

2017-11-26 Thread Michael Niedermayer
On Mon, Nov 20, 2017 at 09:26:48PM +0100, Michael Niedermayer wrote:
> The outputted bits should match what is used in the lossless check
> 
> Fixes: runtime error: signed integer overflow: -538697856 * 256 cannot be 
> represented in type 'int'
> Fixes: 4326/clusterfuzz-testcase-minimized-5689449645080576
> 
> Found-by: continuous fuzzing process 
> https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
> Signed-off-by: Michael Niedermayer 
> ---
>  libavcodec/mlpdsp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

applied

[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

No human being will ever know the Truth, for even if they happen to say it
by chance, they would not even known they had done so. -- Xenophanes


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH 3/3] tests/fate-run: Use -bitexact

2017-11-26 Thread Michael Niedermayer
On Sun, Oct 22, 2017 at 01:41:58AM +0200, Michael Niedermayer wrote:
> Signed-off-by: Michael Niedermayer 
> ---
>  tests/fate-run.sh | 24 
>  1 file changed, 12 insertions(+), 12 deletions(-)

will apply


[...]
-- 
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB

Frequently ignored answer#1 FFmpeg bugs should be sent to our bugtracker. User
questions about the command line tools should be sent to the ffmpeg-user ML.
And questions about how to use libav* should be sent to the libav-user ML.


signature.asc
Description: Digital signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] Added HW H.264 and HEVC encoding for AMD GPUs based on AMF SDK

2017-11-26 Thread Mark Thompson
On 22/11/17 23:28, mmironov wrote:
> From c669277afd764903d3da09d92a263d0fb58e24b1 Mon Sep 17 00:00:00 2001
> From: mmironov 
> Date: Tue, 14 Nov 2017 17:54:24 -0500
> Subject: [PATCH] Added HW H.264 and HEVC encoding for AMD GPUs based on AMF
>  SDK
> 
> Signed-off-by: mmironov 
> ---
>  Changelog|1 +
>  compat/amd/amfsdkenc.h   | 1755 
> ++
>  configure|   18 +-
>  libavcodec/Makefile  |4 +
>  libavcodec/allcodecs.c   |2 +
>  libavcodec/amfenc.c  |  596 
>  libavcodec/amfenc.h  |  143 
>  libavcodec/amfenc_h264.c |  397 +++
>  libavcodec/amfenc_hevc.c |  327 +
>  9 files changed, 3242 insertions(+), 1 deletion(-)
>  create mode 100644 compat/amd/amfsdkenc.h
>  create mode 100644 libavcodec/amfenc.c
>  create mode 100644 libavcodec/amfenc.h
>  create mode 100644 libavcodec/amfenc_h264.c
>  create mode 100644 libavcodec/amfenc_hevc.c

A few minor fixups below.  I would be happy to apply this if it didn't contain 
the external header.

Thanks,

- Mark


> diff --git a/Changelog b/Changelog
> index 68829f2..e5e5ffd 100644
> --- a/Changelog
> +++ b/Changelog
> @@ -15,6 +15,7 @@ version :
>  - Raw aptX muxer and demuxer
>  - NVIDIA NVDEC-accelerated H.264, HEVC and VP9 hwaccel decoding
>  - Intel QSV-accelerated overlay filter
> +- AMD NW H.264 and HEVC encoders

NW?

>  
>  
>  version 3.4:
> diff --git a/compat/amd/amfsdkenc.h b/compat/amd/amfsdkenc.h
> new file mode 100644
> index 000..282656d
> --- /dev/null
> +++ b/compat/amd/amfsdkenc.h
> @@ -0,0 +1,1755 @@
> ...
> diff --git a/configure b/configure
> index 3788f26..a562a2a 100755
> --- a/configure
> +++ b/configure
> @@ -303,6 +303,7 @@ External library support:
>--disable-zlib   disable zlib [autodetect]
>  
>The following libraries provide various hardware acceleration features:
> +  --disable-amfdisable AMF video encoding code [autodetect]
>--disable-audiotoolbox   disable Apple AudioToolbox code [autodetect]
>--disable-cuda   disable dynamically linked Nvidia CUDA code 
> [autodetect]
>--enable-cuda-sdkenable CUDA features that require the CUDA SDK 
> [no]
> @@ -1639,6 +1640,7 @@ EXTERNAL_LIBRARY_LIST="
>  "
>  
>  HWACCEL_AUTODETECT_LIBRARY_LIST="
> +amf
>  audiotoolbox
>  crystalhd
>  cuda
> @@ -2781,12 +2783,15 @@ scale_npp_filter_deps="cuda libnpp"
>  scale_cuda_filter_deps="cuda_sdk"
>  thumbnail_cuda_filter_deps="cuda_sdk"
>  
> +amf_deps_any="libdl LoadLibrary"
> +
>  nvenc_deps="cuda"
>  nvenc_deps_any="libdl LoadLibrary"
>  nvenc_encoder_deps="nvenc"
>  
>  h263_v4l2m2m_decoder_deps="v4l2_m2m h263_v4l2_m2m"
>  h263_v4l2m2m_encoder_deps="v4l2_m2m h263_v4l2_m2m"
> +h264_amf_encoder_deps="amf"
>  h264_crystalhd_decoder_select="crystalhd h264_mp4toannexb_bsf h264_parser"
>  h264_cuvid_decoder_deps="cuvid"
>  h264_cuvid_decoder_select="h264_mp4toannexb_bsf"
> @@ -2803,6 +2808,7 @@ 
> h264_vaapi_encoder_deps="VAEncPictureParameterBufferH264"
>  h264_vaapi_encoder_select="cbs_h264 vaapi_encode"
>  h264_v4l2m2m_decoder_deps="v4l2_m2m h264_v4l2_m2m"
>  h264_v4l2m2m_encoder_deps="v4l2_m2m h264_v4l2_m2m"
> +hevc_amf_encoder_deps="amf"
>  hevc_cuvid_decoder_deps="cuvid"
>  hevc_cuvid_decoder_select="hevc_mp4toannexb_bsf"
>  hevc_mediacodec_decoder_deps="mediacodec"
> @@ -6164,9 +6170,12 @@ if enabled x86; then
>  mingw32*|mingw64*|win32|win64|linux|cygwin*)
>  ;;
>  *)
> -disable cuda cuvid nvdec nvenc
> +disable cuda cuvid nvdec nvenc amf
>  ;;
>  esac
> +if test $target_os = "linux"; then
> +disable amf
> +fi
>  else
>  disable cuda cuvid nvdec nvenc

amf here too?

>  fi
> @@ -6179,6 +6188,13 @@ void f(void) { struct { const GUID guid; } s[] = { { 
> NV_ENC_PRESET_HQ_GUID } };
>  int main(void) { return 0; }
>  EOF
>  
> +enabled amf &&
> +check_cc -I$source_path < +#include "compat/amd/amfsdkenc.h"
> +AMFFactory *factory;
> +int main(void) { return 0; }
> +EOF
> +
>  # Funny iconv installations are not unusual, so check it after all flags 
> have been set
>  if enabled libc_iconv; then
>  check_func_headers iconv.h iconv
> diff --git a/libavcodec/Makefile b/libavcodec/Makefile
> index 2476aec..9bbb60e 100644
> --- a/libavcodec/Makefile
> +++ b/libavcodec/Makefile
> @@ -55,6 +55,7 @@ OBJS = ac3_parser.o 
> \
>  OBJS-$(CONFIG_AANDCTTABLES)+= aandcttab.o
>  OBJS-$(CONFIG_AC3DSP)  += ac3dsp.o ac3.o ac3tab.o
>  OBJS-$(CONFIG_ADTS_HEADER) += adts_header.o mpeg4audio.o
> +OBJS-$(CONFIG_AMF) += amfenc.o
>  OBJS-$(CONFIG_AUDIO_FRAME_QUEUE)   += audio_frame_queue.o
>  OBJS-$(CONFIG_AUDIODSP)+= audiodsp.o
>  OBJS-$(CONFIG_BLOCKDSP)+= blockdsp.o
> @@ -332,6 +333,7 @@ OBJS-$(CONFIG_H263_ENCODER)+= mpeg

  1   2   >