On 07/08/2016 10:17 PM, Nicolas George wrote:
Le primidi 21 messidor, an CCXXIV, Jörn Heusipp a écrit :

Regarding AVProbeData:
Looking at AVProbeData, I can see no (optional) field describing the file
size:
typedef struct AVProbeData {
     const char *filename;
     unsigned char *buf; /**< Buffer must have AVPROBE_PADDING_SIZE of extra
allocated bytes filled with zero. */
     int buf_size;       /**< Size of buf except extra allocated bytes */
     const char *mime_type; /**< mime_type, when known. */
} AVProbeData;
Sadly, that makes it rather useless for probing module formats: There are
module formats which have no magic bytes at all, or very bad ones which
require verifying other simple parts of the header in order to determine any
meaningful probing result, some even require seeking through the file and
verifying other later parts, some even have a footer that may need to be
verified.
Because of this situation, the libopenmpt I/O layer absolutely needs to know
the file size in order to do anything useful. In our adaption layer for
streams (like stdin or HTTP without length information), we lazily pre-cache
the whole file until hitting EOF as soon as the size is required or the code
wants to seek to the end. As any kind of streaming does not apply to module
formats, this was (and is) a sane design choice for libopenmpt.

The probing infrastructure can not provide the file size, for all we know it
could be an infinite stream coming from a live capture device.

Well, I think it could optionally provide it if it knows, as an additional hint to probe functions. I do understand however, that supporting probing for such non-streamable formats is probably not a primary design goal of ffmpeg. I'm not arguing for changing ffmpeg internals here.


Naively, I would suggest to assume that AVProbeData contains the whole file
and try probing that. That would be like trying to play a file that was
truncated on disk for some reason. If the probing fails, libavformat will
read some more and try again until it reaches the probe size limit.

I was not aware of that strategy used by ffmpeg. Skimming through the ffmpeg code, you seem to be referring to av_probe_input_buffer2().
I think your suggestion can work.


For an obscure format (I mean it without any disrespect, only as a fact with
regard to FFMpeg's usual use), false negatives on the probing are not that
big a problem: it will fall back to using the extension, and if even that
fails the user can still specify "-f libopenmpt" or the equivalent in their
application.

These formats are obscure, but they exist, are old, and are not going to change, thus we are stuck with this mess (no disrespect taken ;). If false-negatives are acceptable for ffmpeg, using your suggestion of just pretending AVProbeData contained the whole file will work for almost all cases. I do not think that we (libopenmpt) can guarantee that this will not also introduce false-positives, though they should be really rare (I cannot think of such a case right now, but I also do not think that we want to make the promise made by our API more specific for the truncated data case). I also was not aware of the fallback to file extension, which totally makes sense. Specifying libopenmpt explicitly would of course be easy for users of the ffmpeg program itself, but maybe not so much for users who use libavformat through some other library/framework/player.

I did a really quick, non-exhaustive check on some files, and libopenmpt currently gives pretty reliable positive probing results for files truncated to 4096 bytes. 1024 is not sufficient as standard ProTracker MOD files have the magic bytes "M.K." at offset 1080.

Starting calling openmpt_could_open_propability() only after more than 4096 are available would be a trade-off between on the one hand performance (not calling into libopenmpt at all if other ffmpeg demuxers were able to probe successfully with fewer data) and maybe reduced false-positives, and on the other hand reduced false-negatives (there are perfectly valid and usable module files smaller than 4096 bytes, even smaller than 1024 bytes is possible for some formats).


Regarding probing performance:

A thing to consider when discussing probing performance: the most important
is the speed of the obviously negative answers. A Matroska file or a MP3
file does not look at all like a tracker file, and libopenmpt should be able
to figure it out very quickly. Only when the file actually looks like a
tracker file should it make extra checks to be sure.

If that condition is met, then probing performance is probably not an issue:
users playing non-tracker file will not suffer from it.

I wish we could have this kind of early-reject in our probing function, but module formats without any file magic bytes at all prohibit such an implementation.

For formats which have magic bytes, we of course check these first and continue with the next format if they do not match. We still have to check the formats which have no or only bad magic numbers afterwards though. The format used by the original Ultimate Soundtracker (M15, 4 channel, 15 samples MOD format without any file magic bytes at all) is the most problematic here (the following explanation may require rough prior knowledge about the fundamental concepts of module files): It starts with 20 bytes of supposedly ASCII characters which represent the song name, followed by 15 sample headers, each containing again 22 bytes of name, a length, volume, tuning, and loop information. That is followed by the pattern order list.
That is followed by the raw pattern data, followed by the raw sample data.
For probing, libopenmpt looks at the song name, sample headers and order list and checks anything for plausibility using heuristics. We can reject some cases where values have limited allowed range, but ultimately, it ends up being just guessing. See https://source.openmpt.org/browse/openmpt/trunk/OpenMPT/soundlib/Load_mod.cpp CSoundFile::ReadM15. Whatever smart heuristics we come up with, this prohibits any kind of early-reject, and is basically guaranteed to trigger false positives in some cases.

As we are (for historic reasons) currently using the same code paths for probing as we are using to actually load a module file and just return early after the initial checks, we still instantiate all kinds of internal structures which involves multiple memory allocations before even looking at the file data at all. In theory, we could improve this by duplicating the probing code into some explicit probing functionality, but this is probably not something we will do in the short term, and may not even do at all.


Another possibility might be to explicitly check for some magic bytes that turn out to be too often wrongly detected as module files directly in the ffmpeg libopenmpt demuxer, before even calling openmpt_could_open_propability() at all, in case the libopenmpt probing turns out to be too slow to be acceptable for ffmpeg.


In case libopenmpt tends to detect too many false-positives, this could be handled by tuning the relative probing scores of libopenmpt and possibly other affected demuxers.


By the way, any thought on getting openmpt into Debian? libmodplug has the
very significant advantage of only being an apt-get install away.

It's on the roadmap, and we are in contact with a fellow Debian developer, but he has been rather busy with other things.


Regards,
Jörn
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to