On 3/14/2021 7:34 AM, Anton Khirnov wrote:
Quoting James Almer (2021-03-12 18:24:47)

The padding exists AFAIK because something, like an optimized bitstream
reader that tries to process several bytes at the same time, may end up
reading or writing pass the reported end of the buffer.
That means that if reading and it's not all zeroes, it could in theory
have unpredictable results. Hence why everything always zeroes the
padding, even av_shrink_packet().

On that topic, I've been wondering about eliminating this requirement.
Does anyone know which code it is precisely that depends on the padding
being zeroed? Is this optimization really worth it?
It seems rather silly to jump through all these hoops for an
unmeasurable benefit in one decoder.

Some subtitle demuxers didn't look at pkt->size and depended on the padding to work as a 0 string terminator, but Marton fixed that in a patchset sent yesterday.

Beyond that, i think the v210 decoder and encoder read and write past the end of the buffer if you use the simd functions.


It would also give us zero-copy packet splitting.


And yes, what you describe is what some bitstream filters and the
matroska demuxer do. They just create several packet references pointing
to the same data buffer, but using different offsets for the data
pointer. They all have "padding", but in many cases said padding is the
beginning of the payload of another packet, and that's not ideal.



I'm sure there are other similar valid scenarios where attempting to
shrink a non writable packet is not "bad API usage".


If you want to shrink a writable packet, then maybe you don't even
need zero padding, because the buffer possibly already contains some
defined value, and the main reason of zero padding is avoiding
reading uninitialized data...

If i allocate a payload of size 1024, ultimately fill only 512 bytes,
then resize it to that value (typical scenario in lavf demuxers), if i
don't zero the bytes after the 512 offset then they will remain
uninitialized.

I am not sure I see your point here, if the data after the padding is
uninitalized, that is not a problem. I made a typo above, and meant
non-writable packet in my comment. And my reasoning is that if a packet
is non-writable, it already contains initialized data with a zero
padding. If you shrink that by reducing pkt->size only, you will not
have uninitialized data, only the padding will not be zeroed. And that
is preferable to copying the data only for zeroing the padding, because
- as far as I know - the padding does not have to be zeroed, it only
have to be initialized.

I agree that it is not nice that av_shrink_packet() writes something to
the packet, people may not think about it and misuse it instead of
directly decreaseing pkt->size when they need a partial packet. That is
why I suggested the assert(). One might argue that it kind of breaks
behaviour, but I'd say it does not break it too much, and writing to a
non-writable packet was broken in the first place.

Alternatively we can change av_shrink_packet() to only zero the padding
if the packet is writable, which has the benefit that it will do what
people generally expect, no matter if you throw a writable or a
non-writable packet to it.

A third alternative is to introduce void av_shrink_packet2() in which
you explicitly document that a writable packet is needed and do the
assert there, and deprecate av_shrink_packet().

Not a fan of functions with a 2 suffix. And to be honest, I really don't
care about what we do with av_shrink_packet().
Do you want to keep it around? Ok. Want to deprecate and remove it?
Better. Want to add an assert to it? Not a fan and it may result in
unexpected crashes for library users, but whatever.

I would suggest keeping av_shrink_packet() with a big scary warning that
it does not check for ownership and it is the caller's responsibility to
ensure that writing to the packet is safe.

If we can remove the zero-padding requirement, or the padding requirement altogether, then that would no longer be necessary.


Also, I'd like to emphasise that av_*_is_writable() is not gospel. It is
merely a convention that is used "by default", when you don't have more
accurate information.
Some bits of code may use other conventions and consider a buffer
writable even if av_buffer_is_writable() returns 0. For example this is
at the core of frame threading, where a reference to a frame currently
being decoded is propagated to other threads before decoding finishes.
The owner thread then writes to the frame because frame-mt conventions
allow it to, even though there are multiple references to the frame.


_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to