Morning all

Since PHP 7.1 the unpack() function has a (still undocumented) optional 3rd
argument that allows the caller to specify the offset in the input data
where parsing should start. While this is a useful feature, it is currently
impossible to know how many bytes of the input were consumed for some
format specifiers, such as Z*, f, d and anything else that does not consume
a universally constant amount of data.

It is typically possible to determine this externally, but not without some
clumsy measurements either of the returned value or (in the case of
system-dependent numeric types) inspecting the length of the string
returned by pack() for those specifiers. It can also get complicated when
using things like x and X, which adjust the offset without producing data
in the returned value.

Additionally, computing the new position in the input buffer separately
from the format string risks the two diverging if one is modified and the
other is either not updated, or updated incorrectly.

Many binary data formats are sufficiently complex that unpacking a large
structure requires multiple calls to unpack(), as often there are nuances
that cannot be directly expressed with the current specifier format, such
as strings prefixed with a length indicator.

Here is some code that demonstrates the problem:

    /* This is the only way to know for certain how big float is on the
local system */
    define('FLOAT_WIDTH', strlen(pack('f', 0.0)));

    /* an exaggerated example using two variable width codes and a code that
       does not produce output but modifies the input buffer offset */
    $pieces = unpack('f/X/Z*', $data, $offset);

    /* we now have to modify the offset before we can continue to unpack
data */
    $offset += FLOAT_WIDTH         // f
             - 1                   // x
             + strlen($pieces[3]); // Z*

I would like to look at adding a 4th optional argument, taken by-ref, which
will be populated with the number of buffer bytes consumed by the unpack()
operation. This would enable the above code to be rewritten like so:

    $pieces = unpack('f/X/Z*', $data, $offset, $consumed);
    $offset += $consumed;

Not only is this code much simpler and less susceptible to breakage, it is
(IMHO) clearer to read as well.

Does anyone have any objections to/thoughts about this? If not I will work
up a patch in the coming week.

Thanks, Chris

Reply via email to