On Tue, Sep 14, 2021 at 06:19:03PM +0000, Glenn Washburn wrote: > On Tue, 14 Sep 2021 16:27:55 +0200 > Daniel Kiper <dki...@net-space.pl> wrote: > > > On Fri, Sep 10, 2021 at 04:03:23PM +0000, Glenn Washburn wrote: > > > This code was broken by commit 3f05d693 ("malloc: Use overflow > > > checking primitives where we do complex allocations"), which added > > > overflow checking in many areas. The problem here is that the > > > changes update the local variable sz, which was already in use and > > > which was not updated before the change. So the code using sz was > > > getting a different value of than it would have previously for the > > > same UDF image. This causes the logic getting the destination of > > > the symlink to not realize that its gotten the full destination, > > > but keeps trying to read past the end of the destination. The bytes > > > after the end are generally NULL padding bytes, but that's not a > > > valid component type (ECMA-167 14.16.1.1). So grub_udf_read_symlink > > > branches to error logic, returning NULL, instead of the symlink > > > destination path. > > > > > > The result of this bug is that the UDF filesystem tests were > > > failing in the symlink test with the grub-fstest error message: > > > > > > grub-fstest: error: cannot open `(loop0)/sym': invalid symlink. > > > > > > This change stores the result of doubling sz in another local > > > variable s, so as not to modify sz. Also remove unnecessary > > > grub_add, which increased the output by 1 to account for a NULL > > > byte. This isn't needed because an output buffer of size twice sz > > > is already guaranteed to be more than enough to contain the path > > > components converted to UTF-8. The worst case upper- bound for the > > > needed output buffer size is (sz-4)*1.5, where 4 is the size > > > > I think 4 comes from ECMA-167 spec. Could you add a reference to it > > here? The number of paragraph would be perfect... > > Its 14.16.1 basically in the same place as the reference earlier, which > is why I didn't include it. But, yes, I can include it.
Yes, please. > > > of a path component header and 1.5 is the maximum growth in bytes > > > when converting from 2-byte unicode code-points to UTF-8 (from 2 > > > bytes to 3). > > > > Could you explain how did you come up with the 1.5 value? It would be > > nice if you refer to a spec or something like that. > > There is no spec that I know of (but would be happy to know of one). Its > something I've deduced based on my understanding of Unicode, UTF-8, and > UTF-16. All unicode code points less than or equal to 2 bytes (code > points <0x10000) can be represented in UTF-8 by a maximum of 3 bytes > [1]. Longer UTF-16 encodings don't matter because those will be 4 bytes > or longer. The maximum number of bytes for a UTF-8 encoding of a unicode The [1] says: Since Errata DCN-5157, the range of code points was expanded to all code points from Unicode 4.0 (or any newer or older version), which includes Plane 1-16 characters such as Emoji. So, I think your assumption about longer encodings is incorrect for the UDF. > code point is 4 bytes. So the longer UTF-16 encodings can only be equal > to or longer than the UTF-8 encoding, thus the UTF-16 -> UTF-8 would be > shrinking or maintaining the length of the original byte string. Since > the worst case growth is 2 bytes to 3, that's 1.5 times the original > string size. QED. > > Do you want all that in there? I could just remove that part about the > 1.5 too. > > Here's an SO question that addresses this. Yes, unofficial, but I think > adds some weight as to the correctness of the logic above. > > https://stackoverflow.com/questions/55056322/maximum-utf-8-string-size-given-utf-16-size > > Glenn > > [1] https://en.wikipedia.org/wiki/UTF-8#Encoding Daniel [1] https://en.wikipedia.org/wiki/Universal_Disk_Format#Character_set _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel