Am 14.02.25 um 16:40 schrieb Laurențiu Leahu-Vlăducu: > > This patch series fixes bug #3256: > > 1. It ensures that general config files (e.g. storage.cfg) are decoded > from UTF-8 when deserialized. Previously, no decoding happened, > meaning that Perl interpreted the string as single bytes instead of > Unicode code points. Note: while I would have preferred to decode > the text right after reading from the file, there are some Perl > functions like Digest::SHA::sha1_hex that expect bytes > instead of UTF-8.
What about pre-existing configs that are not UTF-8? Not breaking those is very important here. > > 2. It ensures that general config files are explicitly encoded > as UTF-8 before serialization to prevent similar issues the other > way around. > > 3. It adds a unit test to prevent similar issues from happening in > the future. > > 4. It fixes the PBS storage plugin for serializing/deserializing the > password, similar to points 1 and 2, but for the case where the > password itself contains Unicode characters. > > For more information on this topic, please read: > https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode? > > I'm sending this patch series to begin a discussion on how to handle > encodings in our config files, and eventually also other relevant > files. In my opinion, we should handle them consistently as UTF-8, > also over both Perl and Rust code. Yes, that is the long-term plan AFAIK, but right now existing config files might be encoded differently. > > Due to the fact that Linux uses UTF-8 encoding by default since > a long time, as well as browsers* and other software, I doubt that > we have to worry too much about other encodings > like Latin-1 (ISO-8859-1). However, according to the > Perl documentation, Perl could have deserialized such a string > in the past (since it's the default in Perl when not decoding > explicitly), and it is no longer able to after the fixes included > in this patch series. Unfortunately, we do. E.g. > [I] root@pve8a1 ~# pct set 112 --mp1 /root/ö,mp=/o > [I] root@pve8a1 ~# file /etc/pve/lxc/112.conf > /etc/pve/lxc/112.conf: ISO-8859 text > > We have to ask ourselves: > > a. Do we want to define, in general, that configuration files should > always be serialized and deserialized as UTF-8? If yes, should we > consider this a breaking change? Yes, see above. > > b. Do we want to introduce any backward-compatibility for existing > config files? In other words, assume that older files might have > used other encodings in the past. To be honest, I didn't test > Latin-1 encoded files yet, so I'm not sure how (or if) our > current code would handle it. Yes, we certainly need to. > > There are further parsers and plugins that I still need to modify, > but I first wanted to get your feedback on this subject. > > > * With browsers I mean the encoding in HTML and not the JavaScript > internals with its UTF-16 encoding. > > _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel