This patch series fixes bug #3256: 1. It ensures that general config files (e.g. storage.cfg) are decoded from UTF-8 when deserialized. Previously, no decoding happened, meaning that Perl interpreted the string as single bytes instead of Unicode code points. Note: while I would have preferred to decode the text right after reading from the file, there are some Perl functions like Digest::SHA::sha1_hex that expect bytes instead of UTF-8.
2. It ensures that general config files are explicitly encoded as UTF-8 before serialization to prevent similar issues the other way around. 3. It adds a unit test to prevent similar issues from happening in the future. 4. It fixes the PBS storage plugin for serializing/deserializing the password, similar to points 1 and 2, but for the case where the password itself contains Unicode characters. For more information on this topic, please read: https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode? I'm sending this patch series to begin a discussion on how to handle encodings in our config files, and eventually also other relevant files. In my opinion, we should handle them consistently as UTF-8, also over both Perl and Rust code. Due to the fact that Linux uses UTF-8 encoding by default since a long time, as well as browsers* and other software, I doubt that we have to worry too much about other encodings like Latin-1 (ISO-8859-1). However, according to the Perl documentation, Perl could have deserialized such a string in the past (since it's the default in Perl when not decoding explicitly), and it is no longer able to after the fixes included in this patch series. We have to ask ourselves: a. Do we want to define, in general, that configuration files should always be serialized and deserialized as UTF-8? If yes, should we consider this a breaking change? b. Do we want to introduce any backward-compatibility for existing config files? In other words, assume that older files might have used other encodings in the past. To be honest, I didn't test Latin-1 encoded files yet, so I'm not sure how (or if) our current code would handle it. There are further parsers and plugins that I still need to modify, but I first wanted to get your feedback on this subject. * With browsers I mean the encoding in HTML and not the JavaScript internals with its UTF-16 encoding. pve-common: Laurențiu Leahu-Vlăducu (2): fix #3256: SectionConfig: ensure UTF-8 encoding for general configs SectionConfig: add unit test for UTF-8 configs src/PVE/SectionConfig.pm | 10 +++++++--- test/section_config_test.pl | 25 +++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 3 deletions(-) pve-storage: Laurențiu Leahu-Vlăducu (1): fix #3256: Storage: PBS: ensure passwords are saved and loaded as UTF-8 src/PVE/Storage/PBSPlugin.pm | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- 2.39.5 _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel