[pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files

Laurențiu Leahu-Vlăducu Fri, 14 Feb 2025 08:07:07 -0800

This patch series fixes bug #3256:

1. It ensures that general config files (e.g. storage.cfg) are decoded
   from UTF-8 when deserialized. Previously, no decoding happened,
   meaning that Perl interpreted the string as single bytes instead of
   Unicode code points. Note: while I would have preferred to decode
   the text right after reading from the file, there are some Perl
   functions like Digest::SHA::sha1_hex that expect bytes
   instead of UTF-8.


2. It ensures that general config files are explicitly encoded
   as UTF-8 before serialization to prevent similar issues the other
   way around.

3. It adds a unit test to prevent similar issues from happening in
   the future.

4. It fixes the PBS storage plugin for serializing/deserializing the
   password, similar to points 1 and 2, but for the case where the
   password itself contains Unicode characters.

For more information on this topic, please read:
https://perldoc.perl.org/perlunifaq#When-should-I-decode-or-encode?

I'm sending this patch series to begin a discussion on how to handle
encodings in our config files, and eventually also other relevant
files. In my opinion, we should handle them consistently as UTF-8,
also over both Perl and Rust code.

Due to the fact that Linux uses UTF-8 encoding by default since
a long time, as well as browsers* and other software, I doubt that
we have to worry too much about other encodings
like Latin-1 (ISO-8859-1). However, according to the
Perl documentation, Perl could have deserialized such a string
in the past (since it's the default in Perl when not decoding
explicitly), and it is no longer able to after the fixes included
in this patch series.

We have to ask ourselves:

a. Do we want to define, in general, that configuration files should
   always be serialized and deserialized as UTF-8? If yes, should we
   consider this a breaking change?

b. Do we want to introduce any backward-compatibility for existing
   config files? In other words, assume that older files might have
   used other encodings in the past. To be honest, I didn't test
   Latin-1 encoded files yet, so I'm not sure how (or if) our
   current code would handle it.

There are further parsers and plugins that I still need to modify,
but I first wanted to get your feedback on this subject.


* With browsers I mean the encoding in HTML and not the JavaScript
internals with its UTF-16 encoding.


pve-common:

Laurențiu Leahu-Vlăducu (2):
  fix #3256: SectionConfig: ensure UTF-8 encoding for general configs
  SectionConfig: add unit test for UTF-8 configs

 src/PVE/SectionConfig.pm    | 10 +++++++---
 test/section_config_test.pl | 25 +++++++++++++++++++++++++
 2 files changed, 32 insertions(+), 3 deletions(-)


pve-storage:

Laurențiu Leahu-Vlăducu (1):
  fix #3256: Storage: PBS: ensure passwords are saved and loaded as
    UTF-8

 src/PVE/Storage/PBSPlugin.pm | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

-- 
2.39.5



_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

[pve-devel] [RFC PATCH pve-storage/common] fix #3256: allow special characters in storage-related config files

Reply via email to