Hi, A couple of small early-morning (for me) comments below... not for or against the idea of percent encoding, but as a little bit of food for thought while pondering how to handle Unicode in package names and/or store paths.
On Friday, August 25th, 2023 at 2:01 PM, Eidvilas Markevičius <markeviciuseidvi...@gmail.com> wrote: > Although now, just a few hours later, I'm having second thoughts on > this. When you really think about it, it's very unlinkely that some > user would prefer typing something like > > guix install > %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC > > over > > guix install имагинари-програм I imagine that, for usability, the percent encoding (or other encoding or transliteration) of non-ASCII characters could be handled transparently, i.e. for "guix install имагинари-програм", guix would translate "имагинари-програм" to the encoded form for operations. And if the escape character (e.g. the "%" in percent encoding) isn't also a valid character for store or package names then the values can be handled transparently. For example, both "guix install git" and "guix install %67%69%74" and "guix install g%69t" would all install git. > even if they don't have the russian (or whatever other language) > keyboard layout set up on their system, so just for accessability > purposes, the solution wouldn't be all that great. > It would also make > store name unnecessarily long (they're already long as is), and > there's a 255 char limit for filenames that we have to keep in mind as > well. Searching the store using standard utilities such as find and > grep would too, as a consequence, I split out the quote above as a bit of reference. While I agree that we have to keep in mind the 255 char limit for filenames, with percent encoding causing a single byte in ASCII or UTF-8 to become ~3 bytes (with iirc most non-latin characters having multi-byte encodings in UTF-8) and the store hashes being a 33 byte prefix (counting the dash), 255 chars is still quite a bit. Specifically, the extracted quote above--without the "> " prefixes and with line breaks treated as single characters--is exactly 255 characters. (I find a bit of readable text to be helpful for wrapping my brain around a value like "255 characters".) Cheers, Kaelyn > break... There's just too many > problems with this. > > I believe what Julien proposed is the most reasonable solution: > unrestrict unicode characters in the store and (maybe) make it a > project policy to not put unicode characters inside package names > (however, personally I wouldn't be against that either). > > Now ensuring that URIs don't break, especially for substitute > provision, should also be taken into consideration, but this can be > handled separately. > > On Fri, Aug 25, 2023 at 12:14 PM Eidvilas Markevičius > markeviciuseidvi...@gmail.com wrote: > > > On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel ncdeh...@gmail.com wrote: > > > > > What you could do is implement percent encoding: > > > https://en.wikipedia.org/wiki/Percent-encoding > > > -Allows you to store package titles in any language in an encoded form > > > -Allows the titles to be typed on latin keyboards > > > -Allows the packages to be accessed through URIs in the future without > > > causing problems > > > > Now that's an idea. I didn't really thought of that. Although it'd > > probably be trickier to implement in order to make all the tooling > > compatible. I think that might be a good solution nonetheless.