Re: Relaxing the restrictions for store item names

Kaelyn Fri, 25 Aug 2023 09:33:28 -0700

Hi,

A couple of small early-morning (for me) comments below... not for or against 
the idea of percent encoding, but as a little bit of food for thought while 
pondering how to handle Unicode in package names and/or store paths.

On Friday, August 25th, 2023 at 2:01 PM, Eidvilas Markevičius 
<markeviciuseidvi...@gmail.com> wrote:

> Although now, just a few hours later, I'm having second thoughts on
> this. When you really think about it, it's very unlinkely that some
> user would prefer typing something like
> 
> guix install 
> %D0%B8%D0%BC%D0%B0%D0%B3%D0%B8%D0%BD%D0%B0%D1%80%D0%B8-%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC
> 
> over
> 
> guix install имагинари-програм

I imagine that, for usability, the percent encoding (or other encoding or 
transliteration) of non-ASCII characters could be handled transparently, i.e. 
for "guix install имагинари-програм", guix would translate "имагинари-програм" 
to the encoded form for operations. And if the escape character (e.g. the "%" 
in percent encoding) isn't also a valid character for store or package names 
then the values can be handled transparently. For example, both "guix install 
git" and "guix install %67%69%74" and "guix install g%69t" would all install 
git.

> even if they don't have the russian (or whatever other language)
> keyboard layout set up on their system, so just for accessability
> purposes, the solution wouldn't be all that great.

> It would also make
> store name unnecessarily long (they're already long as is), and
> there's a 255 char limit for filenames that we have to keep in mind as
> well. Searching the store using standard utilities such as find and
> grep would too, as a consequence,

I split out the quote above as a bit of reference. While I agree that we have 
to keep in mind the 255 char limit for filenames, with percent encoding causing 
a single byte in ASCII or UTF-8 to become ~3 bytes (with iirc most non-latin 
characters having multi-byte encodings in UTF-8) and the store hashes being a 
33 byte prefix (counting the dash), 255 chars is still quite a bit. 
Specifically, the extracted quote above--without the "> " prefixes and with 
line breaks treated as single characters--is exactly 255 characters. (I find a 
bit of readable text to be helpful for wrapping my brain around a value like 
"255 characters".)

Cheers,
Kaelyn

> break... There's just too many
> problems with this.
> 
> I believe what Julien proposed is the most reasonable solution:
> unrestrict unicode characters in the store and (maybe) make it a
> project policy to not put unicode characters inside package names
> (however, personally I wouldn't be against that either).
> 
> Now ensuring that URIs don't break, especially for substitute
> provision, should also be taken into consideration, but this can be
> handled separately.
> 
> On Fri, Aug 25, 2023 at 12:14 PM Eidvilas Markevičius
> markeviciuseidvi...@gmail.com wrote:
> 
> > On Fri, Aug 25, 2023 at 11:37 AM Nathan Dehnel ncdeh...@gmail.com wrote:
> > 
> > > What you could do is implement percent encoding:
> > > https://en.wikipedia.org/wiki/Percent-encoding
> > > -Allows you to store package titles in any language in an encoded form
> > > -Allows the titles to be typed on latin keyboards
> > > -Allows the packages to be accessed through URIs in the future without
> > > causing problems
> > 
> > Now that's an idea. I didn't really thought of that. Although it'd
> > probably be trickier to implement in order to make all the tooling
> > compatible. I think that might be a good solution nonetheless.

Re: Relaxing the restrictions for store item names

Reply via email to