Dan B. wrote: > Bob Proulx wrote: > >So as you can see whitespace isn't safe to use in URLs. This is > >basically the same as for Unix filenames. > > They're not quite the same:
Not quite the same is basically the same here. :-) The question of the topic was: ... what about urls? They come from the Unix world, and are full of underscores and question marks and equal signs. Then there are emails, all of which require the @ sign. Not complaining, just asking. I think "basically the same" describes things adequately. People with a Unix background wouldn't normally include spaces in either of file names or URLs, or other related "handles" to data. If you do then they are much more of a pain to manipulate in shell scripts. And so you just don't do it and don't think about whether it is technically possible or not. A lot of scripts don't handle whitespace because there wasn't a need to put the effort into making them handle whitespace. They were good enough for the task regardless. > In URIs, it's not that whitespace "isn't safe to use"; it's simply > that whitespace is not allowed, period. (Yes, encodings of whitespace > characters are allowed, but that encoding still contains no actual > whitespace characters.) No. Actually it was exactly that, "unsafe". *Exactly* as I said. RFC 1738 "The space character is unsafe because ..." Literally they are documented as being "unsafe". Later RFCs have clarified this somewhat. But regardless of being unsafe most software does actually allow them. (I sometimes see them inappropriately used in slug lines.) wget -O- "http://www.example.com/one two three.html" And even though the space hasn't been included in the possible characters RFC 3986 includes this statement: Using <> angle brackets around each URI is especially recommended as a delimiting style for a reference that contains embedded whitespace. > Additionally, various other syntaxes and protocols build on that > consistently (e.g., since URIs can never contain space characters, > HTTP uses space characters as delimiters around URI references). There is a difference between the URL containing something and interpreting the start and end of the URL from context. RFC 3986 describes this in detail. > Unfortunately, on the other hand, Unix filenames have no > corresponding specification, at least one that is followed > consistently. The kernel and file systems allow spaces, and > some utilities/commands/scripts/etc. do, but many don't. The Unix filesystem allows all characters except for the zero character. Because the zero character delimits the end of the string it cannot be used in the string. And of course the '/' is used as a directory separator. If an application doesn't allow other characters then it is arguably a bug in that application. (However the application may document its limitations and stop there.) Core utilities will of course be okay but I am sure that fringe applications have bugs in them. Bob
signature.asc
Description: Digital signature