On Sat, 5 Mar 2022 at 02:02, Tim Chase <python.l...@tim.thechases.com> wrote: > > On 2022-03-04 11:55, Chris Angelico wrote: > > In MS-DOS, it was perfectly possible to have spaces in file names > > DOS didn't allow space (0x20) in filenames unless you hacked it by > hex-editing your filesystem (which I may have done a couple times). > However it did allow you to use 0xFF in filenames which *appeared* as > a space in most character-sets.
Hmm, I'm not sure which APIs worked which way, but I do believe that I messed something up at one point and made a file with an included space (not FF, an actual 20) in it. Maybe it's something to do with the (ancient) FCB-based calls. It was tricky to get rid of that file, though I think it turned out that it could be removed by globbing, putting a question mark where the space was. (Of course, internally, MS-DOS considered that the base name was padded to eight with spaces, and the extension padded to three with spaces, so "READ.ME" would be "READ\x20\x20\x20\x20ME\x20", but that doesn't count, since anything that enumerates the contents of a directory would translate that into the way humans think of it.) > I may have caused a mild bit of consternation in school computer labs > doing this. ;-) Nice :) > > Windows forbade a bunch of characters in file names > > Both DOS and Windows also had certain reserved filenames > > https://www.howtogeek.com/fyi/windows-10-still-wont-let-you-use-these-file-names-reserved-in-1974/ > > that could cause issues if passed to programs. Yup. All because, way back in the day, they didn't want to demand the colon. If you actually *want* to use the printer device, for instance, you could get a hard copy of a directory listing like this: DIR >LPT1: and it's perfectly clear that you don't want to create a file called "LPT1", you want to send it to the printer. But noooooo it had to be that you could just write "LPT1" and it would go to the printer. > To this day, if you poke around on microsoft.com and change random > bits of URLs to include one of those reserved filenames in the GET > path, you'll often trigger a 5xx error rather than a 404 that you > receive with random jibberish in the same place. > > https://microsoft.com/…/asdfjkl → 404 > https://microsoft.com/…/lpt1 → 5xx > https://microsoft.com/…/asdfjkl/some/path → 404 > https://microsoft.com/…/lpt1/some/path → 5xx > > Just in case you aspire to stir up some trouble. > In theory, file system based URLs could be parsed such that, if you ever hit one of those, it returns "Directory not found". In practice... apparently they didn't do that. As a side point, I've been increasingly avoiding any sort of system whereby I take anything from the user and hand it to the file system. The logic is usually more like: If path matches "/static/%s": 1) Get a full directory listing of the declared static-files directory 2) Search that for the token given 3) If not found, return 404 4) Return the contents of the file, with cache markers Since Windows will never return "lpt1" in that directory listing, I would simply never find it, never even try to open it. This MIGHT be an issue with something that accepts file *uploads*, but I've been getting paranoid about those too, so, uhh... my file upload system now creates URLs that look like this: https://sikorsky.rosuav.com/static/upload-49497888-6bede802d13c8d2f7b92ca9fac7c That was uploaded as "pie.gif" but stored on the file system as ~/stillebot/httpstatic/uploads/49497888-6bede802d13c8d2f7b92ca9fac7c with some metadata stored elsewhere about the user-specified file name. So hey, if you were to try to upload a file that had an NTFS invalid character in it, I wouldn't even notice. Maybe I'm *too* paranoid, but at least I don't have to worry about file system attacks. ChrisA -- https://mail.python.org/mailman/listinfo/python-list