> On 9 Oct 2024, at 04:49, Tatsuo Ishii <is...@postgresql.org> wrote:
> Besides nbsp, there are tons of confusing Unicode > characters out there. For example there are many "hyphen like > characters". Using characters which look alike is in the field of internet security known as homograph attacks, where for example a url visually passes for postgresql.org but in fact leads to an attacker. That sort of attack clearly doesn't apply to our docs though. However, what might cause similar problems is if we use a unicode character in example code which the reader could be expected to copy/paste into psql and run which then (at best) cause a syntax error. We could probably build tooling to catch this (most likely not too hard in XSLT) but the ROI for that might be unfavourable. Even with tooling, committer caution is needed to ensure we don't publish examples that might cause unintended side effects when executed by copy/paste. What separates nbsp is that it may affect the rendering in an un-intuitive way by forcing two words to not break even if the viewport is too narrow to fit. Catching such characters seems wortwhile since it's also quite doable with a trivial grep. -- Daniel Gustafsson [0] https://en.wikipedia.org/wiki/IDN_homograph_attack