[Github-comments] Re: [geany/geany] Geany encoding determination broken? (#2910)

Colomban Wendling via Github-comments Tue, 24 Mar 2026 04:41:42 -0700

b4n left a comment (geany/geany#2910)

@dwrs


> "It" is only a single thing - Geany is in a dire need of a switch that 
> disables auto-detection […].

Do you mean force encoding for *everything* or non-Unicode files?  For 
non-Unicode file it's already possible (and it does work), but IIUC we don't 
support the "encoding" you'd like, "ASCII with >= 0x80 bytes shown as 
placeholders", right?  I'm afraid this isn't really an encoding I know of, but 
as mentioned above you can partially trick Geany to do that -- but it'll cause 
issues.

Actually, there is no such thing as "no encoding", and "8-bit ASCII" is 
supposed to *only* have < 0x80 bytes in it -- it's just regular 7-bit ASCII 
stored in 8-bit values. So IIUC what you want to see supported is invalid 
encoding.
Fortunately (or unfortunately), most "encodings" around (e.g. ISO-8859 family) 
are hopelessly dumb: it's single 8-bit values that map to a 256-entries table. 
So everything is valid in these encodings, it's just potentially not showing 
what you'd like.
Additionally, Geany has a limitation with embedded NUL bytes (0x0), mostly due 
to the technical reasons of the C language which makes it pretty hard to handle 
those (it's totally doable, but a lot of the language's own libraries can't be 
used which means being extremely careful -- and as-is rewriting a lot of things 
inside Geany).

I'm not sure in which case not detecting UTF-8 or UTF-16 files automatically 
would be advisable *for an IDE* (as @elextr said, it's not a hex editor): those 
files are pretty strict in their structure, so are pretty unlikely to get 
misdetected but for pathological cases (extremely short, or highly unlikely 
coincidence).

Anyway, if I understand correctly, what you'd like is
1. The ability to display files with all >= 0x80 byes shown as placeholders for 
their byte value
2. The ability to force selecting this for non-Unicode files
3. The ability to force selecting this for Unicode files as well

There is a workaround for points 1 and 2 mentioned above, but it has serious 
limitations.  We *could* potentially make the opening option force this as 
well, but again, it leads to other issues.
There's no support for option 3 at the moment, but with a reasonable 
explanation on why it'd be needed, it could potentially be added -- yet again, 
I'd like to see a *real* use case where that would be helpful.

A better solution for loading your *"8-bit ASCII" files with invalid bytes* 
would be to have an encoding that can map those back and forth to Unicode 
placeholders -- because again, Geany wants the *buffer* to be UTF-8.

> Forced auto-detection might suffice for what people call an _app_, but not 
> for a _tool_, let alone a development one. It's a lightweight IDE, not an 
> Office app. We need a switch that makes Geany not trying to be a smartass 
> about our data.

I don't get this.  An IDE is for *editing code* (mostly), and code is mostly 
plain text files. I expect my IDE to *Do the Right Thing™* when I ask it to 
open a file from whichever project I need to work on, I don't want to worry 
that some silly developers though programs could be written in anything else 
than UTF-8 🙂  And for me, it's getting things done as I need them 99.99+% of 
the time -- it's just the highly occasional case of selecting the incorrect CP, 
which anyways is virtually impossible to get right 100% of the time, and here 
you'd be happy that Geany doesn't try to be too smart about it.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/2910#issuecomment-4117508998
You are receiving this because you are subscribed to this thread.

Message ID: <geany/geany/issues/2910/[email protected]>

[Github-comments] Re: [geany/geany] Geany encoding determination broken? (#2910)

Reply via email to