https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81114
Bug ID: 81114
Summary: GNAT mishandles filenames with UTF8 chars on
case-insensitive filesystems
Product: gcc
Version: 8.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: ada
Assignee: unassigned at gcc dot gnu.org
Reporter: simon at pushface dot org
Target Milestone: ---
Build: x86_64-apple-darwin16
Created attachment 41575
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41575&action=edit
Demonstrator (with BOM)
The attached demonstrator contains two files, each with a UTF8
BOM. One file, pack3_user.adb, contains
with Páck3;
procedure Pack3_User is
begin
null;
end Pack3_User;
while the other, páck3.ads, contains just
package Páck3 is
end Páck3;
There is no problem compiling on Linux (Debian Jessie). However, on
Darwin and Windows, we get
$ gnatmake -c -f pack3_user.adb
gcc -c pack3_user.adb
gnatmake: "p?ck3.ads" not found
This is perhaps partly explained by looking at pack3_user.ali:
====================
V "GNAT Lib v8"
M P W=8
P ZX
RN
U pack3_user%b pack3_user.adb be67fdbd NE OO SU
W pUe1ck3%s p?ck3.ads p?ck3.ali [A]
D p?ck3.ads 20170615165452 7221d8b1 páck3%s [B]
D pack3_user.adb 20170616143450 cc46250c pack3_user%b
D system.ads 20161018202953 085b6ffb system%s
X 1 páck3.ads [C]
[...]
====================
from which ([A], [B]) it is clear that GNAT is sometimes confused
about the file names.
Interestingly, sometimes it gets it right (last component on [B],
[C]).
The ALI file is written by Lib.Writ.Write_ALI. In two places it says
if not File_Names_Case_Sensitive then
Get_Name_String (Fname);
To_Lower (Name_Buffer (1 .. Name_Len)); <<<<<<<<<
Fname := Name_Find;
end if;
which is clearly the Wrong Thing to do if the file name is not
ASCII. In the ALI file above, the small-a-acute, which should be
encoded as C3 A1, has been rendered as E3 A1.
Using the undocumented env var GNAT_FILE_NAME_CASE_SENSITIVE alters
things:
$ GNAT_FILE_NAME_CASE_SENSITIVE=1 gnatmake -c -f pack3_user.adb
gcc -c pack3_user.adb
gcc -c páck3.ads
so it's clear that the problem lies in this region.
Interestingly, [B] and [C] above show that the compiler does
understand how to low-case extended characters in strings. I haven't
yet been able to find where this is done.