Edit report at https://bugs.php.net/bug.php?id=48507&edit=1
ID: 48507 Comment by: jamie dot kahgee at gmail dot com Reported by: krynble at yahoo dot com dot br Summary: fgetcsv() ignoring special characters Status: Not a bug Type: Bug Package: Filesystem function related Operating System: Unix PHP Version: 5.* Block user comment: N Private report: N New Comment: rasmus@php, eswald@middil I had the same problem, running fgetcsv from CLI showed no error and everything worked and output as expected. It was when I ran from through APACHE that I couldn't get my output to display. (same script, same file). (Ã) was the specific character I was dealing with at the start of a string that was not showing. After I tried setting my locale local in the script everything worked as expected through APACHE and my strings started parsing and displaying correctly. setlocale(LC_ALL, 'en_US.UTF-8'); Hopefully this can help you. Previous Comments: ------------------------------------------------------------------------ [2012-02-13 05:16:35] ras...@php.net eswald@middil, I am not able to reproduce your results with either en_US.UTF-8 nor C with a UTF8 input file: ~> echo $LANG en_US.UTF-8 ~> file utf8.txt utf8.txt: UTF-8 Unicode text ~> cat utf8.txt a,"a",é,"é",óú,"óú",ó&ú,"ó&ú" ~> php -r "print_r(fgetcsv(fopen('./utf8.txt','r')));" Array ( [0] => a [1] => a [2] => é [3] => é [4] => óú [5] => óú [6] => ó&ú [7] => ó&ú ) I don't see any corruption. I can understand problems with charsets that are not low-ascii compatible with a low-ascii delimiter, but I don't see why this UTF8 case would break. ------------------------------------------------------------------------ [2012-02-13 01:46:59] figura at hotbox dot ru setlocale() might solve the issue but I do not see any reason to set up dependence of this fgetcsv on locale settings. The format is straight and clear. Especially this "feature" confuses when the string is read in UTF-8 format. ------------------------------------------------------------------------ [2012-01-26 19:55:01] eswald at middil dot com Tested with LANG=C, input file encoding of UTF-8. Also tested with LANG=C, input file encoding of cp1252, with identical results, except that the output characters (what was left of them) were also cp1252. ------------------------------------------------------------------------ [2012-01-26 19:50:26] eswald at middil dot com Confirmed with php5 (5.3.6-13ubuntu3.2 on Oneiric Ocelot); can be worked around by quoting the value with quotation marks. For example, the line a,"a",é,"é",óú,"óú",ó&ú,"ó&ú" yields array ( 0 => 'a', 1 => 'a', 2 => '', 3 => 'é', 4 => '', 5 => 'óú', 6 => '&ú', 7 => 'ó&ú', ) Note the corruption in elements 2, 4, and 6, but not in their quoted counterparts 3, 5, and 7. ------------------------------------------------------------------------ [2012-01-18 11:53:48] tero dot tasanen at gmail dot com I can also confirm that this is an actual bug. File encoding UTF-8, locale settings are set correctly and characters like äöå are dropped from the beginning of the csv column. Tested with php versions 5.2.6, 5.2.10, 5.3.6 ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=48507 -- Edit this bug report at https://bugs.php.net/bug.php?id=48507&edit=1