Edit report at https://bugs.php.net/bug.php?id=48507&edit=1
ID: 48507
Comment by: max dot wildgrube at web dot de
Reported by: krynble at yahoo dot com dot br
Summary: fgetcsv() ignoring special characters
Status: Bogus
Type: Bug
Package: Filesystem function related
Operating System: Unix
PHP Version: 5.*
Block user comment: N
Private report: N
New Comment:
The problem does also appears if the special char is preceded by a blank. This
blank also disappears.
I use this ugly workaround:
1. first reading the complete csv file into a variable: $import
2. $import = preg_replace ("{(^|\t)([â¬-ÿ ])}m", "$1~~$2", $import);
3. after fgetcsv; for each $field of the row array: $field = str_replace ('~~',
'', $field);
This means: before using fgetcsv inserting a magic sequence (e.g. ~~) on the
beginning of a field which begins with a blank or a special char; after parsing
with fgetcsv removing it from each field.
Max.
Previous Comments:
------------------------------------------------------------------------
[2011-07-08 08:39:50] php-bug-48507 at bsrealm dot net
This IS a bug. Whatever locale is, I expect this function to read everything
between delimiter characters without stripping the contents. Besides, docs say
that files in one-byte encoding would read wrong, and there is a different
case. This bug causes serious portability issue. In my case, this function was
used to read custom database that was storing descriptions entered by users.
Some descriptions were in utf-8 enconding. Function just had to read whatever
was between delimiter characters and it worked like that on Windows hosting and
stopped working after moving to Unix hosting. Note, file itself is not utf-8
encoded and it should not be. It is not related to locale. It must read data,
even if it's binary, between delimiters.
------------------------------------------------------------------------
[2011-02-26 02:46:32] gjorgjioski at gmail dot com
This is short example:
kategorija Å¡irina platiÅ¡Ä Å¡tevilo
read:
kategorija
irina platiÅ¡Ä
tevilo
expected:
kategorija
Å¡irina platiÅ¡Ä
število
------------------------------------------------------------------------
[2011-02-26 02:36:32] gjorgjioski at gmail dot com
This bug occurs also when file is in UTF8 (tab delimited file using Å¡,Ä
characters). I can provide an example.
------------------------------------------------------------------------
[2010-05-19 13:39:52] pahan at hubbitus dot spb dot su
> Quote from the docs:
> Note: Locale setting is taken into account by this function. If LANG is e.g.
> en_US.UTF-8, files in one-byte encoding are read wrong by this function.
Ok, bug documented as "are read wrong by this function" is better then nothing.
But do you plan fix this wrong behaviour?
------------------------------------------------------------------------
[2010-05-18 11:03:42] [email protected]
Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php
Quote from the docs:
Note: Locale setting is taken into account by this function. If LANG is e.g.
en_US.UTF-8, files in one-byte encoding are read wrong by this function.
------------------------------------------------------------------------
The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
https://bugs.php.net/bug.php?id=48507
--
Edit this bug report at https://bugs.php.net/bug.php?id=48507&edit=1