Edit report at https://bugs.php.net/bug.php?id=55763&edit=1
ID: 55763
Comment by: alotacents at gmail dot com
Reported by: talk at alexmingoia dot com
Summary: str_getcsv incorrectly handles line-breaks inside
fields
Status: Open
Type: Bug
Package: Strings related
Operating System: OS X 10.6
PHP Version: 5.3.8
Block user comment: N
Private report: N
New Comment:
to split the string in to record lines I used a regular expression that makes
sure not to split inside of double quotes instead of using the str_getcsv. Then
I used the str_getcsv on the line.
example
$s2=<<<EOD
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
EOD;
lines = preg_split('/[\r\n]{1,2}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/',$s2);
it outputs
Array (
[0] => Year,Make,Model,Description,Price
[1] => 1997,Ford,E350,"ac, abs, moon",3000.00
[2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
[3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
[4] => 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00
)
to further convert
$data = array();
foreach($lines as $row) {
$data[] = str_getcsv($row);
}
print_r($data);
which will output
Array (
[0] => Array (
[0] => Year
[1] => Make
[2] => Model
[3] => Description
[4] => Price
)
[1] => Array (
[0] => 1997
[1] => Ford
[2] => E350
[3] => ac, abs, moon
[4] => 3000.00
)
[2] => Array (
[0] => 1999
[1] => Chevy
[2] => Venture "Extended Edition"
[3] =>
[4] => 4900.00
)
[3] => Array (
[0] => 1999
[1] => Chevy
[2] => Venture "Extended Edition, Very Large"
[3] =>
[4] => 5000.00
)
[4] => Array (
[0] => 1996
[1] => Jeep
[2] => Grand Cherokee
[3] => MUST SELL! air, moon roof, loaded
[4] => 4799.00
)
)
Previous Comments:
------------------------------------------------------------------------
[2012-04-27 03:11:17] darren at dcook dot org
The problem can also be shown with the example from the Wikipedia page
(http://en.wikipedia.org/wiki/Comma-separated_values):
$s2=<<<EOD
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
EOD;
$lines=str_getcsv($s2,"\n");
print_r($lines);
It outputs:
Array
(
[0] => Year,Make,Model,Description,Price
[1] => 1997,Ford,E350,"ac, abs, moon",3000.00
[2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
[3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
[4] => 1996,Jeep,Grand Cherokee,"MUST SELL!
[5] => air, moon roof, loaded",4799.00
)
But it should output:
Array
(
[0] => Year,Make,Model,Description,Price
[1] => 1997,Ford,E350,"ac, abs, moon",3000.00
[2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
[3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
[4] => 1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
)
------------------------------------------------------------------------
[2011-09-22 16:45:02] talk at alexmingoia dot com
Sorry... expected output should be
array(4) {
[0]=>
string(15) "Name,Desc,Email"
[1]=>
string(4) "Alex"
[2]=>
string(18) "Is a PHP
developer
"
[3]=>
string(16) "[email protected]"
}
------------------------------------------------------------------------
[2011-09-22 16:41:15] talk at alexmingoia dot com
Description:
------------
RFC4180 states that fields can contain line breaks as long as they are properly
enclosed by double-quotes.
str_getcsv treats line-breaks inside of enclosed fields as new records in the
CSV.
Setting 'auto_detect_line_ending' to TRUE or using "\r\n" instead of "\n" still
produces incorrect results.
Test script:
---------------
$csv = file_get_contents('test.csv');
$csvArray = str_getcsv($csv, "\n");
var_dump($csvArray);
Expected result:
----------------
array(4) {
[0]=>
string(15) "Name,Desc,Email"
[1]=>
string(4) "Alex"
[2]=>
string(18) "Is a PHP developer"
[3]=>
string(16) "[email protected]"
}
Actual result:
--------------
array(4) {
[0]=>
string(15) "Name,Desc,Email"
[1]=>
string(14) "Alex,"Is a PHP"
[2]=>
string(9) "developer"
[3]=>
string(17) ",[email protected]"
}
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=55763&edit=1