On 2003/12/13, at 5:51, Ilia Alshanetsky wrote:

How about we add mb_fgetcsv(), which would have full multi-byte support
(including delimeters). I'd imagine for people who need to parse multi-byte
csv files, full functionality is more important then speed. As for the
fgetcsv() in ext/standard/, we can port the 4.3.X code (copy & paste really)
and let PHP 5 users benefit from a faster fgetcsv() for common applications.
What do you think?

I disagree, because of the following reasons:


1) Not a few people *actually* use fgetcsv() commonly
   with multibyte characters indeed. Regarding this,
   applications made by those who don't use
   such characters don't (and won't) use multibyte specific
   functions and that's the problem. This greatly prevents
   them from being portable.
2) IMO speed is not a key factor here. People rather wants
   trust-worthy behaviour.
3) fgetcsv() implementation in the stable branch is
   now too complicated to add a new feature to
   and also hard to maintain. We should be able to
   eliminate the mblen() calls for acceptable performance.
   See the attached result.

Moriyoshi

p.s. fgetcsv() in the stable branch still seems to segfault with
     the attached test case (segfault.php.txt).

[The benchmark result]
My code with mblen() (on php5-csv):

real    0m1.389s
user    0m1.330s
sys     0m0.060s

Ditto without mblen():

real    0m0.396s
user    0m0.350s
sys     0m0.040s

Your code (on php4-csv):

real    0m0.332s
user    0m0.270s
sys     0m0.060s

<?php
$file = '/tmp/test.csv';
$fp = fopen($file, 'w');
fwrite($fp, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb, 
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,ddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd\n");
fclose($fp);
for ($i = 0; $i < 4000; $i++) {
                $fp = fopen($file, 'r');
                fseek($fp, SEEK_SET, 0);
                fgetcsv($fp, filesize($file));
                fclose($fp);
}
?>
Index: ext/standard/php_string.h
===================================================================
RCS file: /repository/php-src/ext/standard/php_string.h,v
retrieving revision 1.83
diff -u -r1.83 php_string.h
--- ext/standard/php_string.h   10 Dec 2003 21:23:35 -0000      1.83
+++ ext/standard/php_string.h   12 Dec 2003 21:16:09 -0000
@@ -144,15 +144,7 @@
 #define strerror php_strerror
 #endif
 
-#ifndef HAVE_MBLEN
-# define php_mblen(ptr, len) 1
-#else
-# if defined(_REENTRANT) && defined(HAVE_MBRLEN) && defined(HAVE_MBSTATE_T)
-#  define php_mblen(ptr, len) ((ptr) == NULL ? mbsinit(&BG(mblen_state)): 
(int)mbrlen(ptr, len, &BG(mblen_state)))
-# else
-#  define php_mblen(ptr, len) mblen(ptr, len)
-# endif
-#endif
+#define php_mblen(ptr, len) 1
 
 void register_string_constants(INIT_FUNC_ARGS);
 
<?php
$file = '/tmp/test.csv';
$fp = fopen($file, 'w');
fwrite($fp, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb, 
cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc,
 
\"ddddddddddddd\\\"dddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd\"\n");
fclose($fp);
$fp = fopen($file, 'r');
fseek($fp, SEEK_SET, 0);
fgetcsv($fp, filesize($file));
fclose($fp);
?>


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to