Hi,

Using PHP 4.0.4pl1 (I did not try with PHP 4.0.5).

While parsing a large file (>100MB) and using the following (simplified)
piece of code:

while(($fbuffer=fread($fp,4096)) && $continue){
    //
    // ...
    //
  if(xml_parse($xml_parser,$fbuffer,feof($fp))){
  } else {
      // report error
  }  
  if(...){
    $continue=false;
  }
}

I discover discrepancies in the final result. Some elements or attributes
were lost and some data was truncated. It (obviously) looked like being a
so-called "random" behavior. I did install some debug code to eventually
arrive at the conclusion that there could be an issue when the limit of the
read buffer ends up right in the middle of a tag or data. I modified the
code as follows:

$unparsed_buffer="";
while(($fbuffer=fread($fp,4096)) && $continue){
    //
    // ...
    //
  $fbuffer=$unparsed_buffer.$fbuffer;
  $last_tag_pos=strrpos($fbuffer,">");
  $unparsed_buffer=substr($fbuffer,$last_tag_pos+1);
  $fbuffer=substr($fbuffer,0,$last_tag_pos+1);
   //
  if(xml_parse($xml_parser,$fbuffer,feof($fp))){
  } else {
      // report error
  }  
  if(...){
    $continue=false;
  }
}

That solved the problems I was having.

My feeling is that when you pass "false" in the "isFinal" 3rd parameter to
xml_parse it means "don't scream if the input ends up right in the middle of
a structure" or other similar XML level object, but you have to pass a
buffer that ends up "correctly" (for instance, not cutting "<element_name>"
into "<eleme" and "nt_name").

Or did I miss a piece of documentation somewhere?

--
Dominique Hermsdorff

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to