Conrad Herrmann created SOLR-4227:
-------------------------------------

             Summary: StreamingUpdateSolrServer does not buffer 
OutputStreamWriter with BufferedWriter, causing encoding explosion
                 Key: SOLR-4227
                 URL: https://issues.apache.org/jira/browse/SOLR-4227
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 3.2
         Environment: Java 1.6, Linux.  I am running SOLR 3.2, but the code 
doesn't seem different in 3.5.
            Reporter: Conrad Herrmann


org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer line 112 is:
  OutputStreamWriter writer = new OutputStreamWriter(out, "UTF-8");
and then we call
  req.writeXML( writer ); 
Because the writer is not buffered, this causes the XML writer to call the 
UTF-8 encoder for each atom being written, like in 
org.apache.solr.common.util.XML.writeXML:
  out.write('<');
This causes the stream encoder to allocate a char array to hold it, and 
sun.nio.cs.StreamEncoder.implWrite allocates a CharBuffer to wrap it.  All just 
for one character.

This is particularly a problem when you have a lot of threads (100?) writing to 
the SOLR server, they rapidly eat up all the CPU.

It would be helpful to allocate the writer as a BufferedWriter, so encoding 
only happens when you flush.  JavaDoc for OutputStreamWriter recommends this: 
"For top efficiency, consider wrapping an OutputStreamWriter within a 
BufferedWriter so as to avoid frequent converter invocations."

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to