[ 
https://issues.apache.org/jira/browse/SOLR-16265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558690#comment-17558690
 ] 

Chris M. Hostetter commented on SOLR-16265:
-------------------------------------------

Given that {{Http2SolrClient.createRequest}} is private and only used in a few 
places, it seems like we could probably rethink it's API a bit to avoid needing 
the {{ByteArrayOutputStream}} and let the {{ContentWriter}} write directly to 
an {{OutputStreamContentProvider}} ... but skimming the jetty docs on 
{{ContentProvider}} I gather this might have some behavior changes relating to 
retries.

So perhaps we could have our own custom {{ContentProvider}} that works similar 
to {{OutputStreamContentProvider}} but makes a new call to 
{{ContentWriter.write(...)}} each time the {{iterator()}} is called?

But in the meantime, just switching the {{ByteArrayOutputStream}} to use the 
existing {{BinaryRequestWriter.BAOS}} class would eliminate the {{byte[]}} copy 
and give a quick/small improvement ... so I'll open a sub-task for that

> reduce memory usage of ContentWriter based requests in Http2SolrClient
> ----------------------------------------------------------------------
>
>                 Key: SOLR-16265
>                 URL: https://issues.apache.org/jira/browse/SOLR-16265
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Priority: Major
>
> I recently noticed the code below exists in 
> {{Http2SolrClient.createRequest}}...
> {code}
> if (contentWriter != null) {
>   Request req = httpClient.newRequest(url + 
> wparams.toQueryString()).method(method);
>   ByteArrayOutputStream baos = new ByteArrayOutputStream();
>   contentWriter.write(baos);
>   // TODO reduce memory usage
>   return req.content(
>       new BytesContentProvider(contentWriter.getContentType(), 
> baos.toByteArray()));
> {code}
> * AFAICT there is no (other) existing jira discussing this TODO
> * This method is called for most "simple" HTTP2 based requests
> ** {{Http2SolrClient}} or {{CloudHttp2SolrClient}} -- but not 
> {{ConcurrentUpdateHttp2SolrClient}}
> * This block triggers for anything with a {{ContentWriter}}
> ** ie: all {{UpdateRequests}} ... and in theory other custom requests
> * Part of the issue seems to be that this code repurposes the 
> {{ContentWriter}} "push" style API into a "pull" style Jetty client API
> ** Even though {{Http2SolrClient}} has other code used only by 
> {{ConcurrentUpdateHttp2SolrClient}} ({{initOutStream(...)}}) which does 
> leverage a "push" style Jetty client API: {{OutputStreamContentProvider}}
> * But more silly: we make one (serialized) {{byte[]}} of the data in memory 
> inside the {{ByteArrayOutputStream}} then we call {{toByteArray()}} which 
> makes a second copy of the {{byte[]}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to