eliminating one or two in memory copies of the response payload

WJCarpenter Fri, 27 Feb 2004 17:11:34 -0800

I'm developing an obsession with getting rid of unnecessary copies
during response processing.  I believe the most common case still has
two in-memory versions of the response payload (before the parsing is
done, which you could count as a 3rd version).  I also think that in
the most common case at least one and maybe both of these could be
eliminated.  By "most common case", I mean and RPC call where the
response is a single-part XML (I didn't look at Message).  These
copies don't matter when the response is small, but I have some
multi-megabyte uses cases, as do others I've heard mentioned here.
Those start to hurt under load.


Here are some notes I took while step debugging through the current
CVS code.

1.  Constructor to TransportMessage reads entire after-header reponse into
    a byte[] (staring around line #168).  In fact, for large
    responses, you get lots of collectible memory since the target
    buffer gets reallocated as it grows.  It starts at 4k and doubles
    on reallocation.  As a finishing stroke, the entire thing is
    reallocated at the end to make the size of the byte[] array
    exactly match the byte count.  So, for a "just under" 2 MB
    envelope, this will be scratch buffers of 4k, 8k, 16k, 32k, 64k,
    128k, 256k, 512k, 1024k, and 2048k; that's in addition to the
    non-collectible 2 MB byte[] for the final results.

2.  HTTPUtils.post, near line #675, calls TransportMessage.read() but
    ignores the String return value.  The innards of
    TransportMessage.read, near line #342, doesn't actually "read"
    anything, but constructs and keeps a reference to the SOAP
    envelope as a String (from the already read byte[]) and also calls
    SOAPContext.setRootPart with that same String.  Obviously, this
    makes a copy of the payload inside the String object.

3.  Call.invoke, near line #334 makes a call to
    Call.getEnvelopeString, which in turn, near line #261, calls
    SOAPContext.getEnvelope.  The reason for calling
    Call.getEnvelopeString is so that the String is available for use
    as part of reporting a parsing problem.  (The actual parsing
    exception is discarded in that case.)

4.  Call.invoke then passes the String from item 3 to
    XMLParseUtils.parse.

Given the above, I'm thinking of cranking out a patch to do these
things:

A.  Constructor to TransportMessage will keep a reference to the
    InputStream and only read it into a byte[] when it has to.  In the
    usual case, I think it will never have to.

B.  HTTPUtils.post won't call TransportMessage.read() but will instead
    call some new method that returns void.  It will still have the
    side effect of calling SOAPContext.setRootPart, but instead of
    passing a byte[], it will use one of the overloads and pass a
    MimeBodyPart (constructed from a DataSource in turn constructed
    from the original SocketInputStream).

C.  Skip the call to Call.getEnvelopeString from item #3 above.
    Having the text of the SOAP envelope in the message about a
    parsing problem seems to me of frankly dubious value and so not
    worth forcing the read into a byte[] and conversion to String.

D.  Make the call to the overload of XMLParseUtils.parse that takes an
    InputSource, where that InputSource would be constructed from the
    original SocketInputStream.

I believe all of the above can be done with a reasonably small patch,
and for the usual case, the XML parser will be reading directly from
the SocketInputStream.  I'm imagining a few places where the state of
the payload can be one of InputStream, byte[], or String, with
on-demand conversion through that progression.  I also believe I will
be able to do this such that the non-usual cases won't suffer (they'll
really just end up forcing the conversions on demand where they would
have happened unconditionally in the current code.
-- 
[EMAIL PROTECTED] (WJCarpenter)    PGP 0x91865119
38 95 1B 69 C9 C6 3D 25    73 46 32 04 69 D6 ED F3

eliminating one or two in memory copies of the response payload

Reply via email to