Re: Off-loading heavy process

Rob Sargent Mon, 14 Dec 2020 08:33:08 -0800

>>> Calling save() from the servlet would tie-up the request-processing thread 
>>> until the save completes. That's where you get your 18-hour response times, 
>>> which is not very HTTP-friendly.
>> Certainly don't want to pay for 18 EC2 hours of idle.
> 
> So your clients spin-up an EC2 instance just to send the request to your 
> server? That sounds odd.
> 
Maybe more than you want to hear:  I fill an AWS Queue with job definitions.  
Each job is run on separate EC2 instance, pulls an id or two from the job 
def/command line and requests data from the database.  Uses that data to run 
simulations and sends the analysis of the simulations back to the database.  If 
I didn’t spin the work off to the ThreadPoolExec,  the “large” version would 
have to wait for many, many records to be saved.  I avoid this.  (I actually 
had to look back to see where the “18 hours” came from...)
>>> 
>> The two payloads are impls of an a base class. Jackson/ObjectMapper unravels 
>> them to Type. Type.save();
> 
> Okay, so it looks like Type.save() is what needs to be called in the separate 
> thread (well, submitted to a job scheduler; just get it off the request 
> processing thread so you can return 200 response to the client).
> 

Yes, I think I’m covered once I re-establish TPExec.
>>>> That’s the thinking behind the question of accessing a ThreadPoolExecutor 
>>>> via JDNI.  I know my existing impl does queue jobs so (so the load is 
>>>> greater than the capacity to handle requests).  I worry that without 
>>>> off-loading Tomcat would just spin up more servlet threads, exhaust 
>>>> resources.  I can lose a client, but would rather not lose the server 
>>>> (that looses all clients...)
>>> 
>>> Agreed: rejecting a single request is preferred over the service coming 
>>> down -- and all its in-flight jobs with it.
>>> 
>>> So I think you want something like this:
>>> 
>>> servlet {
>>>   post {
>>>     // Buffer all our input data
>>>     long bufferSize = request.getContentLengthLong();
>>>     if(bufferSize > Integer.MAX_VALUE || bufferSize < 0) {
>>>       bufferSize = 8192; // Reasonable default?
>>>     }
>>>     ByteArrayOutputStream buffer = new 
>>> ByteArrayOutputStream((int)bufferSize);
>>> 
>>>     int count;
>>>     byte[] buffer = new byte[8192];
>>>     while(-1 != (count = in.read(buf)) {
>>>         buffer.write(buf, 0, count);
>>>     }
>>> 
>>>     // All data read: tell the client we are good to go
>>>     Job job = new Job(buffer);
>>>     try {
>>>       sharedExecutor.submit(job); // Fire and forget
>>> 
>>>       response.setStatus(200); // Ok
>>>     } catch (RejectedExecutionException ree) {
>>>       response.setStatus(503); // Service Unavailable
>>>     }
>>>   }
>>> }
>>> 
>> This is working:
>>       protected void doPost(HttpServletRequest req, HttpServletResponse
>>    resp) /*throws ServletException, IOException*/ {
>>         lookupHostAndPort();
>>         Connection conn = null;
>>         try {
>>           ObjectMapper jsonMapper = JsonMapper.builder().addModule(new
>>    JavaTimeModule()).build();
>>           jsonMapper.setSerializationInclusion(Include.NON_NULL);
>>           try {
>>             AbstractPayload payload =
>>    jsonMapper.readValue(req.getInputStream(), AbstractPayload.class);
>>             logger.error("received payload");
>>             String redoUrl =
>>    String.format("jdbc:postgresql://%s:%d/%s", getDbHost(),
>>    getDbPort(), getDbName(req));
>>            Connection copyConn = DriverManager.getConnection(redoUrl,
>>    getDbRole(req), getDbRole(req)+getExtension());
> 
> So it's here you cannot pool the connections? What about:
> 
>    Context ctx = new InitialContext();
> 
>    DataSource ds = (DataSource)ctx.lookup("java:/comp/env/jdbc/" + 
> getJNDIName(req));

I’ll see if I need this (If I’m never getting a pooled connection).  But JNDI 
is not a good place for the “second investigator’s name (et al)"
> 
> Then you can define your per-user connection pools in JNDI and get the 
> benefit of connection-pooling.
> 
>>             payload.setConnection(copyConn);
>>             payload.write();
> 
> Is the above call the one that takes hours?

The beginning of it for sure.  The COPY work happens pleasantly quickly but 
does need it’s own db connection.  Payload says thanks, then goes on to using 
the temp tables filled by COPY to write to the real tables.  This is the slow 
part as we can be talking about millions of records into/updating a table with 
indexed. (This is done in 1/16ths. Don’t ask how.)
> 
>>             //HERE THE CLIENT IS WAITING FOR THE SAVE.  Though there
>>    can be a lot of data, COPY is blindingly fast
> 
> Maybe the payload.write() is not slow. Maybe? After this you don't do 
> anything else...
> 
>>             resp.setContentType("plain/text");
>>             resp.setStatus(200);
>>             resp.getOutputStream().write("SGS_OK".getBytes());
>>             resp.getOutputStream().flush();
>>             resp.getOutputStream().close();
>>           }
>>             //Client can do squat at this point.
>>           catch
>>    (com.fasterxml.jackson.databind.exc.MismatchedInputException mie) {
>>             logger.error("transform failed: " + mie.getMessage());
>>             resp.setContentType("plain/text");
>>             resp.setStatus(461);
>>             String emsg = "PAYLOAD NOT
>>    SAVED\n%s\n".format(mie.getMessage());
>>             resp.getOutputStream().write(emsg.getBytes());
>>             resp.getOutputStream().flush();
>>             resp.getOutputStream().close();
>>           }
>>         }
>>         catch (IOException | SQLException ioe) {
>>         etc }
>>> Obviously, the job needs to know how to execute itself (making it Runnable 
>>> means you can use the various Executors Java provides). Also, you need to 
>>> decide what to do about creating the executor.
>>> 
>>> I used the ByteArrayOutputStream above to avoid the complexity of 
>>> re-scaling buffers in example code. If you have huge buffers and you need 
>>> to convert to byte[] at the end, then you are going to need 2x heap space 
>>> to do it. Yuck. Consider implementing the auto-re-sizing byte-array 
>>> yourself and avoiding ByteArrayOutputStream.
>>> 
>>> There isn't anything magic about JNDI. You could also put the thread pool 
>>> directly into your servvlet:
>>> 
>>> servlet {
>>>   ThreadPoolExecutor sharedExecutor;
>>>   constructor() {
>>>     sharedExecutor = new ThreadPoolExecutor(...);
>>>   }
>>>   ...
>>> }
>>> 
>> Yes, I see now that the single real instance of the servlet can master the 
>> sharedExcutor.
>> I have reliable threadpool code at hand.  I don't need to separate the job 
>> types:  In practice all the big ones are done first: they define the small 
>> ones.  It's when I'm spectacularly successful and two (2) investigators want 
>> to use the system ...
> 
> Sounds good.
> 
> But I am still confused as to what is taking 18 hours. None of the calls 
> above look like they should take a long time, given your comments.

I think I've explain the slow part above.  TL/DR: DB writes are expensive 
> 
>>> If you want to put those executors into JNDI, you are welcome to do so, but 
>>> there is no particular reason to. If it's convenient to configure a thread 
>>> pool executor via some JNDI injection something-or-other, feel free to use 
>>> that.
>>> 
>>> But ultimately, you are just going to get a reference to the executor and 
>>> drop the job on it.
>>> 
>>>> Next up, is SSL.  One of the reason’s I must switch from my naked socket 
>>>> impl.
>>> 
>>> Nah, you can do TLS on a naked socket. But I think using Tomcat embedded 
>>> (or not) will save you the trouble of having to learn a whole lot and write 
>>> a lot of code.
>>> 
>> No thanks.
>>> TLS should be fairly easy to get going in Tomcat as long as you already 
>>> understand how to create a key+certificate.
>>> 
>> I've made keys/certs in previous lives (not to say I understand them). I'm 
>> waiting to hear on whether or not I'll be able to self-sign etc. Talking to 
>> AWS Monday on the things security/HIPAA
> 
> AWS may tell you that simply using TLS at the load-balancer (which is 
> fall-off-a-log easy; they will even auto-renew with an AWS-signed CA), which 
> should be sufficient for your needs. You may not have to configure Tomcat for 
> TLS at all.
> 
I will definitely bring this up.  Thanks.
>> I'm sure I'll be back, but I think I can move forward.  Much appreciated.
> 

> Any time.
> 
> -chris
> 
Same threat as before ;)
Thanks a ton,

rjs

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: users-h...@tomcat.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
Re: Off-loading heavy process

Reply via email to