You need to split the request handler and the request processor. Receive the request in mod_perl and then queue it into a separate application which does the actual heavy lifting.
On Wed, Jan 8, 2020 at 8:59 PM Wesley Peng <wes...@magenta.de> wrote: > Hallo > > We are running LR[1] and GBDT[2] and similar algorithm in MP2 handles. > For each request, there were about 1000 features as arguments passed > into the handles, via HTTP POST. > The request will wait for about 100ms to get responses, coz the > calculation is not cheap. > My question is, how can we improve the throughput by architecture > optimization? > Yes we know there are TFS[3] and RT[4] for prediction frameworks, but we > didn't use Tensorflow yet. > > > [1] https://en.wikipedia.org/wiki/LR_parser > [2] https://en.wikipedia.org/wiki/Gradient_boosting > [3] https://www.tensorflow.org/tfx/guide/serving > [4] https://developer.nvidia.com/tensorrt > > > Thanks. >