Hi.

What tactics can I apply for such a scenario.

I have a pipeline of 10 stages. Simple text processing. I train the data
with the pipeline and for the fitted data, do some modelling and store the
results.

I also have a web-server, where I receive requests. For each request
(dataframe of single row), I transform against the same pipeline created
above. And do the respective action. The problem is: calling spark for
single row takes less than  1 second, but under  higher  load, spark
becomes  a major bottleneck.

One solution  that I can  think of, is to have scala re-implementation of
the same pipeline, and with  the help of the model generated above, process
the requests. But this results in  duplication of code and hence
maintenance.

Is there any way, that I can call the same pipeline (transform) in a very
light manner, and just for single row. So that it just works concurrently
and spark does not remain a bottlenect?

Thanks
Jatin

Reply via email to