Hi. What tactics can I apply for such a scenario.
I have a pipeline of 10 stages. Simple text processing. I train the data with the pipeline and for the fitted data, do some modelling and store the results. I also have a web-server, where I receive requests. For each request (dataframe of single row), I transform against the same pipeline created above. And do the respective action. The problem is: calling spark for single row takes less than 1 second, but under higher load, spark becomes a major bottleneck. One solution that I can think of, is to have scala re-implementation of the same pipeline, and with the help of the model generated above, process the requests. But this results in duplication of code and hence maintenance. Is there any way, that I can call the same pipeline (transform) in a very light manner, and just for single row. So that it just works concurrently and spark does not remain a bottlenect? Thanks Jatin