Thanks, that means number of executor = number of http calls, I can make. I can't boost more number of http calls in single executors, I mean - I can't go beyond the threashold of number of executors.
On Thu, May 14, 2020 at 6:26 PM Sean Owen <sro...@gmail.com> wrote: > Default is not 200, but the number of executor slots. Yes you can only > simultaneously execute as many tasks as slots regardless of partitions. > > On Thu, May 14, 2020, 5:19 PM Chetan Khatri <chetan.opensou...@gmail.com> > wrote: > >> Thanks Sean, Jerry. >> >> Default Spark DataFrame partitions are 200 right? does it have >> relationship with number of cores? 8 cores - 4 workers. is not it like I >> can do only 8 * 4 = 32 http calls. Because in Spark number of partitions = >> number cores is untrue. >> >> Thanks >> >> On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote: >> >>> Yes any code that you write in code that you apply with Spark runs in >>> the executors. You would be running as many HTTP clients as you have >>> partitions. >>> >>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <grapesmo...@gmail.com> >>> wrote: >>> > >>> > I believe that if you do this within the context of an operation that >>> is already parallelized such as a map, the work will be distributed to >>> executors and they will do it in parallel. I could be wrong about this as I >>> never investigated this specific use case, though. >>> > >>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> >> >>> >> Thanks for the quick response. >>> >> >>> >> I am curious to know whether would it be parallel pulling data for >>> 100+ HTTP request or it will only go on Driver node? the post body would be >>> part of DataFrame. Think as I have a data frame of employee_id, >>> employee_name now the http GET call has to be made for each employee_id and >>> DataFrame is dynamic for each spark job run. >>> >> >>> >> Does it make sense? >>> >> >>> >> Thanks >>> >> >>> >> >>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov < >>> grapesmo...@gmail.com> wrote: >>> >>> >>> >>> Hi Chetan, >>> >>> >>> >>> You can pretty much use any client to do this. When I was using >>> Spark at a previous job, we used OkHttp, but I'm sure there are plenty of >>> others. In our case, we had a startup phase in which we gathered metadata >>> via a REST API and then broadcast it to the workers. I think if you need >>> all the workers to have access to whatever you're getting from the API, >>> that's the way to do it. >>> >>> >>> >>> Jerry >>> >>> >>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri < >>> chetan.opensou...@gmail.com> wrote: >>> >>>> >>> >>>> Hi Spark Users, >>> >>>> >>> >>>> How can I invoke the Rest API call from Spark Code which is not >>> only running on Spark Driver but distributed / parallel? >>> >>>> >>> >>>> Spark with Scala is my tech stack. >>> >>>> >>> >>>> Thanks >>> >>>> >>> >>>> >>> >>> >>> >>> >>> >>> -- >>> >>> http://www.google.com/profiles/grapesmoker >>> > >>> > >>> > >>> > -- >>> > http://www.google.com/profiles/grapesmoker >>> >>