Default is not 200, but the number of executor slots. Yes you can only simultaneously execute as many tasks as slots regardless of partitions.
On Thu, May 14, 2020, 5:19 PM Chetan Khatri <[email protected]> wrote: > Thanks Sean, Jerry. > > Default Spark DataFrame partitions are 200 right? does it have > relationship with number of cores? 8 cores - 4 workers. is not it like I > can do only 8 * 4 = 32 http calls. Because in Spark number of partitions = > number cores is untrue. > > Thanks > > On Thu, May 14, 2020 at 6:11 PM Sean Owen <[email protected]> wrote: > >> Yes any code that you write in code that you apply with Spark runs in >> the executors. You would be running as many HTTP clients as you have >> partitions. >> >> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <[email protected]> >> wrote: >> > >> > I believe that if you do this within the context of an operation that >> is already parallelized such as a map, the work will be distributed to >> executors and they will do it in parallel. I could be wrong about this as I >> never investigated this specific use case, though. >> > >> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri < >> [email protected]> wrote: >> >> >> >> Thanks for the quick response. >> >> >> >> I am curious to know whether would it be parallel pulling data for >> 100+ HTTP request or it will only go on Driver node? the post body would be >> part of DataFrame. Think as I have a data frame of employee_id, >> employee_name now the http GET call has to be made for each employee_id and >> DataFrame is dynamic for each spark job run. >> >> >> >> Does it make sense? >> >> >> >> Thanks >> >> >> >> >> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <[email protected]> >> wrote: >> >>> >> >>> Hi Chetan, >> >>> >> >>> You can pretty much use any client to do this. When I was using Spark >> at a previous job, we used OkHttp, but I'm sure there are plenty of others. >> In our case, we had a startup phase in which we gathered metadata via a >> REST API and then broadcast it to the workers. I think if you need all the >> workers to have access to whatever you're getting from the API, that's the >> way to do it. >> >>> >> >>> Jerry >> >>> >> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri < >> [email protected]> wrote: >> >>>> >> >>>> Hi Spark Users, >> >>>> >> >>>> How can I invoke the Rest API call from Spark Code which is not only >> running on Spark Driver but distributed / parallel? >> >>>> >> >>>> Spark with Scala is my tech stack. >> >>>> >> >>>> Thanks >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> http://www.google.com/profiles/grapesmoker >> > >> > >> > >> > -- >> > http://www.google.com/profiles/grapesmoker >> >
