Re: Calling HTTP Rest APIs from Spark Job

Sean Owen Thu, 14 May 2020 15:27:28 -0700

Default is not 200, but the number of executor slots. Yes you can only
simultaneously execute as many tasks as slots regardless of partitions.


On Thu, May 14, 2020, 5:19 PM Chetan Khatri <[email protected]>
wrote:

> Thanks Sean, Jerry.
>
> Default Spark DataFrame partitions are 200 right? does it have
> relationship with number of cores? 8 cores - 4 workers. is not it like I
> can do only 8 * 4 = 32 http calls. Because in Spark number of partitions =
> number cores is untrue.
>
> Thanks
>
> On Thu, May 14, 2020 at 6:11 PM Sean Owen <[email protected]> wrote:
>
>> Yes any code that you write in code that you apply with Spark runs in
>> the executors. You would be running as many HTTP clients as you have
>> partitions.
>>
>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <[email protected]>
>> wrote:
>> >
>> > I believe that if you do this within the context of an operation that
>> is already parallelized such as a map, the work will be distributed to
>> executors and they will do it in parallel. I could be wrong about this as I
>> never investigated this specific use case, though.
>> >
>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri <
>> [email protected]> wrote:
>> >>
>> >> Thanks for the quick response.
>> >>
>> >> I am curious to know whether would it be parallel pulling data for
>> 100+ HTTP request or it will only go on Driver node? the post body would be
>> part of DataFrame. Think as I have a data frame of employee_id,
>> employee_name now the http GET call has to be made for each employee_id and
>> DataFrame is dynamic for each spark job run.
>> >>
>> >> Does it make sense?
>> >>
>> >> Thanks
>> >>
>> >>
>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <[email protected]>
>> wrote:
>> >>>
>> >>> Hi Chetan,
>> >>>
>> >>> You can pretty much use any client to do this. When I was using Spark
>> at a previous job, we used OkHttp, but I'm sure there are plenty of others.
>> In our case, we had a startup phase in which we gathered metadata via a
>> REST API and then broadcast it to the workers. I think if you need all the
>> workers to have access to whatever you're getting from the API, that's the
>> way to do it.
>> >>>
>> >>> Jerry
>> >>>
>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri <
>> [email protected]> wrote:
>> >>>>
>> >>>> Hi Spark Users,
>> >>>>
>> >>>> How can I invoke the Rest API call from Spark Code which is not only
>> running on Spark Driver but distributed / parallel?
>> >>>>
>> >>>> Spark with Scala is my tech stack.
>> >>>>
>> >>>> Thanks
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> http://www.google.com/profiles/grapesmoker
>> >
>> >
>> >
>> > --
>> > http://www.google.com/profiles/grapesmoker
>>
>

Re: Calling HTTP Rest APIs from Spark Job

Reply via email to