Re: Calling HTTP Rest APIs from Spark Job

Chetan Khatri Thu, 14 May 2020 16:15:28 -0700

Thanks, that means number of executor = number of http calls, I can make. I
can't boost more number of http calls in single executors, I mean - I can't
go beyond the threashold of number of executors.


On Thu, May 14, 2020 at 6:26 PM Sean Owen <sro...@gmail.com> wrote:

> Default is not 200, but the number of executor slots. Yes you can only
> simultaneously execute as many tasks as slots regardless of partitions.
>
> On Thu, May 14, 2020, 5:19 PM Chetan Khatri <chetan.opensou...@gmail.com>
> wrote:
>
>> Thanks Sean, Jerry.
>>
>> Default Spark DataFrame partitions are 200 right? does it have
>> relationship with number of cores? 8 cores - 4 workers. is not it like I
>> can do only 8 * 4 = 32 http calls. Because in Spark number of partitions =
>> number cores is untrue.
>>
>> Thanks
>>
>> On Thu, May 14, 2020 at 6:11 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> Yes any code that you write in code that you apply with Spark runs in
>>> the executors. You would be running as many HTTP clients as you have
>>> partitions.
>>>
>>> On Thu, May 14, 2020 at 4:31 PM Jerry Vinokurov <grapesmo...@gmail.com>
>>> wrote:
>>> >
>>> > I believe that if you do this within the context of an operation that
>>> is already parallelized such as a map, the work will be distributed to
>>> executors and they will do it in parallel. I could be wrong about this as I
>>> never investigated this specific use case, though.
>>> >
>>> > On Thu, May 14, 2020 at 5:24 PM Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>> >>
>>> >> Thanks for the quick response.
>>> >>
>>> >> I am curious to know whether would it be parallel pulling data for
>>> 100+ HTTP request or it will only go on Driver node? the post body would be
>>> part of DataFrame. Think as I have a data frame of employee_id,
>>> employee_name now the http GET call has to be made for each employee_id and
>>> DataFrame is dynamic for each spark job run.
>>> >>
>>> >> Does it make sense?
>>> >>
>>> >> Thanks
>>> >>
>>> >>
>>> >> On Thu, May 14, 2020 at 5:12 PM Jerry Vinokurov <
>>> grapesmo...@gmail.com> wrote:
>>> >>>
>>> >>> Hi Chetan,
>>> >>>
>>> >>> You can pretty much use any client to do this. When I was using
>>> Spark at a previous job, we used OkHttp, but I'm sure there are plenty of
>>> others. In our case, we had a startup phase in which we gathered metadata
>>> via a REST API and then broadcast it to the workers. I think if you need
>>> all the workers to have access to whatever you're getting from the API,
>>> that's the way to do it.
>>> >>>
>>> >>> Jerry
>>> >>>
>>> >>> On Thu, May 14, 2020 at 5:03 PM Chetan Khatri <
>>> chetan.opensou...@gmail.com> wrote:
>>> >>>>
>>> >>>> Hi Spark Users,
>>> >>>>
>>> >>>> How can I invoke the Rest API call from Spark Code which is not
>>> only running on Spark Driver but distributed / parallel?
>>> >>>>
>>> >>>> Spark with Scala is my tech stack.
>>> >>>>
>>> >>>> Thanks
>>> >>>>
>>> >>>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> http://www.google.com/profiles/grapesmoker
>>> >
>>> >
>>> >
>>> > --
>>> > http://www.google.com/profiles/grapesmoker
>>>
>>

Re: Calling HTTP Rest APIs from Spark Job

Reply via email to