Re: Structuring a PySpark Application

Kartik Ohri Wed, 30 Jun 2021 23:44:47 -0700

Hi Gourav,

Thanks for the suggestion, I'll check it out.


Regards,
Kartik

On Thu, Jul 1, 2021 at 5:38 AM Gourav Sengupta <gourav.sengu...@gmail.com>
wrote:

> Hi,
>
> I think that reading Matei Zaharia's book "SPARK the definitive guide"
> will be a good and best starting point.
>
> Regards,
> Gourav Sengupta
>
> On Wed, Jun 30, 2021 at 3:47 PM Kartik Ohri <kartikohr...@gmail.com>
> wrote:
>
>> Hi all!
>>
>> I am working on a Pyspark application and would like suggestions on how
>> it should be structured.
>>
>> We have a number of possible jobs, organized in modules. There is also a "
>> RequestConsumer
>> <https://github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz_spark/request_consumer/request_consumer.py>"
>> class which consumes from a messaging queue. Each message contains the name
>> of the job to invoke and the arguments to be passed to it. Messages are put
>> into the message queue by cronjobs, manually etc.
>>
>> We submit a zip file containing all python files to a Spark cluster
>> running on YARN and ask it to run the RequestConsumer. This
>> <https://github.com/metabrainz/listenbrainz-server/blob/master/docker/start-spark-request-consumer.sh#L23-L34>
>> is the exact spark-submit command for the interested. The results of the
>> jobs are collected
>> <https://github.com/metabrainz/listenbrainz-server/blob/master/listenbrainz_spark/request_consumer/request_consumer.py#L120-L122>
>> by the request consumer and pushed into another queue.
>>
>> My question is whether this type of structure makes sense. Should the
>> Request Consumer instead run independently of Spark and invoke spark-submit
>> scripts when it needs to trigger a job? Or is there another recommendation?
>>
>> Thank you all in advance for taking the time to read this email and
>> helping.
>>
>> Regards,
>> Kartik.
>>
>>
>>

Re: Structuring a PySpark Application

Reply via email to