Hi,

Would help if you could include a high level architecture diagram.

So as I understand you are running a single broker with 6 partitions (or 6
brokers with one partition (default each).

You said your are using continuous triggering mode, meaning as an example


                   foreach(ForeachWriter()).

                     trigger(Trigger.Continuous("1 second").



with 1 sec checkpointing interval.


so the class ForeachWriter() will handle the transformation logic that will
apply to all executors.


For looking for changes to s3 files, you need an orchestrator integrated
with Spark. *So this is all event driven*. Something like airflow with a
file sensor. I am not sure what the granularity of your resolution is but
you may be able to use micro-batching as well.


                     foreachBatch(SendToBigQuery). \

                     trigger(processingTime='2 seconds'). \

Whatever happens, that class within spark should be able to poll for
changes and take the correct logic where necessary.

HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 27 Mar 2021 at 12:46, shahrajesh2006 <shahrajesh2...@gmail.com>
wrote:

> I am using Spark(Java) Structured Streaming in Continuous Trigger Mode
> connecting to Kafka Broker. Usecase is very simple to do some custom
> filter/transformation using a simple java method and ingest data into an
> external system. Kafka has 6 partitions -so application is running 6
> executors.  I have requirement to change the behavior of
> filter/transfomration logic each executor is doing based on external
> event(for example a property change in s3 file).  This is  towards the goal
> of building a highly resiliency architecture where Spark application is
> running into two Cloud Regions and react to an external event.
>
> What is best way to send a signal to running spark application and
> prorogate
> same to each executor?
>
> Approach I have in mind is to create a new component which periodically
> refreshes s3 file to look for trigger event. This component will be
> integrated with logic  running on each Spark executor JVM.
>
> Please advice.
>
> Thanks and Regards
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to