Sorry I don't have a diagram to share. your understanding of how I are using spark application is right. Its kafka topic with 6 partitions, so spark is able to create 6 parallel consumers/executors.
Thought of using Airflow is interesting. I will explore this option more. Other idea of using ProcessingTime trigger(every 60 seconds) to build a new query to load data from s3 file and use results from this query with ContinuousTrigger query - I will try this option also. Thanks again! -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org