I am using Spark(Java) Structured Streaming in Continuous Trigger Mode
connecting to Kafka Broker. Usecase is very simple to do some custom
filter/transformation using a simple java method and ingest data into an
external system. Kafka has 6 partitions -so application is running 6
executors. I h
Sorry I don't have a diagram to share. your understanding of how I are using
spark application is right.
Its kafka topic with 6 partitions, so spark is able to create 6 parallel
consumers/executors.
Thought of using Airflow is interesting. I will explore this option more.
Other idea of using
I tried to create a Dataset by loading a file and pass that as argument to
java method as below:
Dataset propertiesFile// Dataset created by loading a json property
file
Dataset streamingQuery // Dataset for streaming query
streamingQuery.map(
row -> myfunction( row, propertiesFile), Enco