Hi Andra,
Thanks for the detail. So basically you are doing an ETL on the incoming
stream.
As I understand you have account, product and metric in your streaming data.
Is it likely your data is skewed (non-uniform) due to excessive
presentation of an account or product? What key(s) is used in yo
Hi,
Sure!
Application:
- Spark version 2.4
- Kafka Stream (DStream, from a kafka 0.8 brokers)
- 7 executors, 2cores, 3700M memory size
Logic:
- Process initialises a dataframe that contains metrics for an
account/product metrics (e.g. {"account":A, "product": X123, "metric"; 51}
- After initiali