Hello, I am trying to build a data pipeline that uses spark structured streaming with delta project and runs into Kubernetes. Due to this, I get my output files only into parquet format. Since I am asked to use the prometheus and grafana for building the dashboard for this pipeline, I run an another small spark job and convert output into json so that I would be able to insert them into Grafana. Although I can see that this step is redundant, considering the important of delta lake project, I can not write my data directly into json. Therefore I need some help/guidelines/opinions about moving forward from here.
I would appreciate if the spark user(s) can provide me some practices to follow with respect to the following questions - 1. Since I can not have direct json output from spark structured streams, is there any better way to convert parquet into json? Or should I keep only parquet? 2. Will I need to write some custom exporter for prometheus so as to make grafana read those time-series data? 3. Is there any better dashboard alternative than Grafana for this requirement? 4. Since the pipeline is going to run into Kubernetes, I am trying to avoid InfluxDB as time-series database and moving with prometheus. Is this approach correct? Thanks, Ani ----------- ᐧ