Hello,

I am trying to build a data pipeline that uses spark structured streaming
with delta project and runs into Kubernetes. Due to this, I get my output
files only into parquet format. Since I am asked to use the prometheus and
grafana
for building the dashboard for this pipeline, I run an another small spark
job and convert output into json so that I would be able to insert them
into Grafana. Although I can see that this step is redundant, considering
the important of delta lake project, I can not write my data directly into
json. Therefore I need some help/guidelines/opinions about moving forward
from here.

I would appreciate if the spark user(s) can provide me some practices to
follow with respect to the following questions -

   1. Since I can not have direct json output from spark structured
   streams, is there any better way to convert parquet into json? Or should I
   keep only parquet?
   2. Will I need to write some custom exporter for prometheus so as to
   make grafana read those time-series data?
   3. Is there any better dashboard alternative than Grafana for this
   requirement?
   4. Since the pipeline is going to run into Kubernetes, I am trying to
   avoid InfluxDB as time-series database and moving with prometheus. Is this
   approach correct?

Thanks,
Ani
-----------
ᐧ

Reply via email to