Hi Ani, Prometheus is not well suited for ingesting explicit timeseries data. Its purpose is for technical monitoring. If you want to monitor your spark jobs with prometheus you can publish the metrics so prometheus can scrape it. What you propably are looking for is a timeseries database that you can push metrics to.
Looking for an alternative for grafana should be done only if you find grafana is not well suited for your use case regarding visualization. As said earlier, at a quick glance it sounds that you should look for an alternative to prometheus. For timeseries you can reach out to TimescaleDB, InfluxDB. Other databases like normal SQL databases or cassandra lacks up/downsampling capabilities that can lead to large query responses and the need for the client to post process. Kind regards, Aniruddha P Tekade <ateka...@binghamton.edu> schrieb am Mi. 26. Feb. 2020 um 02:23: > Hello, > > I am trying to build a data pipeline that uses spark structured streaming > with delta project and runs into Kubernetes. Due to this, I get my output > files only into parquet format. Since I am asked to use the prometheus and > grafana > for building the dashboard for this pipeline, I run an another small spark > job and convert output into json so that I would be able to insert them > into Grafana. Although I can see that this step is redundant, considering > the important of delta lake project, I can not write my data directly into > json. Therefore I need some help/guidelines/opinions about moving forward > from here. > > I would appreciate if the spark user(s) can provide me some practices to > follow with respect to the following questions - > > 1. Since I can not have direct json output from spark structured > streams, is there any better way to convert parquet into json? Or should I > keep only parquet? > 2. Will I need to write some custom exporter for prometheus so as to > make grafana read those time-series data? > 3. Is there any better dashboard alternative than Grafana for this > requirement? > 4. Since the pipeline is going to run into Kubernetes, I am trying to > avoid InfluxDB as time-series database and moving with prometheus. Is this > approach correct? > > Thanks, > Ani > ----------- > ᐧ > -- Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io Web: phenetic.io Handelsregister: Amtsgericht Köln (HRB 92595) Geschäftsführer: Roland Johann, Uwe Reimann