Hi Roland, Thank you for your reply. That's quite helpful. I think I should try influxDB then. But I am curious if in case of prometheus writing a custom exporter be a good choice and solve the purpose efficiently? Grafana is not something I want to drop.
Best, Aniruddha ----------- ᐧ On Tue, Feb 25, 2020 at 11:36 PM Roland Johann <roland.joh...@phenetic.io> wrote: > Hi Ani, > > Prometheus is not well suited for ingesting explicit timeseries data. Its > purpose is for technical monitoring. If you want to monitor your spark jobs > with prometheus you can publish the metrics so prometheus can scrape it. > What you propably are looking for is a timeseries database that you can > push metrics to. > > Looking for an alternative for grafana should be done only if you find > grafana is not well suited for your use case regarding visualization. > > As said earlier, at a quick glance it sounds that you should look for an > alternative to prometheus. > > For timeseries you can reach out to TimescaleDB, InfluxDB. Other databases > like normal SQL databases or cassandra lacks up/downsampling capabilities > that can lead to large query responses and the need for the client to post > process. > > Kind regards, > > Aniruddha P Tekade <ateka...@binghamton.edu> schrieb am Mi. 26. Feb. 2020 > um 02:23: > >> Hello, >> >> I am trying to build a data pipeline that uses spark structured streaming >> with delta project and runs into Kubernetes. Due to this, I get my output >> files only into parquet format. Since I am asked to use the prometheus and >> grafana >> for building the dashboard for this pipeline, I run an another small >> spark job and convert output into json so that I would be able to insert >> them into Grafana. Although I can see that this step is redundant, >> considering the important of delta lake project, I can not write my data >> directly into json. Therefore I need some help/guidelines/opinions about >> moving forward from here. >> >> I would appreciate if the spark user(s) can provide me some practices to >> follow with respect to the following questions - >> >> 1. Since I can not have direct json output from spark structured >> streams, is there any better way to convert parquet into json? Or should I >> keep only parquet? >> 2. Will I need to write some custom exporter for prometheus so as to >> make grafana read those time-series data? >> 3. Is there any better dashboard alternative than Grafana for this >> requirement? >> 4. Since the pipeline is going to run into Kubernetes, I am trying to >> avoid InfluxDB as time-series database and moving with prometheus. Is this >> approach correct? >> >> Thanks, >> Ani >> ----------- >> ᐧ >> > -- > Roland Johann > Software Developer/Data Engineer > > phenetic GmbH > Lütticher Straße 10, 50674 Köln, Germany > > Mobil: +49 172 365 26 46 > Mail: roland.joh...@phenetic.io > Web: phenetic.io > > Handelsregister: Amtsgericht Köln (HRB 92595) > Geschäftsführer: Roland Johann, Uwe Reimann >