date:20210429

Re: Avoiding OutOfMemoryError for large batch-jobs

2021-04-29 Thread Thomas Fredriksen(External)

No, we are not seeing anything specific about request timeouts in the logs. Typically, the only thing we do see in the logs is the following: 21/04/21 21:44:52 ERROR ScalaDriverLocal: User Code Stack Trace: > java.lang.RuntimeException: org.apache.spark.SparkException: Job aborted > due to stage

Re: Exporting beam custom metrics to Prometheus

2021-04-29 Thread Yu Zhang

Hi Feba, I did a POC based on Beam’s doc 10.5 [1] and Flink metrics reporter [2]. You can add metrics reporter configuration in flink-conf.yaml so that metrics can be scraped from port you configured. Be sure to set disable_metrics [3] to false. [1] https://beam.apache.org/documentation/prog

Question on printing out a PCollection

2021-04-29 Thread Tao Li

Hi Beam community, The notebook console from Google Cloud defines a show() API to display a PCollection which is very neat: https://cloud.google.com/dataflow/docs/guides/interactive-pipeline-development If we are using a regular jupyter notebook to run beam app, how can we print out a PCollect