Re: Classloader memory leak on job restart (FlinkRunner)

2019-01-17 Thread Maximilian Michels
I'm glad that solved your GC problem. I think dipose() is a good place, it is meant for cleanup. In your case the DoFn is a NOOP, so the PipelineOptions are probably loaded through your UnboundedSource. If both happen to be scheduled in the same TaskManager that is fine. However, just for prec

Re: Classloader memory leak on job restart (FlinkRunner)

2019-01-17 Thread Maximilian Michels
Hi Daniel, hi Juan, @Daniel Thanks a lot for investigating and reporting the issue. Your analysis looks convincing, it may be that Jackson is holding on to the Classloader. Beam uses Jackson to parse the FlinkPipelineOptions. Have you already tried to call TypeFactory.defaultInstance().clearC

Re: Spark

2019-01-17 Thread Juan Carlos Garcia
Hi Matt, during the time we were using Spark with Beam, the solution was always to pack the jar and use the spark-submit command pointing to your main class which will do `pipeline.run`. The spark-submit command have a flag to decide how to run it (--deploy-mode), whether to launch the job on the

Spark

2019-01-17 Thread Matt Casters
Dear Beam friends, Now that I've got cool data integration (Kettle-beam) scenarios running on DataFlow with sample data sets in Google (Files, Pub/Sub, BigQuery, Streaming, Windowing, ...) I thought it was time to also give Apache Spark some attention. The thing I have some trouble with it figuri

Re: KafkaIO not commiting offsets when using withMaxReadTime

2019-01-17 Thread Jozef Vilcek
Hello, was there any progress on this or JIRA I can follow? I could use bounded processing over KafkaIO too. Thanks, Jozef On Thu, Jan 10, 2019 at 4:57 PM Alexey Romanenko wrote: > Don’t you think that we could have some race condition there since, > according to initial issue description, some