I'm glad that solved your GC problem. I think dipose() is a good place, it is
meant for cleanup.
In your case the DoFn is a NOOP, so the PipelineOptions are probably loaded
through your UnboundedSource. If both happen to be scheduled in the same
TaskManager that is fine. However, just for prec
Hi Daniel, hi Juan,
@Daniel Thanks a lot for investigating and reporting the issue.
Your analysis looks convincing, it may be that Jackson is holding on to the
Classloader. Beam uses Jackson to parse the FlinkPipelineOptions.
Have you already tried to call TypeFactory.defaultInstance().clearC
Hi Matt, during the time we were using Spark with Beam, the solution was
always to pack the jar and use the spark-submit command pointing to your
main class which will do `pipeline.run`.
The spark-submit command have a flag to decide how to run it
(--deploy-mode), whether to launch the job on the
Dear Beam friends,
Now that I've got cool data integration (Kettle-beam) scenarios running on
DataFlow with sample data sets in Google (Files, Pub/Sub, BigQuery,
Streaming, Windowing, ...) I thought it was time to also give Apache Spark
some attention.
The thing I have some trouble with it figuri
Hello,
was there any progress on this or JIRA I can follow? I could use bounded
processing over KafkaIO too.
Thanks,
Jozef
On Thu, Jan 10, 2019 at 4:57 PM Alexey Romanenko
wrote:
> Don’t you think that we could have some race condition there since,
> according to initial issue description, some