Re: PCollectionViews$SimplePCollectionView.hashCode allocates memory

2018-10-31 Thread Vojtech Janota
t; Ismaƫl > On Wed, Oct 31, 2018 at 2:18 PM Vojtech Janota > wrote: > > > > Hi, > > > > I'm currently profiling memory consumption of our Beam pipeline and have > noticed that > > > > > org.apache.beam.sdk.values.PCollectionViews$SimplePC

PCollectionViews$SimplePCollectionView.hashCode allocates memory

2018-10-31 Thread Vojtech Janota
Hi, I'm currently profiling memory consumption of our Beam pipeline and have noticed that org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode() makes noticeable heap allocations. The implementation is: return Objects.hash(tag); That itself translates to: return Arrays.

[BEAM-960] Backoff in the DirectRunner if no work is available

2018-08-30 Thread Vojtech Janota
Hi beamers, I would like to contribute fix for the following issue: - https://issues.apache.org/jira/browse/BEAM-690 The corresponding PR: - https://github.com/apache/beam/pull/6303 I tried to follow the approach suggested in the comments of the said ticket and any feedback is appreciate

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Vojtech Janota
wait Eugene's feedback. >> > >> > I remember we had some performance regression on the direct runner >> > identified thanks to Nexmark, but it has been addressed by >> reverting a >> > change. >> > >> > Good catch anyway ! >> &g

Re: Performance issue in Beam 2.4 onwards

2018-07-09 Thread Vojtech Janota
't > currently have such a thing. > > In this case, using coders to clone values is more correct. In a > distributed environment using encode/decode is the only way to copy values, > and the DirectRunner is trying to ensure that your code is correct in a > distributed environment

Performance issue in Beam 2.4 onwards

2018-07-09 Thread Vojtech Janota
Hi, We are using Apache Beam in our project for some time now. Since our datasets are of modest size, we have so far used DirectRunner as the computation easily fits onto a single machine. Recently we upgraded Beam from 2.2 to 2.4 and found out that performance of our pipelines drastically deterio