t; Ismaƫl
> On Wed, Oct 31, 2018 at 2:18 PM Vojtech Janota
> wrote:
> >
> > Hi,
> >
> > I'm currently profiling memory consumption of our Beam pipeline and have
> noticed that
> >
> >
> org.apache.beam.sdk.values.PCollectionViews$SimplePC
Hi,
I'm currently profiling memory consumption of our Beam pipeline and have
noticed that
org.apache.beam.sdk.values.PCollectionViews$SimplePCollectionView.hashCode()
makes noticeable heap allocations. The implementation is:
return Objects.hash(tag);
That itself translates to:
return Arrays.
Hi beamers,
I would like to contribute fix for the following issue:
- https://issues.apache.org/jira/browse/BEAM-690
The corresponding PR:
- https://github.com/apache/beam/pull/6303
I tried to follow the approach suggested in the comments of the said ticket
and any feedback is appreciate
wait Eugene's feedback.
>> >
>> > I remember we had some performance regression on the direct runner
>> > identified thanks to Nexmark, but it has been addressed by
>> reverting a
>> > change.
>> >
>> > Good catch anyway !
>> &g
't
> currently have such a thing.
>
> In this case, using coders to clone values is more correct. In a
> distributed environment using encode/decode is the only way to copy values,
> and the DirectRunner is trying to ensure that your code is correct in a
> distributed environment
Hi,
We are using Apache Beam in our project for some time now. Since our
datasets are of modest size, we have so far used DirectRunner as the
computation easily fits onto a single machine. Recently we upgraded Beam
from 2.2 to 2.4 and found out that performance of our pipelines drastically
deterio