+1 to ban Streams usage.

 

> 9 сент. 2021 г., в 02:59, Valentin Kulichenko <valentin.kuliche...@gmail.com> 
> написал(а):
> 
> Pavel,
> 
> Quite frankly, I think we used to lean into performance too much. We
> generally preferred it over data consistency, project modularity and code
> readability. Performance, of course, plays a very important rule in Ignite,
> but it's possible to overdo anything.
> 
> There are certainly parts of the project that can benefit from features
> like Stream API, without significant concern over performance. CLI is an
> obvious example, but I'm sure it's not the only one.
> 
> That said, I don't think that banning something is productive. At the same
> time, we should make sure we pay more attention to performance during
> reviews. Maybe we should have a checklist of major things to look for? Not
> as a set of strict rules, but more as a guideline for contributors and
> committers.
> 
> We might also want to start benchmarking the code and tracking the progress.
> 
> -Val
> 
> On Wed, Sep 8, 2021 at 12:09 PM Pavel Tupitsyn <ptupit...@apache.org> wrote:
> 
>> Alexander, Ivan,
>> 
>>> not very productive to assume that 100% of your code is
>>> on the hot path
>> 
>> That is exactly what we should be doing.
>> When I joined Ignite community 6 years ago, this was the prevalent mindset.
>> 
>> I'm not sure which part of Ignite can be considered "not on a hot path".
>> Create/alter table (mentioned above) should perform well too.
>> 
>>> measured first and only then optimized
>>> https://wiki.c2.com/?OptimizeLater
>> 
>> Extra allocations are a "death by a thousand cuts".
>> They add up here and there, and then there are GC pauses.
>> This would be hard to "optimize later".
>> 
>> It is common for perf-oriented projects to avoid some techniques.
>> For example, LINQ (streams analog from C# with similar perf issues) is
>> avoided in libraries and compilers [1].
>> 
>> [1] https://github.com/dotnet/roslyn/blob/main/CONTRIBUTING.md
>> 
>> On Wed, Sep 8, 2021 at 9:49 PM Valentin Kulichenko <
>> valentin.kuliche...@gmail.com> wrote:
>> 
>>> I don't think we should ban anything. Streams API is just a tool in the
>>> toolbox - it should be used appropriately. It's up to the contributor and
>>> reviewer(s) to identify whether a particular usage might cause
>> performance
>>> issues.
>>> 
>>> -Val
>>> 
>>> On Wed, Sep 8, 2021 at 8:01 AM Alexander Polovtcev <
>>> alexpolovt...@gmail.com>
>>> wrote:
>>> 
>>>> -1
>>>> I think that it is not very productive to assume that 100% of your code
>>> is
>>>> on the hot path, it would be impossible to write and maintain. Humans
>> are
>>>> not very good at guessing where the performance bottlenecks are, so the
>>>> performance of the possible hot paths should be measured first and only
>>>> then optimized and documented.
>>>> 
>>>> On Wed, Sep 8, 2021 at 5:53 PM Ivan Pavlukhin <vololo...@gmail.com>
>>> wrote:
>>>> 
>>>>> Does not this trivial strategy work for us?
>>>>> https://wiki.c2.com/?OptimizeLater
>>>>> 
>>>>> 2021-09-08 13:52 GMT+03:00, Andrey Gura <ag...@apache.org>:
>>>>>> Agree that any additional object creation on a hot path could be a
>>>>>> problem. So hot path should not contain stream API and any other
>>>>>> potentially problem code (e.g. iterator instead of for by index).
>>>>>> 
>>>>>> On Wed, Sep 8, 2021 at 1:45 PM Pavel Tupitsyn <
>> ptupit...@apache.org>
>>>>> wrote:
>>>>>>> 
>>>>>>> Ok, maybe a total ban is overkill, but right now streams are used
>>> even
>>>>> on
>>>>>>> some hot paths like getAllAsync [1].
>>>>>>> 
>>>>>>> Another issue is that Collectors.toList() and other variants don't
>>>>> accept
>>>>>>> capacity, and we end up with unnecessary reallocations of
>> underlying
>>>>>>> arrays.
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://github.com/apache/ignite-3/blob/1d7d703ff2b18234b15a9a7b03104fbb73388edf/modules/table/src/main/java/org/apache/ignite/internal/table/KVBinaryViewImpl.java#L83
>>>>>>> 
>>>>>>> On Wed, Sep 8, 2021 at 1:06 PM Konstantin Orlov <
>>> kor...@gridgain.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi!
>>>>>>>> 
>>>>>>>> Agree with Ivan that it’s an overkill. Code readability and
>>>>>>>> maintainability should have
>>>>>>>> the same priority as the performance (with some exceptions of
>>>> course).
>>>>>>>> 
>>>>>>>> BTW the result of your benchmark looks quite strange. The
>>>> performance
>>>>>>>> penalty on
>>>>>>>> my laptop (Core i7 9750H, 6 cores up to 4.50 GHz) is 25%, not 8
>>>> times:
>>>>>>>> 
>>>>>>>> Benchmark                         Mode  Cnt      Score     Error
>>>>>>>> Units
>>>>>>>> JmhIncrementBenchmark.loopSum    thrpt   10  32347.819 ± 676.548
>>>>>>>> ops/ms
>>>>>>>> JmhIncrementBenchmark.streamSum  thrpt   10  24459.196 ± 610.152
>>>>>>>> ops/ms
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Konstantin Orlov
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 8 Sep 2021, at 12:23, Ivan Bessonov <bessonov...@gmail.com
>>> 
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hello Igniters,
>>>>>>>>> 
>>>>>>>>> I object, banning streams is an overkill. I would argue that
>>> most
>>>> of
>>>>>>>>> the
>>>>>>>>> code
>>>>>>>>> is not on hot paths and that allocations in TLAB don't create
>>> much
>>>>>>>> pressure
>>>>>>>>> on GC.
>>>>>>>>> 
>>>>>>>>> Streams must be used cautiously, developers should know
>> whether
>>>> they
>>>>>>>>> write hot methods or not. And if methods are not hot, code
>>>>> simplicity
>>>>>>>> must
>>>>>>>>> be
>>>>>>>>> the first priority. I don't want Ignite 3 code to look like
>>>> Ignite 2
>>>>>>>> code,
>>>>>>>>> where
>>>>>>>>> people would iterate over Lists using explicit access by
>>> indexes,
>>>>>>>> because it
>>>>>>>>> saves them a single Iterator allocation. That's absurd.
>>>>>>>>> 
>>>>>>>>> ср, 8 сент. 2021 г. в 11:43, Pavel Tupitsyn <
>>> ptupit...@apache.org
>>>>> :
>>>>>>>>> 
>>>>>>>>>> Igniters,
>>>>>>>>>> 
>>>>>>>>>> Java streams are known to be slower and cause more GC
>> pressure
>>>> than
>>>>>>>>>> an
>>>>>>>>>> equivalent loop.
>>>>>>>>>> Below is a simple filter/map/reduce scenario (code [1]):
>>>>>>>>>> 
>>>>>>>>>> * Benchmark
>>>>> Mode
>>>>>>>> Cnt
>>>>>>>>>>   Score     Error   Units
>>>>>>>>>> 
>>>>>>>>>> * StreamVsLoopBenchmark.loopSum
>>>>>>>>>> thrpt
>>>>>>>>  3
>>>>>>>>>> 7987.016 ± 293.013  ops/ms
>>>>>>>>>> * StreamVsLoopBenchmark.loopSum:·gc.alloc.rate
>>>>>>>>>> thrpt
>>>>>>>>  3
>>>>>>>>>>  ≈ 10⁻⁴            MB/sec
>>>>>>>>>> * StreamVsLoopBenchmark.loopSum:·gc.count
>>>>>>>>>> thrpt
>>>>>>>>  3
>>>>>>>>>>     ≈ 0            counts
>>>>>>>>>> 
>>>>>>>>>> * StreamVsLoopBenchmark.streamSum
>>>>>>>>>> thrpt
>>>>>>>>  3
>>>>>>>>>> 1060.244 ±  36.485  ops/ms
>>>>>>>>>> * StreamVsLoopBenchmark.streamSum:·gc.alloc.rate
>>>>>>>>>> thrpt
>>>>>>>>  3
>>>>>>>>>> 315.819 ±  10.844  MB/sec
>>>>>>>>>> * StreamVsLoopBenchmark.streamSum:·gc.count
>>>>>>>>>> thrpt
>>>>>>>>  3
>>>>>>>>>>  55.000            counts
>>>>>>>>>> 
>>>>>>>>>> Loop is several times faster and does not allocate at all.
>>>>>>>>>> 
>>>>>>>>>> 1. Performance is one of the most important features of our
>>>>> product.
>>>>>>>>>> 2. Most of our APIs will be on the hot path.
>>>>>>>>>> 
>>>>>>>>>> One can argue about performance differences in real-world
>>>>> scenarios,
>>>>>>>>>> but increasing GC pressure just to make the code a little bit
>>>> nicer
>>>>>>>>>> is
>>>>>>>>>> unacceptable.
>>>>>>>>>> 
>>>>>>>>>> I propose to ban streams usage in the codebase (except for
>> the
>>>>>>>>>> tests).
>>>>>>>>>> 
>>>>>>>>>> Thoughts, objections?
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>> https://gist.github.com/ptupitsyn/5934bbbf8f92ac4937e534af9386da97
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Sincerely yours,
>>>>>>>>> Ivan Bessonov
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best regards,
>>>>> Ivan Pavlukhin
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> With regards,
>>>> Aleksandr Polovtcev
>>>> 
>>> 
>> 

Reply via email to