+1 to ban Streams usage.
> 9 сент. 2021 г., в 02:59, Valentin Kulichenko <valentin.kuliche...@gmail.com> > написал(а): > > Pavel, > > Quite frankly, I think we used to lean into performance too much. We > generally preferred it over data consistency, project modularity and code > readability. Performance, of course, plays a very important rule in Ignite, > but it's possible to overdo anything. > > There are certainly parts of the project that can benefit from features > like Stream API, without significant concern over performance. CLI is an > obvious example, but I'm sure it's not the only one. > > That said, I don't think that banning something is productive. At the same > time, we should make sure we pay more attention to performance during > reviews. Maybe we should have a checklist of major things to look for? Not > as a set of strict rules, but more as a guideline for contributors and > committers. > > We might also want to start benchmarking the code and tracking the progress. > > -Val > > On Wed, Sep 8, 2021 at 12:09 PM Pavel Tupitsyn <ptupit...@apache.org> wrote: > >> Alexander, Ivan, >> >>> not very productive to assume that 100% of your code is >>> on the hot path >> >> That is exactly what we should be doing. >> When I joined Ignite community 6 years ago, this was the prevalent mindset. >> >> I'm not sure which part of Ignite can be considered "not on a hot path". >> Create/alter table (mentioned above) should perform well too. >> >>> measured first and only then optimized >>> https://wiki.c2.com/?OptimizeLater >> >> Extra allocations are a "death by a thousand cuts". >> They add up here and there, and then there are GC pauses. >> This would be hard to "optimize later". >> >> It is common for perf-oriented projects to avoid some techniques. >> For example, LINQ (streams analog from C# with similar perf issues) is >> avoided in libraries and compilers [1]. >> >> [1] https://github.com/dotnet/roslyn/blob/main/CONTRIBUTING.md >> >> On Wed, Sep 8, 2021 at 9:49 PM Valentin Kulichenko < >> valentin.kuliche...@gmail.com> wrote: >> >>> I don't think we should ban anything. Streams API is just a tool in the >>> toolbox - it should be used appropriately. It's up to the contributor and >>> reviewer(s) to identify whether a particular usage might cause >> performance >>> issues. >>> >>> -Val >>> >>> On Wed, Sep 8, 2021 at 8:01 AM Alexander Polovtcev < >>> alexpolovt...@gmail.com> >>> wrote: >>> >>>> -1 >>>> I think that it is not very productive to assume that 100% of your code >>> is >>>> on the hot path, it would be impossible to write and maintain. Humans >> are >>>> not very good at guessing where the performance bottlenecks are, so the >>>> performance of the possible hot paths should be measured first and only >>>> then optimized and documented. >>>> >>>> On Wed, Sep 8, 2021 at 5:53 PM Ivan Pavlukhin <vololo...@gmail.com> >>> wrote: >>>> >>>>> Does not this trivial strategy work for us? >>>>> https://wiki.c2.com/?OptimizeLater >>>>> >>>>> 2021-09-08 13:52 GMT+03:00, Andrey Gura <ag...@apache.org>: >>>>>> Agree that any additional object creation on a hot path could be a >>>>>> problem. So hot path should not contain stream API and any other >>>>>> potentially problem code (e.g. iterator instead of for by index). >>>>>> >>>>>> On Wed, Sep 8, 2021 at 1:45 PM Pavel Tupitsyn < >> ptupit...@apache.org> >>>>> wrote: >>>>>>> >>>>>>> Ok, maybe a total ban is overkill, but right now streams are used >>> even >>>>> on >>>>>>> some hot paths like getAllAsync [1]. >>>>>>> >>>>>>> Another issue is that Collectors.toList() and other variants don't >>>>> accept >>>>>>> capacity, and we end up with unnecessary reallocations of >> underlying >>>>>>> arrays. >>>>>>> >>>>>>> [1] >>>>>>> >>>>> >>>> >>> >> https://github.com/apache/ignite-3/blob/1d7d703ff2b18234b15a9a7b03104fbb73388edf/modules/table/src/main/java/org/apache/ignite/internal/table/KVBinaryViewImpl.java#L83 >>>>>>> >>>>>>> On Wed, Sep 8, 2021 at 1:06 PM Konstantin Orlov < >>> kor...@gridgain.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi! >>>>>>>> >>>>>>>> Agree with Ivan that it’s an overkill. Code readability and >>>>>>>> maintainability should have >>>>>>>> the same priority as the performance (with some exceptions of >>>> course). >>>>>>>> >>>>>>>> BTW the result of your benchmark looks quite strange. The >>>> performance >>>>>>>> penalty on >>>>>>>> my laptop (Core i7 9750H, 6 cores up to 4.50 GHz) is 25%, not 8 >>>> times: >>>>>>>> >>>>>>>> Benchmark Mode Cnt Score Error >>>>>>>> Units >>>>>>>> JmhIncrementBenchmark.loopSum thrpt 10 32347.819 ± 676.548 >>>>>>>> ops/ms >>>>>>>> JmhIncrementBenchmark.streamSum thrpt 10 24459.196 ± 610.152 >>>>>>>> ops/ms >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Regards, >>>>>>>> Konstantin Orlov >>>>>>>> >>>>>>>> >>>>>>>>> On 8 Sep 2021, at 12:23, Ivan Bessonov <bessonov...@gmail.com >>> >>>>> wrote: >>>>>>>>> >>>>>>>>> Hello Igniters, >>>>>>>>> >>>>>>>>> I object, banning streams is an overkill. I would argue that >>> most >>>> of >>>>>>>>> the >>>>>>>>> code >>>>>>>>> is not on hot paths and that allocations in TLAB don't create >>> much >>>>>>>> pressure >>>>>>>>> on GC. >>>>>>>>> >>>>>>>>> Streams must be used cautiously, developers should know >> whether >>>> they >>>>>>>>> write hot methods or not. And if methods are not hot, code >>>>> simplicity >>>>>>>> must >>>>>>>>> be >>>>>>>>> the first priority. I don't want Ignite 3 code to look like >>>> Ignite 2 >>>>>>>> code, >>>>>>>>> where >>>>>>>>> people would iterate over Lists using explicit access by >>> indexes, >>>>>>>> because it >>>>>>>>> saves them a single Iterator allocation. That's absurd. >>>>>>>>> >>>>>>>>> ср, 8 сент. 2021 г. в 11:43, Pavel Tupitsyn < >>> ptupit...@apache.org >>>>> : >>>>>>>>> >>>>>>>>>> Igniters, >>>>>>>>>> >>>>>>>>>> Java streams are known to be slower and cause more GC >> pressure >>>> than >>>>>>>>>> an >>>>>>>>>> equivalent loop. >>>>>>>>>> Below is a simple filter/map/reduce scenario (code [1]): >>>>>>>>>> >>>>>>>>>> * Benchmark >>>>> Mode >>>>>>>> Cnt >>>>>>>>>> Score Error Units >>>>>>>>>> >>>>>>>>>> * StreamVsLoopBenchmark.loopSum >>>>>>>>>> thrpt >>>>>>>> 3 >>>>>>>>>> 7987.016 ± 293.013 ops/ms >>>>>>>>>> * StreamVsLoopBenchmark.loopSum:·gc.alloc.rate >>>>>>>>>> thrpt >>>>>>>> 3 >>>>>>>>>> ≈ 10⁻⁴ MB/sec >>>>>>>>>> * StreamVsLoopBenchmark.loopSum:·gc.count >>>>>>>>>> thrpt >>>>>>>> 3 >>>>>>>>>> ≈ 0 counts >>>>>>>>>> >>>>>>>>>> * StreamVsLoopBenchmark.streamSum >>>>>>>>>> thrpt >>>>>>>> 3 >>>>>>>>>> 1060.244 ± 36.485 ops/ms >>>>>>>>>> * StreamVsLoopBenchmark.streamSum:·gc.alloc.rate >>>>>>>>>> thrpt >>>>>>>> 3 >>>>>>>>>> 315.819 ± 10.844 MB/sec >>>>>>>>>> * StreamVsLoopBenchmark.streamSum:·gc.count >>>>>>>>>> thrpt >>>>>>>> 3 >>>>>>>>>> 55.000 counts >>>>>>>>>> >>>>>>>>>> Loop is several times faster and does not allocate at all. >>>>>>>>>> >>>>>>>>>> 1. Performance is one of the most important features of our >>>>> product. >>>>>>>>>> 2. Most of our APIs will be on the hot path. >>>>>>>>>> >>>>>>>>>> One can argue about performance differences in real-world >>>>> scenarios, >>>>>>>>>> but increasing GC pressure just to make the code a little bit >>>> nicer >>>>>>>>>> is >>>>>>>>>> unacceptable. >>>>>>>>>> >>>>>>>>>> I propose to ban streams usage in the codebase (except for >> the >>>>>>>>>> tests). >>>>>>>>>> >>>>>>>>>> Thoughts, objections? >>>>>>>>>> >>>>>>>>>> [1] >>>>>>>>>> >>>> https://gist.github.com/ptupitsyn/5934bbbf8f92ac4937e534af9386da97 >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Sincerely yours, >>>>>>>>> Ivan Bessonov >>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Best regards, >>>>> Ivan Pavlukhin >>>>> >>>> >>>> >>>> -- >>>> With regards, >>>> Aleksandr Polovtcev >>>> >>> >>