https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118033
--- Comment #3 from Dmytro Ovdiienko ---
I believe in 99% cases whatever is passed to the assert() is a legal expression
that returns bool. And there is an opportunity to optimize the output assembly
in case we if want to reuse that expression f
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118033
--- Comment #1 from Dmytro Ovdiienko ---
I'm not sure about how to handle the side effects caused by the expression. The
code in the expression must not be executed but used by the compiler only for
the optimization.
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: dmitriy.ovdienko at gmail dot com
Target Milestone: ---
Could you define `assert` macro as following in case if `NDEBUG` macro is
defined:
#if defined(NDEBUG
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: dmitriy.ovdienko at gmail dot com
Target Milestone: ---
Hello,
I'd like to report the idea which could improve the application performance.
The idea is related to `constexpr` math, which can be perform
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840
--- Comment #4 from Dmitriy Ovdienko ---
What if introduce new ABI version and encode into function name (function name
mangling).
And then have two options:
* Either compile code and store both versions into lib file (ABI v1 and v2).
Applies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98840
--- Comment #3 from Dmitriy Ovdienko ---
> This is not a GCC bug.
No it is not. But can we improve that?
That approach increases the binary size. In case if `baz` is called from many
places, that is going to increase the binary size.
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: dmitriy.ovdienko at gmail dot com
Target Milestone: ---
I'm trying to evaluate the overhead of the `unique_ptr` and I do not understand
why does Gcc execute the destructor of the `uniqu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641
--- Comment #7 from Dmitriy Ovdienko ---
If I change the body of the loop like this, it also works
```
while ('\x01' != *ptr)
{
result = result * 10 - '0' + *ptr++;
}
```
Looks like integer overflow happens on last iteration and compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641
--- Comment #6 from Dmitriy Ovdienko ---
This code does not work
```
#include
int Parse1(char const* ptr) noexcept
{
int result = 0;
while ('\x01' != *ptr)
{
result = result * 10 + *ptr++ - '0';
}
return result;
}
i
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641
Dmitriy Ovdienko changed:
What|Removed |Added
Status|RESOLVED|UNCONFIRMED
Resolution|INVALI
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641
--- Comment #4 from Dmitriy Ovdienko ---
It happens to 2147483646, 2147483647 and std::numeric_limits::min().
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97641
--- Comment #1 from Dmitriy Ovdienko ---
OS: Windows 10
Distribution: MSys2 (https://www.msys2.org/)
Version: (Rev4, Built by MSYS2 project) 10.2.0
I tried to reproduce this issue on https://gcc.godbolt.org/. gcc (trunk) is
also unable to compil
++
Assignee: unassigned at gcc dot gnu.org
Reporter: dmitriy.ovdienko at gmail dot com
Target Milestone: ---
g++ optimizer produces wrong code in case if -O3 is used. In case if -O2 and
-O1 are used, app works as expected.
Expected output: matches
In fact output: does not match
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #42 from Dmitriy Ovdienko ---
> The master branch has been updated by Jonathan Wakely :
Very good commit and comment. I hope this change, made for synthetic benchmark,
wont affect real production applications.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #41 from Dmitriy Ovdienko ---
@Jonathan
Did you have chance to review why default-constructed M-B-R works faster than
another one constructed with the initial buffer size?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #38 from Dmitriy Ovdienko ---
Wohoo! Accepted:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #37 from Dmitriy Ovdienko ---
> I assume you can't just preallocate a buffer for the pool?
I dunno... here is a requirement:
* When possible, use default GC; otherwise use per node allocation or use a
library memory pool.
* As a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #36 from Dmitriy Ovdienko ---
> It doesn't seem to make much difference.
It is visible in the assembly. In case if you use __unlikelly, compiler moves
this code out of hot path minimizing the amount of instructions to decode.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #33 from Dmitriy Ovdienko ---
@Jonathan Wakely
I have one idea to improve code of p_m_r
I expect that allocation are very rare operation. If true, it makes sense to
add __unlikelly to `if (!__p)` inside the `do_allocate` member func
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #32 from Dmitriy Ovdienko ---
What bothers me is does why Rust generate less cpucache-references.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #31 from Dmitriy Ovdienko ---
Created attachment 49202
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49202&action=edit
Modified solution with 2-level memory pools
I believe I'm done with this task. Attached is a solution based
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #30 from Dmitriy Ovdienko ---
> Dividing estimated size by 2 to counter the over-allocation effect:
Good idea... but it smells bad :)
What if someone change allocation algorithm...?
> Since the poolSize function actually returns siz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #29 from Dmitriy Ovdienko ---
Table above isn't readable. Value for `cache-references` .. faults
are divided by 1000
Sorry for flood.
| CPU counter | Rust | C++ before |C++ now | C++ malloc |
|--|-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #28 from Dmitriy Ovdienko ---
Added CPU counters for malloc-based allocator as a base point
| CPU counter | Rust | C++ before |C++ now | C++
malloc |
|--|---:|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #27 from Dmitriy Ovdienko ---
Following are CPU counters I've got for improved C++ test vs Rust and original
C++ test by Danial Klimkin
| CPU counter | Rust | C++ before |C++ now |
|--|-
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #26 from Dmitriy Ovdienko ---
Created attachment 49201
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49201&action=edit
Modified solution (thread per iteration)
Attached is a code similar to what Rust sample is doing (parallel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #21 from Dmitriy Ovdienko ---
> This is only the second time
> I've ever received any indication anybody is even using the
> header, so I've not wasted my time tuning it.
I used it to create an order book :). It helped me to impro
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #10 from Dmitriy Ovdienko ---
Looks like I know why C++ sample does not use all the CPU resources.
C++ does not load threads equally. Last thread gets the most heavy task
(MAX_DEPTH) and performs N iterations alone.
Rust code instea
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #8 from Dmitriy Ovdienko ---
Same as above for Depth = 19
| | PMR |Malloc |
|---|---|---|
| cache-references |16,571,923 |16,260,256 |
| cache-misse
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #7 from Dmitriy Ovdienko ---
Following are CPU counters for single threaded code. Pre-allocation is enabled.
Memory pool is created inside the loop.
```cpp
int poolSize(int depth)
{
return (1 << (depth + 1)) * sizeof(Node);
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #6 from Dmitriy Ovdienko ---
> looking at cache-misses counter does not make sense here
Well, if you compare Rust and C++, cache-misses CPU counter differs
dramatically... and page-faults too... while amount of instructions is the
sa
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #4 from Dmitriy Ovdienko ---
Created attachment 49190
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49190&action=edit
Modified solution with custom allocator based on malloc (simplified, single
threaded)
Attached is a benchmar
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #3 from Dmitriy Ovdienko ---
Created attachment 49189
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49189&action=edit
Original implementation (simplified, single threaded)
Attached is a simplified original version of the bench
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #2 from Dmitriy Ovdienko ---
Created attachment 49185
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49185&action=edit
Modified solution with custom allocator based on malloc
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #1 from Dmitriy Ovdienko ---
Created attachment 49184
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49184&action=edit
Original implementation with preallocated buffer
Priority: P3
Component: libstdc++
Assignee: unassigned at gcc dot gnu.org
Reporter: dmitriy.ovdienko at gmail dot com
Target Milestone: ---
Created attachment 49183
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49183&action=edit
Original implementation
36 matches
Mail list logo