If it works, it's fine, I'll just keep using vtune. I only work on x86 
anyway. That said, I found another miracle, my program has 13 routines as 
soon as it starts. It's so peculiar. I simply can't understand why this is.

This is my code:

[image: 2021-02-02 15-45-01 的屏幕截图.png]
And then this is the result, it's amazing.I think I know why my program is 
slow, the number of routines is too high, but I found that the GOMAXPROCS 
function doesn't work, it's a really confusing phenomenon for me.
My example did not do anything, my understanding of the number of runtines 
should be 1 only Ah.
[image: 2021-02-02 15-45-49 的屏幕截图.png]
在2021年2月2日星期二 UTC+8 下午3:27:45<Amnon> 写道:

> Vtune is very useful for squeezing the ultimate performance out of Go 
> programs, once you have done
> the usual optimisation, mimized allocations, io etc. 
>
> pprof is more than adequate for the average programmer. But when you need 
> to super-optimise 
> functions which implement math kernels, crypto functions, video codecs 
> etc, then without a HW perfomance
> counter based profiler such as vtune or linux perf, (
> https://perf.wiki.kernel.org/index.php/Main_Page)  you are shooting in 
> the dark.
> vtune not only tells you which functions are taking the most time, but WHY 
> these are taking a long time,
> how long the code is spending waiting for cache misses, and the different 
> kind of stall cycles which 
> kill performance on a modern CPU.
>
> Vtune or perf is also a great tool for teaching us about processors, and 
> helping us understand what influences
> the rate at which instructions are executed by them.
>
> The problem with vtune is that it is quite unfriendly and expensive (> 
> $3000 for a single floating license)!
> It also does not work on ARM processors (such as Apple M1).
>
> There has been a proposal to add performance counters to pprof.
>
> https://go.googlesource.com/proposal/+/refs/changes/08/219508/2/design/36821-perf-counter-pprof.md
> If accepted, this would give the power of vtune to the masses for free..
>
> On Tuesday, 2 February 2021 at 06:37:37 UTC nnsm...@gmail.com wrote:
>
>> One more question, is it effective to use vtune to tune golang. I am 
>> afraid that vtune is not suitable, although intel claims to be effective.
>> 在2021年2月2日星期二 UTC+8 下午2:32:40<颜文泽> 写道:
>>
>>> Thanks, it's not memory db, but my current test is not involving io. 
>>> I'll take time to look at your information, thanks a lot. Also I found that 
>>> many of the functions with high cpi rate are runtime functions, is the 
>>> overhead of these functions unavoidable?The following diagram is for a 
>>> single routine:
>>> [image: 2021-02-02 14-25-33 的屏幕截图.png]
>>> The following chart is for the 8 routines:
>>> [image: 2021-02-02 14-25-56 的屏幕截图.png]
>>> 在2021年2月2日星期二 UTC+8 下午2:27:39<ren...@ix.netcom.com> 写道:
>>>
>>>> Unless it is an in memory database, I would expect the IO costs to 
>>>> dwarf the cpu costs, but I guess a lot depends on how you define 
>>>> ‘analytical processing’.
>>>>
>>>> In my experience, “out of the box” performance of Go routines in IO 
>>>> processing is outstanding.
>>>>
>>>> For the cpu bound case, I think with threads, cpu assignments (cpuset), 
>>>> etc. you can probably create a higher performing system in some cases - 
>>>> but 
>>>> it’s a lot of work.
>>>>
>>>> Even without that, I think the scheduler in most Linux systems is more 
>>>> mature than the Go scheduler, and makes better choices for cache affinity, 
>>>> etc. It’s very hard to design a high performance cpu bound system that 
>>>> runs 
>>>> on a general purpose OS or language/platform. Without knowledge of the 
>>>> olap 
>>>> db design it is very hard to make a recommendation.
>>>>
>>>> This is some suggested reading to help you in your journey 
>>>> https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html
>>>>
>>>> On Feb 2, 2021, at 12:07 AM, 颜文泽 <nnsm...@gmail.com> wrote:
>>>>
>>>> I don't know much about the internal implementation of golang, sorry. I 
>>>> was a c programmer and I tried to implement the original logic (olap 
>>>> database) by using routine as a thread replacement. But I found that I 
>>>> would encounter bottlenecks, and I don't know how to solve them. Maybe I 
>>>> should study the implementation of routine before I can write the right 
>>>> code.
>>>>
>>>> 在2021年2月2日星期二 UTC+8 下午12:21:44<ren...@ix.netcom.com> 写道:
>>>>
>>>>> You wrote “I found that cache misses from routines switching is also a 
>>>>> headache”. 
>>>>>
>>>>> They would not be switching if they are cpu bound and there are less 
>>>>> of than number of cpus. Remember too that you need some % of the cpus to 
>>>>> execute the runtime GC code and other housekeeping. 
>>>>>
>>>>> > On Feb 1, 2021, at 10:04 PM, 颜文泽 <nnsm...@gmail.com> wrote: 
>>>>> > 
>>>>> > I found that cache misses from routines switching is also a headache 
>>>>>
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "golang-nuts" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to golang-nuts...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit 
>>>> https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/26052e54-4049-4a77-bb19-7b11cb02f7can%40googlegroups.com.

Reply via email to