If it works, it's fine, I'll just keep using vtune. I only work on x86 anyway. That said, I found another miracle, my program has 13 routines as soon as it starts. It's so peculiar. I simply can't understand why this is.
This is my code: [image: 2021-02-02 15-45-01 的屏幕截图.png] And then this is the result, it's amazing.I think I know why my program is slow, the number of routines is too high, but I found that the GOMAXPROCS function doesn't work, it's a really confusing phenomenon for me. My example did not do anything, my understanding of the number of runtines should be 1 only Ah. [image: 2021-02-02 15-45-49 的屏幕截图.png] 在2021年2月2日星期二 UTC+8 下午3:27:45<Amnon> 写道: > Vtune is very useful for squeezing the ultimate performance out of Go > programs, once you have done > the usual optimisation, mimized allocations, io etc. > > pprof is more than adequate for the average programmer. But when you need > to super-optimise > functions which implement math kernels, crypto functions, video codecs > etc, then without a HW perfomance > counter based profiler such as vtune or linux perf, ( > https://perf.wiki.kernel.org/index.php/Main_Page) you are shooting in > the dark. > vtune not only tells you which functions are taking the most time, but WHY > these are taking a long time, > how long the code is spending waiting for cache misses, and the different > kind of stall cycles which > kill performance on a modern CPU. > > Vtune or perf is also a great tool for teaching us about processors, and > helping us understand what influences > the rate at which instructions are executed by them. > > The problem with vtune is that it is quite unfriendly and expensive (> > $3000 for a single floating license)! > It also does not work on ARM processors (such as Apple M1). > > There has been a proposal to add performance counters to pprof. > > https://go.googlesource.com/proposal/+/refs/changes/08/219508/2/design/36821-perf-counter-pprof.md > If accepted, this would give the power of vtune to the masses for free.. > > On Tuesday, 2 February 2021 at 06:37:37 UTC nnsm...@gmail.com wrote: > >> One more question, is it effective to use vtune to tune golang. I am >> afraid that vtune is not suitable, although intel claims to be effective. >> 在2021年2月2日星期二 UTC+8 下午2:32:40<颜文泽> 写道: >> >>> Thanks, it's not memory db, but my current test is not involving io. >>> I'll take time to look at your information, thanks a lot. Also I found that >>> many of the functions with high cpi rate are runtime functions, is the >>> overhead of these functions unavoidable?The following diagram is for a >>> single routine: >>> [image: 2021-02-02 14-25-33 的屏幕截图.png] >>> The following chart is for the 8 routines: >>> [image: 2021-02-02 14-25-56 的屏幕截图.png] >>> 在2021年2月2日星期二 UTC+8 下午2:27:39<ren...@ix.netcom.com> 写道: >>> >>>> Unless it is an in memory database, I would expect the IO costs to >>>> dwarf the cpu costs, but I guess a lot depends on how you define >>>> ‘analytical processing’. >>>> >>>> In my experience, “out of the box” performance of Go routines in IO >>>> processing is outstanding. >>>> >>>> For the cpu bound case, I think with threads, cpu assignments (cpuset), >>>> etc. you can probably create a higher performing system in some cases - >>>> but >>>> it’s a lot of work. >>>> >>>> Even without that, I think the scheduler in most Linux systems is more >>>> mature than the Go scheduler, and makes better choices for cache affinity, >>>> etc. It’s very hard to design a high performance cpu bound system that >>>> runs >>>> on a general purpose OS or language/platform. Without knowledge of the >>>> olap >>>> db design it is very hard to make a recommendation. >>>> >>>> This is some suggested reading to help you in your journey >>>> https://dave.cheney.net/high-performance-go-workshop/dotgo-paris.html >>>> >>>> On Feb 2, 2021, at 12:07 AM, 颜文泽 <nnsm...@gmail.com> wrote: >>>> >>>> I don't know much about the internal implementation of golang, sorry. I >>>> was a c programmer and I tried to implement the original logic (olap >>>> database) by using routine as a thread replacement. But I found that I >>>> would encounter bottlenecks, and I don't know how to solve them. Maybe I >>>> should study the implementation of routine before I can write the right >>>> code. >>>> >>>> 在2021年2月2日星期二 UTC+8 下午12:21:44<ren...@ix.netcom.com> 写道: >>>> >>>>> You wrote “I found that cache misses from routines switching is also a >>>>> headache”. >>>>> >>>>> They would not be switching if they are cpu bound and there are less >>>>> of than number of cpus. Remember too that you need some % of the cpus to >>>>> execute the runtime GC code and other housekeeping. >>>>> >>>>> > On Feb 1, 2021, at 10:04 PM, 颜文泽 <nnsm...@gmail.com> wrote: >>>>> > >>>>> > I found that cache misses from routines switching is also a headache >>>>> >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "golang-nuts" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to golang-nuts...@googlegroups.com. >>>> >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/golang-nuts/35bccad0-64a9-4796-bc3f-a9cdb8c82961n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> >>>> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/26052e54-4049-4a77-bb19-7b11cb02f7can%40googlegroups.com.