Thank you for writing this informative post. It and explains the issue well 
and should be shared with others with the same question.

Honestly, Go has made me scared of locks. Every time I think of using one, 
I realize how bad the actual design of the program is and end up 
refactoring it. The end result is always better. Channels take care of sync 
+ communication, locks take care of sync only. A process that needs to 
synchronize but not communicate doesn't make much sense anymore. 

On Saturday, August 12, 2017 at 2:37:45 PM UTC-7, Michael Jones wrote:
>
> snmed, 
>
> My disappointment with that blog post is that it does not address the 
> larger issue, which some people here have clarified, but to be clear on the 
> larger issue and the various points raised about channels:
>
> SMALL BENCHMARKS ARE HARD
>
> Benchmarking is harder than it seems, because computers since the IBM 
> 360/85 (1969), the DEC PDP-11/70 (1975), and a others before have memory 
> caches. They can also have multiple CPUs, multiple levels of memory cache, 
> and since 1964's CDC 6000, multiple threads active in a single CPU (the 
> basis of the first NVIDIA GeForce GPU and rediscovered by intel as Jackson 
> Technology, aka SMT, a few years later).
>
> In a world where computers read, write, and compute multiple things at 
> once, it is quite difficult to measure the cost of any single thing. Why? 
> Because if that one thing can be done in some idle part of the computer 
> while other things are happening, then the effective cost is zero. 
>
> Even if you manage to measure that one thing somehow with sufficient 
> scaffolding and after disabling your modern CPU's power saving speed 
> degrader and myriad other processes and performance modulators such as 
> inside the CPU uOp dispatch priority and the like, the problem is, how to 
> understand it and extrapolate from it.
>
> To be clear, if it takes a perfectly measured average of 140 ns +/- 11ns 
> to do something, and you do 100 of them, it will likely not add 100x that 
> time or 100x that variance to your application's actual performance rates. 
> Maybe all 100 of those fit in "empty spots" so the cost is zero. Maybe they 
> conflict with instruction scheduling, bus activity, or cache/VM activity in 
> such a way that they add 100000x that time. 
>
> This is why microbenchmarks are so hard to understand. They are hard to 
> write properly, they are hard to measure in a meaningful way, and they are 
> hard to extrapolate with confidence. (You should do them, but, you should 
> always keep in mind that the application-level, real-world impact may be 0x 
> or 100x that much cost.)
>
> CONCURRENT PROGRAMMING IS HARD
>
> Doing things concurrently generally implies one or more spreading steps 
> where the code widens one thing to multiple things, and one or more 
> gathering steps where the multiple things narrow to fewer things, 
> ultimately to one thing, like knowing when to exit the program. For decades 
> there have been frequent errors in widening, narrowing, and data and device 
> conflicts in the concurrent phase. Code that works for years may suddenly 
> break. Code that looks simple my never work. Natural intuition in such 
> cases often leads one astray.
>
> One solution is to be afraid of concurrency and avoid it. One is to 
> embrace it, but with special armor as if handling hot lava or a pit of 
> vipers. A middle route is to only allow it in such a way that it its tame 
> (Lava in insulated containers, snakes asleep in darkened boxes.) One such 
> mild route--Communicating Sequential Processes--was pioneered by C. A. R. 
> Hoare, the inventor of Quicksort. Per Brinch Hansen has an excellent book 
> about OS construction via the method, and Go is one of CSP's direct 
> decedents. Go's channels, along with select, receive, send, and close 
> operations, are its presence. 
>
> SMALL BENCHMARKS OF CONCURRENCY PRIMITIVES IS VERY HARD
>
> It is hard to measure directly in much the same way it is hard to directly 
> measure curved space-time. Indirect ways are hard too. As above, even when 
> you can measure them, it is hard to understand what that data says in the 
> context of your program or any program other than the test harness. 
>
> This is where that blog post comes in. To paraphrase, "I think some of 
> Go's mild, safe mechanisms lack a feature that I wish for, and not only 
> that, when I use them to emulate some parts of my low-level lava-juggling 
> armor, they are not as fast. Oh no! Yet, I still love Go." Well people see 
> that, seem to miss:
>
> a. Why in the heck would you use high-level magic to emulate low-level 
> tools? In the case of channels, they already use lava juggling and snake 
> charming tools hidden safely inside their implementation.
>
> b. How can you compare performance of high level program structuring 
> elements and low-level viper wrangling tools? Whichever is 'faster' or 
> 'simpler' for the same task is likely a misapplication of one or both.
>
> c. What about the whole notion of making concurrency safe and easy? 
> Experienced people from Tony Hoare to Rob Pike have seen the light about 
> hiding the parts of concurrency that are so often tools of self-destruction 
> in the hands of very good programmers. Why tempt beginners to open that 
> door? Why tempt anyone?
>
> That's what I think when people comment on that post. Sure, new features 
> could be considered. Sure, existing tools can be tweaked toward hardware 
> optimal implementation. Sure, Go provides all the lava and snake tools one 
> could need in the sync package. But unless your well-designed and 
> well-implemented application is too inefficient as it scales from 1 to N 
> CPUs, then why would you think to abandon magic that brings simplicity and 
> correctness to what was formerly a wasteland of failed efforts and 
> inscrutable bugs? ("My plane flies 0.0000001% faster without the weight of 
> my parachute so i leave it behind" is not a well-considered approach.)
>
> How would you know if an application was that inefficient? By benchmarking 
> THE WHOLE APPLICATION rather than an emulation of a low-level concurrency 
> primitive.
>
> Chris,
>
> Channels are not "expensive" it is just that they are not free. I can do 
> 3,000,000 channel send/receive pairs per second on my notebook computer. If 
> each send is a single bit, that's 366kb/second safely and easily sent 
> between communicating processes. If each is a pointer to a 1MB data 
> structure, then that's 3 TB/sec safely and easily sent between 
> communicating processes. It is not likely that any application can do 3 
> million interesting tasks (build and send web pages, compute market 
> conditions and send buy/sell orders, update databases, etc.) on any 
> computer, much less a four core mobile device on battery power.
>
> Maybe it is possible to do 5x or 20x that many mutex-protected increments 
> of an integer using those viper-handling gloves and body armor. But a 
> computer that is dedicated to updating a single int is questionable, and an 
> application dominated by it is also questionable.
>
> If anything, I'd rather put a sleep in all the sync primitives and force 
> the mental discipline to make the application fast DESPITE artificially 
> slow sync/cond/mutex/wait/... speeds. It is all about high-level design, 
> choice of algorithms and data structures, and similar issues--that's where 
> 100x gains in performance await. 2x on a mutex is just not interesting to 
> me. 
>
> On Sat, Aug 12, 2017 at 5:34 AM, Jesper Louis Andersen <
> jesper.lou...@gmail.com <javascript:>> wrote:
>
>> On Fri, Aug 11, 2017 at 2:22 PM Chris Hopkins <cbeho...@gmail.com 
>> <javascript:>> wrote:
>>
>>> .... The microsecond or so of cost you see I understood was *not* due to 
>>> there being thousands of operations needed to run the channel, but the 
>>> latency added by the stall, and scheduler overhead.
>>>
>>  
>> One particular case, which many benchmarks end up doing is that they run 
>> a single operation through the system which in turn pays all the context 
>> switching overhead for that operation. But channels pipeline. If you start 
>> running a million operations, then the switching overhead amortizes over 
>> the operations if your system is correctly asynchronous and tuned.
>>
>> I think most message passing languages add some kind of atomics in order 
>> to track counters and like stuff without resorting to sending around 
>> microscopic messages all the time. 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Michael T. Jones
> michae...@gmail.com <javascript:>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to