John Regehr wrote:
Well, isn't the net effect of volatile simply a more fine-grained
clobbering lock?
Almost but not quite:
- volatile says nothing about the atomicity of any given access
- volatile does not suppress reordering (except with other volatiles)
- volatile has no effect on caches and out-of-order memory subsystems
(not an issue for AVR obviously)
Well, true, but lack of cache coherency is a hardware bug. But I agree
it throws in some interesting ordering issues w.r.t the volatile
keyword. In any case if the CPU writes it's local data cache, that
should invalidate all other copies. (Well, all other copies in data
caches. Instruction caches often require explicit invalidates.)
(I spent quite a few years as a CPU logic designer, starting about 1980.
Pretty much every machine I worked on was out-of-order to some extent,
and had caches of some flavor.)
Also volatile is usually too fine-grained, ensuring consistency-always
instead of what you want (consistency on lock release) and this can
easily lead to inefficiencies.
True in the case of trying to make volatile into a critical section.
Not true for its original use as a way to talk to PDP-11 memory mapped I/O.
That's a counter-intuitive result. The "No idea why." part makes me a
little squinty-eyed. It certainly *could* be a generalizable result,
but then again it might be an artifact of your code structure.
Yep.
My off-the-wall guess was that the clobbers reduced register pressure.
I could not think of an easy way to test that hypothesis.
Forcing early spills helps? Does that say that the optimizer should be
more aggressive about spilling? I should probably get quiet about now
since I'm wandering outside my expertise talking about register allocators.
Finally I'll just add a random plug for a piece of work that a colleague
and I recently completed where we found that most compilers have
problems implementing the volatile qualifier:
http://www.cs.utah.edu/~regehr/papers/emsoft08_submit.pdf
Interesting. I read over section 2. Will have to go back and read the
whole thing.
-dave
OT shaggy dog story about caches: In the 1980's I remember reading the
weekly highlights from another CPU project in our same design center.
During checkout they had discovered a performance bug in the cache
invalidate equation which required a single OR-gate to fix, and boosted
the performance of an important database benchmark by 9% on 4-CPU
systems. One senior logic designer on our project, who was a bit of a
wag, said: "9% from just one OR-gate! We need to get some of those
OR-gates for *our* project!"
_______________________________________________
AVR-GCC-list mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list