On Thu, Nov 19, 2020 at 05:42:34PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 19 November 2020 16:35
> > I just meant "valid C language code as defined by the standards". Many
> > people want all UB to just go away, while that is *impossible* to
On Thu, Nov 19, 2020 at 09:59:51AM -0500, Steven Rostedt wrote:
> On Thu, 19 Nov 2020 08:37:35 -0600
> Segher Boessenkool wrote:
> > > Note that we have a fairly extensive tradition of defining away UB with
> > > language extentions, -fno-strict-overflow, -fno-strict-al
On Thu, Nov 19, 2020 at 09:36:48AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 18, 2020 at 01:11:27PM -0600, Segher Boessenkool wrote:
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
>
On Wed, Nov 18, 2020 at 02:33:43PM -0500, Steven Rostedt wrote:
> On Wed, 18 Nov 2020 13:11:27 -0600
> Segher Boessenkool wrote:
>
> > Calling this via a different declared function type is undefined
> > behaviour, but that is independent of how the function is *defined*.
On Wed, Nov 18, 2020 at 01:58:23PM -0500, Steven Rostedt wrote:
> I wonder if we should define on all architectures a void void_stub(void),
> in assembly, that just does a return, an not worry about gcc messing up the
> creation of the stub function.
>
> On x86_64:
>
> GLOBAL(void_stub)
> r
On Wed, Nov 18, 2020 at 07:31:50PM +0100, Florian Weimer wrote:
> * Segher Boessenkool:
>
> > On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> >> I could change the stub from (void) to () if that would be better.
> >
> > Don't? In a function
On Wed, Nov 18, 2020 at 12:17:30PM -0500, Steven Rostedt wrote:
> I could change the stub from (void) to () if that would be better.
Don't? In a function definition they mean exactly the same thing (and
the kernel uses (void) everywhere else, which many people find clearer).
In a function declar
On Wed, Oct 28, 2020 at 10:57:45PM -0400, Arvind Sankar wrote:
> On Wed, Oct 28, 2020 at 04:20:01PM -0700, Alexei Starovoitov wrote:
> > All compilers have bugs. Kernel has bugs. What can go wrong?
Heh.
> +linux-toolchains. GCC updated the documentation in 7.x to discourage
> people from using th
On Fri, Oct 23, 2020 at 09:28:59PM +, David Laight wrote:
> From: Segher Boessenkool
> > Sent: 23 October 2020 19:27
> > On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> > > On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
> > > On
On Fri, Oct 23, 2020 at 06:58:57PM +0100, Al Viro wrote:
> On Fri, Oct 23, 2020 at 03:09:30PM +0200, David Hildenbrand wrote:
>
> > Now, I am not a compiler expert, but as I already cited, at least on
> > x86-64 clang expects that the high bits were cleared by the caller - in
> > contrast to gcc.
On Thu, Aug 13, 2020 at 02:39:10PM +, Christophe Leroy wrote:
> ppc6xx_defconfig fails building sfc.ko module, complaining
> about the lack of _umoddi3 symbol.
>
> This is due to the following test
>
> if (EFX_MIN_DMAQ_SIZE % reader->value) {
>
> Because reader->value is u64.
>
On Thu, May 24, 2018 at 10:18:44AM +, Christophe Leroy wrote:
> On 05/24/2018 06:20 AM, Christophe LEROY wrote:
> >Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> >>On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> >>>The generic csum_ipv6_
On Thu, May 24, 2018 at 08:20:16AM +0200, Christophe LEROY wrote:
> Le 23/05/2018 à 20:34, Segher Boessenkool a écrit :
> >On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> >>+_GLOBAL(csum_ipv6_magic)
> >>+ lwz r8, 0(r3)
> >>+ lwz
On Tue, May 22, 2018 at 08:57:01AM +0200, Christophe Leroy wrote:
> The generic csum_ipv6_magic() generates a pretty bad result
Please try with a more recent compiler, what you used is pretty ancient.
It's not like recent compilers do great on this either, but it's not
*that* bad anymore ;-)
>
"volatile" has nothing to do with reordering. atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.
Stores can be reordered. Only x86 has (mostly) impli
At some point in the future, barrier() will be universally regarded as
a hammer too big for most purposes. Whether or not removing it now
You can't just remove it, it is needed in some places; you want to
replace it in most places with a more fine-grained "compiler barrier",
I presume?
constit
Let me say it more clearly: On ARM, it is impossible to perform atomic
operations on MMIO space.
Actually, no one is suggesting that we try to do that at all.
The discussion about RMW ops on MMIO space started with a comment
attributed to the gcc developers that one reason why gcc on x86
doesn'
And no, RMW on MMIO isn't "problematic" at all, either.
An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode. But three actual operations.
Maybe for some CPUs, but not all. ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.
And no, RMW on MMIO isn't "problematic" at all, either.
An RMW op is a read op, a modify op, and a write op, all rolled
into one opcode. But three actual operations.
Maybe for some CPUs, but not all. ARM for instance can't use the
load exclusive and store exclusive instructions to MMIO space.
Such code generally doesn't care precisely when it gets the update,
just that the update is atomic, and it doesn't loop forever.
Yes, it _does_ care that it gets the update _at all_, and preferably
as early as possible.
Regardless, I'm convinced we just need to do it all in assembly.
So do y
Right. ROTFL... volatile actually breaks atomic_t instead of making
it safe. x++ becomes a register load, increment and a register store.
Without volatile we can increment the memory directly. It seems that
volatile requires that the variable is loaded into a register first
and then operated up
The documentation simply doesn't say "+m" is allowed. The code to
allow it was added for the benefit of people who do not read the
documentation. Documentation for "+m" might get added later if it
is decided this [the code, not the documentation] is a sane thing
to have (which isn't directly obv
The "asm volatile" implementation does have exactly the same
reordering guarantees as the "volatile cast" thing,
I don't think so.
"asm volatile" creates a side effect.
Yeah.
Side effects aren't
allowed to be reordered wrt sequence points.
Yeah.
This is exactly
the same reason as why "
No it does not have any volatile semantics. atomic_dec() can be
reordered
at will by the compiler within the current basic unit if you do not
add a
barrier.
"volatile" has nothing to do with reordering.
If you're talking of "volatile" the type-qualifier keyword, then
http://lkml.org/lkml/200
atomic_dec() writes
to memory, so it _does_ have "volatile semantics", implicitly, as
long as the compiler cannot optimise the atomic variable away
completely -- any store counts as a side effect.
I don't think an atomic_dec() implemented as an inline "asm volatile"
or one that uses a "forget" m
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
[ This is exactly equivalent to using "+m" in the constraints, as
recently
explained on a GCC list somewhere, in response to the patch in my
bitops
series a few weeks back where I thought "+m" was bogus. ]
[It wasn't ex
#define forget(a) __asm__ __volatile__ ("" :"=m" (a) :"m" (a))
[ This is exactly equivalent to using "+m" in the constraints, as
recently
explained on a GCC list somewhere, in response to the patch in my
bitops
series a few weeks back where I thought "+m" was bogus. ]
[It wasn't ex
Here, I should obviously admit that the semantics of *(volatile int
*)&
aren't any neater or well-defined in the _language standard_ at all.
The
standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help u
Now the second wording *IS* technically correct, but come on, it's
24 words long whereas the original one was 3 -- and hopefully anybody
reading the shorter phrase *would* have known anyway what was meant,
without having to be pedantic about it :-)
Well you were talking pretty formal (and detail
In a reasonable world, gcc should just make that be (on x86)
addl $1,i(%rip)
on x86-64, which is indeed what it does without the volatile. But with
the
volatile, the compiler gets really nervous, and doesn't dare do it in
one
instruction, and thus generates crap like
movl
(and yes, it is perfectly legitimate to
want a non-volatile read for a data type that you also want to do
atomic RMW operations on)
...which is undefined behaviour in C (and GCC) when that data is
declared volatile, which is a good argument against implementing
atomics that way in itself.
Segh
Of course, since *normal* accesses aren't necessarily limited wrt
re-ordering, the question then becomes one of "with regard to *what*
does
it limit re-ordering?".
A C compiler that re-orders two different volatile accesses that have a
sequence point in between them is pretty clearly a buggy co
atomic_dec() already has volatile behavior everywhere, so this is
semantically
okay, but this code (and any like it) should be calling cpu_relax()
each
iteration through the loop, unless there's a compelling reason not
to. I'll
allow that for some hardware drivers (possibly this one) such a
co
Part of the motivation here is to fix heisenbugs. If I knew
where they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
op
Here, I should obviously admit that the semantics of *(volatile int *)&
aren't any neater or well-defined in the _language standard_ at all.
The
standard does say (verbatim) "precisely what constitutes as access to
object of volatile-qualified type is implementation-defined", but GCC
does help u
Note that "volatile"
is a type-qualifier, not a type itself, so a cast of the _object_
itself
to a qualified-type i.e. (volatile int) would not make the access
itself
volatile-qualified.
There is no such thing as "volatile-qualified access" defined
anywhere; there only is the concept of a "vo
I can't speak for this particular case, but there could be similar
code
examples elsewhere, where we do the atomic ops on an atomic_t object
inside a higher-level locking scheme that would take care of the kind
of
problem you're referring to here. It would be useful for such or
similar
code if
I'd go so far as to say that anywhere where you want a non-"volatile"
atomic_read, either your code is buggy, or else an int would work just
as well.
Even, the only way to implement a "non-volatile" atomic_read() is
essentially as a plain int (you can do some tricks so you cannot
assign to the r
The only thing volatile on an asm does is create a side effect
on the asm statement; in effect, it tells the compiler "do not
remove this asm even if you don't need any of its outputs".
It's not disabling optimisation likely to result in bugs,
heisen- or otherwise; _not_ putting the volatile on a
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
op
A volatile default would disable optimizations for atomic_read.
atomic_read without volatile would allow for full optimization by the
compiler. Seems that this is what one wants in many cases.
Name one such case.
An atomic_read should do a load from memory. If the programmer puts
an atomic_rea
"compilation unit" is a C standard term. It typically boils down
to "single .c file".
As you mentioned later, "single .c file with all the other files
(headers
or other .c files) that it pulls in via #include" is actually
"translation
unit", both in the C standard as well as gcc docs.
Yeah
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Precisely the point -- use of volatile (whether in casts or on asms)
in these cases are intended to disable those
Part of the motivation here is to fix heisenbugs. If I knew where
they
By the same token we should probably disable optimisations
altogether since that too can create heisenbugs.
Almost everything is a tradeoff; and so is this. I don't
believe most people would find disabling all compiler
op
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.
Last I checked, the Linux kernel build system did compile each .c
file
as a separate compilation unit.
I have som
Please check the definition of "cache coherence".
Which of the twelve thousand such definitions? :-)
Summary: the CPU is indeed within its rights to execute loads and
stores
to a single variable out of order, -but- only if it gets the same
result
that it would have obtained by executing them
Possibly these were too trivial to expose any potential problems that
you
may have been referring to, so would be helpful if you could write a
more
concrete example / sample code.
The trick is to have a sufficiently complicated expression to force
the compiler to run out of registers.
You ca
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit boundaries just fine, if you tell it to
compile more than one compilation unit at once.
Last I checked, the Linux kernel build system did compile each .c file
as a separate compilation unit.
I have some
Of course, if we find there are more callers in the kernel who want
the
volatility behaviour than those who don't care, we can re-define the
existing ops to such variants, and re-name the existing definitions
to
somethine else, say "atomic_read_nonvolatile" for all I care.
Do we really need a
I think this was just terminology confusion here again. Isn't "any
code
that it cannot currently see" the same as "another compilation unit",
and wouldn't the "compilation unit" itself expand if we ask gcc to
compile more than one unit at once? Or is there some more specific
"definition" for "com
What volatile does are a) never optimise away a read (or write)
to the object, since the data can change in ways the compiler
cannot see; and b) never move stores to the object across a
sequence point. This does not mean other accesses cannot be
reordered wrt the volatile access.
If the abstract
What you probably mean is that the compiler has to assume any code
it cannot currently see can do anything (insofar as allowed by the
relevant standards etc.)
I think this was just terminology confusion here again. Isn't "any code
that it cannot currently see" the same as "another compilation un
How does the compiler know that msleep() has got barrier()s?
Because msleep_interruptible() is in a separate compilation unit,
the compiler has to assume that it might modify any arbitrary global.
No; compilation units have nothing to do with it, GCC can optimise
across compilation unit bounda
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.
rmb() orders *any* two reads; that includes two reads from the same
location.
If the two reads are to the same location, all CPUs I am aware of
will
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use
it.
rmb() orders *any* two reads; that includes two reads from the same
location.
If the two reads are to the same location, all CPUs I am aware of
will
Well if there is only one memory location involved, then smp_rmb()
isn't
going to really do anything anyway, so it would be incorrect to use it.
rmb() orders *any* two reads; that includes two reads from the same
location.
Consider that smp_rmb basically will do anything from flushing the
pip
"Volatile behaviour" itself isn't consistently defined (at least
definitely not consistently implemented in various gcc versions across
platforms),
It should be consistent across platforms; if not, file a bug please.
but it is /expected/ to mean something like: "ensure that
every such access a
Yeah. Compiler errors are more annoying though I dare say ;-)
Actually, compile-time errors are fine,
Yes, they don't cause data corruption or anything like that,
but I still don't think the 390 people want to ship a kernel
that doesn't build -- and it seems they still need to support
GCC ver
"+m" works. We use it. It's better than the alternatives. Pointing to
stale documentation doesn't change anything.
Well, perhaps on i386. I've seen some older versions of the s390 gcc
die
with an ICE because I have used "+m" in some kernel inline assembly.
I'm
happy to hear that this issue is
Yes, though I would use "=m" on the output list and "m" on the input
list. The reason is that I've seen gcc fall on its face with an ICE
on
s390 due to "+m". The explanation I've got from our compiler people
was
quite esoteric, as far as I remember gcc splits "+m" to an input
operand
and an out
Note that last line.
Segher, how about you just accept that Linux uses gcc as per reality,
and
that sometimes the reality is different from your expectations?
"+m" works.
It works _most of the time_. Ask Martin. Oh you don't even have to,
he told you two mails ago. My last mail simply po
You'd have to use "+m".
Yes, though I would use "=m" on the output list and "m" on the input
list. The reason is that I've seen gcc fall on its face with an ICE on
s390 due to "+m". The explanation I've got from our compiler people was
quite esoteric, as far as I remember gcc splits "+m" to an i
That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to. Not often though, since it hardly ever
helps in the cost model it employs.
Please give an example code snippet + gcc version + arch
to back this up.
u
That means GCC cannot compile Linux; it already optimises
some accesses to scalars to smaller accesses when it knows
it is allowed to. Not often though, since it hardly ever
helps in the cost model it employs.
Please give an example code snippet + gcc version + arch
to back this up.
u
The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine. I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard
really
does permit this.
Code all over the kernel assumes that 32-bit reads/writes
are
So, why not use the well-defined alternative?
Because we don't need to, and it hurts performance.
It hurts performance by implementing 32-bit atomic reads in assembler?
No, I misunderstood the question. Implementing 32-bit atomic reads in
assembler is redundant, because any sane compiler, *p
Anyway, what's the supposed advantage of *(volatile *) vs. using
a real volatile object? That you can access that same object in
a non-volatile way?
You'll have to take that up with Linus and the minds behind Volatile
Considered Harmful, but the crux of it is that volatile objects are
prone t
Anyway, what's the supposed advantage of *(volatile *) vs. using
a real volatile object? That you can access that same object in
a non-volatile way?
That's my understanding. That way accesses where you don't care about
volatility may be optimised.
But those accesses might be done non-atomic
Explicit
+casting in atomic_read() ensures consistent behavior across
architectures
+and compilers.
Even modulo compiler bugs, what makes you believe that?
When you declare a variable volatile, you don't actually tell the
compiler where you want to override its default optimization behavior,
If you need to guarantee that the value is written to memory at a
particular time in your execution sequence, you either have to read it
from memory to force the compiler to store it first
That isn't enough. The CPU will happily read the datum back from
its own store queue before it ever hit m
The only safe way to get atomic accesses is to write
assembler code. Are there any downsides to that? I don't
see any.
The assumption that aligned word reads and writes are atomic, and
that words are aligned unless explicitly packed otherwise, is
endemic in the kernel. No sane compiler viol
The compiler is within its rights to read a 32-bit quantity 16 bits at
at time, even on a 32-bit machine. I would be glad to help pummel any
compiler writer that pulls such a dirty trick, but the C standard
really
does permit this.
Yes, but we don't write code for these compilers. There are
The only safe way to get atomic accesses is to write
assembler code. Are there any downsides to that? I don't
see any.
The assumption that aligned word reads and writes are atomic, and that
words are aligned unless explicitly packed otherwise, is endemic in
the kernel. No sane compiler viol
Historically this has been
+accomplished by declaring the counter itself to be volatile, but the
+ambiguity of the C standard on the semantics of volatile make this
practice
+vulnerable to overly creative interpretation by compilers.
It's even worse when accessing through a volatile casted poi
We can't have split stores because we don't use atomic64_t on 32-bit
architectures.
That's not true; the compiler is free to split all stores
(and reads) from memory however it wants. It is debatable
whether "volatile" would prevent this as well, certainly
it is unsafe if you want to be portabl
How about separate autoneg to a property "dumb-phy", which
indicates the
PHY/switch doesn't provide MII register interface.
Something like that I suppose. But don't call it "dumb phy",
nor "fake phy", nor anything similar -- there simply is _no_
phy. If the Linux code wants to pretend there
I wish there was a git option to "just make my shit look like the
remote, dammit!" The above is the "easiest" way I know how to do that.
git-fetch -f remote:local ?
Segher
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More m
If you're going to be paranoid, shouldn't you do something here to
make
sure the value's hit the device?
I thought the whole point of paranoia is that its inexplicable.
Here's a delusional reply: I didn't see any point to it.
1) a wmb would add overhead
A wmb() doesn't guarantee the write ha
Well, Segher doesn't want me to use iobarrier (because it's not I/O).
Andy doesn't want me to use wmb() (because it's sync). I don't think
something like gfar_wmb() would be appropriate. So the remaining
options are either eieio(),
? Just curious... the original intent of eieio was to order I/
So what about some thing like this where we do the read only once?
- k
diff --git a/drivers/net/gianfar.c b/drivers/net/gianfar.c
index a06d8d1..9cd7d1e 100644
--- a/drivers/net/gianfar.c
+++ b/drivers/net/gianfar.c
@@ -1438,31 +1438,35 @@ int gfar_clean_rx_ring(struct net_device
*dev, int rx_w
And the driver is already ppc-specific; it uses in/out_be32.
True, but its hidden behind the gfar_read/write accessors.
Your change is a bit more blatant.
Well, Segher doesn't want me to use iobarrier (because it's not I/O).
Andy doesn't want me to use wmb() (because it's sync).
You should
AFAICS you need stronger barriers though; {w,r,}mb(),
to prevent _any_ reordering of those memory accesses,
not just the compiler-generated ones.
My impression was that the eieio used by iobarrier would be sufficient
for that, as we're not trying to synchronize between accesses to
different ty
The hardware must not see that is given ownership of a buffer until it
is
completely written, and when the driver receives ownership of a buffer,
it must ensure that any other reads to the buffer reflect its final
state. Thus, I/O barriers are added where required.
Without this patch, I have ob
I thought the motivation for div64() was that a 64:32->32 divide could
be done a lot faster on a number of platforms (including the important
x86) than a generic 64:64->64 divide, but gcc doesn't handle the
devolution automatically -- there is no such libgcc function.
That there's no such func
Sure, PCI busses are little-endian. But is readX()/writeX() for
PCI
only?
Yes.
For other buses, use foo_writel(), etc.
Can this please be documented then? Never heard this before...
You have come late to the party.
WHat do you mean here? Could you please explain?
This has been the c
Well, I'm having trouble thinking of other busses that have as strong
a sense of the "address-data" style I/O as PCI. Busses like scsi and
ide are primarily "command-data" or "data-data" in style. Only the
address-data style busses need readl/writel-style routines.
SBUS, JBUS, VMEbus, NuBus, Ra
Sure, PCI busses are little-endian. But is readX()/writeX() for PCI
only?
Yes.
For other buses, use foo_writel(), etc.
Can this please be documented then? Never heard this before...
Segher
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EM
Sure, PCI busses are little-endian. But is readX()/writeX() for PCI
only? I sure hope not.
It's defined for PCI and possibly ISA memory. You can use it for other
things if you whish to, but "other things" are arch specific in any
case.
Huh? You're saying that only PCI and ISA are standardis
Nah. We have the basic rule that readl/writel are little endian.
PowerPC
additionally provides arch specific low level in_{be,le}32 type
accessors with explicit endianness. Or you can also use
cpu_to_le32/le32_to_cpu kind of macros to convert between native and
explicit endianness.
Sure, PCI b
#define tw32_rx_mbox(reg, val) do { wmb();
tp->write32_rx_mbox(tp, reg, val); } while(0)
#define tw32_tx_mbox(reg, val) do { wmb();
tp->write32_tx_mbox(tp, reg, val); } while(0)
That should do it.
I think we need those tcpdump after all. Can you send it to me?
Looks like adding a sync
I've been chasing with Segher a data corruption problem lately.
Basically transferring huge amount of data (several Gb) and I get
corrupted data at the rx side. I cannot tell for sure wether what
I've
been observing here is the same problem that segher's been seing
on is
blades, he will confir
The patch has a couple of places where I reversed 2 assignments, they
are harmless, it was before I figured out that the chip will
(apparently) not access a descriptor before it's been told to do so
via
MMIO, and thus the order of the writes to the descriptors is
irrelevant
(I was also adding
92 matches
Mail list logo