On 2/5/17 1:59 PM, Jason Harmening wrote:
Actually attaching the patch this time (**** gmail client)

On Sun, Feb 5, 2017 at 10:58 AM, Jason Harmening
<jason.harmen...@gmail.com <mailto:jason.harmen...@gmail.com>> wrote:

    Hmm, it's a good idea to consider the possibility of a barrier
    issue.  It wouldn't be the first time we've had such a problem on a
    weakly-ordered architecture. That said, I don't see a problem in
    this case.  smp_rendezvous_cpus() takes a spinlock and then issues
    atomic_store_rel_int()  to ensure the rendezvous params are visible
    to other cpus.  The latter corresponds to lwsync on powerpc, which
    AFAIK should be sufficient to ensure visibility of prior stores.

    For now I'm going with the simpler explanation that I made a bad
    assumption  in the powerpc get_pcpu() and there is some context in
    which the read of sprg0 doesn't return a consistent pointer value.
    Unfortunately I don't see where that might be right now.

    On the mips side, Kurt/Alexander can you test the attached patch?
    It contains a simple fix to ensure get_pcpu() returns the consistent
    per-cpu pointer.

I applied this patch on top of r313347 (which I had verified that a
kernel built from that revisions to boot from successfully).
The kernel from r313347+(this patch) least gets to multi-user on my ERL.

So, that's a big improvement.

I'll start a native buildworld/buildkernel on the ERL, and that ought
to give it a reasonable workout.

-Kurt


    On Sat, Feb 4, 2017 at 1:34 PM, Svatopluk Kraus <onw...@gmail.com
    <mailto:onw...@gmail.com>> wrote:

        Probably not related. But when I took short look to the patch to see
        what could go wrong, I walked into the following comment in
        _rm_wlock(): "Assumes rm->rm_writecpus update is visible on
        other CPUs
        before rm_cleanIPI is called." There is no explicit barrier to
        ensure
        it. However, there might be some barriers inside of
        smp_rendezvous_cpus(). I have no idea what could happened if this
        assumption is not met. Note that rm_cleanIPI() is affected by the
        patch.



        On Sat, Feb 4, 2017 at 9:39 PM, Jason Harmening
        <jason.harmen...@gmail.com <mailto:jason.harmen...@gmail.com>>
        wrote:
        > Can you post an example of such panic?  Only 2 MI pieces were
        changed,
        > netisr and rmlock.  I haven't seen problems on my own
        amd64/i386/arm testing
        > of this, so a backtrace might help to narrow down the cause.
        >
        > On Sat, Feb 4, 2017 at 12:22 PM, Andreas Tobler
        <andre...@freebsd.org <mailto:andre...@freebsd.org>>
        > wrote:
        >>
        >> On 04.02.17 20:54, Jason Harmening wrote:
        >>>
        >>> I suspect this broke rmlocks for mips because the rmlock
        implementation
        >>> takes the address of the per-CPU pc_rm_queue when building
        tracker
        >>> lists.  That address may be later accessed from another CPU
        and will
        >>> then translate to the wrong physical region if the address
        was taken
        >>> relative to the globally-constant pcpup VA used on mips.
        >>>
        >>> Regardless, for mips get_pcpup() should be implemented as
        >>> pcpu_find(curcpu) since returning an address that may mean
        something
        >>> different depending on the CPU seems like a big POLA
        violation if
        >>> nothing else.
        >>>
        >>> I'm more concerned about the report of powerpc breakage.
        For powerpc we
        >>> simply take each pcpu pointer from the pc_allcpu list (which
        is the same
        >>> value stored in the cpuid_to_pcpu array) and pass it through
        the ap_pcpu
        >>> global to each AP's startup code, which then stores it in
        sprg0.  It
        >>> should be globally unique and won't have the
        variable-translation issues
        >>> seen on mips.   Andreas, are you certain this change was
        responsible the
        >>> breakage you saw, and was it the same sort of hang observed
        on mips?
        >>
        >>
        >> I'm really sure. 313036 booted fine, allowed me to execute heavy
        >> compilation jobs, np. 313037 on the other side gave me
        various patterns of
        >> panics. During startup, but I also succeeded to get into
        multiuser and then
        >> the panic happend during port building.
        >>
        >> I have no deeper inside where pcpu data is used. Justin
        mentioned netisr?
        >>
        >> Andreas
        >>
        >




_______________________________________________
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Reply via email to