Le 19/08/2016 à 13:14, Holger Brunck a écrit :
On 19/08/16 13:03, Christophe Leroy wrote:


Le 17/08/2016 à 17:27, Holger Brunck a écrit :
On 16/08/16 19:27, christophe leroy wrote:


Le 15/08/2016 à 18:19, Dave Hansen a écrit :
On 08/15/2016 07:35 AM, Holger Brunck wrote:
I tried this but unfortunately the error only occurs while remote debugging.
Locally with gdb everything works fine. BTW we double-checked with a 85xx ppc
target which is also 32-bit and it ends up with the same behaviour.

I was also investigating where I have to move the line in the struct task_struct
and it turns out to be like this (diff to 4.7 kernel):

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..4868874 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1655,7 +1655,9 @@ struct task_struct {
        struct signal_struct *signal;
        struct sighand_struct *sighand;

+       // struct thread_struct thread;   // until here everything is fine
        sigset_t blocked, real_blocked;
+       struct thread_struct thread;      // from here it's broken
        sigset_t saved_sigmask; /* restored if set_restore_sigmask() was used */
        struct sigpending pending;

Wow, thanks for all the debugging here!

So, we know it has to do with signals, thread_info, and probably only
affects 32-bit powerpc.  Seems awfully weird.  Have you checked with any
of the 64-bit powerpc guys to see if they have any ideas?

I went grepping around for a bit.

Where is the task_struct stored?  Is it on-stack on ppc32 or something?
 The thread_info is, I assume, but I see some THREAD_INFO vs. THREAD
(thread struct) math happening in here, which confuses me:

        .globl  ret_from_debug_exc
ret_from_debug_exc:
        mfspr   r9,SPRN_SPRG_THREAD
        lwz     r10,SAVED_KSP_LIMIT(r1)
        stw     r10,KSP_LIMIT(r9)
        lwz     r9,THREAD_INFO-THREAD(r9)
        CURRENT_THREAD_INFO(r10, r1)
        lwz     r10,TI_PREEMPT(r10)
        stw     r10,TI_PREEMPT(r9)
        RESTORE_xSRR(SRR0,SRR1);
        RESTORE_xSRR(CSRR0,CSRR1);
        RESTORE_MMU_REGS;
        RET_FROM_EXC_LEVEL(SPRN_DSRR0, SPRN_DSRR1, PPC_RFDI)

But, I'm really at a loss to explain this.  It still seems like a deeply
ppc-specific issue.  We can obviously work around it with an #ifdef for
your platform, but that's awfully hackish and hides the real bug,
whatever it is.

My suspicion is that there's a bug in the 32-bit ppc assembly somewhere.
 I don't see any references to 'blocked' or 'real_blocked' in assembly
though.  You could add a bunch of padding instead of moving the
thread_struct and see if that does anything, but that's really a stab in
the dark.


Just to let you know, I'm not sure it is the same issue, but I also get
my 8xx target stuck when I try to use gdbserver.

If I debug a very small app, it gets stuck quickly after the app has
stopped: indeed, the console seems ok but as soon as I try to execute
something simple, like a ps or top, it get stuck. The target still
responds to pings, but nothing else.

If I debug a big app, it gets stuck soon after the start of debug: I set
a bpoint at main(), do a 'continue', get breaked at main(), do some
steps with 'next' then it gets stuck.

I have tried moving the struct thread_struct thread but it has no impact.


that sounds a bit different to what I see. Is your program also mutli-threaded?

Maybe you could try with the program I use to reproduce the error:

--- snip -----
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

void * th_1_func()
{
   while (1) {
     sleep(2);
     printf("Hello from thread function 1)\n");
   }
}

int main() {
  int err;
  pthread_t th_1, th_2, th_3;

  err = pthread_create(&th_1, NULL, th_1_func, NULL);
  if (err != 0)
    printf("pthread_create\n");
  err = pthread_create(&th_2, NULL, th_1_func, NULL);
  if (err != 0)
    printf("pthread_create\n");
  err = pthread_create(&th_3, NULL, th_1_func, NULL);
  if (err != 0)
    printf("pthread_create\n");
  while(1) {}
  return 0;
}
--- snap ---

Then copy it to your target and start it with the gdbserver. If you let it run
from your host with gdb and try to stop it e.g in the sleep call and then try to
single step it you might see the error. But as I said in this thread the
behaviour might be different depending on your kernel configuration as I
encountered different behaviour when enabling FTRACE or SCHED_STAT.

Best regards
Holger


Hi

I just tried it on an 885 and on an 8323, it work properly on both targets.

You can see below the Debug Option that are active on my 8323 target.



thanks for trying it.

Could you completely disable FTRACE? As it also works on my side when I have
FTRACE enabled.

Best regards
Holger


I have now disabled completly FTRACE, the behaviour is still OK.

Christophe

Reply via email to