Hi,

As far as I know those are the only instructions that call suspend() on a
thread context in gem5 for the x86 ISA.  This is all I found from grepping
the src/arch/x86 directories.  But I'm not an expert on the x86 ISA, I just
touched this code because it was breaking regression tests.

The "459500" ticks isn't really all that large terms of time.  Assuming
you're running a 2GHz core (this number is in pico seconds), thats only 919
cycles.  This could easily be explained by having to walk a page
table, branch mispredicts, etc when entering a new region of code. The
atomic CPU wouldn't have to pay any of that time cost (since memory is
accessed instantaneously on it and there are no mispredicts).  It also
could be that the last instruction was the MicroHalt, and it only took 900
cycles for the core to be woken up.  No way to know without looking at what
actually happened.

How are you measuring the divergence to know things are going wrong in
simulation?  The timings/code scheduling done by the OS can be very
different for the Atomic and Timing-focused cores. Basically, different
sequences of committed instructions from atomic and o3 are fine, as long as
the numbers are not completely out of line.




On Thu, Aug 28, 2014 at 5:49 AM, Zi Yan via gem5-users <gem5-users@gem5.org>
wrote:

> Hi Andreas,
>
> I already flag "MicroHalt" as "IsQuiesce" in my last running. And I am not
> using m5ops, so first part of your patch should not affect my running.
>
> Therefore, as I mentioned in my last email, the waiting for
> apic_timer_interrupt behavior still happens for certain instructions.
>
> 1) Is every x86 quiesce marco instruction properly decoded with
> "MicroHalt" microop in gem5?
>
> 2) Does Intel ISA manual say anything about quiesce instruction? So that
> I can find a reference and help check all quiesce instructions in X86.
>
> Thanks.
>
> --
> Best Regards
> Yan Zi
>
> On 28 Aug 2014, at 3:39, Andreas Hansson wrote:
>
> > Hi Yan,
> >
> > Check out: http://reviews.gem5.org/r/2369/
> >
> > Perhaps the problem you are struggling with is even more complex, but at
> > least the patches on the review board should fix up a few issues.
> >
> > Andreas
> >
> > On 28/08/2014 03:27, "Zi Yan via gem5-users" <gem5-users@gem5.org>
> wrote:
> >
> >> Hi Mitch,
> >>
> >> After I applied two patches and IsQuiesce modification, O3 CPU keeps
> >> in the same track as atomic CPU longer than before. But
> >> apic_timer_interrupt
> >> function comes out again in O3 CPU. It used to come out after about
> >> 500,000
> >> instructions, now it comes out after about 990,000 instructions.
> >>
> >> In addition, I dump out tick numbers as well as PCs, so that I find out
> >> there is a 459500 ticks gap between last committed user instruction and
> >> first instruction in apic_timer_interrupt function. This confirms that
> >> the last user instruction sits in commit until timer interrupt happens.
> >> Am I right about this?
> >>
> >> Next step, I think I need to label all x86 quiesce instructions.
> >> Do you have a list of those instructions? Or does somewhere in
> >> Intel manual tell me about this?
> >>
> >> Thanks.
> >>
> >> --
> >> Best Regards
> >> Yan Zi
> >>
> >> On 27 Aug 2014, at 15:59, Mitch Hayenga wrote:
> >>
> >>> Yep, that should do it.
> >>>
> >>>
> >>> On Wed, Aug 27, 2014 at 2:57 PM, Zi Yan <birdman...@gmail.com> wrote:
> >>>
> >>>> Thanks.
> >>>>
> >>>> I will apply 1, and 2 patches.
> >>>>
> >>>> For 3, I need to change the file
> >>>> src/arch/x86/isa/microops/specop.isa:66
> >>>> from
> >>>> setFlags | (ULL(1) << StaticInst::IsNonSpeculative),
> >>>> to
> >>>> setFlags | (ULL(1) << StaticInst::IsNonSpeculative) | (ULL(1) <<
> >>>> StaticInst::IsQuiesce),
> >>>>
> >>>> Am I doing the right thing to tag "MicroHalt" instruction as
> >>>> "IsQuiesce"?
> >>>>
> >>>> BTW, what I did to boot linux is to install gentoo inside QEMU,
> >>>> then use x86KvmCPU to boot up, then take checkpoints and run from
> >>>> checkpoints.
> >>>>
> >>>> I will report whether this works or not.
> >>>>
> >>>> Thanks.
> >>>>
> >>>> --
> >>>> Best Regards
> >>>> Yan Zi
> >>>>
> >>>> On 27 Aug 2014, at 15:44, Mitch Hayenga wrote:
> >>>>
> >>>>> There are probably three main patches that could help.  The fact you
> >>>>> mention the timer interrupt makes me think Andreas is right and these
> >>>> might
> >>>>> solve your issue.
> >>>>>
> >>>>> 1. http://reviews.gem5.org/r/2363/  - o3 is supposed to stop
> fetching
> >>>>> instructions immediately once a quiesce instruction is encountered,
> >>>>> some
> >>>>> managed to sneak by.  Quiesce is used for things like sleeping until
> >>>>> an
> >>>>> interrupt occurs, etc.  Without this patch, we experienced the case
> >>>>> where
> >>>>> o3 state would get corrupted and an instruction would sit at commit
> >>>>> until
> >>>>> the next timer interrupt happened.  At which point taking the
> >>>>> interrupt
> >>>>> would clear the state and execution would continue (until this same
> >>>>> bug
> >>>>> happened again).
> >>>>>
> >>>>> 2. http://reviews.gem5.org/r/2367/  - If o3 was being drained while
> an
> >>>>> interrupt occurred on x86, it could deadlock.
> >>>>>
> >>>>> 3. I believe this last patch will be posted in a day or two.  x86
> >>>> currently
> >>>>> does not tag any instruction that suspends() the CPU as a "quiesce".
> >>>> This
> >>>>> is required by o3 to properly operate, but not by the Atomic CPU.
> >>>>> This
> >>>>> makes the issue in #1 far more likely to occur.  It's pretty amazing
> >>>>> that
> >>>>> x86 booted linux at all on o3 without this.  I believe this patch
> >>>>> will be
> >>>>> posted shortly, but otherwise you could just tag the "MicroHalt"
> >>>>> instruction as "IsQuiesce" yourself.
> >>>>>
> >>>>> So a combination of those things (mainly the last one) could lead to
> >>>>> what
> >>>>> you are seeing.
> >>>>>
> >>>>>
> >>>>> On Wed, Aug 27, 2014 at 12:59 PM, Zi Yan via gem5-users <
> >>>> gem5-users@gem5.org
> >>>>>> wrote:
> >>>>>
> >>>>>> OK. Could you please tell me which patches are there? In the
> >>>>>> review board there are quite a lot of new patches waiting
> >>>>>> for review.
> >>>>>>
> >>>>>> I can apply those patches myself and do a quick test.
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> --
> >>>>>> Best Regards
> >>>>>> Yan Zi
> >>>>>>
> >>>>>> On 27 Aug 2014, at 13:56, Andreas Hansson wrote:
> >>>>>>
> >>>>>>> Hi Yan,
> >>>>>>>
> >>>>>>> I would suspect this is due to a bug in the X86 O3 CPU. There have
> >>>>>>> been
> >>>>>>> quite a few fixes posted on the review board for similar issues. I
> >>>>>>> hope
> >>>>>> to
> >>>>>>> have these committed in the next week or so.
> >>>>>>>
> >>>>>>> Andreas
> >>>>>>>
> >>>>>>>
> >>>>>>> On 27/08/2014 18:02, "Zi Yan via gem5-users" <gem5-users@gem5.org>
> >>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all,
> >>>>>>>>
> >>>>>>>> I am running kmeans via hadoop in gem5 X86 FS mode. I am using
> >>>>>>>> linux kernel 3.2.60 with configuration file linux-2.6.28.4 from
> >>>>>>>> gem5.org.
> >>>>>>>>
> >>>>>>>> I take a checkpoint before a map task and put a "m5 exit" after
> the
> >>>> map
> >>>>>>>> task.
> >>>>>>>> I am using *X86kvmCPU* to take checkpoints.
> >>>>>>>>
> >>>>>>>> When I restore from the same checkpoint, atomic CPU and O3 CPU
> >>>>>>>> give me
> >>>>>>>> quite different executed instructions:
> >>>>>>>> 1) atomic CPU executes about 350 million instructions, reaches "m5
> >>>>>> exit",
> >>>>>>>> then stops simulation.
> >>>>>>>> 2) O3 CPU executes more than 12 billion instructions, and still
> not
> >>>>>>>> reaches
> >>>>>>>> "m5 exit" to stop the simulation.
> >>>>>>>>
> >>>>>>>> I dump out committed PCs from atomic CPU and O3 CPU, finding out
> >>>>>>>> that
> >>>>>>>> after about 500,000 instructions, the systems behave differently,
> >>>>>>>> where atomic CPU is still executing user code, but O3 CPU switch
> to
> >>>>>>>> apic_timer_interrupt(a kernel function, it also appears in atomic
> >>>>>>>> CPU
> >>>>>>>> execution, but somewhere else).
> >>>>>>>>
> >>>>>>>> Could anyone please give some advice about why this happen?
> >>>>>>>>
> >>>>>>>> Thanks.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best Regards
> >>>>>>>> Yan Zi
> >>>>>>>
> >>>>>>>
> >>>>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
> >>>>>>> are
> >>>>>> confidential and may also be privileged. If you are not the intended
> >>>>>> recipient, please notify the sender immediately and do not disclose
> >>>>>> the
> >>>>>> contents to any other person, use it for any purpose, or store or
> >>>>>> copy
> >>>> the
> >>>>>> information in any medium.  Thank you.
> >>>>>>>
> >>>>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> 9NJ,
> >>>>>> Registered in England & Wales, Company No:  2557590
> >>>>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> CB1
> >>>>>> 9NJ, Registered in England & Wales, Company No:  2548782
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> gem5-users mailing list
> >>>>>> gem5-users@gem5.org
> >>>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >>>>>>
> >
> >
> > -- IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium.  Thank you.
> >
> > ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> Registered in England & Wales, Company No:  2557590
> > ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> 9NJ, Registered in England & Wales, Company No:  2548782
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to