Hi Paul,

I've included gem5-dev mailing list here. If you're not subscribed, I would
suggest sending these kinds of questions there where the gem5-gcn/AMD
developers will see things.

To answer your question, adding checkpointing should be relatively
straightforward. The main changes are exactly as you described: saving the
architectural state of the GPU threads. There are probably a few other
pieces of GPU state that need to be saved too (e.g., the control processor,
etc.). Hopefully one of the devs at AMD can reply with more details.

You'll have to add the serialize/unserialize functions, and you'll probably
have to also implement a drain() function to flush out the current
in-progress instructions. I imagine you'll at least want to finish the
in-progress wavefronts, and you may want to wait until the currently
scheduled workgroups are finished as well.

As far as Ruby goes, as long as the protocol you're using support
checkpointing, it will "just work". To support checkpointing, I believe all
the protocol needs to do is implement the flush RubyRequestMsg. I'm not
sure if VIPER supports this or not.

Finally, the only other detail is whether or not the current implementation
of SE mode fully supports checkpointing. Right now, this isn't something we
regularly test, so it's possible that there are some details that are
broken.

Hopefully the current devs at AMD will correct me where I'm wrong! Let us
know on gem5-dev if there's any questions we can answer, etc. We would
greatly appreciate this contribution!

Cheers,
Jason

On Wed, Apr 21, 2021 at 6:55 PM Tschirhart, Paul K [US] (MS) <
[email protected]> wrote:

> Hello Professor Lowe-Power,
>
>
>
> Thank you for your helpful replies to the emails from some other members
> of my group regarding various aspects of the Gem5 simulator.
>
>
>
> I have been working with Gem5-GCN3 and I was wondering if you knew
> anything about the status of checkpointing support for that model. I saw in
> a post that adding checkpoints was something that was planned but I have
> not seen anything since.
>
>
>
> Is there a significant technical challenge involved with expanding Gem5’s
> checkpointing mechanism to support the GCN3 model or is this just a matter
> of writing the necessary serialize/unserialize functions? In other words,
> is this something that someone with experience making significant
> modifications to Gem5 might be able to tackle by implementing an approach
> that is similar to the one used in the O3 model?
>
>
>
> If the modifications should be mostly straightforward , do you know of
> anything that needs to be done besides adding the functions to
> serialize/unserialize threads in the GCN3 model? It seems like
> modifications might be required to support checkpointing for the VIPER
> protocol in Ruby but I don’t see where I need to make the changes. Am I
> missing something either in Ruby or elsewhere in the simulator?
>
>
>
> Thanks again for all of your help.
>
>
>
> Paul
>
>
>


-- 
Jason Lowe-Power (he/him/his)
Assistant Professor, Computer Science Department
University of California, Davis
3049 Kemper Hall
https://arch.cs.ucdavis.edu/
_______________________________________________
gem5-dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Reply via email to