It should do the job, at least for EM where the jump takes 2 cycle, and by means of using delay slots we can make all the cycles count. HS has a branch prediction mechanism, hence, filling up the delay slot doesn't have such a big impact like in EM or even earlier cpus.
//Claudiu > -----Original Message----- > From: Joern Wolfgang Rennecke [mailto:g...@amylaar.uk] > Sent: Friday, April 29, 2016 12:27 PM > To: Claudiu Zissulescu; gcc-patches@gcc.gnu.org > Cc: francois.bed...@synopsys.com; jeremy.benn...@embecosm.com > Subject: Re: [PATCH] [ARC] Handle FPX NaN within optimized floating point > library. > > P.S.: the .d suffix on the branch was there just for scheduling purposes - > not sure if that actually helped any chip's pipeline, or if it was just > a bug > in the documentation.