oviding an interface to fusing instructions via
scheduling
Hi,
> -Original Message-
> From: Gcc On Behalf
> Of gengqi via Gcc
> Sent: 03 September 2021 11:56
> To: gcc@gcc.gnu.org
> Subject: How about providing an interface to fusing instructions via
> scheduling
>
Hi,
> -Original Message-
> From: Gcc On Behalf
> Of gengqi via Gcc
> Sent: 03 September 2021 11:56
> To: gcc@gcc.gnu.org
> Subject: How about providing an interface to fusing instructions via
> scheduling
>
> When I was adding pipeline to my backend, som
When I was adding pipeline to my backend, some instructions needed to be
fused and I found that there was no suitable interface to implement my
requirements.
My hope is that
1. Do instruction scheduling and combine any two instructions, and sometimes
the two instructions can be treated as 1
Hi!
On 2021-07-13T17:59:43+0200, Jakub Jelinek wrote:
> On Tue, Jul 13, 2021 at 05:48:51PM +0200, Thomas Schwinge wrote:
>> Starting with the Volta family (sm_70+), Nvidia GPUs introduced
>> Independent Thread Scheduling for the 32 threads ("32 SIMD lanes") that
>>
On Tue, Jul 13, 2021 at 05:48:51PM +0200, Thomas Schwinge wrote:
> Starting with the Volta family (sm_70+), Nvidia GPUs introduced
> Independent Thread Scheduling for the 32 threads ("32 SIMD lanes") that
> constitute a warp, which means "execution state per thread, inclu
Hi!
Starting with the Volta family (sm_70+), Nvidia GPUs introduced
Independent Thread Scheduling for the 32 threads ("32 SIMD lanes") that
constitute a warp, which means "execution state per thread, including a
program counter", succeeding the previous "warp-synchronous&
Product: GCC
Component: rtl-optimization
Version: 7.3.0
After we enable the schedule DO_PREDICATION, we get wrong scheduling result in
sched2 pass.
The key dump is shown as following:
...(Unimportant things)
;; | 93 | 15 | a20=sxn([a19])
;; | 94 | 10 | t2=a20==0
ted (not queued) [4, 5,
> > 6].
> > So the taskwait or barrier like constructs only have to check whether all
> > the tasks of interest were computed.
> > This unifies the task queuing system and makes scheduling much simpler.
> I see the per-thread locks td_deque_lock
equeuing
> and executing the tasks [2, 3].
> The taskgroup tasks and childen tasks are only counted (not queued) [4, 5, 6].
> So the taskwait or barrier like constructs only have to check whether all the
> tasks of interest were computed.
> This unifies the task queuing system and
g system and makes scheduling much simpler.
What to do on libgomp:
I think we should follow a similar path to libomp.
Instead of using 3 different queues, we could simply use one and only count the
tasks of interest.
This should also reduce the synchronization overhead between the q
into stage1 and after somehow fixing PR90040 issue
I will introduce updated patchset described here:
https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01647.html
(the set is supported locally on all branches since 4.9 with making a
lot of regtesting).
Regarding the modulo scheduling maintainership
Hi!
On Mon, Jan 14, 2019 at 12:24:43PM +, Matthew Malcomson wrote:
> I've found a testcase where the stack protector code generated through
> `-fstack-protector-all` doesn't actually protect anything.
[ snip ]
> When compiling on aarch64 with
> ~gcc -fstack-protector-all -g -S stack-reorder.
I've found a testcase where the stack protector code generated through
`-fstack-protector-all` doesn't actually protect anything.
#+name stack-reorder.c
#+begin_src c
#include
#include
int foo (int a, int b, int c) {
char buf[64];
buf[a] = 1;
buf[b] = c;
// Just add somethin
imary+secondary
>platforms.
That's basically having primary / secondary / rest passes (or command line
switches) which we would need to document as such. Let's discuss this at the
Cauldron - I'm certainly not going to block the release over a selective
scheduling bug...
Richard.
> Are you suggesting we should not care about regressions with features
> that are not enabled by default or which are only exposed with
> "non-standard" flags? The current scheme on which bugs get P1/P2/P4+
> assigned is quite simple...
I'm only suggesting keeping them at P3, which is not "don't
On 03/29/2018 04:15 AM, Eric Botcazou wrote:
>> I noticed there are quite many selective scheduling PRs:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=846
On Thu, Mar 29, 2018 at 12:15 PM, Eric Botcazou wrote:
>> I noticed there are quite many selective scheduling PRs:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
>> https://gcc.gnu.org/bugzilla/show_bug.cgi
> I noticed there are quite many selective scheduling PRs:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
>
> and many others.
Right, and some of them are P1/P2, wh
On 21.03.2018 13:31, Martin Liška wrote:
> On 03/21/2018 11:17 AM, Andrey Belevantsev wrote:
>> Hi Martin,
>>
>> On 21.03.2018 12:48, Martin Liška wrote:
>>> Hello.
>>>
>>> I noticed there are quite many selective scheduling PRs:
>>> htt
On 03/21/2018 11:17 AM, Andrey Belevantsev wrote:
Hi Martin,
On 21.03.2018 12:48, Martin Liška wrote:
Hello.
I noticed there are quite many selective scheduling PRs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
https://gcc.gnu.org
Hi Martin,
On 21.03.2018 12:48, Martin Liška wrote:
> Hello.
>
> I noticed there are quite many selective scheduling PRs:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=8
Hello.
I noticed there are quite many selective scheduling PRs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84872
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
and many others.
I want to ask you if you plan to maintain the scheduling?
Is
On 02/05/2016 05:35 AM, Woon yung Liu wrote:
The current (GCC 5.3.0) MIPS divmod4 pattern emits an expand
that allocates a temporary register (hi+lo) and emits other
instructions, depending if whether the target is a 32-bit or 64-bit
MIPS target.
However, it uses gen_rtx_REG to allocate the hi+
On 01/19/2016 09:22 AM, Woon yung Liu wrote:
Right now, I do have an old homebrew GCC v3.2.2 port to study as
well, but I didn't follow everything from it because I didn't want to
risk including obsolete constructs. Thanks for the information on the
old Cygnus port. I'll try to scrape together
On 01/19/2016 05:04 AM, Woon yung Liu wrote:
Hi,
I'm am trying to complete support for the MIPS R5900, by adding support for its
second
interger multiplication/division pipe. GCC currently supports only the first
one.My target at this moment is the public GCC v5.3.0 release.
To get the 2
Hello All:
I was going through the following article
" Register Allocation with instruction scheduling: a new approach" by Pinter
etal.
The phase ordering of register allocation and Instruction scheduling is
important topic. The scheduling before register allocator
increases th
On 2014-06-06, 10:48 AM, Ajit Kumar Agarwal wrote:
Hello All:
I was looking further the aspect of reducing register pressure based on
Register Allocation and Instruction Scheduling and the
Following observation being made on reducing register pressure based on the
existing papers on reducing
Hello All:
I was looking further the aspect of reducing register pressure based on
Register Allocation and Instruction Scheduling and the
Following observation being made on reducing register pressure based on the
existing papers on reducing register pressure
Based on scheduling approach
well and come back with any specific testcases
that Charles / Richard could also take a look into.
Hi all,
From what I can see the most significant regression from this pre-regalloc
scheduling on SPEC2k is in 171.swim. It seems to suffer from similar symptoms to
Proc_8 (lots of extra spills on th
experiments on x86_64 ?
>
>
Yes, I benchmarked x86 and x86-64. I believe this pass can help when we
don't use the 1st insn scheduler (that is x86/x86-64 case). If the 1st
insn scheduler is profitable, I guess it is better to use
register-pressure insn scheduling than live-range sh
On Thu, May 15, 2014 at 8:36 AM, Maxim Kuvyrkov
wrote:
> On May 15, 2014, at 6:46 PM, Ramana Radhakrishnan
> wrote:
>>
>>>
>>> I'm not claiming it's a great heuristic or anything. There's bound to
>>> be room for improvement. But it was based on "reality" and real results.
>>>
>>> Of course, i
On May 15, 2014, at 6:46 PM, Ramana Radhakrishnan
wrote:
>
>>
>> I'm not claiming it's a great heuristic or anything. There's bound to
>> be room for improvement. But it was based on "reality" and real results.
>>
>> Of course, if it turns out not be a win for ARM or s390x any more then it
>
beaten.
>> But both versions of -fsched-pressure are off by default on most
>> targets for a reason. (AFAIK the only two targets that enable it by
>> default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
>> I think this is still an area that could be improved
ou think bin packing wasn't beaten.
> But both versions of -fsched-pressure are off by default on most
> targets for a reason. (AFAIK the only two targets that enable it by
> default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
> I think this is still an area that coul
for implementing register-pressure scheduling
(more accurate register pressure evaluation). The more people use it,
the better it for me.
Saying that, I am not satisfied as you with how GCC resolves 1st insn
scheduler and RA conflict. Ideally, I'd like to see that 1st insn
scheduler (with som
ets that enable it by
default are the two that use SCHED_PRESSURE_MODEL: arm and s390x.)
I think this is still an area that could be improved. I don't mind
whether that's through improving one of the two existing heuristics
or doing something different, but it seems pessimistic to say th
On 2014-05-13, 6:27 AM, Kyrill Tkachov wrote:
Hi all,
In haifa-sched.c (in rank_for_schedule) I notice that live range
shrinkage is not performed when SCHED_PRESSURE_MODEL is used and the
comment mentions that it results in much worse code.
Could anyone elaborate on this? Was it just empiricall
On May 13, 2014, at 10:27 PM, Kyrill Tkachov wrote:
> Hi all,
>
> In haifa-sched.c (in rank_for_schedule) I notice that live range shrinkage is
> not performed when SCHED_PRESSURE_MODEL is used and the comment mentions that
> it results in much worse code.
>
> Could anyone elaborate on this?
Hi all,
In haifa-sched.c (in rank_for_schedule) I notice that live range shrinkage is
not performed when SCHED_PRESSURE_MODEL is used and the comment mentions that it
results in much worse code.
Could anyone elaborate on this? Was it just empirically noticed on x86_64?
Thanks,
Kyrill
>>>> On 11/12/2013, at 5:17 am, Ramana Radhakrishnan
>>>> wrote:
>>>>
>>>>> On Mon, Jul 1, 2013 at 5:31 PM, Paulo Matos wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Near the start of schedule_block, find_modi
t;>> On Mon, Jul 1, 2013 at 5:31 PM, Paulo Matos wrote:
>>>>> Hi,
>>>>>
>>>>> Near the start of schedule_block, find_modifiable_mems is called if
>>>>> DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass. It seems
gt;>>>
>>>> Near the start of schedule_block, find_modifiable_mems is called if
>>>> DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass. It seems
>>>> on c6x backend currently uses this.
>>>> However, it's quite strange tha
led if
>>> DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass. It seems
>>> on c6x backend currently uses this.
>>> However, it's quite strange that this is not a requirement for all backends
>>> since find_modifiable_mems, moves all my d
On 11/12/2013, at 5:17 am, Ramana Radhakrishnan
wrote:
> On Mon, Jul 1, 2013 at 5:31 PM, Paulo Matos wrote:
>> Hi,
>>
>> Near the start of schedule_block, find_modifiable_mems is called if
>> DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass.
On Mon, Jul 1, 2013 at 5:31 PM, Paulo Matos wrote:
> Hi,
>
> Near the start of schedule_block, find_modifiable_mems is called if
> DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass. It seems on
> c6x backend currently uses this.
> However, it's quite str
ion I have.
Regards,
Paulo Matos
> -Original Message-
> From: Maxim Kuvyrkov [mailto:ma...@kugelworks.com]
> Sent: 16 July 2013 05:02
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Delay scheduling due to possible future multiple issue in VLIW
>
> Pa
iple issue
> has to contend with other problems.
>
> Any thoughts on this?
>
> Paulo Matos
>
>
>> -Original Message-
>> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Paulo
>> Matos
>> Sent: 26 June 2013 15:08
>> T
Hi,
Near the start of schedule_block, find_modifiable_mems is called if
DONT_BREAK_DEPENDENCIES is not enabled for this scheduling pass. It seems on
c6x backend currently uses this.
However, it's quite strange that this is not a requirement for all backends
since find_modifiable_mems,
w...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Paulo
> Matos
> Sent: 26 June 2013 15:08
> To: gcc@gcc.gnu.org
> Subject: Delay scheduling due to possible future multiple issue in VLIW
>
> Hello,
>
> We have a port for a VLIW machine using gcc head 4.8 with an maximum
Chosen insn : 41
;;1--> b 0: i 41r0=r0+0x4
:(p0+no_stl2)|(p1+no_dual)
So, it is scheduling first insn 38 followed by 41.
The insn chain for bb3 before sched2 looks like:
(insn 38 36 40 3 (set (reg:DI 1 r1)
(zero_extend:DI (mem:SI (plu
On Sat, 2013-06-15 at 00:06 +0200, Eric Botcazou wrote:
> > The part of the scheduling change that I am interested in is the change in
> > where the addiu instruction occurs and the related changes from the positive
> > offsets to the negative offsets. Can anyone tell me w
> The part of the scheduling change that I am interested in is the change in
> where the addiu instruction occurs and the related changes from the positive
> offsets to the negative offsets. Can anyone tell me where the code that
> decides to do that is? Extra bonus points for an
I have an instruction scheduling question I was hoping someone could help me
with. Specifically, I am trying to figure out where and how GCC is deciding
to move the add of a constant to a register above the use of that register and
then changing the register usage by change the offsets associated
guray.ozen wrote:
I applied gsoc for openMP taks scheduling and my advice may cover
taskyield facility. Currently i have some idea for taskyield. i think
i can add something. Therefore i wonder GCC mentor related about
openMP was announced? or should i wait until "student acceptance"?
Hi All,
I applied gsoc for openMP taks scheduling and my advice may cover
taskyield facility. Currently i have some idea for taskyield. i think
i can add something. Therefore i wonder GCC mentor related about
openMP was announced? or should i wait until "student acceptance"?
Regards,
Dear All,
Thank you for your reply Tobias.
By the way Mr Jakup I hope my approach is make sense for you.
I changed GOMP_SPINCOUNT factor and i got speedup more than.
I attached my trace that was profiled extrae and paraver. Light blue
mean idle, Dark blue mean running, Yellow scheduling, Fork
guray.ozen wrote:
I thought gcc tasks/threads waiting too much on the idle than intel
compiler's threads.
Regarding busy waits, you could try to tune the values of the
GOMP_SPINCOUNT environment variable. Search for "@node GOMP_SPINCOUNT"
in
http://gcc.gnu.org/viewcvs/gcc/branches/gomp-4_0-b
omputer in top500 with Intel Xeon E5649 E5649 (6-Core, each core
has 2 threads) a 2.53 GHz.
The following report shows the OpenMP in GCC scheduling vs Intel C Compiler
https://github.com/grypp/gcc-gsoc-taskscheduler/raw/master/report.pdf
And Here is my omp code
https://github.com/gryp
Hi,
I'm MSc High-Performance Computing student at Polytechnic University
of Catalonia(BarcelonaTech). I'm interesting openmp task scheduling
optimization or openmp 3.1 facility taskyield.
@For Task scheduling
I'm using mercurium compiler already at my university because t
> -Original Message-
> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
> Sent: 07 November 2012 11:24
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Defining scheduling resource constraint
>
>
> You can effectively remove it by returning zero if a
On 11/07/2012 12:08 PM, Paulo Matos wrote:
>
>> -Original Message-
>> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
>> Sent: 07 November 2012 10:48
>> To: Paulo Matos
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Defining scheduling resource constra
> -Original Message-
> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
> Sent: 07 November 2012 10:48
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Defining scheduling resource constraint
>
> Yes... I seem to remember the documentation is just wrong
On 11/07/2012 11:41 AM, Paulo Matos wrote:
> Yes, the reordering works fine. The problem is when I change the
> value of *n_readyp. The c6x port returns n_ready (which for me
> doesn't make sense since the max insns I can schedule in a cycle is 2
> which is my issue_rate), but doesn't change *n_re
> -Original Message-
> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
> Sent: 06 November 2012 17:12
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Defining scheduling resource constraint
>
> On 11/06/2012 05:50 PM, Paulo Matos wrote:
>
> &g
On 11/06/2012 05:50 PM, Paulo Matos wrote:
> I am following your advice and using sched.reorg to remove the
> instruction from the ready list. What I am doing is checking the
> register written in ready[n_ready - 1] (if any) and look for the
> remainder of the ready list for insns writing to the s
> -Original Message-
> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
> Sent: 05 November 2012 16:52
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Defining scheduling resource constraint
>
> Depends on why it schedules them in the same cycle. Ei
On 11/05/2012 06:11 PM, Paulo Matos wrote:
>> -Original Message-
>> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
>> Sent: 05 November 2012 16:52
>> To: Paulo Matos
>> Cc: gcc@gcc.gnu.org
>> Subject: Re: Defining scheduling resource constraint
&g
> -Original Message-
> From: Bernd Schmidt [mailto:ber...@codesourcery.com]
> Sent: 05 November 2012 16:52
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Defining scheduling resource constraint
>
> Depends on why it schedules them in the same cycle. Ei
> -Original Message-
> From: Joern Rennecke [mailto:joern.renne...@embecosm.com]
> Sent: 05 November 2012 16:32
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Defining scheduling resource constraint
>
> > This cannot happen, but I am unsure about
On 11/05/2012 03:51 PM, Paulo Matos wrote:
> Hello,
>
> I am experience a problem in GCC4.7 scheduler whereby the scheduler is
> issuing two instructions that write with a cond_exec to the same register. It
> ends up looking like this:
> Cond_exec p1 != 0 : r2 <- r2 and 0xf8
> Cond_exec p0 != 0:
Quoting Paulo Matos :
Hello,
I am experience a problem in GCC4.7 scheduler whereby the scheduler
is issuing two instructions that write with a cond_exec to the same
register. It ends up looking like this:
Cond_exec p1 != 0 : r2 <- r2 and 0xf8
Cond_exec p0 != 0: r2 <- 0x10
This cannot ha
Hello,
I am experience a problem in GCC4.7 scheduler whereby the scheduler is issuing
two instructions that write with a cond_exec to the same register. It ends up
looking like this:
Cond_exec p1 != 0 : r2 <- r2 and 0xf8
Cond_exec p0 != 0: r2 <- 0x10
This cannot happen, but I am unsure about wh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
[ This should have gone out some time ago... Sorry for the long delay ]
I'm pleased to announce that the GCC steering committee has approved
the nomination of Andrey Belevantsev, Alexander Monakov, and Dmitry
Melnik as selective sched
t i see
>>> some strange behavior in delay slot scheduling. For my target
>>> the instruction in the delay slots gets executed irrespective
>>> of whether the branch is taken or not. I have generated the
>>> following code after commenting out the call to
>>>
On 6 September 2011 20:50, Jeff Law wrote:
>
> On 09/06/11 08:46, Mohamed Shafi wrote:
>> Hi,
>>
>> I am doing a private port in GCC 4.5.1. For the my target i see some
>> strange behavior in delay slot scheduling. For my target the
>> instruction in the delay
> I am doing a private port in GCC 4.5.1. For the my target i see some
> strange behavior in delay slot scheduling. For my target the
> instruction in the delay slots gets executed irrespective of whether
> the branch is taken or not.
Early 4.5.x releases have known bugs in this
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 09/06/11 08:46, Mohamed Shafi wrote:
> Hi,
>
> I am doing a private port in GCC 4.5.1. For the my target i see some
> strange behavior in delay slot scheduling. For my target the
> instruction in the delay slots gets executed
Hi,
I am doing a private port in GCC 4.5.1. For the my target i see some
strange behavior in delay slot scheduling. For my target the
instruction in the delay slots gets executed irrespective of whether
the branch is taken or not. I have generated the following code after
commenting out the call
Ayal Zaks writes:
>> (FWIW, libav did show up extra differences when using the patch
>> that I'd originally submitted. They were due to the count_preds
>> and count_succs thing that you picked up in your review.)
>
>
> (These differences had no noticable consequences performance-wise, right?)
We
are in.
>
> >> I don't have powerpc hardware that I can do meaningful performance
> >> testing on, but I did run it through a Popular* Embedded Benchmark
> >> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
> >> -fmodulo-sched-allow-regmoves. There were no ch
ing on, but I did run it through a Popular* Embedded Benchmark
>> on an ARM Cortex-A8 board with -O3 -fmodulo-sched
>> -fmodulo-sched-allow-regmoves. There were no changes. (And this is
>> a benchmark that does benefit from modulo scheduling, in some cases
>> by a significant
>
> seemed more natural, and would match the existing comment. I'm happy
> to test that instead if you prefer.
>
I wouldn't worry about this tie breaker, unless there's a reason (in
which case the reason should hopefully provide a secondary criteria).
>
> I
cowardice, I ended up keeping
this as:
! if (count_succs && count_succs >= count_preds)
The reason for asking was that:
! if (count_succs > count_preds)
seemed more natural, and would match the existing comment. I'm happy
to test that instead if you prefer.
I don't
(sorry for replicated submissions, had to convert to plain text)
>2011/7/27 Revital1 Eres
>
>Hello Richard,
>
>
>> I ask because in the final range:
>>
>> start = early_start;
>> end = MIN (end, early_start + ii);
>> /* Schedule the node close to it's predecessors. */
>>
Hello Richard,
> I ask because in the final range:
>
> start = early_start;
> end = MIN (end, early_start + ii);
> /* Schedule the node close to it's predecessors. */
> step = 1;
>
> END is an exclusive bound. It seems like we might be double-counting
here,
> and effectiv
to get an upper bound on the scheduling window that is permitted
> by memory dependencies. I think this:
>
> SCHED_TIME (v_node) + ii - 1
>
> is an inclusive bound, in that scheduling the node at that time
> would not break the memory dependence, whereas scheduling at
> S
I've been looking at SMS, and have a question about get_sched_window.
When there are previously-scheduled predessors, we use:
if (e->data_type == MEM_DEP)
end = MIN (end, SCHED_TIME (v_node) + ii - 1);
to get an upper bound on the scheduling window that is p
Hi,
I would like to experiment with modifications to the instruction flow
during scheduling. One motivation for doing that is the combining of
contiguous loads like was discussed here:
http://gcc.gnu.org/ml/gcc/2010-12/msg00153.html
I've seen that the scheduler itself does some modificatio
On Tue, 15 Feb 2011, DJ Delorie wrote:
>
> pr45055 tests a scheduling fix, but on targets that don't support
> scheduling (like m32c-elf), gcc emits a warning that scheduling is not
> supported. This warning causes the test to fail. How do we bypass
> these types of test c
pr45055 tests a scheduling fix, but on targets that don't support
scheduling (like m32c-elf), gcc emits a warning that scheduling is not
supported. This warning causes the test to fail. How do we bypass
these types of test cases? I don't see a suitable effective_target
for scheduli
On 02/11/2011 07:33 AM, Bernd Schmidt wrote:
Suppose I have two insns, one reserving (A|B|C), and the other reserving
A. I'm observing that when the first one is scheduled in an otherwise
empty state, it reserves the A unit and blocks the second one from being
scheduled in the same cycle. This is
it should
> allow you to schedule both instructions together as this should try all
> functional unit alternatives.
Ah, that seems to be exactly what I was looking for. Thanks!
I'd expect this won't work too well with define_query_cpu_unit, so I'll
need another method to assign units after scheduling.
Bernd
Le vendredi 11 février 2011 à 13:33 +0100, Bernd Schmidt a écrit :
> Suppose I have two insns, one reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an otherwise
> empty state, it reserves the A unit and blocks the second one from being
> schedule
possible to
fit as many other insn as possible in the same cycle.
During this process I also update the ready list with insns that become ready
as a result of scheduling the current insn ( like in your example -insns that
are anti-dependent on the current insn and which therefore can be scheduled in
nit (A), the second has to wait.
The CPU I'm working on needs to specify explicitly which unit an insn is
using, but to generate optimal code that assignment must be made _after_
scheduling all the insns in a given cycle.
Bernd
On Fri, 11 Feb 2011, Bernd Schmidt wrote:
> Suppose I have two insns, one reserving (A|B|C), and the other reserving
> A. I'm observing that when the first one is scheduled in an otherwise
> empty state, it reserves the A unit and blocks the second one from being
> scheduled in the same cycle. T
Suppose I have two insns, one reserving (A|B|C), and the other reserving
A. I'm observing that when the first one is scheduled in an otherwise
empty state, it reserves the A unit and blocks the second one from being
scheduled in the same cycle. This is a problem when there's an
anti-dependence of c
Quoting Tom de Vries :
About the penalty, I don't really know. But since the optimization is
both filling delay slots and removing
duplicate code, it looks like a good idea to me.
It's usually beneficial, but for some microarchitectures, this kind of
code confuses the branch predictor.
So ther
Hi Jeff,
However, that doesn't work for the second example:
...
beq$3,$0,$L14
nop
$L7:
andi$2,$2,0x
...
bne$3,$0,$L7
nop
$L14:
andi$2,$2,0x
...
...
What is different from the first example, is that here the beq owns
neither the
fall-throug
On 11/18/10 10:31, Tom de Vries wrote:
I'm working on improving delay-slot scheduling and would appreciate
advice on a
problem I encountered.
Oh boy
The problem is: how to add support for placing a CODE_LABEL on an
instruction in
a delay slot?
My impression is that this i
1 - 100 of 335 matches
Mail list logo