Hello,

Because I will soon somewhat criticize Vishal's work, I would start by thanking him for his implementation that enabled me
for the first time to experiment with parrot development.

I have some new tests to contribute that reveal some bugs in Parrot_register_move().
Moreover the current implementation produces suboptimal code, is quadratic in space and
and cubic in execution time in the worst case.

So I propose a new implementation that solve the bugs, and that is linear in space and time, and hopefully produce
an optimal list of moves. It is to be found in the second patch.

I'll try to explain my method in order to help you try to detect any flaw or things I have missed.

DEFINITIONS AND VOCABULARY
----------------------------------------

We have two lists of registers, the src_regs[] and dest_regs[], that form a graph: src_regs[i] ==> dest_regs[i]
A same register may appear several times in the src regs, but only at most one time
in the dest ones, because it makes no sense to assign two values to a same register.

If we have : i ==> j, I call  i the predecessor of j, and j is one successor of i.

I call a cycle (shoud be a circuit though) a path whose last element is the same as the first one, e.g : i ==> j==>k ==> i


I call a cycle without exist, a cycle such as every member of the cycle has exactly one successor, i.e it is not possible to leave the cycle.

An example of a graph with a cycle WITH an exit :

1 == > 2 ==> 1    (   1==>2, 2==>1,  1==>3)
||
3

I call a well a register that has no successor ( 3 in the previous example )


PROPERTIES OF THE REGISTERS GRAPHS
---------------------------------------------------

- There can be several disconnected subgraphs ( e.g 1 ==> 2 and 3 ==> 4), in the following I will say graph for a connected component.
- there is at most ONE cycle in a graph (remember, in the same component)
- a graph has either at least a well, or it is a single cycle without exit

OPTIMALITY
---------------------

To handle a cycle with exit, you do not need  a temp register.
Look at he previous example, the optimal list of moves operation is :
move( 1 -> 3)
move( 2->1)
move( 3->2)

and not
move(1->3)
move(1->temp)
move(2->1)
move(temp->2)

So the rule is : the only time you need a temp register is in a cycle without exit, and in this case you need exactly one.

The optimal number of moves is then the number of registers + the number of cycles without exits.


 ALGORITHM
----------------

A graph has either a well, and in that case we do not need a temp register, or it's a cycle without exit.
So we first locate the wells, and for each one we climb back the graph, emitting the moves, and using the notion of backup both for marking
visited nodes, and to store the new register that now hold the value.

For instance, after a move(i -> j), j is now the backup of i.
We stop when we process a node already marked.

This process will mark all the sub graphs except for the cycles without exits. We then take the list of (destination) nodes that have not been marked, and
process them specifically.

DATA STRUCTURES
-------------------------

The trick is to use the existing data structures src_regs[] and dest_regs[] to explore the graph.
And because each node has only one predecessor, we can locate it in constant time.

We need three data structures, one to hold the mapping between the parrot destination register numbers and their index in the dest_regs[] array,
an other to count the nb of successors in order to locate the wells, and the last to both mark the nodes and store their backup register.

COMPLEXITY
----------------
It should be easy to figure out that the complexity is linear in n_regs (or max_reg).

OPTIMIZATION
------------------

For now, I've only hard-coded the cases n=0 and n=1.
I know that premature optimization is EVIL, but one thing that could save some often slow system calls would be to keep the allocated data structures,
reallocating them when needed instead of allocating them each time we enter the function.
Those arrays could be stored in the Interpreter for instance.

OPEN PROBLEMS
---------------------
It is stated in the documentation that the mov_alt function should be tried before using a temp register
Suppose that you have a cycle 1 ==> 2 ==> 3 ==> 4
If move_alt( 1 -> 2 ) fails, should we try move_alt( 2 -> 3) and move_alt( 3-> 4) too ?

It would slow down a little, but if successful could reduce the nb of moves by one.
Currently it is not implemented.



So I hope that this work could prove useful, I would be very glad if I could help the development of parrot.
Regards,

Karl Forner






Index: t/compilers/imcc/imcpasm/optc.t
===================================================================
--- t/compilers/imcc/imcpasm/optc.t	(revision 14604)
+++ t/compilers/imcc/imcpasm/optc.t	(working copy)
@@ -4,7 +4,7 @@
 
 use strict;
 use lib qw( . lib ../lib ../../lib );
-use Parrot::Test tests => 36;
+use Parrot::Test tests => 43;
 use Test::More;
 
 # these tests are run with -Oc by TestCompiler and show
@@ -13,6 +13,222 @@
 
 ##############################
 
+
+
+pir_output_is(<<'CODE', <<'OUT', "karl trivial test");
+.sub _main
+    $I1 = foo(10)
+    print_item $I1
+    print_newline
+.end
+.sub foo
+    .param int i
+    if i goto recurse
+    .return (0)
+
+recurse:
+    $I1= i - 1
+    .return  foo($I1)
+.end
+CODE
+0
+OUT
+
+
+pir_output_is(<<'CODE', <<'OUT', "karl spot bug 1");
+.sub _main
+    foo(0, 1, 2, 3,4)
+.end
+.sub foo
+    .param int done
+    .param int i
+    .param int j
+    .param int k
+    .param int l
+
+    unless done goto tc
+    print_item "i"
+    print_item i
+    print_item "j"
+    print_item j
+    print_item "k"
+    print_item k
+    print_item "l"
+    print_item l
+
+    print_newline
+    end
+tc:    
+    .return foo(1, 9, i, j,k)
+.end
+CODE
+i 9 j 1 k 2 l 3
+OUT
+
+
+pir_output_is(<<'CODE', <<'OUT', "karl tailcall 3 args");
+.sub _main
+    foo(0, 1, 2, 3)
+.end
+.sub foo
+    .param int done
+    .param int i
+    .param int j
+    .param int k
+    unless done goto tc
+    print_item "i"
+    print_item i
+    print_item "j"
+    print_item j
+    print_item "k"
+    print_item k
+    print_newline
+    end
+tc:    
+    .return foo(1, j, i, i)
+.end
+CODE
+i 2 j 1 k 1
+OUT
+
+
+pir_output_is(<<'CODE', <<'OUT', "cycle no exit 1");
+.sub _main
+    foo(0, 1, 2, 3, 4, 5)
+.end
+.sub foo
+    .param int done
+    .param int i
+    .param int j
+    .param int k
+    .param int l
+    .param int m
+
+
+    unless done goto tc
+    print_item "i"
+    print_item i
+    print_item "j"
+    print_item j
+    print_item "k"
+    print_item k
+    print_item "l"
+    print_item l
+    print_item "m"
+    print_item m
+
+    print_newline
+    end
+tc:    
+    .return foo(1, m,i,j,k,l)
+.end
+CODE
+i 5 j 1 k 2 l 3 m 4
+OUT
+
+pir_output_is(<<'CODE', <<'OUT', "cycle no exit 2");
+.sub _main
+    foo(0, 1, 2, 3, 4, 5)
+.end
+.sub foo
+    .param int done
+    .param int i
+    .param int j
+    .param int k
+    .param int l
+    .param int m
+
+
+    unless done goto tc
+    print_item "i"
+    print_item i
+    print_item "j"
+    print_item j
+    print_item "k"
+    print_item k
+    print_item "l"
+    print_item l
+    print_item "m"
+    print_item m
+
+    print_newline
+    end
+tc:    
+    .return foo(1, m,l,j,i,k)
+.end
+CODE
+i 5 j 4 k 2 l 1 m 3
+OUT
+
+pir_output_is(<<'CODE', <<'OUT', "2 unconnected cycles no exit ");
+.sub _main
+    foo(0, 1, 2, 3, 4, 5)
+.end
+.sub foo
+    .param int done
+    .param int i
+    .param int j
+    .param int k
+    .param int l
+    .param int m
+
+
+    unless done goto tc
+    print_item "i"
+    print_item i
+    print_item "j"
+    print_item j
+    print_item "k"
+    print_item k
+    print_item "l"
+    print_item l
+    print_item "m"
+    print_item m
+
+    print_newline
+    end
+tc:    
+    .return foo(1, k,m,i,j,l)
+.end
+CODE
+i 3 j 5 k 1 l 2 m 4
+OUT
+
+pir_output_is(<<'CODE', <<'OUT', "cycle with exit 1");
+.sub _main
+    foo(0, 1, 2, 3, 4, 5)
+.end
+.sub foo
+    .param int done
+    .param int i
+    .param int j
+    .param int k
+    .param int l
+    .param int m
+
+
+    unless done goto tc
+    print_item "i"
+    print_item i
+    print_item "j"
+    print_item j
+    print_item "k"
+    print_item k
+    print_item "l"
+    print_item l
+    print_item "m"
+    print_item m
+
+    print_newline
+    end
+tc:    
+    .return foo(1, j,i,j,i,j)
+.end
+CODE
+i 2 j 1 k 2 l 1 m 2
+OUT
+
+
 pir_2_pasm_like(<<'CODE', <<'OUT', "in P param");
 .sub _main
     $P0 = new Undef
Index: src/utils.c
===================================================================
--- src/utils.c	(revision 14604)
+++ src/utils.c	(working copy)
@@ -22,11 +22,24 @@
 
 #include "parrot/parrot.h"
 
-#define GRAPH_ELEMENT(PTR,REGS,X,Y)  (PTR)[((Y) * ((REGS) + 3)) + (X)]
-#define ROOT_VERTEX(PTR,REGS,Y)      GRAPH_ELEMENT(PTR,REGS,REGS+0,Y)
-#define REG_MOVE_VAL(PTR,REGS,Y)     GRAPH_ELEMENT(PTR,REGS,REGS+1,Y)
-#define CYCLE_NODE(PTR,REGS,Y)        GRAPH_ELEMENT(PTR,REGS,REGS+2,Y)
+/* Parrot_register_move companion functions i and data */
+typedef struct {
+    unsigned char *dest_regs;
+    unsigned char *src_regs;
+    unsigned char temp_reg;
+    int* nb_succ;
+    int* backup;
+    int* reg_to_index;
+    Interp * interpreter;
+    reg_move_func mov;
+    reg_move_func mov_alt;
+    void *info;
+} parrot_prm_context;
 
+void rec_climb_back_and_mark(int regindex, parrot_prm_context* c);
+void process_cycle_without_exit(int regindex, parrot_prm_context* c);
+void move_reg(int from, int dest, parrot_prm_context* c);
+
 /*
 
 =item C<INTVAL intval_mod(INTVAL i2, INTVAL i3)>
@@ -638,127 +651,101 @@
 }
 
 /*
-  
-=item C<static int
-find_first_indegree(int *graph,int node_count, int dest)>
-  
-Finds the first indegree for the given node.
-   
+
+=item C<void rec_climb_back_and_mark(int node_index, parrot_prm_context* c)>
+
+Recursive function, used by Parrot_register_move to 
+climb back the graph of register moves operations.
+
+The node must have a predecessor: it is implicit because if a node has
+a node_index, it must have a predecessor because the node_index are the
+index of registers in dest_regs[] array, so by definition they have
+a corrsponding src_regs register.
+
+Then it emits the move operation with its predecessor, or its backup
+if already used/visited.
+
+Then continues the climbing if the predecessor was not modified, anf in that
+case marks it, and set node_index as its backup.
+
+  node_index  ... the index of a destination (i.e. with a pred.) register
+  c           ... the graph and all the needed params : the context
+
 =cut
-      
+ 
 */
-static int
-find_first_indegree(int *graph, int node_count, int dest)
-{
-    int i = 0;
-    for (i = 0; i < node_count; i++) {
-        if (GRAPH_ELEMENT(graph, node_count, i, dest) > 0) {
-            return i;
+void
+rec_climb_back_and_mark(int node_index, parrot_prm_context* c) {
+    int pred, pred_index, src, node;
+   
+    node = c->dest_regs[node_index];
+    pred = c->src_regs[node_index];
+    pred_index = c->reg_to_index[pred];
+    
+    if ( pred_index < 0 ) { /* pred has no predecessor */
+        move_reg(pred, node, c);
+    } else { /* pred has a predecessor, so may be processed */
+        src = c->backup[pred_index];
+        if (  src < 0 ) { /* not visited */
+            move_reg(pred, node, c);
+            c->backup[pred_index] = node; /* marks pred*/
+            rec_climb_back_and_mark(pred_index, c);
+        } else { /* already visited, use backup instead */
+            move_reg(src, node, c);
         }
     }
-    return -1;
 }
 
+
 /*
- 
-=item C<static int
-find_root(int *graph ,int node_count, int src, int dest)>
-          
-Finds the root vertex of the graph.
 
+=item C<void process_cycle_without_exit(int node_index, parrot_prm_context* c)>
+
+Recursive function, used by Parrot_register_move to handle the case
+of cycles without exits, that are cycles of move ops between registers
+where each register has exactly one predecessor and one successor
+
+For instance: 1-->2, 2-->3, 3-->1
+
+  node_index  ... the index of a destination (i.e. with a pred.) register
+  c           ... the graph and all the needed params : the context
+
 =cut
+ */
 
-*/
-static int
-find_root(int *graph, int node_count, int src, int dest)
-{
-    int in_degree;
-    if (GRAPH_ELEMENT(graph, node_count, src, dest) == 2) {
-        CYCLE_NODE(graph, node_count, dest) = 1;    
-        GRAPH_ELEMENT(graph, node_count, src, dest) = 1;
-        return dest;
-    }
-    ROOT_VERTEX(graph, node_count, src) = 0;
-    GRAPH_ELEMENT(graph, node_count, src, dest) = 2;
-    in_degree = find_first_indegree(graph, node_count, src);
-    if (in_degree == -1) {
-        ROOT_VERTEX(graph, node_count, dest) = 0;
-        GRAPH_ELEMENT(graph, node_count, src, dest) = 1;
-        return src;
-    }
-    return find_root(graph, node_count, in_degree, src);
-}
+void
+process_cycle_without_exit(int node_index, parrot_prm_context* c) {
+    int pred, pred_index, src_index;
+    int alt = 0;
+ 
+    pred = c->src_regs[node_index];
+    /* pred_index has to be defined cause we are in a cycle so each node has a pred*/
+    pred_index = c->reg_to_index[pred];
 
-/*
+    /* let's try the alternate move function*/
+    if (NULL != c->mov_alt)         
+        alt = c->mov_alt(c->interpreter, c->dest_regs[node_index], pred, c->info);
 
-=item C<static void
-emit_mov(reg_move_func mov, Interp *interpreter, void *info, int emit,
-          int emit_temp, int dest, int src, int temp, int * map)>
-          
-   
-Emit the move instructions
-  
-=cut
-       
-*/
-static void
-emit_mov(reg_move_func mov, Interp * interpreter, void *info, int emit,
-         int emit_temp, int dest, int src, int temp)
-{
-    if (emit > -1) {
-        if (emit_temp) {
-            mov(interpreter, dest, temp, info);
-        }
-        else if (src != dest) {
-            mov(interpreter, dest, src, info);
-        }
-    }
+    if ( 0 == alt ) { /* use temp reg */
+        move_reg(c->dest_regs[node_index],c->temp_reg, c);
+        c->backup[node_index] = c->temp_reg;
+    } else 
+        c->backup[node_index] = c->dest_regs[node_index];
+
+    rec_climb_back_and_mark(node_index, c);
 }
 
 /*
- 
-=item C<static int
-reorder_move(int *graph, INT node_count, int *map, INT * val,
-             int src, int prev, int depth, reg_move_func mov,
-             Interp *interpreter, void *info, int temp)>
-                          
-   
-This method reorders the move operations.OA
-   
-=cut
-       
-*/
-static int
-reorder_move(int *graph, int node_count, int src, int prev, int depth, 
-            reg_move_func mov, Interp * interpreter, void *info, int temp)
-{
-    int i, x;
-    REG_MOVE_VAL(graph, node_count, src) = 1;
-  
-    for (i = 0; i < node_count; i++) {
-        if (GRAPH_ELEMENT(graph, node_count, src, i) > 0) {
-            if (REG_MOVE_VAL(graph, node_count, i) == 1) {
-                emit_mov(mov, interpreter, info, prev, 0, i, src, temp);
-                emit_mov(mov, interpreter, info, prev, depth <= 1, src, prev,
-                         temp);
-                return 1;
-            }
-            if (REG_MOVE_VAL(graph, node_count, i) != 2) {
-                depth++;
-                x = reorder_move(graph, node_count, i, src, depth,
-                                     mov, interpreter, info, temp);
-                depth--;
-                emit_mov(mov, interpreter, info, prev,
-                         x && (depth <= 1), src, prev, temp);
-                return x;
-            }
-        }
-    }
-    REG_MOVE_VAL(graph, node_count, src) = 2;
-    emit_mov(mov, interpreter, info, prev, 0, src, prev, temp);
-    return 0;
+ should be self-speaking
+ */
+
+void 
+move_reg(int from, int dest, parrot_prm_context* c) {
+   /* fprintf(stderr,"move %i ==> %i\n",from,dest);*/
+    c->mov(c->interpreter, dest, from, c->info);
 }
 
+
 /*
 
 =item C<typedef int (*reg_move_func)(Interp*, unsigned char d,
@@ -815,6 +802,7 @@
 2. I1->I2 I3->I2
 
 TODO: Add tests for the above conditions.
+
 =cut
   
 */
@@ -824,53 +812,83 @@
                      unsigned char temp_reg,
                      reg_move_func mov, reg_move_func mov_alt, void *info)
 {
-    int i, uniq_reg_cnt;
-    int *reg_move_graph;
-    uniq_reg_cnt=0;
-
+    int i,index;
+    int max_reg = 0;
+    int* nb_succ = NULL;
+    int* backup = NULL;
+    int* reg_to_index = NULL;
+    parrot_prm_context c;
+      
     if (n_regs == 0)
         return;
 
+    if (n_regs == 1) {
+        if (src_regs[0] != dest_regs[0])
+            mov(interpreter, dest_regs[0], src_regs[0], info);
+        return;
+    }
+    
+    c.interpreter = interpreter;
+    c.info = info;
+    c.mov = mov;
+    c.mov_alt = mov_alt;
+    c.src_regs = src_regs;
+    c.dest_regs = dest_regs;
+    c.temp_reg = temp_reg;
+    
+    /* compute max_reg, the max reg number + 1 */
     for (i = 0; i < n_regs; i++) {
-        if (src_regs[i] > uniq_reg_cnt) {
-            uniq_reg_cnt=src_regs[i];
-        }
-        
-        if (dest_regs[i] > uniq_reg_cnt) {
-            uniq_reg_cnt=dest_regs[i];
-        }   
+        if (src_regs[i] > max_reg) 
+            max_reg = src_regs[i];
+        if (dest_regs[i] > max_reg) 
+            max_reg = dest_regs[i];
     }
-    uniq_reg_cnt++;
+    ++max_reg;
+
     
-    reg_move_graph = (int *)
-        mem_sys_allocate_zeroed(sizeof(int) * uniq_reg_cnt *
-                                (uniq_reg_cnt + 3));
+    /* allocate space for data structures */
+    /* NOTA: data structures could be kept allocated somewhere waiting to get reused...*/
+    c.nb_succ = nb_succ = (int*)mem_sys_allocate_zeroed(sizeof(int) * n_regs);
+    c.backup = backup = (int*)mem_sys_allocate(sizeof(int) * n_regs);
+    c.reg_to_index = reg_to_index = (int*)mem_sys_allocate(sizeof(int) * max_reg);
 
+    /* init backup array */
+    for (i = 0; i < n_regs; i++) 
+        backup[i] = -1;
+
+    /* fill in the conversion array between a register number and its index */
+    for (i = 0; i < max_reg; i++)
+        reg_to_index[i] = -1;
     for (i = 0; i < n_regs; i++) {
-        GRAPH_ELEMENT(reg_move_graph, uniq_reg_cnt, src_regs[i],
-                      dest_regs[i]) = 1;
-        ROOT_VERTEX(reg_move_graph, uniq_reg_cnt,
-                    find_root(reg_move_graph, uniq_reg_cnt, src_regs[i],
-                              dest_regs[i])) = 1;
-        GRAPH_ELEMENT(reg_move_graph, uniq_reg_cnt, src_regs[i],
-                      dest_regs[i]) = 1;
+        index = dest_regs[i];
+        if (index != src_regs[i]) /* get rid of self-assignment */
+            reg_to_index[index] = i;
     }
-    for (i = 0; i < uniq_reg_cnt; i++) {
-        if (ROOT_VERTEX(reg_move_graph, uniq_reg_cnt, i) > 0) {
-            if (GRAPH_ELEMENT(reg_move_graph, uniq_reg_cnt, i, i) == 1) {
-                /* mov(interpreter, i, i, info); */
-            }
-            else {
-                if (CYCLE_NODE(reg_move_graph, uniq_reg_cnt, i)) {
-                    mov(interpreter, temp_reg, i, info);
-                }
-                reorder_move(reg_move_graph, uniq_reg_cnt, i, -1, 0, mov,
-                         interpreter, info, temp_reg);
-            }
+
+    /* count the nb of successors for each reg index */
+    for (i = 0; i < n_regs; i++) {
+        index = reg_to_index[ src_regs[i] ];
+        if ( index >= 0 ) /* not interested in the wells that have no preds */
+            nb_succ[ index ]++;
+    }
+    /* process each well if any */
+    for (i = 0; i < n_regs; i++) {
+        if ( 0 == nb_succ[i] ) { /* a well */
+            rec_climb_back_and_mark(i, &c); 
         }
     }
 
-    mem_sys_free(reg_move_graph);
+    /* process remaining dest registers not processed */
+    /* remaining nodes are members of cycles without exits */
+    for (i = 0; i < n_regs; i++) {
+        if ( 0 < nb_succ[i] && 0 > backup[i] ) { /* not a well nor visited*/
+            process_cycle_without_exit(i, &c);
+        }
+    }
+        
+    mem_sys_free(nb_succ);
+    mem_sys_free(reg_to_index);
+    mem_sys_free(backup);
 }
 
 

Reply via email to