My comments are mostly advisory for optimizers in general ;)

On 2/11/12 3:11 PM, Stefan Manegold wrote:
On Sat, Feb 11, 2012 at 02:06:17PM +0100, Martin Kersten wrote:


On 2/11/12 11:03 AM, Stefan Manegold wrote:
On Wed, Feb 08, 2012 at 10:27:11AM +0100, Martin Kersten wrote:
Changeset: 67c12a700166 for MonetDB
URL: http://dev.monetdb.org/hg/MonetDB?cmd=changeset;node=67c12a700166
Modified Files:
        monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
Branch: default
Log Message:

More advice on the optimizer template.


diffs (140 lines):

diff --git a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx 
b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
--- a/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
+++ b/monetdb5/extras/mal_optimizer_template/opt_sql_append.mx
[...]
@@ -39,6 +39,8 @@ All Rights Reserved.
   * i.e., an sql.append() statement that is eventually followed by some other
   * statement later on in the MAL program that uses the same v0 BAT as
   * argument as the sql.append() statement does,
+ * Do you assume a single re-use of the variable v0?

No. Why?
Use assign-once and use-many-times policy. It can improve parallel
processing
and simplifies scope analysis.

v0 is (as far as I know) created (assigned) once (by Niels, or preceeding
optimizers).
true, on purpose
If it is used only once (only by sql_append), my optimizer does not (have
to) do anything.  Otherwise, it replaces one use v0 (by sql_append) by a
view of v0.
That's the very purpose of this optimizer.

+ * Do you assume a non-nested  MAL block ?

Not necessarily.

Analysis may become complex if you have something like

V0:= expr
barrier E1:=expr
        V0:= expr2
exit E1
now V0 depends on runtime use


same holds for
barrier E1:= expr
        V0:=expr
exit E1
        z:= f(V0)

will be flagged as an error because V0 may be uninitialized

I must admit, that I do not know how the oprimizer framework handles nested
MAL blocks, and what an optimizer needs to do to be aware of nested MAL
blocks and to handle them correctly.
Preferrably the MAL blocks are linear programs (until you reach the
dataflow optimizer).

How do I know / see that in my optimizer?
While looping through the plan you check if p->barrier is set.
You can always safely exit an optimizer.
Do I have to check for barrier / exit statements / constructs myself?
in principle, yes
Optimizers in the pipeline preceeding yours could introduce them.


In the sample optimizer, for now, I'd be fine if there are no
false-positives, i.e., the optimizer triggers in case it should not trigger
or in cases it cannot handle correctly.
I can accept false-negatives, i.e., not triggering in all case it could handle
correctly.

   *
   * and transform them into
   *
@@ -52,6 +54,7 @@ All Rights Reserved.
   *
   * i.e., handing a BAT view v2 of BAT v0 as argument to the sql.append()
   * statement, rather than the original BAT v0.
+ * My advice, always use new variable names, it may capture some easy to make 
errors.

I/my optimizer does use new variables for all new statements/results.
I/my optimizer re-use variable names only for identical results.

   *
   * As a refinement, patterns like
   *
[...]
@@ -181,13 +195,17 @@ OPTsql_appendImplementation(Client cntxt
                                        pushInstruction(mb, q);
                                        q1 = q;
                                        i++;
-                                       actions++;
+                                       actions++;      /* to keep track if 
anything has been done */
                                }
                        }

-                       /* look for
+                       /* look for     
                         *  v5 := ... v0 ...;
                         */
+                       /* an expensive loop, better would be to remember that 
v0 has a different role.
+                        * A typical method is to keep a map from variable ->   
instruction where it was
+                        * detected. The you can check each assignment for use 
of v0
+                       */

This is general support functionality.
Is this already available in the optimizer framework?
I try to use single pass algorithms in the optimizers.
Even in the case of commonterms optimizer, we may have to
traverse the history. This can become a n^2 process

If so, where is it and how can I use it?
Mimic how it is done in other optimizers (e.g. opt_reorder).
Typically, a buffer is maintained per variable to keep
optimization properties around.

If not, where/how could we add it?

                        for (j = i+1; !found&&   j<   limit; j++)
                                for (k = old[j]->retc; !found&&   k<   
old[j]->argc; k++)
                                        found = (getArg(old[j], k) == getArg(p, 
5));
@@ -202,6 +220,8 @@ OPTsql_appendImplementation(Client cntxt

                                /* push new v1 := aggr.count( v0 ); unless 
already available */
                                if (q1 == NULL) {
+                               /* use mal_buil.mx primitives q1 = newStmt(mb, 
aggrRef,countRef); setArgType(mb,q1,TYPE_wrd) */
+                               /* it will be added to the block and even my 
re-use MAL instructions */

Is this (supposed to be) documentation of the existing code below,
or rather advice how to implement the below functionality differently?
Use the mal_builder to simplify your code base.


                                        q1 = newInstruction(mb,ASSIGNsymbol);
                                        getArg(q1,0) = newTmpVariable(mb, 
TYPE_wrd);
                                        setModuleId(q1, aggrRef);
@@ -211,6 +231,7 @@ OPTsql_appendImplementation(Client cntxt
                                }

                                /* push new v2 := algebra.slice( v0, 0, v1 ); */
+                               /* use mal_buil.mx primitives q1 = newStmt(mb, 
algebraRef,sliceRef); */

Is this (supposed to be) documentation of the existing code below,
or rather advice how to implement the below functionality differently?

                                q2 = newInstruction(mb,ASSIGNsymbol);
                                getArg(q2,0) = newTmpVariable(mb, TYPE_any);
                                setModuleId(q2, algebraRef);
@@ -240,6 +261,7 @@ OPTsql_appendImplementation(Client cntxt
        for(i++; i<limit; i++)
                if (old[i])
                        pushInstruction(mb, old[i]);
+       /* any remaining MAL instruction records are removed */
        for(; i<slimit; i++)
                if (old[i])
                        freeInstruction(old[i]);
@@ -253,6 +275,9 @@ OPTsql_appendImplementation(Client cntxt
        return actions;
  }

+/* optimizers have to be registered in the optcatalog in opt_support.c.

Why?
SQL needs a place to pick up all optimizers known. You may also have
to extend the optimizer pipeline validity code.

If at all possible, I'd prefer to be able to add a new optimizer without the
need to change existing code ...
yes understood, but you have to patch Makefile.ag, youroptimizer.mx, and
opt_support. Possibly, you may have to extend opt_prelude as well


+ * you have to path the file accordingly.
"path"
                   ^^^^
parse?

What does this mean? What am I supposed to do in detail?

+ */
  @include ../../optimizer/optimizerWrapper.mx
  @c
  #include "opt_statistics.h"
_______________________________________________
Checkin-list mailing list
Checkin-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/checkin-list


Thanks!

Stefan

_______________________________________________
Checkin-list mailing list
Checkin-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/checkin-list



_______________________________________________
Checkin-list mailing list
Checkin-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/checkin-list

Reply via email to