Re: [GENERAL] Google Summer of Code: Full Disjunctions

Tzahi Fadida Fri, 12 May 2006 14:02:18 -0700

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> [EMAIL PROTECTED]
> Sent: Tuesday, May 09, 2006 1:51 AM
> To: [email protected]
> Subject: Re: [GENERAL] Google Summer of Code: Full Disjunctions
> 
> > First, i have no knowledge of anyone that have implemented 
> full disjunctions(ever) aside
> > from the theoretical works of my colleagues.
> > With the exception of a corner case of it, that I believe 
> was a simulation in 96.
> > (A. Rajaman and J.D. Ullman Integrating information by 
> outerjoins and full-disjunctions).
> > I'd love to hear about any implementation out there (aside 
> from my colleagues work, which
> > is mine also: cohen,sagiv, kimelfeld,kanza)
> 
> I didn't mean to imply there was. It was the Rajaraman & Ullman paper
> that got me interested in FD's and then I've looked at the "Computing
> Full Disjunctions" paper by Kanza & Sagiv which gives a general
> solution.
> Obviously from the second paper it's clear that implementing full
> disjunction (efficiently) is a non-trivial exercise.


If only all the FD problems would have the same difficulty, Rajaraman & Ullman 
paper
tries to solve. Their paper is actually trying to pinpoint where you can no 
longer use
your standard SQL tools to compute the FD. Commonly, as a rule of thumb, if you
have a cycle in your scheme graph, you can no longer use Rajaraman & Ullman
method (with some delicate exceptions) 
of using a series of natural outer joins to compute the FD which is very fast 
and
simple. However, in the general case where you have cycles you can forget about 
it.
The problem becomes very hard to compute and I wouldn't suggest it for huge 
relations.
My work is to see how we can alleviate the common troubles but the problem is 
still hard.

> 
> > It can never be a binary operation since at the heart of 
> the matter is that you need to take
> > each subset of the relations and join them. i.e.:
> ...
> > Usually binary operations allow for a bottom up computation 
> approach, but FD is a TOP down approach
> > (Galindo-Legaria, C. outerjoins as disjunctions).
> 
> Right, thanks for clarifying.
> 
> >From a data analysis perspective I would like to be able to look at
> various subsets, eg. FD(A,B,C), FD(B,C,D), FD(A,B,C,D) etc and so this
> just means that each subset has too be computed independantly. I can
> live with that but wasn't sure if I had missed something.

Basically you only have to look at FD(A,B,C,D).
It includes all the information of FD(A,B,C) and FD(B,C,D).

> 
> In any case, the difficulty of implementing FD precludes me from
> experimenting with it just yet.

Well, as I said, I already implemented it. What's left is to clean up the code 
from the statistics
of my experiments and improve the coding style and add some improvements I've 
been meaning to add.
The code is pretty big considering. It's about 200kb C code for the one 
algorithm I want to work on
for the gsoc.
And bigger if I will have time to add some of the others.

> 
> Regards
> Lee
> 
> 
> ---------------------------(end of 
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
> 
> 



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: [GENERAL] Google Summer of Code: Full Disjunctions

Reply via email to