On Saturday, March 12, 2016, David G. Johnston <david.g.johns...@gmail.com> wrote:
> On Saturday, March 12, 2016, Tom Lane <t...@sss.pgh.pa.us > <javascript:_e(%7B%7D,'cvml','t...@sss.pgh.pa.us');>> wrote: > >> "David G. Johnston" <david.g.johns...@gmail.com> writes: >> > Don't the semantics of a SEMI JOIN also state that the output columns >> only >> > come from the outer relation? i.e., the inner relation doesn't >> contribute >> > either rows or columns to the final result? Or is that simply >> > an implementation artifact of the fact that the only current way to >> perform >> > a semi-join explicitly is via exists/in? >> >> I think it's an artifact. What nodes.h actually says about it is you get >> the values of one randomly-selected matching inner row, which seems like >> a fine definition for the purposes we plan to put it to. >> >> > But is it a definition that actually materializes anywhere presently? > > I'm not sure what we consider an authoritative source but relational > algebra does define the results of semi and anti joins as only containing > rows from main relation. > > https://en.m.wikipedia.org/wiki/Relational_algebra > Pondering it more calling these optimizations "semi" joins further distances us from the meaning of "semi" as used in relational algebra. The half that semi refers to IS that only one half of the tables are returned. That you only get a single row of output regardless of multiple potential matches is simply a consequence of this and general set theory. In short "semi" communicates a semantic meaning as to the intended output of the query irrespective of the data upon which it is executed. We now are hijacking the and calling something "semi" if by some chance the data the query is operating against happens to be accommodating to some particular optimization. This seems wrong on definitional and cleanliness grounds. So while I'm still liking the idea of introducing specializations of outer and inner joins I think calling them "semi" joins adds a definitional inconsistency we are better off avoiding. This came about because calling something "outer semi join" struck me as odd. Something like "outer only join" and "inner only join" comes to mind. Consider the parallel between this and "index only scan". Learning that "only" means "join the outer row to the (at most for outer) one and only row in the inner relation" doesn't seem to much of a challenge. David J.