On Wed, Jun 19, 2013 at 11:48 AM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote:
> On 18.06.2013 23:59, Alexander Korotkov wrote: > >> I would like to illustrate that on example. Imagine you have fulltext >> query >> "rare_term& frequent_term". Frequent term has large posting tree while >> >> rare term has only small posting list containing iptr1, iptr2 and iptr3. >> At >> first we get iptr1 from posting list of rare term, then we would like to >> check whether we have to scan part of frequent term posting tree where >> iptr >> < iptr1. So we call pre_consistent([false, true]), because we know that >> rare term is not present for iptr< iptr2. pre_consistent returns false. >> So >> we can start scanning frequent term posting tree from iptr1. Similarly we >> can skip lags between iptr1 and iptr2, iptr2 and iptr3, from iptr3 to >> maximum possible pointer. >> > > Thanks, now I understand the rare-term & frequent-term problem. Couldn't > you do that with the existing consistent function? I don't see why you need > the new pre-consistent function for this. In the case of two entries I can. But in the case of n entries things becomes more complicated. Imagine you have "term_1 & term_2 & ... & term_n" query. When you get some item pointer from term_1 you can skip all the lesser item pointers from term_2, term_3 ... term_n. But if all you have for it is consistent function you have to call it with following check arguments: 1) [false, false, false, ... , false] 2) [false, true, false, ... , false] 3) [false, false, true, ... , false] 4) [false, true, true, ..., false] ...... i.e. you have to call it 2^(n-1) times. But if you know the query specific (i.e. in opclass) it's typically easy to calculate exactly what we need in single pass. That's why I introduced pre_consistent. ------ With best regards, Alexander Korotkov.