On Thu, Feb 27, 2014 at 1:07 AM, Alexander Korotkov <aekorot...@gmail.com>wrote:
> On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas < > hlinnakan...@vmware.com> wrote: > >> On 02/09/2014 12:11 PM, Alexander Korotkov wrote: >> >>> I've rebased catalog changes with last master. Patch is attached. I've >>> rerun my test suite with both last master ('committed') and attached >>> patch ('ternary-consistent'). >>> >> >> Thanks! >> >> >> method | sum >>> ------------------------+------------------ >>> committed | 143491.715000001 >>> fast-scan-11 | 126916.111999999 >>> fast-scan-light | 137321.211 >>> fast-scan-light-heikki | 138168.028000001 >>> master | 446976.288 >>> ternary-consistent | 125923.514 >>> >>> I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8 >>> to 4. >>> >> >> Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make >> sure we get similar behavior in Tomas' tests that used 6 search terms. But >> I always felt that it was too large for real queries, once we have the >> catalog changes, that's why I lowered to 4 when committing. If an opclass >> benefits greatly from fast scan, it should provide the ternary consistent >> function, and not rely on the shim implementation. >> >> >> I'm not sure about decision to reserve separate procedure number for >>> ternary consistent. Probably, it would be better to add ginConfig method. >>> It would be useful for post 9.4 improvements. >>> >> >> Hmm, it might be useful for an opclass to provide both, a boolean and >> ternary consistent function, if the boolean version is significantly more >> efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a >> quick check through the array to see if there are any MAYBE arguments, >> within the consistent function. But I'm inclined to keep the possibility to >> provide both versions. As long as we support the boolean version at all, >> there's not much difference in terms of the amount of code to support >> having them both for the same opclass. >> >> A ginConfig could be useful for many other things, but I don't think it's >> worth adding it now. >> >> >> What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? >> We discussed that earlier, but didn't reach any conclusion. That needs to >> be clarified in the docs. One possibility is to document that they're >> equivalent. Another is to forbid one of them. Yet another is to assign a >> different meaning to each. >> >> I've been thinking that it might be useful to define them so that a MAYBE >> result from the tri-consistent function means that it cannot decide if you >> have a match or not, because some of the inputs were MAYBE. And >> TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or >> FALSE, the result would be the same, TRUE+recheck. The practical difference >> would be that if the tri-consistent function returns TRUE+recheck, ginget.c >> wouldn't need to bother fetching the other entries, it could just return >> the entry with recheck=true immediately. While with MAYBE result, it would >> fetch the other entries and call tri-consistent again. ginget.c doesn't >> currently use the tri-consistent function that way - it always fetches all >> the entries for a potential match before calling tri-consistent, but it >> could. I had it do that in some of the patch versions, but Tomas' testing >> showed that it was a big loss on some queries, because the consistent >> function was called much more often. Still, something like that might be >> sensible in the future, so it might be good to distinguish those cases in >> the API now. Note that ginarrayproc is already using the return values like >> that: in GinContainedStrategy, it always returns TRUE+recheck regardless of >> the inputs, but in other cases it uses GIN_MAYBE. > > > Next revision of patch is attached. > > In this version opclass should provide at least one consistent function: > binary or ternary. It's expected to achieve best performance when opclass > provide both of them. However, tests shows opposite :( I've to recheck it > carefully. > However, it's not! This revision of patch completes my test case in 123330 ms. This is slightly faster than previous revision. ------ With best regards, Alexander Korotkov.