and FYI, the size of IN_THRESHOLD is configurable after https://issues.apache.org/jira/browse/KYLIN-2193
On Mon, Nov 21, 2016 at 4:35 PM, Alberto Ramón <[email protected]> wrote: > very very clear, > thanks ¡¡ > > 2016-11-18 4:16 GMT+01:00 Li Yang <[email protected]>: > >> For filter on derived column, it has to translate into a filter on PK. >> >> E.g. say USER_NAME is a derived column (not on cube), USER_ID is its PK >> (on cube). When filter USER_NAME='liyang' comes in, it need to translate >> into USER_ID in (1,211,382), where ID 1, 211, 382 are three users whose >> name is 'liyang'. >> >> Now consider 'liyang' is so common a name that there are thousands of >> 'liyang's. Then the IN clause becomes super long and can cause performance >> problem during storage scanning. In such case, the filter can be translated >> into a range filter instead, like USER_ID between 1 and 382. >> >> The threshold is used to decided whether the translation to return a IN >> condition or a range condition. >> >> Cheers >> Yang >> >> On Wed, Nov 16, 2016 at 12:35 AM, Alberto Ramón < >> [email protected]> wrote: >> >>> About Kylin 2193 >>> What is the poupose of >>> org.apache.kylin.storage.translate.DerivedFilterTranslator# >>> IN_THRESHOLD ? :) >>> (when is used?) >>> >> >> > -- Regards, *Bin Mahone | 马洪宾*
