Re: IN_THRESHOLD

hongbin ma Sat, 10 Dec 2016 06:05:15 -0800

and FYI, the size of IN_THRESHOLD is configurable after
https://issues.apache.org/jira/browse/KYLIN-2193


On Mon, Nov 21, 2016 at 4:35 PM, Alberto Ramón <[email protected]>
wrote:

> very very clear,
> thanks ¡¡
>
> 2016-11-18 4:16 GMT+01:00 Li Yang <[email protected]>:
>
>> For filter on derived column, it has to translate into a filter on PK.
>>
>> E.g. say USER_NAME is a derived column (not on cube), USER_ID is its PK
>> (on cube). When filter USER_NAME='liyang' comes in, it need to translate
>> into USER_ID in (1,211,382), where ID 1, 211, 382 are three users whose
>> name is 'liyang'.
>>
>> Now consider 'liyang' is so common a name that there are thousands of
>> 'liyang's. Then the IN clause becomes super long and can cause performance
>> problem during storage scanning. In such case, the filter can be translated
>> into a range filter instead, like USER_ID between 1 and 382.
>>
>> The threshold is used to decided whether the translation to return a IN
>> condition or a range condition.
>>
>> Cheers
>> Yang
>>
>> On Wed, Nov 16, 2016 at 12:35 AM, Alberto Ramón <
>> [email protected]> wrote:
>>
>>> About Kylin 2193
>>> What is the poupose of 
>>> org.apache.kylin.storage.translate.DerivedFilterTranslator#
>>> IN_THRESHOLD ? :)
>>> (when is used?)
>>>
>>
>>
>


-- 
Regards,

*Bin Mahone | 马洪宾*

Re: IN_THRESHOLD

Reply via email to