You perhaps don't understand my question. My question is: original column
value is '1_3_12_15_27_35', but it can't directly be used to dimension
value, so it must be splited to 6 values [1, 3, 12, 15, 27, 35], and this
values will be used to construct the rowkey, and origianl record row will
be expanded to 6 times, it is too big. Is there a way to read '
1_3_12_15_27_35' and automate split it to 6 values in distinct column and
other step, use this values to create dimension dictionary and rowkey, and
don't need to preprocess orignal data.

Li Yang <[email protected]>于2016年8月18日周四 下午6:47写道:

> Depends on how you query/process the multi-value field, the answer will be
> different.
>
> Could you share some query sample?
>
> On Wed, Aug 17, 2016 at 2:35 PM, 张天生 <[email protected]> wrote:
>
>> Can someone help me to answer this question? I was still waiting for
>> answer.
>>
>> 张天生 <[email protected]>于2016年8月15日周一 上午11:28写道:
>>
>>> I have a dimension user_tags, it is a multi-value column, for example
>>> the value is "1_3_12_15_27_35_...", it was seperated by "_". As i known,
>>> kylin don't directly propress this multi-value column, it must preprocess
>>> it to a single value column, but it will increase record count to 50~100
>>> times, the data is too big.So is there a way to deal with multi-value
>>> dimension, it don't need to split the value to many record, in calculate
>>> dimension cardinality, it can read original data and automate split the
>>> value to multi-value and process, and it will save disk i/o and cpu
>>> spending.
>>>
>>
>

Reply via email to