I understand your meaning, it will very slowly in cube building. Li Yang <[email protected]>于2016年8月27日周六 下午8:52写道:
> I think making the data model right is the first thing. Expand the > multi-value into multiple rows is the right approach. The concern that the > result will be too big is then a secondary issue. There are plenty ways to > handle a big table. E.g. it can be a view that only temporarily exists > during cube build and is deleted right after build complete. > > On Thu, Aug 18, 2016 at 7:50 PM, 张天生 <[email protected]> wrote: > >> You perhaps don't understand my question. My question is: original column >> value is '1_3_12_15_27_35', but it can't directly be used to dimension >> value, so it must be splited to 6 values [1, 3, 12, 15, 27, 35], and >> this values will be used to construct the rowkey, and origianl record row >> will be expanded to 6 times, it is too big. Is there a way to read ' >> 1_3_12_15_27_35' and automate split it to 6 values in distinct column >> and other step, use this values to create dimension dictionary and rowkey, >> and don't need to preprocess orignal data. >> >> Li Yang <[email protected]>于2016年8月18日周四 下午6:47写道: >> >>> Depends on how you query/process the multi-value field, the answer will >>> be different. >>> >>> Could you share some query sample? >>> >>> On Wed, Aug 17, 2016 at 2:35 PM, 张天生 <[email protected]> wrote: >>> >>>> Can someone help me to answer this question? I was still waiting for >>>> answer. >>>> >>>> 张天生 <[email protected]>于2016年8月15日周一 上午11:28写道: >>>> >>>>> I have a dimension user_tags, it is a multi-value column, for example >>>>> the value is "1_3_12_15_27_35_...", it was seperated by "_". As i known, >>>>> kylin don't directly propress this multi-value column, it must preprocess >>>>> it to a single value column, but it will increase record count to 50~100 >>>>> times, the data is too big.So is there a way to deal with multi-value >>>>> dimension, it don't need to split the value to many record, in calculate >>>>> dimension cardinality, it can read original data and automate split the >>>>> value to multi-value and process, and it will save disk i/o and cpu >>>>> spending. >>>>> >>>> >>> >
