Max cardinality of defeault Dic is 2 millons
Why encode  Sale_ord_id as Dic? if this is an int, you can use integer Dic

Please check:
http://apache-kylin.74782.x6.nabble.com/create-dictionary-error-td7155.html
http://mail-archives.apache.org/mod_mbox/kylin-user/201702.mbox/%3CCAEcyM17BTkhVpFcZLP6%2Boawx%3D1eap%3DZS_ER1HJbhevJPBE71-g%40mail.gmail.com%3E



2017-02-14 10:14 GMT+01:00 仇同心 <[email protected]>:

> Hi ,all
>
>   The first step in cube merge, an error :
>
>
>
>    java.lang.RuntimeException: Too big dictionary, dictionary cannot be
> bigger than 2GB
>
>        at org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(
> TrieDictionaryBuilder.java:421)
>
>        at org.apache.kylin.dict.TrieDictionaryBuilder.build(
> TrieDictionaryBuilder.java:408)
>
>        at org.apache.kylin.dict.DictionaryGenerator$
> StringDictBuilder.build(DictionaryGenerator.java:165)
>
>        at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:81)
>
>        at org.apache.kylin.dict.DictionaryGenerator.buildDictionary(
> DictionaryGenerator.java:73)
>
>        at org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(
> DictionaryGenerator.java:102)
>
>        at org.apache.kylin.dict.DictionaryManager.mergeDictionary(
> DictionaryManager.java:268)
>
>        at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.
> mergeDictionaries(MergeDictionaryStep.java:145)
>
>        at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.
> makeDictForNewSegment(MergeDictionaryStep.java:135)
>
>        at org.apache.kylin.engine.mr.steps.MergeDictionaryStep.
> doWork(MergeDictionaryStep.java:67)
>
>        at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>        at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(
> DefaultChainedExecutable.java:57)
>
>        at org.apache.kylin.job.execution.AbstractExecutable.
> execute(AbstractExecutable.java:113)
>
>        at org.apache.kylin.job.impl.threadpool.DefaultScheduler$
> JobRunner.run(DefaultScheduler.java:136)
>
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
>
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
>
>        at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>      “SALE_ORD_ID”  Cardinality :157644463
>
>      SALE    COUNT_DISTINCT      Value:SALE_ORD_ID, Type:column   bitmap
>
>
>
> I'm wondering that the high base fields can't do count_distinct accurate
> statistical metrics ??
>
>
>
>
>
>
>

Reply via email to