Hi Joel,

Can this problem be reproduced consistently? I guess it is an environment
issue. When building cube, Kylin will dump the dependent resources
(metadata, dictionaries) from HBase to local disk, and then submit to MR
nodes via 'tmpfiles' ; If the resources weren't submitted correctly, the
mapper may get a broken dictionary.

For the other two questions:

1) values duplicate in the input file: It shouldn't happen usually as the
reducer already remove the redandancy. The only possibility is that reducer
was executed more than once; Two reducers write the same file.
2) But even 1) happens, the dictionary building will remove redandancy
again.

Please check the disk usages and others, and then retry. Good luck!

2016-10-11 10:12 GMT+08:00 Joel Victor <[email protected]>:

> I am aware of this JIRA and have gone through it and the patch. I would
> like to point out that I am not facing the problem in the #4 step. I am
> facing it in the #17 step i.e. build cube. Also neither is the cardinality
> of  my column high nor is it hitting the 1GB mark.
>
> I am using Kylin 1.5.2.1 but I don't think that should be a problem
> because I am not hitting any of the limits that would cause the problem.
>
> Thanks,
> -Joel
>
>
>
> On Mon, Oct 10, 2016 at 11:55 PM, Alberto Ramón <[email protected]
> > wrote:
>
>> can you check this bug: KYLIN-1834
>> <https://issues.apache.org/jira/browse/KYLIN-1834>
>> check if your actual version is 1.5.2  there is a new version 1.5.4.2
>>
>> good luck, Alb
>>
>> 2016-10-10 18:38 GMT+02:00 Joel Victor <[email protected]>:
>>
>>> I have come across this error where I get a exception in the cube build
>>> step (step #17) which says that a particular key does not exist in the trie
>>> dictionary.
>>>
>>> The build dictionary step says that the value is present in the
>>> dictionary. I have deduced this by looking at the Kylin debug logs.
>>> Following is the log
>>> 2016-10-10 05:43:06,956 DEBUG [pool-5-thread-6]
>>> dict.DictionaryGenerator:86 : Dictionary value samples: =>0,
>>> 20160628082452279-7bdd009d55a794c=>17606, 
>>> 20140225082303539-a91c9daed8602d1=>1,
>>> 20140225082452582-d55ca8b438418c4=>2, 20140225082509763-e9c208ceff68
>>> ea1=>3
>>>
>>>
>>> Also I went back and checked the input for the build dimension
>>> dictionary step. In particular the -input paramter
>>>
>>> I found that the id for which the error is surfacing has a occurs twice in 
>>> the -input file.
>>>
>>> I wanted to ask that whether there Is a chance that duplicate values in the 
>>> input file while creating the dimension dictionary would cause the error 
>>> mentioned above in the build cube (#17) step?
>>>
>>> From what I can tell there shouldn't be a duplicate values since the step 
>>> before that is to extract distinct values.
>>>
>>> Following is the stack trace for the cube build step for reference:
>>>
>>> 2016-10-10 05:45:17,461 ERROR [Thread-11] 
>>> org.apache.kylin.dict.TrieDictionary: Not a valid value:
>>> 20160628082452279-7bdd009d55a794c
>>> 2016-10-10 05:45:18,462 ERROR [pool-5-thread-1] 
>>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder: Dogged Cube Build error
>>> java.io.IOException: java.lang.IllegalArgumentException: Value not exists!
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.abort(DoggedCubeBuilder.java:193)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.checkException(DoggedCubeBuilder.java:166)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$BuildOnce.build(DoggedCubeBuilder.java:113)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder.build(DoggedCubeBuilder.java:72)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.AbstractInMemCubeBuilder$1.run(AbstractInMemCubeBuilder.java:74)
>>>     at 
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>     at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>     at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>     at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.lang.IllegalArgumentException: Value not exists!
>>>     at 
>>> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>>>     at 
>>> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>>>     at 
>>> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>>>     at 
>>> org.apache.kylin.dimension.DictionaryDimEnc$DictionarySerializer.serialize(DictionaryDimEnc.java:120)
>>>     at 
>>> org.apache.kylin.cube.gridtable.CubeCodeSystem.encodeColumnValue(CubeCodeSystem.java:122)
>>>     at 
>>> org.apache.kylin.cube.gridtable.CubeCodeSystem.encodeColumnValue(CubeCodeSystem.java:111)
>>>     at org.apache.kylin.gridtable.GTRecord.setValues(GTRecord.java:99)
>>>     at org.apache.kylin.gridtable.GTRecord.setValues(GTRecord.java:87)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilderInputConverter.convert(InMemCubeBuilderInputConverter.java:75)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConverter$1.next(InMemCubeBuilder.java:540)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder$InputConverter$1.next(InMemCubeBuilder.java:521)
>>>     at 
>>> org.apache.kylin.gridtable.GTAggregateScanner.iterator(GTAggregateScanner.java:133)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.createBaseCuboid(InMemCubeBuilder.java:337)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InMemCubeBuilder.java:164)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.InMemCubeBuilder.build(InMemCubeBuilder.java:133)
>>>     at 
>>> org.apache.kylin.cube.inmemcubing.DoggedCubeBuilder$SplitThread.run(DoggedCubeBuilder.java:281)
>>>
>>>
>>> Thanks,
>>>
>>> -Joel
>>>
>>>
>>>
>>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to