Re: kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb is not working in yarn

ShaoFeng Shi Wed, 04 Jan 2017 00:32:33 -0800

In 1.6 the property name is still in old format, which is
"kylin.hbase.region.count.min"
in this case.


2017-01-04 14:17 GMT+08:00 Billy Liu <[email protected]>:

> The cuboid size is an estimated result. There may some improvements needed
> for some special cases. If you could share more about your model, such as
> measure and cardinality, it could be easier to figure out the root cause.
>
> In your case, increase the ratio parameter is one way, another way is to
> increase the 'min-region-count' to force more regions.
>
> 2017-01-04 13:46 GMT+08:00 Da Tong <[email protected]>:
>
>> Hi,
>>
>> Thanks for all the reply.
>>
>> We are running Kylin 1.6.0 on Hadoop 2.7.2. The region.cut (5) and
>> hfile.size.gb (2) was left as default. With about 230 million records of
>> data (about 800MB of raw data), we built a cube with size 7.7GB. But in
>> Hbase, we could only find ONE region with ONE Hfile in the table.
>>
>> Sorry for not providing every detail information. Because our kylin
>> cluster is deployed in a private datacenter, which has a very high privacy
>> policy, I could not provide details. I could not even take photo of screen.
>> That' sad. : (
>>
>> After some more investigation about the source code, we found that
>> HBaseMRSteps.createCreateHTableStep is actually invoked
>> in outputSide.addStepPhase2_BuildDictionary instead
>> of outputSide.addStepPhase3_BuildCube, thanks to Billy's reply.
>>
>> But our problem is still there. What we observed is that
>> in CreateHTableJob, Kylin did some estimation based on cuboid stats. In the
>> log, we found that the estimated cuboid size did not exceed the split size,
>> so there would be only one region. So we suspect that there may be some
>> bias between actual cuboid size and estimated cuboid size. Maybe we could
>> try to set kylin.job.cuboid.size.ratio and 
>> kylin.job.cuboid.size.memhungry.ratio
>> to increase the estimated size of cuboid. Hope someone could tell me that
>> whether I am on the right way.
>>
>> Thanks.
>>
>> On Wed, Jan 4, 2017 at 9:29 AM ShaoFeng Shi <[email protected]>
>> wrote:
>>
>> Tong, could you please provide some detail information, like the
>> Kylin/Hadoop version, model/cube description, etc. That would help us to
>> analysis.
>>
>> 2017-01-03 19:59 GMT+08:00 Billy Liu <[email protected]>:
>>
>> The default region.cut is 5, and default hfile.size.gb is 2. What's your
>> setting?
>>
>> 2017-01-03 19:33 GMT+08:00 Billy Liu <[email protected]>:
>>
>> Thanks Da Tong for the careful code check.
>> But actually, both BatchCubingJobBuilder and BatchCubingJobBuilder2 will
>> call HBaseMRSteps.createCreateHTableStep, The CreateHTableJob step will
>> calculate the regions by split parameter.
>>
>> 2017-01-03 16:25 GMT+08:00 Da Tong <[email protected]>:
>>
>> Hi,
>>
>> We found that in Hadoop using mapred2 with yarn, the number of HFile
>> created by Kylin is always 1. After some investigation, we suspect that in
>> engine-mr, the BatchCubingJobBuilder2 works in a different way of
>> BatchCubingJobBuilder. BatchCubingJobBuilder   will invoke
>> HBaseMRSteps.addSaveCuboidToHTableSteps, which include calculating
>> region size. But BatchCubingJobBuilder2 invoke
>> HBaseMRSteps.createConvertCuboidToHfileStep directly.
>> I am not sure that this difference is by design or not. But what we see
>> is that we got a single 16GB hfile in a single region even we set
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb.
>>
>>
>>
>> --
>> TONG, Da / 佟达
>>
>>
>>
>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>> --
>> TONG, Da / 佟达
>>
>
>


-- 
Best regards,

Shaofeng Shi 史少锋

Re: kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb is not working in yarn

Reply via email to