Re: kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb is not working in yarn

Billy Liu Tue, 03 Jan 2017 22:18:03 -0800

The cuboid size is an estimated result. There may some improvements needed
for some special cases. If you could share more about your model, such as
measure and cardinality, it could be easier to figure out the root cause.


In your case, increase the ratio parameter is one way, another way is to
increase the 'min-region-count' to force more regions.

2017-01-04 13:46 GMT+08:00 Da Tong <[email protected]>:

> Hi,
>
> Thanks for all the reply.
>
> We are running Kylin 1.6.0 on Hadoop 2.7.2. The region.cut (5) and
> hfile.size.gb (2) was left as default. With about 230 million records of
> data (about 800MB of raw data), we built a cube with size 7.7GB. But in
> Hbase, we could only find ONE region with ONE Hfile in the table.
>
> Sorry for not providing every detail information. Because our kylin
> cluster is deployed in a private datacenter, which has a very high privacy
> policy, I could not provide details. I could not even take photo of screen.
> That' sad. : (
>
> After some more investigation about the source code, we found that
> HBaseMRSteps.createCreateHTableStep is actually invoked
> in outputSide.addStepPhase2_BuildDictionary instead
> of outputSide.addStepPhase3_BuildCube, thanks to Billy's reply.
>
> But our problem is still there. What we observed is that
> in CreateHTableJob, Kylin did some estimation based on cuboid stats. In the
> log, we found that the estimated cuboid size did not exceed the split size,
> so there would be only one region. So we suspect that there may be some
> bias between actual cuboid size and estimated cuboid size. Maybe we could
> try to set kylin.job.cuboid.size.ratio and 
> kylin.job.cuboid.size.memhungry.ratio
> to increase the estimated size of cuboid. Hope someone could tell me that
> whether I am on the right way.
>
> Thanks.
>
> On Wed, Jan 4, 2017 at 9:29 AM ShaoFeng Shi <[email protected]>
> wrote:
>
> Tong, could you please provide some detail information, like the
> Kylin/Hadoop version, model/cube description, etc. That would help us to
> analysis.
>
> 2017-01-03 19:59 GMT+08:00 Billy Liu <[email protected]>:
>
> The default region.cut is 5, and default hfile.size.gb is 2. What's your
> setting?
>
> 2017-01-03 19:33 GMT+08:00 Billy Liu <[email protected]>:
>
> Thanks Da Tong for the careful code check.
> But actually, both BatchCubingJobBuilder and BatchCubingJobBuilder2 will
> call HBaseMRSteps.createCreateHTableStep, The CreateHTableJob step will
> calculate the regions by split parameter.
>
> 2017-01-03 16:25 GMT+08:00 Da Tong <[email protected]>:
>
> Hi,
>
> We found that in Hadoop using mapred2 with yarn, the number of HFile
> created by Kylin is always 1. After some investigation, we suspect that in
> engine-mr, the BatchCubingJobBuilder2 works in a different way of
> BatchCubingJobBuilder. BatchCubingJobBuilder   will invoke 
> HBaseMRSteps.addSaveCuboidToHTableSteps,
> which include calculating region size. But BatchCubingJobBuilder2 invoke
> HBaseMRSteps.createConvertCuboidToHfileStep directly.
> I am not sure that this difference is by design or not. But what we see is
> that we got a single 16GB hfile in a single region even we set
>
>
>
>
>
>
>
>
>
>
> kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb.
>
>
>
> --
> TONG, Da / 佟达
>
>
>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi 史少锋
>
> --
> TONG, Da / 佟达
>

Re: kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb is not working in yarn

Reply via email to