In 1.6 the property name is still in old format, which is "kylin.hbase.region.count.min" in this case.
2017-01-04 14:17 GMT+08:00 Billy Liu <[email protected]>: > The cuboid size is an estimated result. There may some improvements needed > for some special cases. If you could share more about your model, such as > measure and cardinality, it could be easier to figure out the root cause. > > In your case, increase the ratio parameter is one way, another way is to > increase the 'min-region-count' to force more regions. > > 2017-01-04 13:46 GMT+08:00 Da Tong <[email protected]>: > >> Hi, >> >> Thanks for all the reply. >> >> We are running Kylin 1.6.0 on Hadoop 2.7.2. The region.cut (5) and >> hfile.size.gb (2) was left as default. With about 230 million records of >> data (about 800MB of raw data), we built a cube with size 7.7GB. But in >> Hbase, we could only find ONE region with ONE Hfile in the table. >> >> Sorry for not providing every detail information. Because our kylin >> cluster is deployed in a private datacenter, which has a very high privacy >> policy, I could not provide details. I could not even take photo of screen. >> That' sad. : ( >> >> After some more investigation about the source code, we found that >> HBaseMRSteps.createCreateHTableStep is actually invoked >> in outputSide.addStepPhase2_BuildDictionary instead >> of outputSide.addStepPhase3_BuildCube, thanks to Billy's reply. >> >> But our problem is still there. What we observed is that >> in CreateHTableJob, Kylin did some estimation based on cuboid stats. In the >> log, we found that the estimated cuboid size did not exceed the split size, >> so there would be only one region. So we suspect that there may be some >> bias between actual cuboid size and estimated cuboid size. Maybe we could >> try to set kylin.job.cuboid.size.ratio and >> kylin.job.cuboid.size.memhungry.ratio >> to increase the estimated size of cuboid. Hope someone could tell me that >> whether I am on the right way. >> >> Thanks. >> >> On Wed, Jan 4, 2017 at 9:29 AM ShaoFeng Shi <[email protected]> >> wrote: >> >> Tong, could you please provide some detail information, like the >> Kylin/Hadoop version, model/cube description, etc. That would help us to >> analysis. >> >> 2017-01-03 19:59 GMT+08:00 Billy Liu <[email protected]>: >> >> The default region.cut is 5, and default hfile.size.gb is 2. What's your >> setting? >> >> 2017-01-03 19:33 GMT+08:00 Billy Liu <[email protected]>: >> >> Thanks Da Tong for the careful code check. >> But actually, both BatchCubingJobBuilder and BatchCubingJobBuilder2 will >> call HBaseMRSteps.createCreateHTableStep, The CreateHTableJob step will >> calculate the regions by split parameter. >> >> 2017-01-03 16:25 GMT+08:00 Da Tong <[email protected]>: >> >> Hi, >> >> We found that in Hadoop using mapred2 with yarn, the number of HFile >> created by Kylin is always 1. After some investigation, we suspect that in >> engine-mr, the BatchCubingJobBuilder2 works in a different way of >> BatchCubingJobBuilder. BatchCubingJobBuilder will invoke >> HBaseMRSteps.addSaveCuboidToHTableSteps, which include calculating >> region size. But BatchCubingJobBuilder2 invoke >> HBaseMRSteps.createConvertCuboidToHfileStep directly. >> I am not sure that this difference is by design or not. But what we see >> is that we got a single 16GB hfile in a single region even we set >> >> >> >> >> >> >> >> >> >> >> kylin.hbase.region.cut and Kylie.hbase.hfile.size.gb. >> >> >> >> -- >> TONG, Da / 佟达 >> >> >> >> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> -- >> TONG, Da / 佟达 >> > > -- Best regards, Shaofeng Shi 史少锋
