Thanks for the input. Did you enable any compression (e.g, LZO, Snappy) for HBase?
2017-08-08 0:49 GMT+08:00 Alexander Sterligov <[email protected]>: > All parameters were default. I've found out that it is really related to > size estimation of count distinct measure. F2 family were underestimated > for about 4 times. > > After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 estimations > are good and it works much better. > > It looks like default value of 0.05 is too low for bitmap and global > dictionary. > > Cube description is attached. > > On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <[email protected]> > wrote: > >> Hi Alexander, >> >> Sometimes there will be over-estimation for the size if Cube has some >> complex measure like count distinct and topn, but seldom heard of less >> estimation. Did you change other parameters which may impact on the >> estimation in kylin.properties? Besides, if you can share the Cube >> definition, that would help (information like dimension/measure, rowkey >> encoding will also impact on the region split). >> >> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <[email protected]>: >> >>> I've found out that sharding is done manually, so running split in hbase >>> shell breaks data. >>> >>> So the main problem is that region-cut doesn't work on hbase with s3. I >>> see that in the log it creates shards properly: >>> >>> 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) >>> 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>> steps.CreateHTableJob:193 : Expecting 4 regions. >>> 2017-08-05 20:54:48,709 INFO [Job 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>> steps.CreateHTableJob:194 : Expecting 5333 MB per region. >>> >>> But then I get single 20GB region. >>> >>> Did anyone had same behaviour? >>> >>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov <[email protected] >>> > wrote: >>> >>>> hi, >>>> >>>> I noticed very large hbase region for one segment (more than 20GB and >>>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so large, >>>> but anyway it degraded performance a lot, so I decided to split it in >>>> hbase. >>>> >>>> When the split has just started kylin started to return empty results >>>> for queries to this segment. >>>> >>>> Why can that happen? >>>> >>>> PS >>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work in >>>> case if external hbase cluster is used. >>>> >>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋
