Okay, the estimation ratio is too small for bitmap type measure. Could you please open a JIRA with your findings? We can enhance that in the future release. Thanks!
2017-08-08 12:56 GMT+08:00 Alexander Sterligov <[email protected]>: > Yes, I'm using lz4. > > On Tue, Aug 8, 2017 at 4:15 AM, ShaoFeng Shi <[email protected]> > wrote: > >> Thanks for the input. Did you enable any compression (e.g, LZO, >> Snappy) for HBase? >> >> 2017-08-08 0:49 GMT+08:00 Alexander Sterligov <[email protected]>: >> >>> All parameters were default. I've found out that it is really related to >>> size estimation of count distinct measure. F2 family were underestimated >>> for about 4 times. >>> >>> After I set kylin.cube.size-estimate-countdistinct-ratio=0.2 >>> estimations are good and it works much better. >>> >>> It looks like default value of 0.05 is too low for bitmap and global >>> dictionary. >>> >>> Cube description is attached. >>> >>> On Mon, Aug 7, 2017 at 6:21 AM, ShaoFeng Shi <[email protected]> >>> wrote: >>> >>>> Hi Alexander, >>>> >>>> Sometimes there will be over-estimation for the size if Cube has some >>>> complex measure like count distinct and topn, but seldom heard of less >>>> estimation. Did you change other parameters which may impact on the >>>> estimation in kylin.properties? Besides, if you can share the Cube >>>> definition, that would help (information like dimension/measure, rowkey >>>> encoding will also impact on the region split). >>>> >>>> 2017-08-07 3:03 GMT+08:00 Alexander Sterligov <[email protected]>: >>>> >>>>> I've found out that sharding is done manually, so running split in >>>>> hbase shell breaks data. >>>>> >>>>> So the main problem is that region-cut doesn't work on hbase with s3. >>>>> I see that in the log it creates shards properly: >>>>> >>>>> 2017-08-05 20:54:48,709 INFO [Job >>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>>>> steps.CreateHTableJob:192 : Total size 21334.075368547456M (estimated) >>>>> 2017-08-05 20:54:48,709 INFO [Job >>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>>>> steps.CreateHTableJob:193 : Expecting 4 regions. >>>>> 2017-08-05 20:54:48,709 INFO [Job >>>>> 1175d3ed-504f-4eb0-a973-d57338fdff2c-892] >>>>> steps.CreateHTableJob:194 : Expecting 5333 MB per region. >>>>> >>>>> But then I get single 20GB region. >>>>> >>>>> Did anyone had same behaviour? >>>>> >>>>> On Sun, Aug 6, 2017 at 8:15 PM, Alexander Sterligov < >>>>> [email protected]> wrote: >>>>> >>>>>> hi, >>>>>> >>>>>> I noticed very large hbase region for one segment (more than 20GB and >>>>>> kylin.storage.hbase.region-cut-gb=5). I don't know why it is so >>>>>> large, but anyway it degraded performance a lot, so I decided to split it >>>>>> in hbase. >>>>>> >>>>>> When the split has just started kylin started to return empty results >>>>>> for queries to this segment. >>>>>> >>>>>> Why can that happen? >>>>>> >>>>>> PS >>>>>> It seams to me that kylin.storage.hbase.region-cut-gb doesn't work >>>>>> in case if external hbase cluster is used. >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Best regards, >>>> >>>> Shaofeng Shi 史少锋 >>>> >>>> >>> >> >> >> -- >> Best regards, >> >> Shaofeng Shi 史少锋 >> >> > -- Best regards, Shaofeng Shi 史少锋
