Make sure number of regions is at least number of physical disks on cluster, if not split or salt. Do the math based on row size and target performance on disk throughput, number of regions etc. If necessary, add servers or disks. Also look at hbase cache settings, JVM heap sizes, GC settings etc. Depending on the data, compression can improve performance. Snappy typically does less compression than gzip but at less CPU cost. Gzip can get pretty high ratios but writes are more costly than reads, so major compactions get get backed up. For typical data both will likely increase read throughout. Depending on how often rows are updated, removed, added, change default hbase major compaction interval, or force major compaction after large updates. Also, unless counting rows is your use case, don't worry about how long it takes to count them. Base expectations on expected use cases. With the overhead of Phoenix query parsing, threading in the client, etc etc you probably won't do much better than sub-second on aggregates on 1+ mil rows.
> On Oct 31, 2016, at 5:19 PM, Fawaz Enaya <m.fawaz.en...@gmail.com> wrote: > > Thanks for your answer but why it gives 1 way parallel and can not be more? > >> On Sunday, 30 October 2016, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> If you create a secondary index in Phoenix on the table on single or >> selected columns, that index (which will be added to Hbase) will be used to >> return data. For example in below MARKETDATAHBASE_IDX1 is an index on table >> MARKETDATAHBASE and is used by the query >> >> >> 0: jdbc:phoenix:rhes564:2181> EXPLAIN select count(1) from MARKETDATAHBASE; >> +--------------------------------------------------------------------+ >> | PLAN | >> +--------------------------------------------------------------------+ >> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER MARKETDATAHBASE_IDX1 | >> | SERVER FILTER BY FIRST KEY ONLY | >> | SERVER AGGREGATE INTO SINGLE ROW | >> +--------------------------------------------------------------------+ >> >> HTH >> >> >> >> >> >> >> >> >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> http://talebzadehmich.wordpress.com >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >>> On 30 October 2016 at 11:42, Fawaz Enaya <m.fawaz.en...@gmail.com> wrote: >>> Hi All in this great project, >>> >>> >>> I have an HBase cluster of four nodes, I use Phoenix to access HBase, but I >>> do not know why its too much slow to execute SELECT count(*) for table >>> contains 5 million records it takes 8 seconds. >>> Below is the explain for may select statement >>> CLIENT 6-CHUNK 9531695 ROWS 629145639 BYTES PARALLEL 1-WAY FULL SCAN OVER >>> TABLE | >>> >>> | SERVER FILTER BY FIRST KEY ONLY >>> | >>> >>> | SERVER AGGREGATE INTO SINGLE ROW >>> >>> Anyone can help. >>> >>> Many Thanks >>> -- >>> Thanks & regards, >>> >> > > > -- > -- > Thanks & regards, > >