Guys, 

You have two issues. 

1) Physical structure and organization.
2) Logical organization and data usage. 

This goes to the question of your data access pattern and use case. 

The best example of how to use Column Families that I can think of is an order 
entry system. 
Here you would have something like 4-5 CF. (Order, Pick Slips, shipping, 
Invoice, metadata??)  

Note that while there is some overlap of the data between CFs, it allows for 
querying only one CF to be queried… maybe 2 if you’re accessing the metadata 
and its stored separately. 
(THIS IS NOT NECESSARILY A RELATIONAL MODEL)

I’m sure that there are other models that could be used as an example, but this 
is one that any classically trained database developer would understand. 
(Reservation Systems, Medical Billing, … could also be used.) 

So, while the physical issues of HBase Managing N CFs per table, you still have 
to deal with the design issue on when to us a CF. 
One of the first and most common mistake is to think about HBase in terms of a 
Relational Database. Its not. Thinking of CFs as analogous to tables in the 
relational model will kill your performance. 


Please understand that Otis’ question raises both issues (physical design and 
logical design). 

The answer to Otis’ question, it depends… 
You have a couple of factors and you need to approach this on a case by case 
basis. 

Please refrain from blogging about it until you understand the overall issue 
better. 

But hey! What do I know?  ;-) 

-Mike



On Jan 7, 2015, at 10:42 PM, Otis Gospodnetic <[email protected]> 
wrote:

> Thanks Ted!
> 
> So with HBASE-10201 in place, would N sparsely populated CFs with the same
> key structure ever be a better choice than a single densely populated CF
> with the same key structure?
> 
> Thanks,
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
> 
> 
> On Wed, Jan 7, 2015 at 12:31 PM, Ted Yu <[email protected]> wrote:
> 
>> Please see HBASE-10201 which would come in 1.1.0 release.
>> 
>> Cheers
>> 
>> On Wed, Jan 7, 2015 at 9:10 AM, Otis Gospodnetic <
>> [email protected]
>>> wrote:
>> 
>>> Hi,
>>> 
>>> I recently came across this good thread about 1 vs. N ColumnFamilies, the
>>> max recommended number of CFs, dense vs. sparse structure, etc. --
>>> http://search-hadoop.com/m/TozMw1jqh262
>>> 
>>> This thread is from 2013. Even though people say HBase should handle more
>>> than 3 CFs, the docs still recommend to stick to 2-3 CFs.  Is that still
>>> the case?
>>> 
>>> See http://hbase.apache.org/book.html#number.of.cfs
>>> 
>>> Also, the thread talks about lumpy CFs and the fact that all CFs would
>> have
>>> to be flushed whenever any one of them triggers compaction..... but I
>>> remember something being changed in this space a while back.  No?
>>> 
>>> Thanks,
>>> Otis
>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>> 
>> 

Reply via email to