Hi folks, We have a hbase table with 4 column families which stores log data.The columns and the content stored on each of these column families are the same. The reason for having multiple families is that we needed 4 retention buckets for messages and were using the TTL feature of hbase to achieve this.Each of our hbase row would have a predefined set of meta fields and a large blob message.
I was considering re structuring the table with 2 column families.One column family for metadata and other for the blob message which is the meatier chunk.The reason for this approach being most of the analytics queries would be directed at meta data which is in cf1 and few in cf2 which has the blob message.There will be few use cases where you would need to query the data in both cf1 and cf2 but that is not the dominant use case.We would then devise some method to purge the data manually(using retention bucket + timestamp) in row key. How does this look so far?Is there a better way?. Thanks, Nishanth