Hello, Stack,
Thank you for giving me advice.
But your boss seems rather to be criticizing the fact that our system
is made of components. In software engineering, this is usually
considered a strength. As to 'roles', one of the bigtable author's
argues that a cluster of master and slaves makes for simpler systems
[1].
I definitely agree with you. However, my boss considers the simplicity from
the users' viewpoint. More components make the system more complex for
users.
If you have big data, you are running hadoop already? If so, you
already have hdfs in place. If you have big systems and are trying to
do HA, you'll have a zookeeper ensemble in place already or at least,
if you haven't, you are likely considering it? In this case, HBase
complements your existing infrastructure?
Our potential customers will create new systems for new big data. Therefore
they are not using Hadoop yet.
Out of the box, HBase can manage zookeeper for you. There is an issue
to make it so when 'start' hbase, by default it starts hdfs too.
Would that help?
This might help, if the users do not need to be aware of
starting/stopping/initializing HDFS and its daemons as much as possible.
I've tripped over blog posts where zookeeper is grafted to a cassandra
ensemble to add facility that is inherent to hbase; e.g. locks and
counters (though in the first case, zk is not suitable for cluster
locks and yes, I've also read of the patch to do counters using vector
clocks).
Yes, I know that the current Cassandra requires the users to use ZooKeeper
to implement counters. I'm afraid that if counters with vector clocks is
implemented, one of HBase's merits -- not needing another software just for
counters -- will be lost.
Why can't you do the same in hbase? Hash sensor_id + ts and have it
distributed across all of the cluster and have no 'hotspot'?
Yes, hashing the sensor_id may distributes inserts well right after creating
a new table, by using the following HBaseAdmin method (this method is not
documented). I'll consider how to utilize this method.
public void createTable(HTableDescriptor desc, byte [] startKey,
byte [] endKey, int numRegions)
At stumbleupon we have a sensor database built on hbase. Purportedly
its to be made open. Watch this list for announcements.
Sounds interesting. I'll look forward to that announcement.
Regards,
Maumau