On Sun, Sep 5, 2010 at 12:07 AM, Jonathan Gray <[email protected]> wrote:
>> > But your boss seems rather to be criticizing the fact that our system
>> > is made of components. In software engineering, this is usually
>> > considered a strength. As to 'roles', one of the bigtable author's
>> > argues that a cluster of master and slaves makes for simpler systems
>> > [1].
>>
>> I definitely agree with you. However, my boss considers the simplicity
>> from
>> the users' viewpoint. More components make the system more complex for
>> users.
>
> Who are the users? Are they deploying the software and responsible for
> maintaining backend databases?
>
> Or are there backend developers, frontend developers, operations, etc?
>
> In my experience, the "users" are generally writing the applications and not
> maintaining databases. And in the case of HBase, and it's been said already
> on this thread, that users generally have an easier time with the data and
> consistency models.
>
> Above all, I think the point made by Stack earlier is extremely relevant.
> Are you using HDFS already? Do you have needs for ZK? When you do, HBase in
> an additional piece to this stack and generally fits in nicely. From an
> admin/ops POV, the learning curve is minimal once familiar with these other
> systems. And even if you aren't already using Hadoop, might you in the
> future?
>
> If you don't and never will, then the single-component nature of Cassandra
> may be more appealing.
>
> Also, vector clocks are nice but are still a distributed algorithm. We've
> been doing lots of work benchmarking and optimizing increments recently,
> pushing extremely high throughput on relatively small clusters. I would not
> expect being able to achieve this level of performance or concurrency with
> any kind of per-counter distribution. Certainly not while providing the
> strict atomicity and consistency guarantees that HBase provides.
>
> I've never implemented counters w/ vector clocks so I could be wrong. But I
> do know that I could explain how we implement counters in a performant,
> consistent, atomic way and you wouldn't have to reach for Wikipedia once ;)
>
> JG
>
(I agree with just about everything on this thread except point 1)
If this was a black and white issue, you would be either a user or a
developer. At this stage both cassandra and hbase are at the stage
where very few people are pure users. I feel if you are checking out
beta versions, applying patches to a source tree, or watching issues,
upgrading 3 times a year, your are more of a developer then a user.
Modular software is great. But if two programs do roughly the same
function, but one is 7 pieces and the other is 1, it is hard to make
the case that modular is better.
cd /home/edward/hadoop/hadoop-0.20.2/src/
[edw...@ec src]$ find . | wc -l
2683
[edw...@ec apache-cassandra-0.6.3-src]$ find . | wc -l
609
I have been working with hadoop for a while now. There is a ticket I
wanted to work on reading the hadoop configuration from ldap. I
figured this would be a relatively quickly thing. After all, the
hadoop conf is just a simple XML file with name value pairs.....
[edw...@ec core]$ cd org/apache/hadoop/conf/
[edw...@ec conf]$ ls
Configurable.java Configuration.java Configured.java package.html
[edw...@ec conf]$ wc -l Configuration.java
1301 Configuration.java
Holy crud! Now a good portion of this file is comments, but still.
1301 lines to read and write xml files! The hadoop conf has tons of
stuff to do variable interpolation, xinclude support, the ability to
read configurations as streams, handing for deprecated config file
names.
There is a method in Configuration with this signature:
public <U> Class<? extends U> getClass(String name,
Class<? extends U> defaultValue,
Class<U> xface) {
My point, is that all the modularity and flexibility does not
translate into much for end users, and for developers that just want
to jump in, I would rather jump into 600 files then 2600 (by the way
that is NOT including hbase)