Re: cassandra vs. mongodb quick question(good additional info)

Wojciech Meler Wed, 20 Feb 2013 14:31:57 -0800

you have 86400 seconds a day so 42T could take less than 12 hours on 10Gb
link
19 lut 2013 02:01, "Hiller, Dean" <dean.hil...@nrel.gov> napisał(a):


> I thought about this more, and even with a 10Gbit network, it would take
> 40 days to bring up a replacement node if mongodb did truly have a 42T /
> node like I had heard.  I wrote the below email to the person I heard this
> from going back to basics which really puts some perspective on it….(and a
> lot of people don't even have a 10Gbit network like we do)
>
> Nodes are hooked up by a 10G network at most right now where that is
> 10gigabit.  We are talking about 10Terabytes on disk per node recently.
>
> Google "10 gigabit in gigabytes" gives me 1.25 gigabytes/second  (yes I
> could have divided by 8 in my head but eh…course when I saw the number, I
> went duh)
>
> So trying to transfer 10 Terabytes  or 10,000 Gigabytes to a node that we
> are bringing online to replace a dead node would take approximately 5
> days???
>
> This means no one else is using the bandwidth too ;).  10,000Gigabytes * 1
> second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.555555 days.  This is more
> likely 11 days if we only use 50% of the network.
>
> So bringing a new node up to speed is more like 11 days once it is
> crashed.  I think this is the main reason the 1Terabyte exists to begin
> with, right?
>
> From an ops perspective, this could sound like a nightmare scenario of
> waiting 10 days…..maybe it is livable though.  Either way, I thought it
> would be good to share the numbers.  ALSO, that is assuming the bus with
> it's 10 disk can keep up with 10G????  Can it?  What is the limit of
> throughput on a bus / second on the computers we have as on wikipedia there
> is a huge variance?
>
> What is the rate of the disks too (multiplied by 10 of course)?  Will they
> keep up with a 10G rate for bringing a new node online?
>
> This all comes into play even more so when you want to double the size of
> your cluster of course as all nodes have to transfer half of what they have
> to all the new nodes that come online(cassandra actually has a very data
> center/rack aware topology to transfer data correctly to not use up all
> bandwidth unecessarily…I am not sure mongodb has that).  Anyways, just food
> for thought.
>
> From: aaron morton <aa...@thelastpickle.com<mailto:aa...@thelastpickle.com
> >>
> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Date: Monday, February 18, 2013 1:39 PM
> To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" <
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>>, Vegard
> Berget <p...@fantasista.no<mailto:p...@fantasista.no>>
> Subject: Re: cassandra vs. mongodb quick question
>
> My experience is repair of 300GB compressed data takes longer than 300GB
> of uncompressed, but I cannot point to an exact number. Calculating the
> differences is mostly CPU bound and works on the non compressed data.
>
> Streaming uses compression (after uncompressing the on disk data).
>
> So if you have 300GB of compressed data, take a look at how long repair
> takes and see if you are comfortable with that. You may also want to test
> replacing a node so you can get the procedure documented and understand how
> long it takes.
>
> The idea of the soft 300GB to 500GB limit cam about because of a number of
> cases where people had 1 TB on a single node and they were surprised it
> took days to repair or replace. If you know how long things may take, and
> that fits in your operations then go with it.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 18/02/2013, at 10:08 PM, Vegard Berget <p...@fantasista.no<mailto:
> p...@fantasista.no>> wrote:
>
>
>
> Just out of curiosity :
>
> When using compression, does this affect this one way or another?  Is 300G
> (compressed) SSTable size, or total size of data?
>
> .vegard,
>
> ----- Original Message -----
> From:
> user@cassandra.apache.org<mailto:user@cassandra.apache.org>
>
> To:
> <user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
> Cc:
>
> Sent:
> Mon, 18 Feb 2013 08:41:25 +1300
> Subject:
> Re: cassandra vs. mongodb quick question
>
>
> If you have spinning disk and 1G networking and no virtual nodes, I would
> still say 300G to 500G is a soft limit.
>
> If you are using virtual nodes, SSD, JBOD disk configuration or faster
> networking you may go higher.
>
> The limiting factors are the time it take to repair, the time it takes to
> replace a node, the memory considerations for 100's of millions of rows. If
> you the performance of those operations is acceptable to you, then go crazy.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com<http://www.thelastpickle.com/>
>
> On 16/02/2013, at 9:05 AM, "Hiller, Dean" <dean.hil...@nrel.gov<mailto:
> dean.hil...@nrel.gov>> wrote:
>
> So I found out mongodb varies their node size from 1T to 42T per node
> depending on the profile.  So if I was going to be writing a lot but rarely
> changing rows, could I also use cassandra with a per node size of +20T or
> is that not advisable?
>
> Thanks,
> Dean
>
>
>

Re: cassandra vs. mongodb quick question(good additional info)

Reply via email to