I am not sure, but I think the problem might be "order preserving
partitioners" you used. When using "order preserving partitioners" data
might be skewed meaning most data only stay in a few servers, so that might
create a few heavy load servers.

On Mon, Aug 29, 2011 at 7:24 AM, Ryan Lowe <ryanjl...@gmail.com> wrote:

> Edward,
>
> This information (and the presentation) was very helpful... but I still
> have a few more questions.
>
> During a test run, I brought up 16 servers with RF of 2 and Read Repair
> Chance of 1.0.  However, like I mentioned, the load was only on a few
> servers.  I attempted to increase the Key and Row caching to completely
> cover our entire dataset, but still CPU load on the few servers was still
> extremely high.  Does the cache exist on every server in the cluster?  Would
> turning off Read Repair (or turning it dramatically down) help reduce the
> load on the servers with the heavy load?
>
>
> Thanks!
> Ryan
>
>
> On Sun, Aug 28, 2011 at 3:49 PM, Edward Capriolo <edlinuxg...@gmail.com>wrote:
>
>> So my question is this... if I bring in 20+ nodes, should I increase the
>> replication factor as well?
>>
>> Each write is done to all natural endpoints of the data. If you set
>> replication factor equal to number of nodes write operations do not scale.
>> This is because each write has to happen on every node. The same thing is
>> true with read operations, even if you ready at CL.ONE read repair will
>> perform a read on all replicates.
>>
>> * However there is one caveat to this advice, I covered it in this
>> presentation I did.
>>
>> http://www.slideshare.net/edwardcapriolo/cassandra-as-memcache
>>
>> The read_repair_chance controls how often a read repair is triggered. You
>> can increase Replication Factor and lower read_repair_chance. This gives you
>> many severs capable of serving the same read without burdening by doing
>> repair reads across the other 19 nodes.
>>
>> However this is NOT the standard method to scale out. The standard, and
>> probably better way in all but a few instances, is to leave the replication
>> factor alone and add more nodes.
>>
>> Normally, people set Replication Factor at 3. This gives 3 nodes to serve
>> reads, as long as their dataset is small, which is true in your case, these
>> reads are heavily cached. You would need a very high number of reads/writes
>> to bottleneck any node.
>>
>> Raising and lowering replication factor is not the way to go, changing the
>> replication factor involves more steps then just changing the variable as
>> does growing and shrinking the cluster.
>>
>> What to do about idling servers is another question. We have thought about
>> having our idling web servers join our hadoop cluster at night and then
>> leave again in the morning :) Maybe you can have some fun with your
>> cassandra gear in its idle time.
>>
>>
>>
>> On Sun, Aug 28, 2011 at 2:47 PM, Ryan Lowe <ryanjl...@gmail.com> wrote:
>>
>>> We are working on a system that has super heavy traffic during specific
>>> times... think of sporting events.  Other times we will get almost 0
>>> traffic.  In order to handle the traffic during the events, we are planning
>>> on scaling out cassandra into a very large cluster.   The size of our data
>>> is still quite small.  A single event's data might be 100MB in size max, but
>>> we will be inserting that data very rapidly and needing to read it at the
>>> same time.
>>>
>>> Since we have very slow times, we use a replication factor of 2 and a
>>> cluster size of 2 to handle the traffic... it handles it perfectly.
>>>
>>> Since dataset size is not really an issue, what is the best way to scale
>>> out for us?  We are using order preserving partitioners to do range
>>> scanning, so last time I tried to scale out our cluster we ended up with
>>> very uneven load.  Then the few nodes that contained that data were very
>>> swamped, while the rest were barely touched.
>>>
>>> Other note is that since we have very little data, and lots of memory, we
>>> turned on key and row cache almost as high as we could go.
>>>
>>> So my question is this... if I bring in 20+ nodes, should I increase the
>>> replication factor as well?  It would seem to make sense that more
>>> replication factor would help distribute load?  Or does it just mean that
>>> writes take even longer?  What are some other suggestions on how to do scale
>>> up (and then back down) for a system that gets very high traffic in known
>>> small time windows.
>>>
>>>
>>>
>>> Let me know if you need more info.
>>>
>>> Thanks!
>>> Ryan
>>>
>>
>>
>

Reply via email to