Re: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam
Hello Alexey,

The node count is 20 per site and there will be two sites. RF=3. But since
the software isn't complete and the database code is going through a
rewrite we aren't sure about space requirements. The node count is only a
guess, bases on the number of dev nodes in use. We will have better
information when the rewrite is done and testing resumes.

The data will be time series data. It was binary blobs originally but we
have found that the new datastax c# drivers have improved alot in terms of
read performance.

I'm curious. What is your definition of commodity. My IT people seem to
think that the servers must be super robust. Personally I'm not sure if
that should be the case.

The node

Thanks

Jabbar Azam

On 8 November 2014 02:56, Plotnik, Alexey  wrote:

> Cassandra is a cluster itself, it's not necessary to have redundant each
> node. Cassandra has replication for that. And also Cassandra is designed to
> run in multiple data center - am think that redundant policy is applicable
> for you. Only thing from your saying you can deploy is raid10, other don't
> make any sense. As you are in stage of designing you cluster, please
> provide some numbers: how many data will be stored on each node, how many
> nodes would you have? What type of data will be stored in cluster: binary
> object o something time series?
>
> Cassandra is designed to run on commodity hardware.
>
> Отправлено с iPad
>
> > 8 нояб. 2014 г., в 6:26, Jabbar Azam  написал(а):
> >
> > Hello all,
> >
> > My work will be deploying a cassandra cluster next year. Due to internal
> wrangling we can't seem to agree on the hardware. The software hasn't been
> finished, but management are asking for a ballpark figure for the hardware
> costs.
> >
> > The problem is the IT team are saying the nodes need to have multiple
> points of redundancy
> >
> > e.g. dual power supplies, dual nics, SSD's configured in raid 10.
> >
> >
> > The software team is saying that due to cassandras resilient nature, due
> to the way data is distributed and scalability that lots of cheap boes
> should be used. So they have been taling about self build consumer grade
> boxes with single nics, PSU's single SSDs etc.
> >
> > Obviously the self build boxes will cost a fraction of the price, but
> each box is not as resilient as the first option.
> >
> > We don;t use any cloud technologies, so that's out of the question.
> >
> > My question is what do people use in the real world in terms of node
> resiliancy when running a cassandra cluster?
> >
> > Write now the team is only thinking of hosting cassandra on the nodes.
> I'll see if I can twist their arms and see the light with Apache Spark.
> >
> > Obviously there are other tiers of servers, but they won't be running
> cassandra.
> >
> >
> >
> >
> >
> > Thanks
> >
> > Jabbar Azam
>


Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Plotnik, Alexey
Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The problem 
is money. If your IT people have a $$$ they can deploy Cassandra on super 
robust hardware with triple power supply of course. But why then you need 
Cassandra? Only for scalability?

The idea of high available clusters is to get robustness from availability (not 
from hardware reliability). More availability (more nodes) you have - more 
money you need to buy hardware. Cassandra is the most high available system on 
the planet - it scaled horizontally to any number of nodes. You have time 
series data, you can set replication factor > 3 if needed.

There is a concept of network topology in Cassandra - you can specify on which 
*failure domain* (racks or independent power lines) your nodes installed on, 
and then replication will be computed correspondingly to store replicas of a 
specified data on a different failure domains. The same is for DC - there is a 
concept of data center in Cassandra topology, it knows about your data centers.

You should think not about hardware but about your data model - is Cassandra 
applicable for you domain? Thinks about queries to your data. Cassandra is 
actually a key value storage (documentation says it's a column based storage, 
but it's just an CQL-abstraction over key and binary value, nothing special 
except counters) so be very careful in designing your data model.

Anyway, let me answer your original question:
> what do people use in the real world in terms of node resiliancy when running 
> a cassandra cluster?

Nothing because Cassandra is high available system. They use SSDs if they need 
speed. They do not use Raid10 on the node, they don't use dual power as well, 
because it's not cheap in cluster of many nodes and have no sense because 
reliability is ensured by replication in large clusters. Not sure about dual 
NICs, network reliability is ensured by distributing your cluster across 
multiple data centers.

We're using single SSD and single HDD on each node (we symlink some CF folders 
to other disk). SSD for CFs where we need low latency, HDD for binary data. If 
one of them fails, replication save us and we have time to deploy new node and 
load data from replicas with Cassandra repair feature back to original node. 
And we have no problem with it, node fail sometimes, but it doesn't affect 
customers. That is.


-- Original Message --
From: "Jabbar Azam" mailto:aja...@gmail.com>>
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Sent: 08.11.2014 19:43:18
Subject: Re: Redundancy inside a cassandra node

Hello Alexey,

The node count is 20 per site and there will be two sites. RF=3. But since the 
software isn't complete and the database code is going through a rewrite we 
aren't sure about space requirements. The node count is only a guess, bases on 
the number of dev nodes in use. We will have better information when the 
rewrite is done and testing resumes.

The data will be time series data. It was binary blobs originally but we have 
found that the new datastax c# drivers have improved alot in terms of read 
performance.

I'm curious. What is your definition of commodity. My IT people seem to think 
that the servers must be super robust. Personally I'm not sure if that should 
be the case.

The node

Thanks

Jabbar Azam

On 8 November 2014 02:56, Plotnik, Alexey 
mailto:aplot...@rhonda.ru>> wrote:
Cassandra is a cluster itself, it's not necessary to have redundant each node. 
Cassandra has replication for that. And also Cassandra is designed to run in 
multiple data center - am think that redundant policy is applicable for you. 
Only thing from your saying you can deploy is raid10, other don't make any 
sense. As you are in stage of designing you cluster, please provide some 
numbers: how many data will be stored on each node, how many nodes would you 
have? What type of data will be stored in cluster: binary object o something 
time series?

Cassandra is designed to run on commodity hardware.

Отправлено с iPad

> 8 нояб. 2014 г., в 6:26, Jabbar Azam 
> mailto:aja...@gmail.com>> написал(а):
>
> Hello all,
>
> My work will be deploying a cassandra cluster next year. Due to internal 
> wrangling we can't seem to agree on the hardware. The software hasn't been 
> finished, but management are asking for a ballpark figure for the hardware 
> costs.
>
> The problem is the IT team are saying the nodes need to have multiple points 
> of redundancy
>
> e.g. dual power supplies, dual nics, SSD's configured in raid 10.
>
>
> The software team is saying that due to cassandras resilient nature, due to 
> the way data is distributed and scalability that lots of cheap boes should be 
> used. So they have been taling about self build consumer grade boxes with 
> single nics, PSU's single SSDs etc.
>
> Obviously the self build boxes will cost a fraction of the price, but each 
> box is not as resilient as the first option.
>
> We don;t use any cloud technologi

Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Eric Stevens
> They do not use Raid10 on the node, they don't use dual power as well,
because it's not cheap in cluster of many nodes

I think the point here is that money spent on traditional failure avoidance
models is better spent in a Cassandra cluster by instead having more nodes
of less expensive hardware.  Rather than redundant disks network ports and
power supplies, spend that money on another set of nodes in a different
topological (and probably physical) rack.  The parallel to having redundant
disk arrays is to increase replication factor (RF=3 is already one replica
better than Raid 10, and with fewer SPOFs).

The only reason I can think you'd want to double down on hardware failover
like the traditional model is if you are constrained in your data center
(eg, space or cooling) and you'd rather run machines which are individually
physically more resilient in exchange for running a lower RF.

On Sat Nov 08 2014 at 5:32:22 AM Plotnik, Alexey  wrote:

>  Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The
> problem is money. If your IT people have a $$$ they can deploy Cassandra on
> super robust hardware with triple power supply of course. But why then you
> need Cassandra? Only for scalability?
>
> The idea of high available clusters is to get robustness from availability
> (not from hardware reliability). More availability (more nodes) you have -
> more money you need to buy hardware. Cassandra is the most high available
> system on the planet - it scaled horizontally to any number of nodes. You
> have time series data, you can set replication factor > 3 if needed.
>
> There is a concept of network topology in Cassandra - you can specify on
> which *failure domain* (racks or independent power lines) your nodes
> installed on, and then replication will be computed correspondingly to
> store replicas of a specified data on a different failure domains. The same
> is for DC - there is a concept of data center in Cassandra topology, it
> knows about your data centers.
>
> You should think not about hardware but about your data model - is
> Cassandra applicable for you domain? Thinks about queries to your
> data. Cassandra is actually a key value storage (documentation says it's a
> column based storage, but it's just an CQL-abstraction over key and binary
> value, nothing special except counters) so be very careful in designing
> your data model.
>
> Anyway, let me answer your original question:
> > what do people use in the real world in terms of node resiliancy when
> running a cassandra cluster?
>
> Nothing because Cassandra is high available system. They use SSDs if they
> need speed. They do not use Raid10 on the node, they don't use dual power
> as well, because it's not cheap in cluster of many nodes and have no sense
> because reliability is ensured by replication in large clusters. Not sure
> about dual NICs, network reliability is ensured by distributing your
> cluster across multiple data centers.
>
> We're using single SSD and single HDD on each node (we symlink some CF
> folders to other disk). SSD for CFs where we need low latency, HDD for
> binary data. If one of them fails, replication save us and we have
> time to deploy new node and load data from replicas with Cassandra repair
> feature back to original node. And we have no problem with it, node fail
> sometimes, but it doesn't affect customers. That is.
>
>
> -- Original Message --
> From: "Jabbar Azam" 
> To: "user@cassandra.apache.org" 
> Sent: 08.11.2014 19:43:18
> Subject: Re: Redundancy inside a cassandra node
>
>
> Hello Alexey,
>
> The node count is 20 per site and there will be two sites. RF=3. But since
> the software isn't complete and the database code is going through a
> rewrite we aren't sure about space requirements. The node count is only a
> guess, bases on the number of dev nodes in use. We will have better
> information when the rewrite is done and testing resumes.
>
> The data will be time series data. It was binary blobs originally but we
> have found that the new datastax c# drivers have improved alot in terms of
> read performance.
>
> I'm curious. What is your definition of commodity. My IT people seem to
> think that the servers must be super robust. Personally I'm not sure if
> that should be the case.
>
> The node
>
>  Thanks
>
> Jabbar Azam
>
> On 8 November 2014 02:56, Plotnik, Alexey  wrote:
>
>> Cassandra is a cluster itself, it's not necessary to have redundant each
>> node. Cassandra has replication for that. And also Cassandra is designed to
>> run in multiple data center - am think that redundant policy is applicable
>> for you. Only thing from your saying you can deploy is raid10, other don't
>> make any sense. As you are in stage of designing you cluster, please
>> provide some numbers: how many data will be stored on each node, how many
>> nodes would you have? What type of data will be stored in cluster: binary
>> object o something time series?
>>
>> Cassandra is designed to r

Re: Redundancy inside a cassandra node

2014-11-08 Thread Jack Krupansky
About  the only thing you can say is two specific points:

1. A more resilient node is great, but it in no ways reduces or eliminates the 
need total nodes. Sometimes nodes become inaccessible due to network outages or 
system maintenance (e.g., software upgrades), or the vagaries of Java JVM and 
OOM issues.
2. Replication redundancy is also for supporting higher load, not just 
availability on node outage.

-- Jack Krupansky

From: Jabbar Azam 
Sent: Friday, November 7, 2014 3:24 PM
To: user@cassandra.apache.org 
Subject: Redundancy inside a cassandra node

Hello all,

My work will be deploying a cassandra cluster next year. Due to internal 
wrangling we can't seem to agree on the hardware. The software hasn't been 
finished, but management are asking for a ballpark figure for the hardware 
costs.

The problem is the IT team are saying the nodes need to have multiple points of 
redundancy 

e.g. dual power supplies, dual nics, SSD's configured in raid 10.


The software team is saying that due to cassandras resilient nature, due to the 
way data is distributed and scalability that lots of cheap boes should be used. 
So they have been taling about self build consumer grade boxes with single 
nics, PSU's single SSDs etc.

Obviously the self build boxes will cost a fraction of the price, but each box 
is not as resilient as the first option.

We don;t use any cloud technologies, so that's out of the question.

My question is what do people use in the real world in terms of node resiliancy 
when running a cassandra cluster?

Write now the team is only thinking of hosting cassandra on the nodes. I'll see 
if I can twist their arms and see the light with Apache Spark.

Obviously there are other tiers of servers, but they won't be running cassandra.





Thanks

Jabbar Azam


Re: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam
Hello Jack,

Some really good points. I never thought of issues with the JVM or OOM
issues.

Thanks

Jabbar Azam

On 8 November 2014 16:52, Jack Krupansky  wrote:

>   About  the only thing you can say is two specific points:
>
> 1. A more resilient node is great, but it in no ways reduces or eliminates
> the need total nodes. Sometimes nodes become inaccessible due to network
> outages or system maintenance (e.g., software upgrades), or the vagaries of
> Java JVM and OOM issues.
> 2. Replication redundancy is also for supporting higher load, not just
> availability on node outage.
>
> -- Jack Krupansky
>
>  *From:* Jabbar Azam 
> *Sent:* Friday, November 7, 2014 3:24 PM
> *To:* user@cassandra.apache.org
> *Subject:* Redundancy inside a cassandra node
>
>  Hello all,
>
> My work will be deploying a cassandra cluster next year. Due to internal
> wrangling we can't seem to agree on the hardware. The software hasn't been
> finished, but management are asking for a ballpark figure for the hardware
> costs.
>
> The problem is the IT team are saying the nodes need to have multiple
> points of redundancy
>
> e.g. dual power supplies, dual nics, SSD's configured in raid 10.
>
>
> The software team is saying that due to cassandras resilient nature, due
> to the way data is distributed and scalability that lots of cheap boes
> should be used. So they have been taling about self build consumer grade
> boxes with single nics, PSU's single SSDs etc.
>
> Obviously the self build boxes will cost a fraction of the price, but each
> box is not as resilient as the first option.
>
> We don;t use any cloud technologies, so that's out of the question.
>
> My question is what do people use in the real world in terms of node
> resiliancy when running a cassandra cluster?
>
> Write now the team is only thinking of hosting cassandra on the nodes.
> I'll see if I can twist their arms and see the light with Apache Spark.
>
> Obviously there are other tiers of servers, but they won't be running
> cassandra.
>
>
>
>
>
>  Thanks
>
> Jabbar Azam
>


Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam
With regards to money I think it's always a good idea to find a cost
effective solution. The problem is different people have different
interpretations of what cost effectiveness means. I'm referring to my
organisation. ;). I'm sure it happens in other organisations. Biases,
politics, experience, how stuff is currently done dictates how new
solutions are created.

I think the idea of not using redundancy, goes against current thinking
unfortunately. Especially not using raid 10. I think the problem may be due
to lack of know how of dev ops and tools like cobbler and ansible, chef and
puppet.. I'm working on this, but it's hard work doing this in my spare
time.

Do you build your own nodes, or use a well known brand like Dell or HP.
Dell recommended R720 nodes for the cassandra nodes or the R320 nodes.

We have built our own dev nodes from "consumer grade" kit but becuase they
have no redundancy they are not taken seriously for production nodes.
They're not rack mount, which is a big no with respect to the IT department.




Thanks

Jabbar Azam

On 8 November 2014 12:31, Plotnik, Alexey  wrote:

>  Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The
> problem is money. If your IT people have a $$$ they can deploy Cassandra on
> super robust hardware with triple power supply of course. But why then you
> need Cassandra? Only for scalability?
>
> The idea of high available clusters is to get robustness from availability
> (not from hardware reliability). More availability (more nodes) you have -
> more money you need to buy hardware. Cassandra is the most high available
> system on the planet - it scaled horizontally to any number of nodes. You
> have time series data, you can set replication factor > 3 if needed.
>
> There is a concept of network topology in Cassandra - you can specify on
> which *failure domain* (racks or independent power lines) your nodes
> installed on, and then replication will be computed correspondingly to
> store replicas of a specified data on a different failure domains. The same
> is for DC - there is a concept of data center in Cassandra topology, it
> knows about your data centers.
>
> You should think not about hardware but about your data model - is
> Cassandra applicable for you domain? Thinks about queries to your
> data. Cassandra is actually a key value storage (documentation says it's a
> column based storage, but it's just an CQL-abstraction over key and binary
> value, nothing special except counters) so be very careful in designing
> your data model.
>
> Anyway, let me answer your original question:
> > what do people use in the real world in terms of node resiliancy when
> running a cassandra cluster?
>
> Nothing because Cassandra is high available system. They use SSDs if they
> need speed. They do not use Raid10 on the node, they don't use dual power
> as well, because it's not cheap in cluster of many nodes and have no sense
> because reliability is ensured by replication in large clusters. Not sure
> about dual NICs, network reliability is ensured by distributing your
> cluster across multiple data centers.
>
> We're using single SSD and single HDD on each node (we symlink some CF
> folders to other disk). SSD for CFs where we need low latency, HDD for
> binary data. If one of them fails, replication save us and we have
> time to deploy new node and load data from replicas with Cassandra repair
> feature back to original node. And we have no problem with it, node fail
> sometimes, but it doesn't affect customers. That is.
>
>
> -- Original Message --
> From: "Jabbar Azam" 
> To: "user@cassandra.apache.org" 
> Sent: 08.11.2014 19:43:18
> Subject: Re: Redundancy inside a cassandra node
>
>
> Hello Alexey,
>
> The node count is 20 per site and there will be two sites. RF=3. But since
> the software isn't complete and the database code is going through a
> rewrite we aren't sure about space requirements. The node count is only a
> guess, bases on the number of dev nodes in use. We will have better
> information when the rewrite is done and testing resumes.
>
> The data will be time series data. It was binary blobs originally but we
> have found that the new datastax c# drivers have improved alot in terms of
> read performance.
>
> I'm curious. What is your definition of commodity. My IT people seem to
> think that the servers must be super robust. Personally I'm not sure if
> that should be the case.
>
> The node
>
>  Thanks
>
> Jabbar Azam
>
> On 8 November 2014 02:56, Plotnik, Alexey  wrote:
>
>> Cassandra is a cluster itself, it's not necessary to have redundant each
>> node. Cassandra has replication for that. And also Cassandra is designed to
>> run in multiple data center - am think that redundant policy is applicable
>> for you. Only thing from your saying you can deploy is raid10, other don't
>> make any sense. As you are in stage of designing you cluster, please
>> provide some numbers: how many data will be stored on each node, ho

Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Juho Mäkinen
I have used Supermicro servers in my previous work and they give excellent
quality for their money. They have been considered a bit "cheap" quality
wise in the past, but the current models are pretty good. They offer all
standard stuff like remote control cards (IPMI), dual power supplies (if
you want) etc.

On Sat, Nov 8, 2014 at 7:42 PM, Jabbar Azam  wrote:

> With regards to money I think it's always a good idea to find a cost
> effective solution. The problem is different people have different
> interpretations of what cost effectiveness means. I'm referring to my
> organisation. ;). I'm sure it happens in other organisations. Biases,
> politics, experience, how stuff is currently done dictates how new
> solutions are created.
>
> I think the idea of not using redundancy, goes against current thinking
> unfortunately. Especially not using raid 10. I think the problem may be due
> to lack of know how of dev ops and tools like cobbler and ansible, chef and
> puppet.. I'm working on this, but it's hard work doing this in my spare
> time.
>
> Do you build your own nodes, or use a well known brand like Dell or HP.
> Dell recommended R720 nodes for the cassandra nodes or the R320 nodes.
>
> We have built our own dev nodes from "consumer grade" kit but becuase they
> have no redundancy they are not taken seriously for production nodes.
> They're not rack mount, which is a big no with respect to the IT department.
>
>
>
>
> Thanks
>
> Jabbar Azam
>
> On 8 November 2014 12:31, Plotnik, Alexey  wrote:
>
>>  Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The
>> problem is money. If your IT people have a $$$ they can deploy Cassandra on
>> super robust hardware with triple power supply of course. But why then you
>> need Cassandra? Only for scalability?
>>
>> The idea of high available clusters is to get robustness from
>> availability (not from hardware reliability). More availability (more
>> nodes) you have - more money you need to buy hardware. Cassandra is the
>> most high available system on the planet - it scaled horizontally to any
>> number of nodes. You have time series data, you can set replication
>> factor > 3 if needed.
>>
>> There is a concept of network topology in Cassandra - you can specify on
>> which *failure domain* (racks or independent power lines) your nodes
>> installed on, and then replication will be computed correspondingly to
>> store replicas of a specified data on a different failure domains. The same
>> is for DC - there is a concept of data center in Cassandra topology, it
>> knows about your data centers.
>>
>> You should think not about hardware but about your data model - is
>> Cassandra applicable for you domain? Thinks about queries to your
>> data. Cassandra is actually a key value storage (documentation says it's a
>> column based storage, but it's just an CQL-abstraction over key and binary
>> value, nothing special except counters) so be very careful in designing
>> your data model.
>>
>> Anyway, let me answer your original question:
>> > what do people use in the real world in terms of node resiliancy when
>> running a cassandra cluster?
>>
>> Nothing because Cassandra is high available system. They use SSDs if
>> they need speed. They do not use Raid10 on the node, they don't use dual
>> power as well, because it's not cheap in cluster of many nodes and have no
>> sense because reliability is ensured by replication in large clusters. Not
>> sure about dual NICs, network reliability is ensured by distributing your
>> cluster across multiple data centers.
>>
>> We're using single SSD and single HDD on each node (we symlink some CF
>> folders to other disk). SSD for CFs where we need low latency, HDD for
>> binary data. If one of them fails, replication save us and we have
>> time to deploy new node and load data from replicas with Cassandra repair
>> feature back to original node. And we have no problem with it, node fail
>> sometimes, but it doesn't affect customers. That is.
>>
>>
>> -- Original Message --
>> From: "Jabbar Azam" 
>> To: "user@cassandra.apache.org" 
>> Sent: 08.11.2014 19:43:18
>> Subject: Re: Redundancy inside a cassandra node
>>
>>
>> Hello Alexey,
>>
>> The node count is 20 per site and there will be two sites. RF=3. But
>> since the software isn't complete and the database code is going through a
>> rewrite we aren't sure about space requirements. The node count is only a
>> guess, bases on the number of dev nodes in use. We will have better
>> information when the rewrite is done and testing resumes.
>>
>> The data will be time series data. It was binary blobs originally but we
>> have found that the new datastax c# drivers have improved alot in terms of
>> read performance.
>>
>> I'm curious. What is your definition of commodity. My IT people seem to
>> think that the servers must be super robust. Personally I'm not sure if
>> that should be the case.
>>
>> The node
>>
>>  Thanks
>>
>> Jabbar Azam
>>
>> On 8 November 201

Re: Re[2]: Redundancy inside a cassandra node

2014-11-08 Thread Jabbar Azam
Hello Eric,

You make a good point about resiliency being applied at a higher level in
the stack.

Thanks

Jabbar Azam

On 8 November 2014 14:24, Eric Stevens  wrote:

> > They do not use Raid10 on the node, they don't use dual power as well,
> because it's not cheap in cluster of many nodes
>
> I think the point here is that money spent on traditional failure
> avoidance models is better spent in a Cassandra cluster by instead having
> more nodes of less expensive hardware.  Rather than redundant disks network
> ports and power supplies, spend that money on another set of nodes in a
> different topological (and probably physical) rack.  The parallel to
> having redundant disk arrays is to increase replication factor (RF=3 is
> already one replica better than Raid 10, and with fewer SPOFs).
>
> The only reason I can think you'd want to double down on hardware failover
> like the traditional model is if you are constrained in your data center
> (eg, space or cooling) and you'd rather run machines which are individually
> physically more resilient in exchange for running a lower RF.
>
> On Sat Nov 08 2014 at 5:32:22 AM Plotnik, Alexey 
> wrote:
>
>>  Let me speak from my heart. I maintenance 200+TB Cassandra cluster. The
>> problem is money. If your IT people have a $$$ they can deploy Cassandra on
>> super robust hardware with triple power supply of course. But why then you
>> need Cassandra? Only for scalability?
>>
>> The idea of high available clusters is to get robustness from
>> availability (not from hardware reliability). More availability (more
>> nodes) you have - more money you need to buy hardware. Cassandra is the
>> most high available system on the planet - it scaled horizontally to any
>> number of nodes. You have time series data, you can set replication
>> factor > 3 if needed.
>>
>> There is a concept of network topology in Cassandra - you can specify on
>> which *failure domain* (racks or independent power lines) your nodes
>> installed on, and then replication will be computed correspondingly to
>> store replicas of a specified data on a different failure domains. The same
>> is for DC - there is a concept of data center in Cassandra topology, it
>> knows about your data centers.
>>
>> You should think not about hardware but about your data model - is
>> Cassandra applicable for you domain? Thinks about queries to your
>> data. Cassandra is actually a key value storage (documentation says it's a
>> column based storage, but it's just an CQL-abstraction over key and binary
>> value, nothing special except counters) so be very careful in designing
>> your data model.
>>
>> Anyway, let me answer your original question:
>> > what do people use in the real world in terms of node resiliancy when
>> running a cassandra cluster?
>>
>> Nothing because Cassandra is high available system. They use SSDs if
>> they need speed. They do not use Raid10 on the node, they don't use dual
>> power as well, because it's not cheap in cluster of many nodes and have no
>> sense because reliability is ensured by replication in large clusters. Not
>> sure about dual NICs, network reliability is ensured by distributing your
>> cluster across multiple data centers.
>>
>> We're using single SSD and single HDD on each node (we symlink some CF
>> folders to other disk). SSD for CFs where we need low latency, HDD for
>> binary data. If one of them fails, replication save us and we have
>> time to deploy new node and load data from replicas with Cassandra repair
>> feature back to original node. And we have no problem with it, node fail
>> sometimes, but it doesn't affect customers. That is.
>>
>>
>> -- Original Message --
>> From: "Jabbar Azam" 
>> To: "user@cassandra.apache.org" 
>> Sent: 08.11.2014 19:43:18
>> Subject: Re: Redundancy inside a cassandra node
>>
>>
>> Hello Alexey,
>>
>> The node count is 20 per site and there will be two sites. RF=3. But
>> since the software isn't complete and the database code is going through a
>> rewrite we aren't sure about space requirements. The node count is only a
>> guess, bases on the number of dev nodes in use. We will have better
>> information when the rewrite is done and testing resumes.
>>
>> The data will be time series data. It was binary blobs originally but we
>> have found that the new datastax c# drivers have improved alot in terms of
>> read performance.
>>
>> I'm curious. What is your definition of commodity. My IT people seem to
>> think that the servers must be super robust. Personally I'm not sure if
>> that should be the case.
>>
>> The node
>>
>>  Thanks
>>
>> Jabbar Azam
>>
>> On 8 November 2014 02:56, Plotnik, Alexey  wrote:
>>
>>> Cassandra is a cluster itself, it's not necessary to have redundant each
>>> node. Cassandra has replication for that. And also Cassandra is designed to
>>> run in multiple data center - am think that redundant policy is applicable
>>> for you. Only thing from your saying you can deploy is raid10, other don't
>>> make