On Mon, Oct 25, 2010 at 10:19 PM, Takayuki Tsunakawa
wrote:
> Hello, Mike,
>
> Thank you for your advice. I'll close this thread with this mail (I've been
> afraid I was interrupting the community developers with cloudy questions.)
> I'm happy to know that any clearly known limitation does not exi
Hello, Mike,
Thank you for your advice. I'll close this thread with this mail (I've been
afraid I was interrupting the community developers with cloudy questions.)
I'm happy to know that any clearly known limitation does not exist to limit
the cluster to a couple hundreds of nodes. If our project
Hey Takayuki,
I don't think you're going to find anyone willing to promise that Cassandra
will fit your petabyte scale data analysis problem. That's a lot of data,
and there's not a ton of operational experience at that scale within the
community. And the people who do work on that sort of problem
Hello, Edward,
Thank you for giving me insight about large disk nodes.
From: "Edward Capriolo"
> Index sampling on start up. If you have very small rows your indexes
> become large. These have to be sampled on start up and sampling our
> indexes for 300Gb of data can take 5 minutes. This is goin
Hello, Jonathan,
From: "Jonathan Ellis"
> There is no reason Cassandra cannot scale to 1000s or more nodes
with
> the current architecture.
Oh, really, I got an impression that the gossip exchanges limit the
number of nodes in a cluster when I read the Dynamos's paper and
"Cassandra - A Decentra
On Mon, Oct 25, 2010 at 12:37 PM, Jonathan Ellis wrote:
> On Sun, Oct 24, 2010 at 9:09 PM, Takayuki Tsunakawa
> wrote:
>> From: "Jonathan Ellis"
>>> (b) Cassandra generates input splits from the sampling of keys each
>>> node has in memory. So if a node does end up with no data for a
>>> keyspa
On Sun, Oct 24, 2010 at 9:09 PM, Takayuki Tsunakawa
wrote:
> From: "Jonathan Ellis"
>> (b) Cassandra generates input splits from the sampling of keys each
>> node has in memory. So if a node does end up with no data for a
>> keyspace (because of bad OOP balancing for instance) it will have no
>>
Hello, Jonathan,
Thank you for your kind reply. Could you give me some more
opinions/comments?
From: "Jonathan Ellis"
> (b) Cassandra generates input splits from the sampling of keys each
> node has in memory. So if a node does end up with no data for a
> keyspace (because of bad OOP balancing
On Fri, Oct 22, 2010 at 3:30 AM, Takayuki Tsunakawa
wrote:
> Yes, I meant one map task would be sent to each task tracker, resulting in
> 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot
> identify the nodes that actually hold some data, so the job tracker will
> send the
.
>
> Regards,
> Takayuki Tsunakawa
>
> - Original Message -
> From: aaron morton
> To: user@cassandra.apache.org
> Sent: Friday, October 22, 2010 4:05 PM
> Subject: Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes
> of data
>
d.
Regards,
Takayuki Tsunakawa
- Original Message -
From: aaron morton
To: user@cassandra.apache.org
Sent: Friday, October 22, 2010 4:05 PM
Subject: Re: [Q] MapReduce behavior and Cassandra's scalability for
petabytes of data
For plain old log analysis the Cloudera Hadoop distribution
For plain old log analysis the Cloudera Hadoop distribution may be a better
match. Flume is designed to help with streaming data into HDFS, the LZo
compression extensions would help with the data size and PIG would make the
analysis easier (IMHO).
http://www.cloudera.com/hadoop/
http://www.clou
Hello,
I'm evaluating whether Cassandra fits a certain customer well. The
customer will collect petabytes of logs and analyze them. Could you
tell me if my understanding is correct and/or give me your opinions?
I'm sorry that the analysis requirement is not clear yet.
1. MapReduce behavior
I read
13 matches
Mail list logo