Having participated in the design of a few of these systems being mentioned,
I'll chime in here and point out that the combination of Flume and Hive
makes CDH3 very useful for log processing and that use case is directly in
the wheelhouse of the system, especially for large collections of log files
Hi,
Please, can someone help us with Munin??
Thanks,
Miriam
On Mon, Jul 26, 2010 at 1:58 PM, osishkin osishkin wrote:
> Hi,
>
> I'm trying to use Munin to monitor cassandra.
> I've seen other people using munin here ,so I hope someone ran into
> this problem.
> The default plugins are working,
Is your code posted somewhere such that others could try it?
On Thu, Jul 29, 2010 at 5:57 AM, Miriam Allalouf
wrote:
> Hi,
> Please, can someone help us with Munin??
> Thanks,
> Miriam
>
>
> On Mon, Jul 26, 2010 at 1:58 PM, osishkin osishkin
> wrote:
> > Hi,
> >
> > I'm trying to use Munin to m
i see a 0.6.4 tag in SVN, but not on cassandra's download page. is this
ready for use if building from SVN?
The vote is in process.
http://permalink.gmane.org/gmane.comp.db.cassandra.devel/2010
Gary.
On Thu, Jul 29, 2010 at 11:34, B. Todd Burruss wrote:
> i see a 0.6.4 tag in SVN, but not on cassandra's download page. is this
> ready for use if building from SVN?
>
>
Hi All,
We are working with a Cassandra Cluster consisting 3 nodes with each having
storage capacity of 0.5 Terabytes.
We are loading the data into the cluster with OrderPreservingPartition and
with Replication Factor 2.
The Data that has been loaded so far looks as follows:
Address Status
On Wed, Jul 28, 2010 at 9:13 PM, Aaron Morton wrote:
> Have you considered Redis http://code.google.com/p/redis/?
>
> It may be more suited to the master-slave configuration you are after.
>
> - You can have a master to write to, then slave to a slave master, then your
> web heads run a local redi
Thank you, Aaron.
Yes, we're now thinking Hadoop would be one of choices, too.
So far, it doesn't matter if we use "SQL" or not as long as Cassandra
can process millions of rows at a time in a practical time.
As a result, what kind of patterns should be Cassandra more powerful
than MySQL from t
Just wanted to follow up on this.
We were never able to achieve throughput scaling in the cloud. We were able to
verify that many of our cluster nodes and test servers were collocated on the
same physical hardware (thanks Stu for the tip on the Rackspace REST API), and
that performance on coll
You are both confusing columns with rows. Columns have timestamps,
row keys do not.
On Wed, Jul 28, 2010 at 11:37 PM, Thorvaldsson Justus
wrote:
> You insert 500 rows with key “x”
>
> And 1000 rows with key “y”
>
> You make a query getting all rows.
>
> It will only show two rows, the ones with
Is there any limitations on the number of columns a row can have? Does
all the day for a single key need to reside on a single host? If so,
wouldn't that mean there is an implicit limit on the number of columns
one can have... ie the disk size of that machine.
What is the proper way to handle
Just wanted to toss this out there in case if this is an issue or the format
really changed and have to start from a clean slate. I was running from
yesterday's trunc and had some Keyspaces with data. Today's trunc failed server
start giving this exception:
ERROR [main] 2010-07-29 14:05:21,489
Yes the OPP will could give you a distribution like that. Given that only two nodes have data, and they seem to have same amount of data. I wonder if all your keys are falling into the key range of the last node? So with RF 2 they go to the last and first node only. As an experiment you could try r
One method would be to use a Super Column Family. Have one row, in that create a column family for each count value you have, and then in the super column create a column for each word. Set the CompareWith for the super col to be LongType and the CompareSubcolumnsWith to be AsciiTyoe or UTFType. Yo
Can you determine approximately what revisions you were running before and
after?
-Original Message-
From: "Arya Goudarzi"
Sent: Thursday, July 29, 2010 4:42pm
To: user@cassandra.apache.org
Subject: Avro Runtime Exception Bad Index
Just wanted to toss this out there in case if this is a
> Just wanted to follow up on this.
>
> We were never able to achieve throughput scaling in the cloud. We were able
> to verify that many of our cluster nodes and test servers were collocated on
> the same physical hardware (thanks Stu for the tip on the Rackspace REST
> API), and that performa
Thanks for this, Aaron. It does actually look like Redis may be better
suited to our needs. I had originally discounted Redis because I had
the impression that it had volatile storage only, but now I see that
not to be the case.
Thanks again! Yup, you've got Append Only, foreground Snap Shot and
I noticed this once when accidentally sharing connections around. Could that be the case ? What sort of commands are you running ? Could you be seeing this problem ?http://www.mail-archive.com/user@cassandra.apache.org/msg04831.htmlAaronOn 29 Jul, 2010,at 12:47 PM, Jianing Hu wrote:We recently mig
Which type of cache is appropriate to your particular case depends on a
variety of factors including the hotness and other access
characteristics of your data set, the relationship of data set size to
the heap size, row size to key size, and so forth.
=Rob
A little of topic, but I remember re
Ok so basically an "array" of words grouped by their count?
Something like this?
{
SearchLogs : {
ALL : {
999: { word1:word1, word2:word2, word3:word3 }
998: { word1:word1, word2:word2, word3:word3 }
}
}
}
On 7/29/10 2:50 PM, Aaron Morton wrote:
One metho
Yes, but as I said it may not be the optimal design. You may end up with a single row very big row. - you could use multiple rows, each holding a range of counts. - you could use a standard CF and store the count in the row key, then use get_range_slices. Using the random partition you will need to
Hi Aaron,
Thanks for the reply. Can you explain what you mean by "sharing
connections around"?
I'm just calling a simple "get", and the data returned is for a
completely different key. It's intermittent and hard to produce in my
test environment, but can be observed in our production environment
I was accidentally sharing connections between threads, and getting strange results. Is your client multi threaded?Can you provide some more information, such as the client library, how the data is written and how you're deciding that the returned results are the wrong ones. Is the read inconsiste
Why? What reasons did you choose TCP?
Shen
On Sat, Mar 6, 2010 at 9:15 AM, Jonathan Ellis wrote:
> In 0.6 gossip is over TCP.
>
> On Fri, Mar 5, 2010 at 6:54 PM, Ashwin Jayaprakash
> wrote:
> > Hey guys! I have a simple question. I'm a casual observer, not a real
> > Cassandra user yet. So, ex
That's an interesting thought. My code runs in FCGI and although the
cassandra connection is used to serve multiple requests, those
requests are supposedly processed sequentially, in a while
($request->Accept() >= 0) loop. However, we do call FCGI::finish to
close the request (so the HTTP request w
An asynchronous thrift client in Java would be something that we could
really use; I'm trying to get a sense of whether this async client is usable
with Cassandra at this point -- given that Cassandra typically bundles a
specific older Thrift version, would the technique described here work at
all
When you can't get the number of threads, that means you have way too many
running (8,000+) usually.
Try running `ps -eLf | grep cassandra`. How many threads?
-Chris
On Jul 29, 2010, at 8:40 PM, Dathan Pattishall wrote:
>
> To Follow up on this thread. I blew away the data for my entire clust
27 matches
Mail list logo