Look into your Cassandra's logs to see if JNA is really enabled (it
really should be, by default), and more importantly if JNA is loaded
correctly. You might find some surprising message over there: if this
is the case, just install JNA with your distro's package manager and,
if still doesn't work,
Hi Paolo,
Thanks for the hint - JNA indeed wasn't installed. However, now that
cassandra is actually using it, there doesn't seem to be any change in
terms of speed - still 7 seconds with pycassa.
On Thu, Apr 19, 2012 at 12:14 AM, Paolo Bernardi wrote:
> Look into your Cassandra's logs to see i
De : phuduc nguyen [mailto:duc.ngu...@pearson.com]
> How are you passing a blob or binary stream to the CLI? It sounds like
> you're passing in a representation of a binary stream as ascii/UTF8
> which will create the problems you describe.
So this is only a limitation of Cassandra-cli?
--
Marc
Hi,
I am interesting in knowing what is the best way to create my Cassandra
Client bypassing the Socket communication and directly interacting with the
'Storage Manager'. I checked Cassandra Wiki and some of the Hector
Examples, mostly what I see is that Cassandra when run in embedded mode,
requir
What version are you on ?
AFAIK the SimpleAuthenticator, and to some degree authentication (?), has been
essentially deprecated as it was considered incomplete and was not under
development. This is why the SimpleAuthenticator was moved out to the examples
directory in 1.X. I doubt it will be
try this
http://www.datastax.com/docs/1.0/install/upgrading#upgrading-between-minor-releases-of-cassandra-1-0-x
Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 18/04/2012, at 3:02 AM, Tamar Fraenkel wrote:
> Thanks!!!
> Two simple actions
For background:
http://www.datastax.com/docs/1.0/cluster_architecture/index
http://thelastpickle.com/2011/02/07/Introduction-to-Cassandra/
> Which mechanism is used to replicate the changes from one system to another:
> statement distribution or recording the changeset via triggers or storing the
You can get some idea from reading
org.apache.cassandra.thrift.CassandraServer.java, but I wonder what kind of use
case will justify such effort.
From iPhone
On 2012/04/19, at 18:17, Tarun Gupta wrote:
> Hi,
>
> I am interesting in knowing what is the best way to create my Cassandra
> Clie
I would suggest you build one cluster, using all your nodes, and create one
keyspace for all users.
There are lots of reasons, here a few:
* many nodes in a single clusters spreads the load and gives you fault
tolerance.
* read and write requests can be distributed in a many node cluster.
* ca
At some point the gossip system on the node this log is from decided that
130.199.185.195 was DOWN. This was based on how often the node was gossiping to
the cluster.
The active repair session was informed. And to avoid failing the job
unnecessarily it tested that the errant nodes phi value wa
As timestamps are set by clients, a common gotcha is to have all or some
clients which are not synchronised by NTP.
Hi,
One of the projects I am working on is going to need to store about 200TB
of data - generally in manageable binary chunks. However, after doing some
rough calculations based on rules of thumb I have seen for how much storage
should be on each node I'm worried.
200TB with RF=3 is 600TB = 600
Thanks Aaron and Romain,
very useful information indeed; and yes there is no alternative to
personally trying out and dirtying our hands.
Regards,
Samba
Here's a test I did a while ago about creating column objects in python
http://www.mail-archive.com/user@cassandra.apache.org/msg06729.html
As Tyler said, the best approach is to limit the size of the slices.
If are are trying to load 125K super columns with 25 columns each your are
asking fo
Cassandra supports data compression and depending on your data, you can
gain a reduction in data size up to 4x.
600 TB is a lot, hence requires lots of servers...
Franc Carter a écrit sur 19/04/2012 13:12:19 :
> Hi,
>
> One of the projects I am working on is going to need to store about
> 2
On Thu, Apr 19, 2012 at 9:38 PM, Romain HARDOUIN
wrote:
>
> Cassandra supports data compression and depending on your data, you can
> gain a reduction in data size up to 4x.
>
The data is gzip'd already ;-)
> 600 TB is a lot, hence requires lots of servers...
>
>
> Franc Carter a écrit sur 19/
Thanks.
This was the one I followed :) Wonder if there is something more detailed...
*Tamar Fraenkel *
Senior Software Engineer, TOK Media
[image: Inline image 1]
ta...@tok-media.com
Tel: +972 2 6409736
Mob: +972 54 8356490
Fax: +972 2 5612956
On Thu, Apr 19, 2012 at 1:06 PM, aaron mor
Franc Carter
> One of the projects I am working on is going to need to store about 200TB of
> data - generally in manageable binary chunks. However, after doing some rough
> calculations based on rules of thumb I have seen for how much storage should
> be on each node I'm worried.
> 200TB wit
600 TB is really a lot, even 200 TB is a lot. In our organization, storage
at such scale is handled by our storage team and they purchase specialized
(and very expensive) equipment from storage hardware vendors because at
this scale, performance and reliability is absolutely critical.
but it soun
On Thu, Apr 19, 2012 at 10:07 PM, John Doe wrote:
> Franc Carter
>
> > One of the projects I am working on is going to need to store about
> 200TB of data - generally in manageable binary chunks. However, after doing
> some rough calculations based on rules of thumb I have seen for how much
> st
On Thu, Apr 19, 2012 at 10:16 PM, Yiming Sun wrote:
> 600 TB is really a lot, even 200 TB is a lot. In our organization,
> storage at such scale is handled by our storage team and they purchase
> specialized (and very expensive) equipment from storage hardware vendors
> because at this scale, pe
Can you say more about how and how often these 200TB get used, queried,
updated? Is a different usage profile needed? What kind of column
families do you have in mind for them?
On Thu, Apr 19, 2012 at 8:24 AM, Franc Carter wrote:
> On Thu, Apr 19, 2012 at 10:16 PM, Yiming Sun wrote:
>
>> 600
I think your math is 'relatively' correct. It would seem to me you
should focus on how you can reduce the amount of storage you are using
per item, if at all possible, if that node count is prohibitive.
On 04/19/2012 07:12 AM, Franc Carter wrote:
Hi,
One of the projects I am working on is go
Take a peep at cassandra-unit, maybe this could help you :
https://github.com/jsevellec/cassandra-unit
Well, I'm not sure exactly how you're passing a blob to the CLI. It would be
helpful if you pasted your commands/code and maybe there is a simple
oversight.
With that said, Cassandra can most definitely save blob/binary values. I
think most people use a high level client; we use Hector. If you're
> The bit I am trying to understand is whether my figure of 400GB/node in
practice for Cassandra is correct, or whether we can push the GB/node higher
and if so how high
Our cluster runs with up to 2TB/node (thats the compressed size) and an
RF=2. The figure of 400GB/node is by no way a maximum
PHPCassa does support binaries, so that should not be the problem.
2012/4/19 phuduc nguyen
> Well, I'm not sure exactly how you're passing a blob to the CLI. It would
> be
> helpful if you pasted your commands/code and maybe there is a simple
> oversight.
>
> With that said, Cassandra can most d
Hi,
Is there any documentation on what the procedure for migrating from
SimpleStrategy to NetworkTopologyStrategy?
thanks
Simon
Would there be any reason why I can't write more than 875 writes/sec to a
cluster of 2 cassandra boxes? They are quad core machines with 8gb of ram
running raid 10, so not huge servers….but certainly enough to handle a much
larger load than that.
We are feeding data into it through a Flume sin
Hi All,
I did a web search of the archives (hope I looked in the right place) and
could not find a request like this.
When Cassandra is running, it seems to create to random tcp listen ports.
For example: "50378 and 58692", "49952, 52792".
What are are these for and is there documentation regar
All the examples of cassandra-topology.properties that I have seen have a
default entry assigning unknown nodes to a specific data center and rack.
Is it possible to have Cassandra ignore unknown nodes for the purpose of
replication?
Bill
We'll try doing multithreaded requests today-tomorrow
As for tuning down the number of supercolumns per slice, I tried doing
that, but I've noticed that the time was decreasing linearly with the
length of the slice. So, grabbing 1000 per slice would take 1/5 as long as
5000, but i'll have to make
I think that is enough to do an update on keyspace, for example (cassandra-cli):
update keyspace KEYSPACE with placement_strategy =
'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options =
{datacenter1: 1};
On Thu, 19 Apr 2012 16:18:46 +0100
simojenki wrote:
> Hi,
>
> Is
I have a web application that generates multiple log files in a log file
directory. On a particularly chatty box, up to 2000 entries per second are
written to those log files. We are looking for a solution to tail that
directory and insert new entries into a cassandra db.
The fields in the log
Yes it is possible. Put the following as the last line of your topology file:
default=unknown:unknown
So long as you don't have any DC or rack with this name your local node will
not be able to address any nodes that aren't explicitly given in its topology
file.
However bear in mind that, whil
Try writing them through Kafka. It should that load.
Bill
Sent from my BlackBerry® wireless handheld
-Original Message-
From: Trevor Francis
Date: Thu, 19 Apr 2012 12:04:19
To:
Reply-To: user@cassandra.apache.org
Subject: High Log Storage
I have a web application that generates multip
I had thought that the topology file is used for replicas placement only
such that for the token range that the unknown node is responsible for,
data is still read and write there. It just won't be replicated since
replication factor is not defined.
Bill
On Thu, Apr 19, 2012 at 1:18 PM, Richard
Couple of ideas:
* take a look at compression in 1.X
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
* is there repetition in the binary data ? Can you save space by implementing
content addressable storage ?
Cheers
-
Aaron Morton
Freelance Developer
You should be able to get more than that.
Run nodetool cfstats, look at the Write Latency (this is the recent latency,
i.e. is reset each time you run it). This will give you an idea of how long an
individual node is spending on a write.
Fire up JConsole, go to the StorageProxy MBean and look
> but i'll have to make 5 times as many requests to the database
5 times a small number can be less than 1 big number :)
see http://wiki.apache.org/cassandra/HadoopSupport
It's also covered in the O'Reilly cassandra book, however that book is somewhat
out of date.
also search for posts from Jere
There is this, it's old..
http://wiki.apache.org/cassandra/Operations#Replication
There was also a discussion about it in the last month or so.
i *think* it's ok so long as you move to a single DC and single rack. But
please test.
Cheers
-
Aaron Morton
Freelance Developer
@a
Firefox version 3.6.10. on Ubuntu 10.10. Let me update it and try. Thanks
Nick! Will let you know.
-Original Message-
From: Nick Bailey [mailto:n...@datastax.com]
Sent: Wednesday, April 18, 2012 4:56 PM
To: user@cassandra.apache.org
Subject: Re: DataStax Opscenter 2.0 question
What versi
Thanks Nick, that was it. With Firefox 11, it works.
-Original Message-
From: Nick Bailey [mailto:n...@datastax.com]
Sent: Wednesday, April 18, 2012 4:56 PM
To: user@cassandra.apache.org
Subject: Re: DataStax Opscenter 2.0 question
What version of firefox? Someone has reported a similar
We tried this route previously. We did not run repair at all {our use-cases
don't need a repair} but while adding a secondary data center, we were
forced to run repair. It ended up exploding the data.
We finally had to start afresh, scrapped the cluster and re-import the data
with NTS. Now, whethe
44 matches
Mail list logo