1.1 not removing commit log files?

2012-05-21 Thread Bryce Godfrey
The commit log drives on my nodes keep slowly filling up.  I don't see any 
errors in my logs that are indicating any issues that I can map to this issue.

Is this how 1.1 is supposed to work now?  Previous versions seemed to keep this 
drive at a minimum as it flushed.

/dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog



endless hinted handoff with 1.1

2012-05-21 Thread Arend-Jan Wijtzes
Hi,

We are running a small test cluster and recently installed Cassandra 1.1 and
started with a new clean database. We keep seeing these messages in the log 
on just one of our nodes:

INFO [HintedHandoff:1] 2012-05-21 09:49:56,757 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 85070591730234615865843651857942052864 
with IP: /10.0.0.73
 INFO [HintedHandoff:1] 2012-05-21 09:49:56,775 HintedHandOffManager.java (line 
382) Finished hinted handoff of 0 rows to endpoint /10.0.0.73
 INFO [HintedHandoff:1] 2012-05-21 09:59:56,756 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 42535295865117307932921825928971026432 
with IP: /10.0.0.69
 INFO [HintedHandoff:1] 2012-05-21 09:59:56,757 HintedHandOffManager.java (line 
382) Finished hinted handoff of 0 rows to endpoint /10.0.0.69
 INFO [HintedHandoff:1] 2012-05-21 09:59:56,757 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 85070591730234615865843651857942052864 
with IP: /10.0.0.73
 INFO [HintedHandoff:1] 2012-05-21 09:59:56,775 HintedHandOffManager.java (line 
382) Finished hinted handoff of 0 rows to endpoint /10.0.0.73
 INFO [HintedHandoff:1] 2012-05-21 10:09:56,757 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 42535295865117307932921825928971026432 
with IP: /10.0.0.69
 INFO [HintedHandoff:1] 2012-05-21 10:09:56,758 HintedHandOffManager.java (line 
382) Finished hinted handoff of 0 rows to endpoint /10.0.0.69
 INFO [HintedHandoff:1] 2012-05-21 10:09:56,758 HintedHandOffManager.java (line 
294) Started hinted handoff for token: 85070591730234615865843651857942052864 
with IP: /10.0.0.73
 INFO [HintedHandoff:1] 2012-05-21 10:09:56,879 HintedHandOffManager.java (line 
382) Finished hinted handoff of 0 rows to endpoint /10.0.0.73


All four nodes are up:

-bash-4.1$ nodetool ring -h localhost
Note: Ownership information does not include topology, please specify a 
keyspace. 
Address DC  RackStatus State   LoadOwns 
   Token   

   127605887595351923798765477786913079296 
10.0.0.65   datacenter1 rack1   Up Normal  244.41 MB   25.00%   
   0   
10.0.0.69   datacenter1 rack1   Up Normal  155.39 MB   25.00%   
   42535295865117307932921825928971026432  
10.0.0.73   datacenter1 rack1   Up Normal  220.42 MB   25.00%   
   85070591730234615865843651857942052864  
10.0.0.77   datacenter1 rack1   Up Normal  296.14 MB   25.00%   
   127605887595351923798765477786913079296 


This has been going on for days. Note that it's just two key's in the log
that keep repeating. No recent messages about HintedHandOff in the logs 
on the other nodes.

Let me know if you need more info.

Arend-Jan

-- 
Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl


RE: 1.1 not removing commit log files?

2012-05-21 Thread Pieter Callewaert
Hi,

In 1.1 the commitlog files are pre-allocated with files of 128MB. 
(https://issues.apache.org/jira/browse/CASSANDRA-3411) This should however not 
exceed your commitlog size in Cassandra.yaml.

commitlog_total_space_in_mb: 4096

Kind regards,
Pieter Callewaert

From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
Sent: maandag 21 mei 2012 9:52
To: user@cassandra.apache.org
Subject: 1.1 not removing commit log files?

The commit log drives on my nodes keep slowly filling up.  I don't see any 
errors in my logs that are indicating any issues that I can map to this issue.

Is this how 1.1 is supposed to work now?  Previous versions seemed to keep this 
drive at a minimum as it flushed.

/dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog



Re: unable to nodetool to remote EC2

2012-05-21 Thread Tamar Fraenkel
Hi!
I am trying the tunnel and it fails. Will be gratefull for some hints:

I defined

   - proxy_host = ubuntu@my_ec2_cassandra_node_public_ip
   - proxy_port = 22

I do:
*ssh -N -f -i /c/Users/tamar/.ssh/Amazon/tokey.openssh -D22
ubuntu@my_ec2_cassandra_node_public_ip*

I put some debug prints and I can see that the ssh_pid is indeed the
correct one.

I run
*jconsole -J-DsocksProxyHost=localhost -J-DsocksProxyPort=22
service:jmx:rmi:///jndi/rmi://my_ec2_cassandra_node_public_ip:7199/jmxrmi*

I get errors and it fails:
channel 2: open failed: connect failed: Connection timed out

One note though, I can ssh to that vm using
ssh -i /c/Users/tamar/.ssh/Amazon/tokey.openssh -D22
ubuntu@my_ec2_cassandra_node_public_ip
without being prompted for PW.

Any help appreciated

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Fri, May 18, 2012 at 9:49 PM, ramesh  wrote:

>  On 05/18/2012 01:35 PM, Tyler Hobbs wrote:
>
> Your firewall rules need to allow TCP traffic on any port >= 1024 for JMX
> to work.  It initially connects on port 7199, but then the client is asked
> to reconnect on a randomly chosen port.
>
> You can open the firewall, SSH to the node first, or set up something like
> this: http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunnel.html
>
> On Fri, May 18, 2012 at 1:31 PM, ramesh  wrote:
>
>  I updated the cassandra-env.sh
> $JMX_HOST="10.20.30.40"
> JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=$JMX_HOST"
>
> netstat -ltn shows port 7199 is listening.
>
> I tried both public and private IP for connecting but neither helps.
>
> However, I am able to connect locally from within server.
>
>  I get this error when I remote:
>
> Error connection to remote JMX agent! java.rmi.ConnectException:
> Connection refused to host: 10.20.30.40; nested exception is:
> java.net.ConnectException: Connection timed out at
> sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at
> sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at
> sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110) at
> javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at
> javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2329)
> at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279)
> at
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144) at
> org.apache.cassandra.tools.NodeProbe. (NodeProbe.java:114) at
> org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623) Caused by:
> java.net.ConnectException: Connection timed out at
> java.net.PlainSocketImpl.socketConnect(Native Method) at
> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at
> java.net.Socket.connect(Socket.java:529) at
> java.net.Socket.connect(Socket.java:478) at java.net.Socket. (Socket.java:375)
> at java.net.Socket. (Socket.java:189) at
> sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
> at
> sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
> at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ...
> 10 more
>
> Any help appreciated.
> Regards
> Ramesh
>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>
>
> It helped.
> Thanks Tyler for the info and the link to the post.
>
> Regards
> Ramesh
>
<>

Re: 1.1 not removing commit log files?

2012-05-21 Thread Alain RODRIGUEZ
commitlog_total_space_in_mb: 4096

By default this line is commented in 1.0.x if I remember well. I guess
it is the same in 1.1. You really should remove this comment or your
commit logs will entirely fill up your disk as it happened to me a
while ago.

Alain

2012/5/21 Pieter Callewaert :
> Hi,
>
>
>
> In 1.1 the commitlog files are pre-allocated with files of 128MB.
> (https://issues.apache.org/jira/browse/CASSANDRA-3411) This should however
> not exceed your commitlog size in Cassandra.yaml.
>
>
>
> commitlog_total_space_in_mb: 4096
>
>
>
> Kind regards,
>
> Pieter Callewaert
>
>
>
> From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
> Sent: maandag 21 mei 2012 9:52
> To: user@cassandra.apache.org
> Subject: 1.1 not removing commit log files?
>
>
>
> The commit log drives on my nodes keep slowly filling up.  I don’t see any
> errors in my logs that are indicating any issues that I can map to this
> issue.
>
>
>
> Is this how 1.1 is supposed to work now?  Previous versions seemed to keep
> this drive at a minimum as it flushed.
>
>
>
> /dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog
>
>


Re: unsubscribe

2012-05-21 Thread Dave Brosius

On 05/21/2012 02:44 AM, Qingyan(Evan) Liu wrote:



send to user-unsubscr...@cassandra.apache.org


Re: Tuning cassandra (compactions overall)

2012-05-21 Thread Alain RODRIGUEZ
Hi Aaron.

I wanted to try the new config. After doing a rolling restart I have
all my counters false, with wrong values. I stopped my servers with
the following :

nodetool -h localhost disablegossip
nodetool -h localhost disablethrift
nodetool -h localhost drain
kill cassandra sigterm (15) via htop

And after restarting the second one I have lost all the consistency of
my data. All my statistics since September are totally false now in
production.

As reminder I'm using a 2 node cluster RF=2, CL.ONE

1 - How to fix it ? (I have a backup from this morning, but I will
lose all the data after this date if I restore this backup)
2 - What happened ? How to avoid it ?

Any Idea would be greatly appreciated, I'm quite desperated.

Alain

2012/5/17 aaron morton :
> What is the the benefit of having more memory ? I mean, I don't
>
> understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
>
> Less frequent and less aggressive garbage collection frees up CPU resources
> to run the database.
>
> Less memory results in frequent and aggressive (i.e. stop the world) GC, and
> increase IO pressure. Which reduces read performance and in the extreme can
> block writes.
>
> The memory used inside
>
> the heap will remains close to the max memory available, therefore
> having more or less memory doesn't matter.
>
> Not an ideal situation. Becomes difficult to find an contiguous region of
> memory to allocate.
>
> Can you enlighten me about this point ?
>
> It's a database server, it is going to work better with more memory. Also
> it's Java and it's designed to run on multiple machines with many GB's of
> ram available. There are better arguments
> here http://wiki.apache.org/cassandra/CassandraHardware
>
>
> I'm interested a lot in learning about some configuration I can use to
> reach better peformance/stability as well as in learning about how
> Cassandra works.
>
> Turn off all caches.
>
> In the schema increase the bloom filter false positive rate (see help in the
> cli for Create column family)
>
> In the yaml experiment with these changes:
> * reduce sliced_buffer_size_in_kb
> * reduce column_index_size_in_kb
> * reduce in_memory_compaction_limit_in_mb
> * increase index_interval
> * set concurrent_compactors to 2
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/05/2012, at 12:40 AM, Alain RODRIGUEZ wrote:
>
> Using c1.medium, we are currently able to deliver the service.
>
> What is the the benefit of having more memory ? I mean, I don't
> understand why having 1, 2, 4, 8 or 16 GB of memory is so different.
> In my mind, Cassandra will fill the heap and from then, start to flush
> and compact to avoid OOMing and fill it again. The memory used inside
> the heap will remains close to the max memory available, therefore
> having more or less memory doesn't matter.
>
> I'm pretty sure I misunderstand or forget something about how the
> memory is used but not sure about what.
>
> Can you enlighten me about this point ?
>
> If I understand why the memory size is that important I will probably
> be able to argue about the importance of having more memory and my
> boss will probably allow me to spend more money to get better servers.
>
> "There are some changes you can make to mitigate things (let me know
> if you need help), but this is essentially a memory problem."
>
> I'm interested a lot in learning about some configuration I can use to
> reach better peformance/stability as well as in learning about how
> Cassandra works.
>
> Thanks for the help you give to people and for sharing your knowledge
> with us. I appreciate a lot the Cassandra community and the most
> active people keeping it alive. It's worth being said :).
>
> Alain
>
>


Wrong data after rolling restart

2012-05-21 Thread Alain RODRIGUEZ
Hi, I re-post this here because it's a new subject far away from my
initial tuning questions.

I wanted to try a new config. After doing a rolling restart I have all
my counters false, with wrong values. I stopped my servers with the
following :

nodetool -h localhost disablegossip
nodetool -h localhost disablethrift
nodetool -h localhost drain
kill cassandra sigterm (15) via htop

And after restarting the second one I have lost all the consistency of
my data. All my statistics since September are totally false now in
production.

As reminder I'm using a 2 node cluster RF=2, CL.ONE

1 - How to fix it ? (I have a backup from this morning, but I will
lose all the data after this date if I restore this backup)
2 - What happened ? How to avoid it ?

Any Idea would be greatly appreciated, I'm quite desperated.

Alain


Re: Wrong data after rolling restart

2012-05-21 Thread Alain RODRIGUEZ
Here are my 2 nodes starting logs, I hop it can help...

https://gist.github.com/2762493
https://gist.github.com/2762495

Alain

2012/5/21 Alain RODRIGUEZ :
> Hi, I re-post this here because it's a new subject far away from my
> initial tuning questions.
>
> I wanted to try a new config. After doing a rolling restart I have all
> my counters false, with wrong values. I stopped my servers with the
> following :
>
> nodetool -h localhost disablegossip
> nodetool -h localhost disablethrift
> nodetool -h localhost drain
> kill cassandra sigterm (15) via htop
>
> And after restarting the second one I have lost all the consistency of
> my data. All my statistics since September are totally false now in
> production.
>
> As reminder I'm using a 2 node cluster RF=2, CL.ONE
>
> 1 - How to fix it ? (I have a backup from this morning, but I will
> lose all the data after this date if I restore this backup)
> 2 - What happened ? How to avoid it ?
>
> Any Idea would be greatly appreciated, I'm quite desperated.
>
> Alain


Ordering counters in Cassandra

2012-05-21 Thread Filippo Diotalevi
Hi, 
I'm trying to understand what's the best design for a simple "ranking" use 
cases.
I have, in a row, a good number (10k - a few 100K) of counters; each one is 
counting the occurrence of an event. At the end of day, I want to create a 
ranking of the most occurred event.

What's the best approach to perform this task? 
The brute force approach of retrieving the row and ordering it doesn't work 
well (the call usually goes timeout, especially is Cassandra is also under 
load); I also don't know in advance the full set of event names (column names), 
so it's difficult to slice the get call.

Is there any trick to solve this problem? Maybe a way to retrieve the row 
ordering for counter values? 

Thanks,
-- 
Filippo Diotalevi




RE Ordering counters in Cassandra

2012-05-21 Thread Romain HARDOUIN
If I understand you've got a data model which looks like this:

CF Events:
"row1": { "event1": 1050, "event2": 1200, "event3": 830, ... }

You can't query on column values but you can build every day a ranking in 
a dedicated CF by iterating over events:

create column family Ranking
with comparator = 'LongType(reversed=true)' 
...

CF Ranking:
"rank": { 1200: "event2", 1050: "event1", 830: "event3", ... }
 
Then you can make a "top ten" or whatever you want because counter values 
will be sorted.


Filippo Diotalevi  a écrit sur 21/05/2012 16:59:43 :

> Hi, 
> I'm trying to understand what's the best design for a simple 
> "ranking" use cases.
> I have, in a row, a good number (10k - a few 100K) of counters; each
> one is counting the occurrence of an event. At the end of day, I 
> want to create a ranking of the most occurred event.
> 
> What's the best approach to perform this task? 
> The brute force approach of retrieving the row and ordering it 
> doesn't work well (the call usually goes timeout, especially is 
> Cassandra is also under load); I also don't know in advance the full
> set of event names (column names), so it's difficult to slice the get 
call.
> 
> Is there any trick to solve this problem? Maybe a way to retrieve 
> the row ordering for counter values?
> 
> Thanks,
> -- 
> Filippo Diotalevi

Re: Counters and replication factor

2012-05-21 Thread Radim Kolar

Dne 26.3.2012 19:17, aaron morton napsal(a):
Can you describe the situations where counter updates are lost or go 
backwards ?


Do you ever get TimedOutExceptions when performing counter updates ?
we got few timeouts per day but not much, less then 10. I do not think 
that timeouts will be root cause. I havent figured exact steps to 
reproduce it (i havent even tried). We are reading at CL.ONE but cluster 
is well synchronized and we are reading long time after writing - new 
value should be present at all nodes allready.


Re: RE Ordering counters in Cassandra

2012-05-21 Thread Filippo Diotalevi
Hi Romain,  
thanks for your suggestion.

When you say " build every day a ranking in a dedicated CF by iterating over 
events:" do you mean
- load all the columns for the specified row key
- iterate over each column, and write a new column in the inversed index
?

That's my current approach, but since I have many of these wide rows (1 per 
day), the process is extremely slow as it involves moving an entire row from 
Cassandra to client, inverting every column, and sending the data back to 
create the inversed index.  

--  
Filippo Diotalevi



On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:

>  
> If I understand you've got a data model which looks like this:  
>  
> CF Events:  
> "row1": { "event1": 1050, "event2": 1200, "event3": 830, ... }  
>  
> You can't query on column values but you can build every day a ranking in a 
> dedicated CF by iterating over events:  
>  
> create column family Ranking  
> with comparator = 'LongType(reversed=true)'
> ...  
>  
> CF Ranking:  
> "rank": { 1200: "event2", 1050: "event1", 830: "event3", ... }  
>  
> Then you can make a "top ten" or whatever you want because counter values 
> will be sorted.  
>  
>  
> Filippo Diotalevi mailto:fili...@ntoklo.com)> a écrit 
> sur 21/05/2012 16:59:43 :
>  
> > Hi,  
> > I'm trying to understand what's the best design for a simple  
> > "ranking" use cases.  
> > I have, in a row, a good number (10k - a few 100K) of counters; each
> > one is counting the occurrence of an event. At the end of day, I  
> > want to create a ranking of the most occurred event.  
> >  
> > What's the best approach to perform this task?  
> > The brute force approach of retrieving the row and ordering it  
> > doesn't work well (the call usually goes timeout, especially is  
> > Cassandra is also under load); I also don't know in advance the full
> > set of event names (column names), so it's difficult to slice the get call. 
> >  
> >  
> > Is there any trick to solve this problem? Maybe a way to retrieve  
> > the row ordering for counter values?  
> >  
> > Thanks,  
> > --  
> > Filippo Diotalevi  



Re: RE Ordering counters in Cassandra

2012-05-21 Thread Tamar Fraenkel
I also had a similar problem. I have a temporary solution, which is not
best, but may be of help.
I have the coutner cf to count events, but apart from that I hold leaders
CF:

leaders = {
  // key is time bucket
  // values are composites(rank, event) ordered by
  // descending order of the rank
  // set relevant TTL on columns
  time_bucket1 : {
composite(1000,event1) : ""
composite(999, event2) : ""
  },
  ...
}

Whenever I increment counter for a specific event, I add a column in the
time bucket row of the leaders CF, with the new value of the counter and
the event name.
There are two ways to go here, either delete the old column(s) for that
event (with lower counters) from leaders CF. Or let them be.
If you choose to delete, there is the complication of not having
getAndSetfor counters, so you may end up not deleting all the old
columns.
If you choose not to  delete old column, and live with duplicate columns
for events (each with different count), it will make your query to retrieve
leaders run longer.
Anyway, when you need to retrieve the leaders, you can do slice query
onleaders CF and ignore
duplicates events using client (I use Java). This will happen less if you
do delete old columns.

Another option is not to use Cassandra for that purpose, http://redis.io/ is
a nice tool

Will be happy to hear you comments.
Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, May 21, 2012 at 8:05 PM, Filippo Diotalevi wrote:

> Hi Romain,
> thanks for your suggestion.
>
> When you say " build every day a ranking in a dedicated CF by iterating
> over events:" do you mean
> - load all the columns for the specified row key
> - iterate over each column, and write a new column in the inversed index
> ?
>
> That's my current approach, but since I have many of these wide rows (1
> per day), the process is extremely slow as it involves moving an entire row
> from Cassandra to client, inverting every column, and sending the data back
> to create the inversed index.
>
> --
> Filippo Diotalevi
>
>
> On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:
>
>
> If I understand you've got a data model which looks like this:
>
> CF Events:
> "row1": { "event1": 1050, "event2": 1200, "event3": 830, ... }
>
> You can't query on column values but you can build every day a ranking in
> a dedicated CF by iterating over events:
>
> create column family Ranking
> with comparator = 'LongType(reversed=true)'
> ...
>
> CF Ranking:
> "rank": { 1200: "event2", 1050: "event1", 830: "event3", ... }
>
> Then you can make a "top ten" or whatever you want because counter values
> will be sorted.
>
>
> Filippo Diotalevi  a écrit sur 21/05/2012 16:59:43 :
>
> > Hi,
> > I'm trying to understand what's the best design for a simple
> > "ranking" use cases.
> > I have, in a row, a good number (10k - a few 100K) of counters; each
> > one is counting the occurrence of an event. At the end of day, I
> > want to create a ranking of the most occurred event.
> >
> > What's the best approach to perform this task?
> > The brute force approach of retrieving the row and ordering it
> > doesn't work well (the call usually goes timeout, especially is
> > Cassandra is also under load); I also don't know in advance the full
> > set of event names (column names), so it's difficult to slice the get
> call.
> >
> > Is there any trick to solve this problem? Maybe a way to retrieve
> > the row ordering for counter values?
> >
> > Thanks,
> > --
> > Filippo Diotalevi
>
>
>
<>

Re: restoring from snapshot - missing data

2012-05-21 Thread Tyler Hobbs
On Mon, May 21, 2012 at 12:01 AM, Tamar Fraenkel wrote:

> If I am putting the snapshots on a clean ring, I need to first create the
> data model?


Yes.

-- 
Tyler Hobbs
DataStax 


Re: RE Ordering counters in Cassandra

2012-05-21 Thread Filippo Diotalevi


Hi Tamar,
the solution you propose is indeed a "temporary solution", but it might be the best one.Which approach did you follow?I'm a bit concerned about the deletion approach, since in case of concurrent writes on the same counter you might "lose" the pointer to the column to delete. 
-- Filippo Diotalevi
 
On Monday, 21 May 2012 at 18:51, Tamar Fraenkel wrote:

I also had a similar problem. I have a temporary solution, which is not best, but may be of help.I have the coutner cf to count events, but apart from that I hold leaders CF:

leaders = {
  // key is time bucket
  // values are composites(rank, event) ordered by
  // descending order of the rank
  // set relevant TTL on columns
  time_bucket1 : {
composite(1000,event1) : ""
composite(999, event2) : ""
  },
  ...
}Whenever I increment counter for a specific event, I add a column in the time bucket row of the leaders CF, with the new value of the counter and the event name.There are two ways to go here, either delete the old column(s) for that event (with lower counters) from leaders CF. Or let them be. 

If you choose to delete, there is the complication of not having getAndSet for counters, so you may end up not deleting all the old columns. 

If you choose not to  delete old column, and live with duplicate columns for events (each with different count), it will make your query to retrieve leaders run longer.

Anyway, when you need to retrieve the leaders, you can do slice query on leaders CF and ignore duplicates events using client (I use Java). This will happen less if you do delete old columns.

Another option is not to use Cassandra for that purpose, http://redis.io/ is a nice tool

Will be happy to hear you comments.

Thanks,Tamar Fraenkel Senior Software Engineer, TOK Media 

ta...@tok-media.comTel:   +972 2 6409736 Mob:  +972 54 8356490 

Fax:   +972 2 5612956 
On Mon, May 21, 2012 at 8:05 PM, Filippo Diotalevi  wrote:


Hi Romain,
thanks for your suggestion.When you say " build every day a ranking in a dedicated CF by iterating over events:" do you mean- load all the columns for the specified row key

- iterate over each column, and write a new column in the inversed index?That's my current approach, but since I have many of these wide rows (1 per day), the process is extremely slow as it involves moving an entire row from Cassandra to client, inverting every column, and sending the data back to create the inversed index.


-- Filippo Diotalevi
  
On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:

If I understand you've got a data model
which looks like this:

CF Events:
    "row1": { "event1":
1050, "event2": 1200, "event3": 830, ... }

You can't query on column values but
you can build every day a ranking in a dedicated CF by iterating over events:

create column family Ranking
    with comparator = 'LongType(reversed=true)'
 
    ...

CF Ranking:
    "rank": { 1200:
"event2", 1050: "event1", 830: "event3",
... }
    
Then you can make a "top ten"
or whatever you want because counter values will be sorted.


Filippo Diotalevi  a écrit
sur 21/05/2012 16:59:43 :

> Hi, 
> I'm trying to understand what's the best design
for a simple 
> "ranking" use cases.
> I have, in a row, a good number (10k - a few
100K) of counters; each
> one is counting the occurrence of an event. At the end of day, I 
> want to create a ranking of the most occurred event.
> 
> What's the best approach to perform this task? 
> The brute force approach of retrieving the row
and ordering it 
> doesn't work well (the call usually goes timeout, especially is 
> Cassandra is also under load); I also don't know in advance the full
> set of event names (column names), so it's difficult to slice the
get call.
> 
> Is there any trick to solve this problem? Maybe a way to retrieve

> the row ordering for counter values?
> 
> Thanks,
> -- 
> Filippo Diotalevi
  
  
  
  





 
 
 
 

 





Number of keyspaces

2012-05-21 Thread Luís Ferreira
Hi,

Does the number of keyspaces affect the overall cassandra performance?


Cumprimentos,
Luís Ferreira





Re: RE Ordering counters in Cassandra

2012-05-21 Thread Tamar Fraenkel
Indeed I took the not delete approach. If time bucket rows are not that big, 
this is a good temporary solution.
I just finished implementation and testing now on a small staging environment. 
So far so good.
Tamar

Sent from my iPod

On May 21, 2012, at 9:11 PM, Filippo Diotalevi  wrote:

> Hi Tamar,
> the solution you propose is indeed a "temporary solution", but it might be 
> the best one.
> 
> Which approach did you follow?
> I'm a bit concerned about the deletion approach, since in case of concurrent 
> writes on the same counter you might "lose" the pointer to the column to 
> delete. 
> 
> -- 
> Filippo Diotalevi
> 
> 
> On Monday, 21 May 2012 at 18:51, Tamar Fraenkel wrote:
> 
>> I also had a similar problem. I have a temporary solution, which is not 
>> best, but may be of help.
>> I have the coutner cf to count events, but apart from that I hold leaders CF:
>> leaders = {
>>   // key is time bucket
>>   // values are composites(rank, event) ordered by
>>   // descending order of the rank
>>   // set relevant TTL on columns
>>   time_bucket1 : {
>> composite(1000,event1) : ""
>> composite(999, event2) : ""
>>   },
>>   ...
>> }
>> Whenever I increment counter for a specific event, I add a column in the 
>> time bucket row of the leaders CF, with the new value of the counter and the 
>> event name.
>> There are two ways to go here, either delete the old column(s) for that 
>> event (with lower counters) from leaders CF. Or let them be. 
>> If you choose to delete, there is the complication of not having getAndSet 
>> for counters, so you may end up not deleting all the old columns. 
>> If you choose not to  delete old column, and live with duplicate columns for 
>> events (each with different count), it will make your query to retrieve 
>> leaders run longer.
>> Anyway, when you need to retrieve the leaders, you can do slice query on 
>> leaders CF and ignore duplicates events using client (I use Java). This will 
>> happen less if you do delete old columns.
>> 
>> Another option is not to use Cassandra for that purpose, http://redis.io/ is 
>> a nice tool
>> 
>> Will be happy to hear you comments.
>> Thanks,
>> 
>> Tamar Fraenkel 
>> Senior Software Engineer, TOK Media 
>> 
>> 
>> 
>> ta...@tok-media.com
>> Tel:   +972 2 6409736 
>> Mob:  +972 54 8356490 
>> Fax:   +972 2 5612956 
>> 
>> 
>> 
>> 
>> 
>> On Mon, May 21, 2012 at 8:05 PM, Filippo Diotalevi  
>> wrote:
>>> Hi Romain,
>>> thanks for your suggestion.
>>> 
>>> When you say " build every day a ranking in a dedicated CF by iterating 
>>> over events:" do you mean
>>> - load all the columns for the specified row key
>>> - iterate over each column, and write a new column in the inversed index
>>> ?
>>> 
>>> That's my current approach, but since I have many of these wide rows (1 per 
>>> day), the process is extremely slow as it involves moving an entire row 
>>> from Cassandra to client, inverting every column, and sending the data back 
>>> to create the inversed index.
>>> 
>>> -- 
>>> Filippo Diotalevi
>>> 
>>> 
>>> On Monday, 21 May 2012 at 17:19, Romain HARDOUIN wrote:
>>> 
 
 If I understand you've got a data model which looks like this: 
 
 CF Events: 
 "row1": { "event1": 1050, "event2": 1200, "event3": 830, ... } 
 
 You can't query on column values but you can build every day a ranking in 
 a dedicated CF by iterating over events: 
 
 create column family Ranking 
 with comparator = 'LongType(reversed=true)'   
 ... 
 
 CF Ranking: 
 "rank": { 1200: "event2", 1050: "event1", 830: "event3", ... } 
 
 Then you can make a "top ten" or whatever you want because counter values 
 will be sorted. 
 
 
 Filippo Diotalevi  a écrit sur 21/05/2012 16:59:43 :
 
 > Hi, 
 > I'm trying to understand what's the best design for a simple 
 > "ranking" use cases. 
 > I have, in a row, a good number (10k - a few 100K) of counters; each
 > one is counting the occurrence of an event. At the end of day, I 
 > want to create a ranking of the most occurred event. 
 > 
 > What's the best approach to perform this task?  
 > The brute force approach of retrieving the row and ordering it 
 > doesn't work well (the call usually goes timeout, especially is 
 > Cassandra is also under load); I also don't know in advance the full
 > set of event names (column names), so it's difficult to slice the get 
 > call. 
 > 
 > Is there any trick to solve this problem? Maybe a way to retrieve 
 > the row ordering for counter values? 
 > 
 > Thanks, 
 > -- 
 > Filippo Diotalevi
>>> 
>> 
> 


Re: restoring from snapshot - missing data

2012-05-21 Thread Tamar Fraenkel
Thanks.
After creating the data model and matching the correct snapshot with the
correct new node (same token) all worked fine!

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, May 21, 2012 at 9:06 PM, Tyler Hobbs  wrote:

> On Mon, May 21, 2012 at 12:01 AM, Tamar Fraenkel wrote:
>
>> If I am putting the snapshots on a clean ring, I need to first create the
>> data model?
>
>
> Yes.
>
> --
> Tyler Hobbs
> DataStax 
>
>
<>

RE: 1.1 not removing commit log files?

2012-05-21 Thread Bryce Godfrey
Thanks, I'll give it a try.

-Original Message-
From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Monday, May 21, 2012 2:12 AM
To: user@cassandra.apache.org
Subject: Re: 1.1 not removing commit log files?

commitlog_total_space_in_mb: 4096

By default this line is commented in 1.0.x if I remember well. I guess it is 
the same in 1.1. You really should remove this comment or your commit logs will 
entirely fill up your disk as it happened to me a while ago.

Alain

2012/5/21 Pieter Callewaert :
> Hi,
>
>
>
> In 1.1 the commitlog files are pre-allocated with files of 128MB.
> (https://issues.apache.org/jira/browse/CASSANDRA-3411) This should 
> however not exceed your commitlog size in Cassandra.yaml.
>
>
>
> commitlog_total_space_in_mb: 4096
>
>
>
> Kind regards,
>
> Pieter Callewaert
>
>
>
> From: Bryce Godfrey [mailto:bryce.godf...@azaleos.com]
> Sent: maandag 21 mei 2012 9:52
> To: user@cassandra.apache.org
> Subject: 1.1 not removing commit log files?
>
>
>
> The commit log drives on my nodes keep slowly filling up.  I don't see 
> any errors in my logs that are indicating any issues that I can map to 
> this issue.
>
>
>
> Is this how 1.1 is supposed to work now?  Previous versions seemed to 
> keep this drive at a minimum as it flushed.
>
>
>
> /dev/mapper/mpathf 25G   21G  4.2G  83% /opt/cassandra/commitlog
>
>


Re: unable to nodetool to remote EC2

2012-05-21 Thread ramesh

  
  
On 05/21/2012 03:55 AM, Tamar Fraenkel wrote:

  
  Hi!
I am trying the tunnel and it fails. Will be gratefull
  for some hints:


I defined

  

  proxy_host = ubuntu@my_ec2_cassandra_node_public_ip
  proxy_port = 22

I do:
  
  ssh -N -f -i /c/Users/tamar/.ssh/Amazon/tokey.openssh
  -D22 ubuntu@my_ec2_cassandra_node_public_ip
  
  
  
I put some debug prints and I can see that the ssh_pid is
indeed the correct one.
  

  I run
  jconsole -J-DsocksProxyHost=localhost
  -J-DsocksProxyPort=22
service:jmx:rmi:///jndi/rmi://my_ec2_cassandra_node_public_ip:7199/jmxrmi
  
  
  I get errors and it fails:
  channel 2: open failed: connect
failed: Connection timed out
  

  
  One note though, I can ssh to that vm
using 
  ssh -i /c/Users/tamar/.ssh/Amazon/tokey.openssh
-D22 ubuntu@my_ec2_cassandra_node_public_ip
  without being prompted for PW.
  
  
  Any help appreciated
  
  
  

  Tamar Fraenkel 
  Senior Software Engineer, TOK Media 
  
  

  ta...@tok-media.com
  Tel:   +972 2 6409736 
  Mob:  +972 54 8356490 
  Fax:   +972 2 5612956 
  
  


  
  
  
  
  On Fri, May 18, 2012 at 9:49 PM, ramesh 
wrote:

  
 On 05/18/2012 01:35 PM, Tyler Hobbs wrote: 

   Your firewall rules need to allow TCP traffic on
any port >= 1024 for JMX to work.  It initially
connects on port 7199, but then the client is asked
to reconnect on a randomly chosen port.

You can open the firewall, SSH to the node first, or
set up something like this: http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunnel.html

  
  
On Fri, May 18, 2012 at 1:31 PM, ramesh
  
  wrote:


  
 I updated the cassandra-env.sh
  
  $JMX_HOST="10.20.30.40"
  JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=$JMX_HOST"
  
  netstat -ltn shows
  port 7199 is listening. 
  
  I tried both public and private IP for
  connecting but neither helps.
  
  However, I am able to connect locally from
  within server.
  
   I get this error when I remote:
  

Error connection to remote JMX agent!
java.rmi.ConnectException: Connection refused to
host: 10.20.30.40; nested exception is:
java.net.ConnectException: Connection timed out
at
sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601)
at
sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198)
at
sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184)
at
sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110)
at
javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown
Source) at
javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2329)



at
javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279)
at
javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)



at
org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144)
at org.apache.cassandra.tools.NodeProbe.
(NodeProbe.java:114) at
org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623)
 

Re: how can we get (a lot) more performance from cassandra

2012-05-21 Thread Yiming Sun
Hi Aaron,

I don't know if you could elaborate a bit more on each of the points you
suggested.  Thanks.

-- Y.

On Sun, May 20, 2012 at 7:29 PM, aaron morton wrote:

> I would look into the problems you are having with GC...
>
> The server log shows the GC ParNew frequently gets longer than 200ms,
> often in the range of 4-5seconds.  But nowhere near 15 seconds (which is an
> indication that JVM heap is being swapped out).
>
>
> Then check the throughput on the san and the steal on the VM's.
>
> Also try to isolate the issue to "it takes this long for a single thread
> to make this call"
>
> In a low write environment reads should be flying along.
>
> Cheers
>
>   -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/05/2012, at 1:44 PM, Yiming Sun wrote:
>
> Hi Aaron T.,  No, actually we haven't, but this sounds like a good
> suggestion.  I can definitely try THIS before jumping into other things
> such as enabling row cache etc. Thanks!
>
> -- Y.
>
> On Wed, May 16, 2012 at 9:38 PM, Aaron Turner wrote:
>
>> On Wed, May 16, 2012 at 12:59 PM, Yiming Sun 
>> wrote:
>> > Hello,
>> >
>> > I asked the question as a follow-up under a different thread, so I
>> figure I
>> > should ask here instead in case the other one gets buried, and besides,
>> I
>> > have a little more information.
>> >
>> > "We find the lack of performance disturbing" as we are only able to get
>> > about 3-4MB/sec read performance out of Cassandra.
>> >
>> > We are using cassandra as the backend for an IR repository of digital
>> texts.
>> > It is a read-mostly repository with occasional writes.  Each row
>> represents
>> > a book volume, and each column of a row represents a page of the volume.
>> >  Granted the data size is small -- the average size of a column text is
>> > 2-3KB, and each row has about 250 columns (varies quite a bit from one
>> > volume to another).
>> >
>> > Currently we are running a 3-node cluster, and will soon be upgraded to
>> a
>> > 6-node setup.  Each node is a VM with 4 cores and 16GB of memory.  All
>> VMs
>> > use SAN as disk storage.
>> >
>> > To retrieve a volume, a slice query is used via Hector that specifies
>> the
>> > row key (the volume), and a list of column keys (pages), and the
>> consistency
>> > level is set to ONE.  It is typical to retrieve multiple volumes per
>> > request.
>> >
>> > The read rate that I have been seeing is about 3-4 MB/sec, and that is
>> > reading the raw bytes... using string serializer the rate is even lower,
>> > about 2.2MB/sec.
>> >
>> > The server log shows the GC ParNew frequently gets longer than 200ms,
>> often
>> > in the range of 4-5seconds.  But nowhere near 15 seconds (which is an
>> > indication that JVM heap is being swapped out).
>> >
>> > Currently we have not added JNA.  From a blog post, it seems JNA is
>> able to
>> > increase the performance by 13%, and we are hoping to increase the
>> > performance by something more like 1300% (3-4 MB/sec is just
>> disturbingly
>> > low).  And we are hesitant to disable swap entirely since one of the
>> nodes
>> > is running a couple other services
>> >
>> > Do you have any suggestions on how we may boost the performance?
>>  Thanks!
>>
>> Have you tried using more threads on the client side?  Generally
>> speaking, when I need faster read/write performance I look for ways to
>> parallelize my requests and it scales pretty much linearly.
>>
>>
>> --
>> Aaron Turner
>> http://synfin.net/ Twitter: @synfinatic
>> http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix &
>> Windows
>> Those who would give up essential Liberty, to purchase a little temporary
>> Safety, deserve neither Liberty nor Safety.
>> -- Benjamin Franklin
>> "carpe diem quam minimum credula postero"
>>
>
>
>


Re: Couldn't detect any schema definitions in local storage - after handling schema disagreement according to FAQ

2012-05-21 Thread aaron morton
> 1) What did I wrong? - why cassandra was throwing exceptions on first startup?
In 1.0.X the history of schema changes was replayed to the node when it 
rejoined the cluster. If the node is receiving traffic while this is going on 
it will log those errors until the schema mutation that created 1012 is 
replayed. 

> 2) Why the keyspace data was invalidated ? Is it expected?
The data will have remained on the disk. The load is calculated based on the 
CF's in the schema, this can mean that the load will not return to full until 
the schema is fully replayed. 

Did you lose data ?

> 3) If answer to #2 is  "yes it's expected" then  that's the point in doing 
> http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> then all keyspace data is lost anyway? It makes more sense to just do 
> http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node

Answer as no. 

Checking, did you delete just the Schema-* and Migration-* files or all of the 
files in data/system?

Also in the first log there is a log of commit log mutation being skipped 
because the schema is not there. Drain should have removed these, but it can 
take a little time (I think).  

> 4) afaiu i could also stop cassandra again move old sstables from snapshot 
> back to keyspace data dir and run repair for all keyspace CFs? So that it 
> finishes faster
> and makes less load than running a repair which has no previous keyspace data 
> at all?

The approach you followed was the correct one. 

I've updated the wiki to say the errors are expected. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 19/05/2012, at 6:34 AM, Piavlo wrote:

> Hi,
> 
> I had a schema disagreement problem in cassandra 1.0.9 cluster, where one 
> node had different schema version.
> So I followed the faq at 
> http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> disabled gossip, disabled thrift, drained  and finally stopped the cassandra 
> process, on startup
> noticed
> INFO [main] 2012-05-18 16:23:11,879 DatabaseDescriptor.java (line 467) 
> Couldn't detect any schema definitions in local storage.
> in the log, and after
> INFO [main] 2012-05-18 16:23:15,463 StorageService.java (line 619) 
> Bootstrap/Replace/Move completed! Now serving reads.
> it started throwing Fatal exceptions for all read/write operations endlessly.
> 
> I had to stop cassandra process again(no draining was done)
> 
> On second start it did came up ok immediately loading the correct cluster 
> schema version
> INFO [main] 2012-05-18 16:54:44,303 DatabaseDescriptor.java (line 499) 
> Loading schema version 9db34ef0-a0be-11e1--f9687e034cf7
> 
> But now this node appears to have started with no data from keyspace which 
> had schema disagreement.
> The original keyspace sstables now appear under snapshots dir.
> 
> # nodetool -h localhost ring
> Address DC  RackStatus State   LoadOwns   
>  Token
>   
> 141784319550391026443072753096570088106
> 10.49.127.4 eu-west 1a  Up Normal  8.19 GB 16.67% 
>  0
> 10.241.29.65eu-west 1b  Up Normal  8.18 GB 16.67% 
>  28356863910078205288614550619314017621
> 10.59.46.236eu-west 1c  Up Normal  8.22 GB 16.67% 
>  56713727820156410577229101238628035242
> 10.50.33.232eu-west 1a  Up Normal  8.2 GB  16.67% 
>  85070591730234615865843651857942052864
> 10.234.71.33eu-west 1b  Up Normal  8.15 GB 16.67% 
>  113427455640312821154458202477256070485
> 10.58.249.118   eu-west 1c  Up Normal  660.98 MB   16.67% 
>  141784319550391026443072753096570088106
> #
> 
> The node is the one with 660.98 MB data( which is opscenter keyspace data 
> which was not invalidated)
> 
> So i have some questions:
> 
> 1) What did I wrong? - why cassandra was throwing exceptions on first startup?
> 2) Why the keyspace data was invalidated ? Is it expected?
> 3) If answer to #2 is  "yes it's expected" then  that's the point in doing 
> http://wiki.apache.org/cassandra/FAQ#schema_disagreement
> then all keyspace data is lost anyway? It makes more sense to just do 
> http://wiki.apache.org/cassandra/Operations#Replacing_a_Dead_Node
> 4) afaiu i could also stop cassandra again move old sstables from snapshot 
> back to keyspace data dir and run repair for all keyspace CFs? So that it 
> finishes faster
> and makes less load than running a repair which has no previous keyspace data 
> at all?
> 
> The first startup log is below:
> 
> INFO [main] 2012-05-18 16:23:07,367 AbstractCassandraDaemon.java (line 105) 
> Logging initialized
> INFO [main] 2012-05-18 16:23:07,382 AbstractCassandraDaemon.java (line 126) 
> JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_24
> INFO [main] 2012-05-18 16:23:07,383 AbstractCassandraDaemon.java (line 127) 
> Heap size

Re: unable to nodetool to remote EC2

2012-05-21 Thread Tamar Fraenkel
Thanks for the response. But it still does not work.
I am running the script from a git bash on my windows 7.
adding some debug prints, this is what I am running
ssh -i key.pem -N -f -D8123 ubuntu@ec2-*.amazonaws.com
ssh pid = 11616
/c/PROGRA~2/Java/jdk1.7.0_02/bin/jconsole.exe -J-DsocksProxyHost=localhost
-J-DsocksProxyPort=8123 service:jmx:rmi:///jndi/rmi://ec2-*.
amazonaws.com:7199/jmxrmi

Still getting "channel 2: open failed: connect failed: Connection timed out"
Any further idea? Where are you running the script.
Thanks

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Mon, May 21, 2012 at 11:00 PM, ramesh  wrote:

>  On 05/21/2012 03:55 AM, Tamar Fraenkel wrote:
>
> Hi!
> I am trying the tunnel and it fails. Will be gratefull for some hints:
>
>  I defined
>
>- proxy_host = ubuntu@my_ec2_cassandra_node_public_ip
>- proxy_port = 22
>
> I do:
>  *ssh -N -f -i /c/Users/tamar/.ssh/Amazon/tokey.openssh -D22
> ubuntu@my_ec2_cassandra_node_public_ip*
>
>  I put some debug prints and I can see that the ssh_pid is indeed the
> correct one.
>
>  I run
> *jconsole -J-DsocksProxyHost=localhost -J-DsocksProxyPort=22
> service:jmx:rmi:///jndi/rmi://my_ec2_cassandra_node_public_ip:7199/jmxrmi*
>
>  I get errors and it fails:
> channel 2: open failed: connect failed: Connection timed out
>
>  One note though, I can ssh to that vm using
> ssh -i /c/Users/tamar/.ssh/Amazon/tokey.openssh -D22
> ubuntu@my_ec2_cassandra_node_public_ip
> without being prompted for PW.
>
>  Any help appreciated
>
>   *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> [image: Inline image 1]
>
> ta...@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>
> On Fri, May 18, 2012 at 9:49 PM, ramesh  wrote:
>
>  On 05/18/2012 01:35 PM, Tyler Hobbs wrote:
>
>  Your firewall rules need to allow TCP traffic on any port >= 1024 for JMX
> to work.  It initially connects on port 7199, but then the client is asked
> to reconnect on a randomly chosen port.
>
> You can open the firewall, SSH to the node first, or set up something like
> this: http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunnel.html
>
>  On Fri, May 18, 2012 at 1:31 PM, ramesh  wrote:
>
>  I updated the cassandra-env.sh
> $JMX_HOST="10.20.30.40"
> JVM_OPTS="$JVM_OPTS -Djava.rmi.server.hostname=$JMX_HOST"
>
> netstat -ltn shows port 7199 is listening.
>
> I tried both public and private IP for connecting but neither helps.
>
> However, I am able to connect locally from within server.
>
>  I get this error when I remote:
>
>  Error connection to remote JMX agent! java.rmi.ConnectException:
> Connection refused to host: 10.20.30.40; nested exception is:
> java.net.ConnectException: Connection timed out at
> sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:601) at
> sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:198) at
> sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:184) at
> sun.rmi.server.UnicastRef.invoke(UnicastRef.java:110) at
> javax.management.remote.rmi.RMIServerImpl_Stub.newClient(Unknown Source) at
> javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2329)
> at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:279)
> at
> javax.management.remote.JMXConnectorFactory.connect(JMXConnectorFactory.java:248)
> at org.apache.cassandra.tools.NodeProbe.connect(NodeProbe.java:144) at
> org.apache.cassandra.tools.NodeProbe. (NodeProbe.java:114) at
> org.apache.cassandra.tools.NodeCmd.main(NodeCmd.java:623) Caused by:
> java.net.ConnectException: Connection timed out at
> java.net.PlainSocketImpl.socketConnect(Native Method) at
> java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at
> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at
> java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at
> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at
> java.net.Socket.connect(Socket.java:529) at
> java.net.Socket.connect(Socket.java:478) at java.net.Socket. (Socket.java:375)
> at java.net.Socket. (Socket.java:189) at
> sun.rmi.transport.proxy.RMIDirectSocketFactory.createSocket(RMIDirectSocketFactory.java:22)
> at
> sun.rmi.transport.proxy.RMIMasterSocketFactory.createSocket(RMIMasterSocketFactory.java:128)
> at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:595) ...
> 10 more
>
> Any help appreciated.
> Regards
> Ramesh
>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>
>
> It helped.
> Thanks Tyler for the info and the link to the post.
>
> Regards
> Ramesh
>
>
>   Hello Tamar,
>
> In your bash file, where you ssh , pass the .pem as well :
>
>  # start up a background ssh tunnel on the desired port
> ssh -i mypem.pem -N -f -D$proxy_port $proxy_host
>
> Here is the entire code
>
>
> 

Re: Number of keyspaces

2012-05-21 Thread R. Verlangen
Yes, it does. However there's no real answer what's the limit: it depends
on your hardware and cluster configuration.

You might even want to search the archives of this mailinglist, I remember
this has been asked before.

Cheers!

2012/5/21 Luís Ferreira 

> Hi,
>
> Does the number of keyspaces affect the overall cassandra performance?
>
>
> Cumprimentos,
> Luís Ferreira
>
>
>
>


-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl