Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data
For plain old log analysis the Cloudera Hadoop distribution may be a better match. Flume is designed to help with streaming data into HDFS, the LZo compression extensions would help with the data size and PIG would make the analysis easier (IMHO). http://www.cloudera.com/hadoop/ http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ I'll try to answer your questions, others please jump in if I'm wrong. 1. Data in a keyspace will be distributed to all nodes in the cassandra cluster. AFAIK the Job Tracker should only send one task to each task tracker, and normally you would have a task tracker running on each cassandra node. The task tracker can then throttle how may concurrent tasks can run. So you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes. When the task runs on the cassandra node it will iterate through all of the rows in the specified ColumnFamily with keys in the Token range the Node is responsible for. If cassandra is using the RandomPartitioner, data will be spear around the cluster. So, for example, a Map-Reduce job that only wants to read the last weeks data may have to read from every node. Obviously this depends on how the data is broken up between rows / columns. 2. Some of the other people from riptano.com or rackspace may be able to help with Cassandra's outer limits. There is a 400 node cluster planned http://www.riptano.com/blog/riptano-and-digital-reasoning-form-partnership Hope that helps. Aaron On 22 Oct 2010, at 15:45, Takayuki Tsunakawa wrote: > Hello, > > I'm evaluating whether Cassandra fits a certain customer well. The > customer will collect petabytes of logs and analyze them. Could you > tell me if my understanding is correct and/or give me your opinions? > I'm sorry that the analysis requirement is not clear yet. > > 1. MapReduce behavior > I read the source code of Cassandra 0.6.x and understood that > jobtracker submits the map tasks to all Cassandra nodes, regardless of > whether the target keyspace's data reside there. That is, if there are > 1,000 nodes in the Cassandra cluster, jobtracker sends more than 1,000 > map tasks to all of the 1,000 nodes in parallel. If this is correct, > I'm afraid the startup time of a MapReduce job gets longer as more > nodes join the Cassandra cluster. > Is this correct? > With HBase, jobtracker submits map tasks only to the region servers > that hold the target data. This behavior is desirable because no > wasteful task submission is done. Can you suggest the cases where > Cassandra+MapReduce is better than HBase+MapReduce for log/sensor > analysis? (Please excuse me for my not presenting the analysis > requirement). > > 2. Data capacity > The excerpt from the paper about Amazon Dynamo says that the cluster > can scale to hundreds of nodes, not thousands. I understand Cassandra > is similar. Assuming that the recent commodity servers have 2 to 4 TB > of disks, we need about 1,000 nodes or more to store petabytes of > data. > Is the present Cassandra suitable for petabytes of data? If not, is > any development in progress to increase the scalability? > > > -- > Finally, Dynamo adopts a full membership model where each node is > aware of the data hosted by its peers. To do this, each node actively > gossips the full routing table with other nodes in the system. This > model works well for a system that contains couple of hundreds of > nodes. However, scaling such a design to run with tens of thousands of > nodes is not trivial because the overhead in maintaining the routing > table increases with the system size. This limitation might be > overcome by introducing hierarchical extensions to Dynamo. Also, note > that this problem is actively addressed by O(1) DHT systems(e.g., > [14]). > -- > > Regards, > Takayuki Tsunakawa > >
Re: Reading a keyrange when using RP
> > The goal is actually getting the rows in the range of "start","end"The order is not important at all.But what I can see is, this does not seem to be possible at all using RP. Am I wrong? Simpler solution is just compare MD5 of both keys and set start to one with lesser md5 and end to key with greater MD5. RandomPartitioner orders keys by their md5, not by value.
Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data
Hello, Aaron, Thank you for much info (especially pointers that seem interesting). > So you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes. Yes, I meant one map task would be sent to each task tracker, resulting in 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot identify the nodes that actually hold some data, so the job tracker will send the map tasks to all of the 1,000 nodes. This is wasteful and time-consuming if only 200 nodes hold some data for a keyspace. > When the task runs on the cassandra node it will iterate through all of the rows in the specified ColumnFamily with keys in the Token range the Node is responsible for. I hope the ColumnFamilyInputFormat will allow us to set KeyRange to select rows passed to map. I'll read the web pages you gave me. Thank you. All, any other advice and comment is appreciated. Regards, Takayuki Tsunakawa - Original Message - From: aaron morton To: user@cassandra.apache.org Sent: Friday, October 22, 2010 4:05 PM Subject: Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data For plain old log analysis the Cloudera Hadoop distribution may be a better match. Flume is designed to help with streaming data into HDFS, the LZo compression extensions would help with the data size and PIG would make the analysis easier (IMHO). http://www.cloudera.com/hadoop/ http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ I'll try to answer your questions, others please jump in if I'm wrong. 1. Data in a keyspace will be distributed to all nodes in the cassandra cluster. AFAIK the Job Tracker should only send one task to each task tracker, and normally you would have a task tracker running on each cassandra node. The task tracker can then throttle how may concurrent tasks can run. So you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes. When the task runs on the cassandra node it will iterate through all of the rows in the specified ColumnFamily with keys in the Token range the Node is responsible for. If cassandra is using the RandomPartitioner, data will be spear around the cluster. So, for example, a Map-Reduce job that only wants to read the last weeks data may have to read from every node. Obviously this depends on how the data is broken up between rows / columns. 2. Some of the other people from riptano.com or rackspace may be able to help with Cassandra's outer limits. There is a 400 node cluster planned http://www.riptano.com/blog/riptano-and-digital-reasoning-form-partnership Hope that helps. Aaron
Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data
I may be wrong about which nodes the task is sent to. Others here know more about hadoop integration. Aaron On 22 Oct 2010, at 21:30, Takayuki Tsunakawa wrote: > Hello, Aaron, > > Thank you for much info (especially pointers that seem interesting). > > > So you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes. > > Yes, I meant one map task would be sent to each task tracker, resulting in > 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot > identify the nodes that actually hold some data, so the job tracker will send > the map tasks to all of the 1,000 nodes. This is wasteful and time-consuming > if only 200 nodes hold some data for a keyspace. > > > When the task runs on the cassandra node it will iterate through all of the > > rows in the specified ColumnFamily with keys in the Token range the Node is > > responsible for. > > I hope the ColumnFamilyInputFormat will allow us to set KeyRange to select > rows passed to map. > > I'll read the web pages you gave me. Thank you. > All, any other advice and comment is appreciated. > > Regards, > Takayuki Tsunakawa > > - Original Message - > From: aaron morton > To: user@cassandra.apache.org > Sent: Friday, October 22, 2010 4:05 PM > Subject: Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes > of data > > > For plain old log analysis the Cloudera Hadoop distribution may be a better > match. Flume is designed to help with streaming data into HDFS, the LZo > compression extensions would help with the data size and PIG would make the > analysis easier (IMHO). > http://www.cloudera.com/hadoop/ > http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/ > http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ > > > I'll try to answer your questions, others please jump in if I'm wrong. > > > 1. Data in a keyspace will be distributed to all nodes in the cassandra > cluster. AFAIK the Job Tracker should only send one task to each task > tracker, and normally you would have a task tracker running on each cassandra > node. The task tracker can then throttle how may concurrent tasks can run. So > you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes. > > > When the task runs on the cassandra node it will iterate through all of the > rows in the specified ColumnFamily with keys in the Token range the Node is > responsible for. If cassandra is using the RandomPartitioner, data will be > spear around the cluster. So, for example, a Map-Reduce job that only wants > to read the last weeks data may have to read from every node. Obviously this > depends on how the data is broken up between rows / columns. > > > > > 2. Some of the other people from riptano.com or rackspace may be able to help > with Cassandra's outer limits. There is a 400 node cluster planned > http://www.riptano.com/blog/riptano-and-digital-reasoning-form-partnership > > > Hope that helps. > Aaron
NPE in cassandra0.7 (from trunk) while bootstrap
I try play with cassandra 0.7 (i build it from trunk) and its looks better then 0.6 brunch, but when i try to add new node with auto_bootstrap: true i got NPE (192.168.0.37 initial node with data on it, 192.168.0.220 bootstraped node): DEBUG 14:00:58,931 Checking to see if compaction of Schema would be useful DEBUG 14:00:58,948 Checking to see if compaction of IndexInfo would be useful INFO 14:00:58,929 Upgrading to 0.7. Purging hints if there are any. Old hints will be snapshotted. INFO 14:00:58,954 Cassandra version: 0.7.0-beta2-SNAPSHOT INFO 14:00:58,954 Thrift API version: 19.2.0 INFO 14:00:58,961 Loading persisted ring state INFO 14:00:58,962 Starting up server gossip INFO 14:00:58,968 switching in a fresh Memtable for LocationInfo at CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1 287741658826.log', position=700) INFO 14:00:58,969 Enqueuing flush of memtable-locationi...@14222419(227 bytes, 4 operations) INFO 14:00:58,970 Writing memtable-locationi...@14222419(227 bytes, 4 operations) INFO 14:00:59,089 Completed flushing /data/cassandra/0.7/data/system/LocationInfo-e-1-Data.db DEBUG 14:00:59,093 Checking to see if compaction of LocationInfo would be useful DEBUG 14:00:59,094 discard completed log segments for CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1287741658826.lo g', position=700), column family 0. DEBUG 14:00:59,095 Marking replay position 700 on commit log CommitLogSegment(/data/cassandra/0.7/commitlog/CommitLog-1287741658826.l og) DEBUG 14:00:59,116 attempting to connect to /192.168.0.37 ERROR 14:00:59,118 Exception encountered during startup. java.lang.NullPointerException at org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:437) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Exception encountered during startup. java.lang.NullPointerException at org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308) at org.apache.cassandra.service.StorageService.initServer(StorageService.java:437) at org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) Is it bug or i do something wrong? PS: here is my cassandra yaml # Cassandra storage config YAML cluster_name: 'Test Cluster' initial_token: auto_bootstrap: true hinted_handoff_enabled: true authenticator: org.apache.cassandra.auth.AllowAllAuthenticator authority: org.apache.cassandra.auth.AllowAllAuthority partitioner: org.apache.cassandra.dht.RandomPartitioner # directories where Cassandra should store data on disk. data_file_directories: - /data/cassandra/0.7/data # commit log commitlog_directory: /data/cassandra/0.7/commitlog # saved caches saved_caches_directory: /data/cassandra/0.7/saved_caches # Size to allow commitlog to grow to before creating a new segment commitlog_rotation_threshold_in_mb: 128 commitlog_sync: periodic commitlog_sync_period_in_ms: 1 seeds: - 192.168.0.37 disk_access_mode: auto concurrent_reads: 8 concurrent_writes: 32 memtable_flush_writers: 1 # TCP port, for commands and data storage_port: 7000 listen_address: 192.168.0.220 rpc_address: 192.168.0.220 rpc_port: 9160 # enable or disable keepalive on rpc connections rpc_keepalive: true binary_memtable_throughput_in_mb: 256 # Add column indexes to a row after its contents reach this size. # Increase if your column values are large, or if you have a very large # number of columns. The competing causes are, Cassandra has to # deserialize this much of the row to read a single column, so you want # it to be small - at least if you do many partial-row reads - but all # the index data is read for each access, so you don't want to generate # that wastefully either. column_index_size_in_kb: 64 # Size limit for rows being compacted in memory. Larger rows will spill # over to disk and use a slower two-pass compaction process. A message # will be logged specifying the row key. in_memory_compaction_limit_in_mb: 64 # Time to wait for a reply from other nodes before failing the command rpc_timeout_in_ms: 1 # phi value that must be reached for a host to be marked down. # most users should never need to adjust this. # phi_convict_threshold: 8 # endpoint_snitch -- Set this to a class that implements # I
KeyRange over Long keys
Ever since I started implementing my second level caches I've been wondering on how to deal with this, and thus far I've not found a good solution. I have a CF acting as a secondary index, and I want to make range queries against it. Since my keys are Long I simply went ahead and wrote them as they were, which resulted them in being stored as UTF8 Strings. Now I'm having the problem that if I want to make a range query on those keys (lets say 1-100) they will be matched as string against each other, meaning that 55 > 100, which is not what I want. Is there a simple way to make such queries by just adjusting the key? Specifically I'm wondering if I could create a byte representation of the Long that would also be lexicographically ordered. Anyone had a similar problem? Regards, Chris
Re: KeyRange over Long keys
Prepend zeros to every number out to a fixed length determined by the maximum possible value. As an example, 0055 < 0100 in a lexical ordering where the maximum value is . On Fri, Oct 22, 2010 at 5:05 AM, Christian Decker < decker.christ...@gmail.com> wrote: > Ever since I started implementing my second level caches I've been > wondering on how to deal with this, and thus far I've not found a good > solution. > > I have a CF acting as a secondary index, and I want to make range queries > against it. Since my keys are Long I simply went ahead and wrote them as > they were, which resulted them in being stored as UTF8 Strings. Now I'm > having the problem that if I want to make a range query on those keys (lets > say 1-100) they will be matched as string against each other, meaning that > 55 > 100, which is not what I want. > > Is there a simple way to make such queries by just adjusting the key? > Specifically I'm wondering if I could create a byte representation of the > Long that would also be lexicographically ordered. > > Anyone had a similar problem? > > Regards, > Chris >
Re: Reading a keyrange when using RP
That gets you keys whose MD5s are between the MD5s of start and end, which is not the same as the keys between start and end. On Fri, Oct 22, 2010 at 2:07 AM, Oleg Anastasyev wrote: >> >> The goal is actually getting the rows in the range of "start","end"The order > is not important at all.But what I can see is, this does not seem to be > possible > at all using RP. Am I wrong? > > Simpler solution is just compare MD5 of both keys and set start to one with > lesser md5 and end to key with greater MD5. RandomPartitioner orders keys by > their md5, not by value. > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data
On Fri, Oct 22, 2010 at 3:30 AM, Takayuki Tsunakawa wrote: > Yes, I meant one map task would be sent to each task tracker, resulting in > 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot > identify the nodes that actually hold some data, so the job tracker will > send the map tasks to all of the 1,000 nodes. This is wasteful and > time-consuming if only 200 nodes hold some data for a keyspace. (a) Normally all data from each keyspace is spread around each node in the cluster. This is what you want for best parallelism. (b) Cassandra generates input splits from the sampling of keys each node has in memory. So if a node does end up with no data for a keyspace (because of bad OOP balancing for instance) it will have no splits generated or mapped. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: KeyRange over Long keys
> Specifically I'm wondering if I could create a byte representation of the Long > that would also be lexicographically ordered. This is probably what you want to do, combined with the ByteOrderedPartitioner in 0.7 -Original Message- From: "Eric Czech" Sent: Friday, October 22, 2010 7:05am To: user@cassandra.apache.org Subject: Re: KeyRange over Long keys Prepend zeros to every number out to a fixed length determined by the maximum possible value. As an example, 0055 < 0100 in a lexical ordering where the maximum value is . On Fri, Oct 22, 2010 at 5:05 AM, Christian Decker < decker.christ...@gmail.com> wrote: > Ever since I started implementing my second level caches I've been > wondering on how to deal with this, and thus far I've not found a good > solution. > > I have a CF acting as a secondary index, and I want to make range queries > against it. Since my keys are Long I simply went ahead and wrote them as > they were, which resulted them in being stored as UTF8 Strings. Now I'm > having the problem that if I want to make a range query on those keys (lets > say 1-100) they will be matched as string against each other, meaning that > 55 > 100, which is not what I want. > > Is there a simple way to make such queries by just adjusting the key? > Specifically I'm wondering if I could create a byte representation of the > Long that would also be lexicographically ordered. > > Anyone had a similar problem? > > Regards, > Chris >
Benchmarking & Testing
I'm coming to the portion of the Cassandra installation where the customer is looking for benchmarking and testing for purposes of "keeping an eye" on the system to see if we need to add capacity or just to see how the system in general is doing. Basically, warm fuzzies that the system is still performing properly and quickly. My question is: what are the points in the system that you guys test? What are the metrics for the test-points? Any flags that you guys use to see if more capacity / nodes are needed? Thanks in advance. Trying to figure this out and figured I'd ask the community with more experience than I have. David Sent from my iPhone
Re: NPE in cassandra0.7 (from trunk) while bootstrap
This was a regression from the Thrift 0.5 upgrade. Should be fixed in r1026415 On Fri, Oct 22, 2010 at 5:11 AM, ruslan usifov wrote: > I try play with cassandra 0.7 (i build it from trunk) and its looks better > then 0.6 brunch, but when i try to add new node with auto_bootstrap: true i > got NPE (192.168.0.37 initial node with data on it, 192.168.0.220 > bootstraped node): > > DEBUG 14:00:58,931 Checking to see if compaction of Schema would be useful > DEBUG 14:00:58,948 Checking to see if compaction of IndexInfo would be > useful > INFO 14:00:58,929 Upgrading to 0.7. Purging hints if there are any. Old > hints will be snapshotted. > INFO 14:00:58,954 Cassandra version: 0.7.0-beta2-SNAPSHOT > INFO 14:00:58,954 Thrift API version: 19.2.0 > INFO 14:00:58,961 Loading persisted ring state > INFO 14:00:58,962 Starting up server gossip > INFO 14:00:58,968 switching in a fresh Memtable for LocationInfo at > CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1 > 287741658826.log', position=700) > INFO 14:00:58,969 Enqueuing flush of memtable-locationi...@14222419(227 > bytes, 4 operations) > INFO 14:00:58,970 Writing memtable-locationi...@14222419(227 bytes, 4 > operations) > INFO 14:00:59,089 Completed flushing > /data/cassandra/0.7/data/system/LocationInfo-e-1-Data.db > DEBUG 14:00:59,093 Checking to see if compaction of LocationInfo would be > useful > DEBUG 14:00:59,094 discard completed log segments for > CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1287741658826.lo > g', position=700), column family 0. > DEBUG 14:00:59,095 Marking replay position 700 on commit log > CommitLogSegment(/data/cassandra/0.7/commitlog/CommitLog-1287741658826.l > og) > DEBUG 14:00:59,116 attempting to connect to /192.168.0.37 > ERROR 14:00:59,118 Exception encountered during startup. > java.lang.NullPointerException > at > org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:437) > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159) > at > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) > at > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215) > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) > Exception encountered during startup. > java.lang.NullPointerException > at > org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308) > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:437) > at > org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159) > at > org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55) > at > org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215) > at > org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134) > > > > Is it bug or i do something wrong? > > > > > > > PS: here is my cassandra yaml > > # Cassandra storage config YAML > > cluster_name: 'Test Cluster' > > initial_token: > > auto_bootstrap: true > > hinted_handoff_enabled: true > > authenticator: org.apache.cassandra.auth.AllowAllAuthenticator > > authority: org.apache.cassandra.auth.AllowAllAuthority > > partitioner: org.apache.cassandra.dht.RandomPartitioner > > # directories where Cassandra should store data on disk. > data_file_directories: > - /data/cassandra/0.7/data > > # commit log > commitlog_directory: /data/cassandra/0.7/commitlog > > # saved caches > saved_caches_directory: /data/cassandra/0.7/saved_caches > > # Size to allow commitlog to grow to before creating a new segment > commitlog_rotation_threshold_in_mb: 128 > > commitlog_sync: periodic > > commitlog_sync_period_in_ms: 1 > > seeds: > - 192.168.0.37 > > disk_access_mode: auto > > concurrent_reads: 8 > concurrent_writes: 32 > > memtable_flush_writers: 1 > > # TCP port, for commands and data > storage_port: 7000 > listen_address: 192.168.0.220 > > rpc_address: 192.168.0.220 > rpc_port: 9160 > > # enable or disable keepalive on rpc connections > rpc_keepalive: true > > binary_memtable_throughput_in_mb: 256 > > # Add column indexes to a row after its contents reach this size. > # Increase if your column values are large, or if you have a very large > # number of columns. The competing causes are, Cassandra has to > # deserialize this much of the row to read a single column, so you want > # it to be small - at least if you do many partial-row reads - but all > # the index data is read for each access, so you don't want to generate > # that wastefully either. > column_index_size_in_kb: 64 > > # Size limit for rows being compacted in memory. Larger rows will spill > # over to disk and use a slower two-pass compaction process. A mess
Re: error: identifier ONE is unqualified!
Thanks very much, that did the trick :) On Thu, Oct 21, 2010 at 9:28 PM, Aaron Morton wrote: > Look for lib/thrift-rX.jar in the source. is the svn revision to > use. > > http://wiki.apache.org/cassandra/InstallThrift > > Not sure if all those steps still apply, but it's what I did last time I > felt like feeling some angst. > > Aaron > > > On 22 Oct, 2010,at 08:57 AM, J T wrote: > > What is the latest version of Thrift that cassandra-trunk is is supposed to > work with ? > > I know Thrift 0.2.0 works, I'm using that on an existing cassandra 0.7 > trunk install. > > I recently tried setting up another casandra node and just got the latest > version of Thrift, which is now at 0.6.0 but after getting thrift to build, > which was as much of a pain as I remember it being from my previous install, > I am unable to generate the thrift cassandra bindings - in my case i'm > interested in the erlang bindings, but I get the same problem if I try > producing the python or java bindings. > > The error that occurs is below: > > gen-thrift-py: > [echo] Generating Thrift Python code from > /opt/cassandra-trunk-0.7.0/interface/cassandra.thrift > [exec] > * [exec] > [FAILURE:/opt/cassandra-trunk-0.7.0/interface/cassandra.thrift:376] error: > identifier ONE is unqualified!* > [exec] Result: 1 > > Sure, I could go through each version of thrift backwards, to 0.2.0 but > given how much hassle I have building it each time I thought it worth asking > you guys what the latest version you use is ? > > Jason > >
Re: Cassandra crashed - possible JMX threads leak
Not with the nodeprobe or nodetool command because the JVM these two commands spawn has a very short life span. I am using a webapp to monitor my cassandra cluster. It pretty much uses the same code as NodeCmd class. For each incoming request, it creates an NodeProbe object and use it to get get various status of the cluster. I can reproduce the Cassandra JVM crash by issuing requests to this webapp in a bash while loop. I took a deeper look and here is what I discovered: In the webapp when NodeProbe creates a JMXConnector to connect to the Cassandra JMX port, a thread (com.sun.jmx.remote.internal.ClientCommunicatorAdmin$Checker) is created and run in the webapp's JVM. Meanwhile in the Cassamdra JVM there is a com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout thread to timeout remote JMX connection. However, since NodeProbe does not call JMXConnector.close(), the JMX client checker threads remains in the webapp's JVM even after the NobeProbe object has been garbage collected. So this JMX connection is still considered open and that keeps the JMX timeout thread running inside the Cassandra JVM. The number of JMX client checker threads in my webapp's JVM matches up with the number of JMX server timeout threads in my Cassandra's JVM. If I stop my webapp's JVM, all the JMX server timeout threads in my Cassandra's JVM all disappear after 2 minutes, the default timeout for a JMX connection. This is why the problem cannot be reproduced by nodeprobe or nodetool. Even though JMXConnector.close() is not called, the JVM exits shortly so the JMX client checker thread do not stay around. So their corresponding JMX server timeout thread goes away after two minutes. This is not the case with my weabpp since its JVM keeps running, so all the JMX client checker threads keep running as well. The threads keep piling up until it crashes Casssandra's JVM. In my case I think I can change my webapp to use a static NodeProbe instead of creating a new one for every request. That should get around the leak. However, I have seen the leak occurs in another situation. On more than one occasions when I restarted one node in a live multi-node clusters, I see that the JMX server timeout threads quickly piled up (number in the thousands) in Cassandra's JVM. It only happened on a live cluster that is servicing read and write requests. I am guessing the hinted hand off might have something to do with it. I am still trying to understand what is happening there. Bill On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis wrote: > can you reproduce this by, say, running nodeprobe ring in a bash while > loop? > > On Wed, Oct 20, 2010 at 3:09 PM, Bill Au wrote: > > One of my Cassandra server crashed with the following: > > > > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 > > CassandraDaemon.java (line 82) Uncaught exception in thread > > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] > > java.lang.OutOfMemoryError: unable to create new native thread > > at java.lang.Thread.start0(Native Method) > > at java.lang.Thread.start(Thread.java:597) > > at > > > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533) > > > > > > I took threads dump in the JVM on all the other Cassandra severs in my > > cluster. They all have thousand of threads looking like this: > > > > "JMX server connection timeout 183373" daemon prio=10 > tid=0x2aad230db800 > > nid=0x5cf6 in Object.wait() [0x2aad7a316000] > >java.lang.Thread.State: TIMED_WAITING (on object monitor) > > at java.lang.Object.wait(Native Method) > > at > > > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) > > - locked <0x2aab056ccee0> (a [I) > > at java.lang.Thread.run(Thread.java:619) > > > > It seems to me that there is a JMX threads leak in Cassandra. NodeProbe > > creates a JMXConnector but never calls its close() method. I tried > setting > > jmx.remote.x.server.connection.timeout to 0 hoping that would disable the > > JMX server connection timeout threads. But that did not make any > > difference. > > > > Has anyone else seen this? > > > > Bill > > > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >
How can build Bond graph?
Hello Does anybody have receipt how possible effectively hold Bond graph in cassandra. For example relations between users in social networks(friendship). Simplest that comes to mind is follow keyspace But this have a minus, if one user have many many friends, and all relations for this one user will by hold on one node. What kind of data design should i use to avoid problem? Thanks
Re: How can build Bond graph?
Unless one user has several hundred million friends, this shouldn't be a problem. - Tyler On Fri, Oct 22, 2010 at 3:00 PM, ruslan usifov wrote: > Hello > > Does anybody have receipt how possible effectively hold Bond graph in > cassandra. For example relations between users in social > networks(friendship). > Simplest that comes to mind is follow keyspace > > > > > > > But this have a minus, if one user have many many friends, and all > relations for this one user will by hold on one node. What kind of data > design should i use to avoid problem? > > Thanks > >
Re: Cassandra crashed - possible JMX threads leak
Is the fix as simple as calling close() then? Can you submit a patch for that? On Fri, Oct 22, 2010 at 2:49 PM, Bill Au wrote: > Not with the nodeprobe or nodetool command because the JVM these two > commands spawn has a very short life span. > > I am using a webapp to monitor my cassandra cluster. It pretty much uses > the same code as NodeCmd class. For each incoming request, it creates an > NodeProbe object and use it to get get various status of the cluster. I can > reproduce the Cassandra JVM crash by issuing requests to this webapp in a > bash while loop. I took a deeper look and here is what I discovered: > > In the webapp when NodeProbe creates a JMXConnector to connect to the > Cassandra JMX port, a thread > (com.sun.jmx.remote.internal.ClientCommunicatorAdmin$Checker) is created and > run in the webapp's JVM. Meanwhile in the Cassamdra JVM there is a > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout thread to > timeout remote JMX connection. However, since NodeProbe does not call > JMXConnector.close(), the JMX client checker threads remains in the webapp's > JVM even after the NobeProbe object has been garbage collected. So this JMX > connection is still considered open and that keeps the JMX timeout thread > running inside the Cassandra JVM. The number of JMX client checker threads > in my webapp's JVM matches up with the number of JMX server timeout threads > in my Cassandra's JVM. If I stop my webapp's JVM, > all the JMX server timeout threads in my Cassandra's JVM all disappear after > 2 minutes, the default timeout for a JMX connection. This is why the > problem cannot be reproduced by nodeprobe or nodetool. Even though > JMXConnector.close() is not called, the JVM exits shortly so the JMX client > checker thread do not stay around. So their corresponding JMX server > timeout thread goes away after two minutes. This is not the case with my > weabpp since its JVM keeps running, so all the JMX client checker threads > keep running as well. The threads keep piling up until it crashes > Casssandra's JVM. > > In my case I think I can change my webapp to use a static NodeProbe instead > of creating a new one for every request. That should get around the leak. > > However, I have seen the leak occurs in another situation. On more than one > occasions when I restarted one node in a live multi-node clusters, I see > that the JMX server timeout threads quickly piled up (number in the > thousands) in Cassandra's JVM. It only happened on a live cluster that is > servicing read and write requests. I am guessing the hinted hand off might > have something to do with it. I am still trying to understand what is > happening there. > > Bill > > > On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis wrote: >> >> can you reproduce this by, say, running nodeprobe ring in a bash while >> loop? >> >> On Wed, Oct 20, 2010 at 3:09 PM, Bill Au wrote: >> > One of my Cassandra server crashed with the following: >> > >> > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419 >> > CassandraDaemon.java (line 82) Uncaught exception in thread >> > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main] >> > java.lang.OutOfMemoryError: unable to create new native thread >> > at java.lang.Thread.start0(Native Method) >> > at java.lang.Thread.start(Thread.java:597) >> > at >> > >> > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533) >> > >> > >> > I took threads dump in the JVM on all the other Cassandra severs in my >> > cluster. They all have thousand of threads looking like this: >> > >> > "JMX server connection timeout 183373" daemon prio=10 >> > tid=0x2aad230db800 >> > nid=0x5cf6 in Object.wait() [0x2aad7a316000] >> > java.lang.Thread.State: TIMED_WAITING (on object monitor) >> > at java.lang.Object.wait(Native Method) >> > at >> > >> > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) >> > - locked <0x2aab056ccee0> (a [I) >> > at java.lang.Thread.run(Thread.java:619) >> > >> > It seems to me that there is a JMX threads leak in Cassandra. NodeProbe >> > creates a JMXConnector but never calls its close() method. I tried >> > setting >> > jmx.remote.x.server.connection.timeout to 0 hoping that would disable >> > the >> > JMX server connection timeout threads. But that did not make any >> > difference. >> > >> > Has anyone else seen this? >> > >> > Bill >> > >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of Riptano, the source for professional Cassandra support >> http://riptano.com > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
DC Cassandra training and Atlanta meetup
Riptano is bringing some Cassandra love to the East coast the first week of November. First, on the evening of Nov 3, we're sponsoring a meetup in Atlanta. This is held at the ApacheCon venue but you do _not_ have to be going to ApacheCon to come; it is free to attend! I will be there and several other committers and contributors. Register at http://www.eventbrite.com/event/981873811/. Second, on Nov 5, we are holding an all-day intensive Cassandra training in Washington, DC., on Nov 5. This will be our first training class covering 0.7: http://www.eventbrite.com/event/900402127 See you there! -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Hung Repair
I am currently running a 4 node cluster on Cassandra beta 2. Yesterday, I ran into a number of problems and the one of my nodes went down for a few hours. I tried to run a nodetool repair and at least at a data level, everything seems to be consistent and alright. The problem is that the node is still chewing up 100% of its available CPU, 20 hours after I started the repair. Load averages are 8-9 which is crazy given it is a single core ec2 m1.small. Besides sitting at 100% cpu, the node on which I ran the repair seems to be fine. The Cassandra logs appear normal. Based on bandwidth patterns between nodes, it does not seem like they are transferring any repair related data (as they did initially). No pending tasks are being shown in any of the services when inspecting via jmx. I have a reasonable amount of data in the cluster (~6 gb * 2 replication factor) but nothing crazy. The last repair related entry in the logs is as follows: INFO [Thread-145] 2010-10-22 00:24:10,561 AntiEntropyService.java (line 828) # completed successfully: 14 outstanding. Any idea what is going on? Could the CPU usage STILL be related to the repair? Is there any way to check? I hesitate to simply kill the node given the "14 outstanding" log message and as doing so has caused me problems in the past when using beta versions. Dan Hendry
remove
remove
HintedHandoff and ReplicationFactor with a downed node
Hi, I'm testing Cassandra to ensure it fits my needs. One of the tests I want to perform is writing while a node is down. Here's the scenario: Cassandra 0.6.6 2 nodes replication factor of 2 hinted handoff on I load node A with 50,000 rows while B is shutdown (BTW, I'm using CL.ONE during the inserts, which, according to the HintedHandoff wiki shouldn't be working in this case?). All columns are successfully created. I then start node B and wait a bit. I start doing a get (with CL.ONE) for every key I created in node A. They seem to be trickling in to node B and eventually (after about an hour?) they all get there. Is this expected? Is there any way to tune that? I'm mostly concerned with the amount of time it's taking to fully replicate. Even better, I'd love a way to not allow B to be available until replication is complete, can I detect that somehow? I appreciate any help or suggestions! Cheers, Craig
Re: HintedHandoff and ReplicationFactor with a downed node
On 10/22/10 2:55 PM, Craig Ching wrote: Even better, I'd love a way to not allow B to be available until replication is complete, can I detect that somehow? Proposed and rejected a while back : https://issues.apache.org/jira/browse/CASSANDRA-768 =Rob
Re: HintedHandoff and ReplicationFactor with a downed node
The last time this came up on the list Jonathan Ellis said (something along the lines of) if your application can't tolerate stale data then you should read with a consistency level of QUORUM. It would be nice if there was some sort of middle ground for an application that can tolerate slightly stale data (minutes) but not very stale data (hours or days) could still get the performance gain of consistency level of ONE. Even if a node just made a best effort in the OPs scenario it might be sufficient...? Is there an alternative solution to reading with consistency level of QUORUM? For example, if a node has been down for an extended period of time could you re-add it as a new node (fetching all its data again) and avoid having to read with QUORUM? Just curious... :) Cheers, Dan On Sat, Oct 23, 2010 at 10:01 AM, Rob Coli wrote: > > On 10/22/10 2:55 PM, Craig Ching wrote: >> >> Even better, I'd love a way to not allow B to be available >> until replication is complete, can I detect that somehow? > > Proposed and rejected a while back : > > https://issues.apache.org/jira/browse/CASSANDRA-768 > > =Rob
Streaming got stuck for a long time
When using nodetool move command, the streaming between nodes got stuck for a long period like the following: Streaming from: /10.100.10.66 Profile: /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Index.db 0/809960194 Profile: /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Filter.db 0/77858845 Profile: /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Data.db 0/5711978741 Profile: /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Index.db 0/1857117923 Profile: /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Filter.db 0/85398565 Profile: /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Data.db 0/22536290920 What's wrong here? Cassandra version used: 0.6.1. Thanks. The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential, proprietary, and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from all computers.
Re: Streaming got stuck for a long time
This is a known bug in early 0.6, fixed in 0.6.5 iirc. But at this point you should upgrade to 0.6.6. On Fri, Oct 22, 2010 at 8:52 PM, Henry Luo wrote: > When using nodetool move command, the streaming between nodes got stuck for > a long period like the following: > > > > Streaming from: /10.100.10.66 > > Profile: > /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Index.db > 0/809960194 > > Profile: > /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Filter.db > 0/77858845 > > Profile: > /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Data.db > 0/5711978741 > > Profile: > /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Index.db > 0/1857117923 > > Profile: > /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Filter.db > 0/85398565 > > Profile: > /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Data.db > 0/22536290920 > > > > What’s wrong here? > > > > Cassandra version used: 0.6.1. > > > > Thanks. > > > The information transmitted is intended only for the person or entity to > which it is addressed and may contain confidential, proprietary, and/or > privileged material. Any review, retransmission, dissemination or other use > of, or taking of any action in reliance upon this information by persons or > entities other than the intended recipient is prohibited. If you received > this in error, please contact the sender and delete the material from all > computers. > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Remove
Remove