Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread aaron morton
For plain old log analysis the Cloudera Hadoop distribution may be a better 
match. Flume is designed to help with streaming data into HDFS, the LZo 
compression extensions would help with the data size and PIG would make the 
analysis easier (IMHO). 
http://www.cloudera.com/hadoop/
http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/

I'll try to answer your questions, others please jump in if I'm wrong.

1. Data in a keyspace will be distributed to all nodes in the cassandra 
cluster. AFAIK the Job Tracker should only send one task to each task tracker, 
and normally you would have a task tracker running on each cassandra node. The 
task tracker can then throttle how may concurrent tasks can run. So you would 
not have 1,000 tasks sent to each of the 1,000 cassandra nodes. 

When the task runs on the cassandra node it will iterate through all of the 
rows in the specified ColumnFamily with keys in the Token range the Node is 
responsible for. If cassandra is using the RandomPartitioner, data will be 
spear around the cluster. So, for example, a Map-Reduce job that only wants to 
read the last weeks data may have to read from every node. Obviously this 
depends on how the data is broken up between rows / columns. 


2. Some of the other people from riptano.com or rackspace may be able to help 
with Cassandra's outer limits. There is a 400 node cluster planned 
http://www.riptano.com/blog/riptano-and-digital-reasoning-form-partnership

Hope that helps. 
Aaron

On 22 Oct 2010, at 15:45, Takayuki Tsunakawa wrote:

> Hello,
> 
> I'm evaluating whether Cassandra fits a certain customer well. The
> customer will collect petabytes of logs and analyze them. Could you
> tell me if my understanding is correct and/or give me your opinions?
> I'm sorry that the analysis requirement is not clear yet.
> 
> 1. MapReduce behavior
> I read the source code of Cassandra 0.6.x and understood that
> jobtracker submits the map tasks to all Cassandra nodes, regardless of
> whether the target keyspace's data reside there. That is, if there are
> 1,000 nodes in the Cassandra cluster, jobtracker sends more than 1,000
> map tasks to all of the 1,000 nodes in parallel. If this is correct,
> I'm afraid the startup time of a MapReduce job gets longer as more
> nodes join the Cassandra cluster.
> Is this correct?
> With HBase, jobtracker submits map tasks only to the region servers
> that hold the target data. This behavior is desirable because no
> wasteful task submission is done. Can you suggest the cases where
> Cassandra+MapReduce is better than HBase+MapReduce for log/sensor
> analysis? (Please excuse me for my not presenting the analysis
> requirement).
> 
> 2. Data capacity
> The excerpt from the paper about Amazon Dynamo says that the cluster
> can scale to hundreds of nodes, not thousands. I understand Cassandra
> is similar. Assuming that the recent commodity servers have 2 to 4 TB
> of disks, we need about 1,000 nodes or more to store petabytes of
> data.
> Is the present Cassandra suitable for petabytes of data? If not, is
> any development in progress to increase the scalability?
> 
> 
> --
> Finally, Dynamo adopts a full membership model where each node is
> aware of the data hosted by its peers. To do this, each node actively
> gossips the full routing table with other nodes in the system. This
> model works well for a system that contains couple of hundreds of
> nodes. However, scaling such a design to run with tens of thousands of
> nodes is not trivial because the overhead in maintaining the routing
> table increases with the system size. This limitation might be
> overcome by introducing hierarchical extensions to Dynamo. Also, note
> that this problem is actively addressed by O(1) DHT systems(e.g.,
> [14]).
> --
> 
> Regards,
> Takayuki Tsunakawa
> 
> 



Re: Reading a keyrange when using RP

2010-10-22 Thread Oleg Anastasyev
> 
> The goal is actually getting the rows in the range of "start","end"The order
is not important at all.But what I can see is, this does not seem to be possible
at all using RP. Am I wrong?

Simpler solution is just compare MD5 of both keys and set start to one with
lesser md5 and end to key with greater MD5. RandomPartitioner orders keys by
their md5, not by value.



Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread Takayuki Tsunakawa
Hello, Aaron,

Thank you for much info (especially pointers that seem interesting).

> So you would not have 1,000 tasks sent to each of the 1,000 cassandra
nodes.

Yes, I meant one map task would be sent to each task tracker, resulting in
1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot
identify the nodes that actually hold some data, so the job tracker will
send the map tasks to all of the 1,000 nodes. This is wasteful and
time-consuming if only 200 nodes hold some data for a keyspace.

> When the task runs on the cassandra node it will iterate through all of
the rows in the specified ColumnFamily with keys in the Token range the Node
is responsible for.

I hope the ColumnFamilyInputFormat will allow us to set KeyRange to select
rows passed to map.

I'll read the web pages you gave me. Thank you.
All, any other advice and comment is appreciated.

Regards,
Takayuki Tsunakawa

- Original Message - 
From: aaron morton
To: user@cassandra.apache.org
Sent: Friday, October 22, 2010 4:05 PM
Subject: Re: [Q] MapReduce behavior and Cassandra's scalability for
petabytes of data


For plain old log analysis the Cloudera Hadoop distribution may be a better
match. Flume is designed to help with streaming data into HDFS, the LZo
compression extensions would help with the data size and PIG would make the
analysis easier (IMHO).
http://www.cloudera.com/hadoop/
http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/


I'll try to answer your questions, others please jump in if I'm wrong.


1. Data in a keyspace will be distributed to all nodes in the cassandra
cluster. AFAIK the Job Tracker should only send one task to each task
tracker, and normally you would have a task tracker running on each
cassandra node. The task tracker can then throttle how may concurrent tasks
can run. So you would not have 1,000 tasks sent to each of the 1,000
cassandra nodes.


When the task runs on the cassandra node it will iterate through all of the
rows in the specified ColumnFamily with keys in the Token range the Node is
responsible for. If cassandra is using the RandomPartitioner, data will be
spear around the cluster. So, for example, a Map-Reduce job that only wants
to read the last weeks data may have to read from every node. Obviously this
depends on how the data is broken up between rows / columns.




2. Some of the other people from riptano.com or rackspace may be able to
help with Cassandra's outer limits. There is a 400 node cluster planned
http://www.riptano.com/blog/riptano-and-digital-reasoning-form-partnership


Hope that helps.
Aaron


Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread Aaron Morton
I may be wrong about which nodes the task is sent to.  

Others here know more about hadoop integration.

Aaron
  

On 22 Oct 2010, at 21:30, Takayuki Tsunakawa  
wrote:

> Hello, Aaron,
>  
> Thank you for much info (especially pointers that seem interesting).
>  
> > So you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes.
>  
> Yes, I meant one map task would be sent to each task tracker, resulting in 
> 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot 
> identify the nodes that actually hold some data, so the job tracker will send 
> the map tasks to all of the 1,000 nodes. This is wasteful and time-consuming 
> if only 200 nodes hold some data for a keyspace.
>  
> > When the task runs on the cassandra node it will iterate through all of the 
> > rows in the specified ColumnFamily with keys in the Token range the Node is 
> > responsible for.
>  
> I hope the ColumnFamilyInputFormat will allow us to set KeyRange to select 
> rows passed to map.
>  
> I'll read the web pages you gave me. Thank you.
> All, any other advice and comment is appreciated.
>  
> Regards,
> Takayuki Tsunakawa
>  
> - Original Message - 
> From: aaron morton 
> To: user@cassandra.apache.org 
> Sent: Friday, October 22, 2010 4:05 PM
> Subject: Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes 
> of data
>  
> 
> For plain old log analysis the Cloudera Hadoop distribution may be a better 
> match. Flume is designed to help with streaming data into HDFS, the LZo 
> compression extensions would help with the data size and PIG would make the 
> analysis easier (IMHO). 
> http://www.cloudera.com/hadoop/
> http://www.cloudera.com/blog/2010/09/using-flume-to-collect-apache-2-web-server-logs/
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/
>  
> 
> I'll try to answer your questions, others please jump in if I'm wrong.
>  
> 
> 1. Data in a keyspace will be distributed to all nodes in the cassandra 
> cluster. AFAIK the Job Tracker should only send one task to each task 
> tracker, and normally you would have a task tracker running on each cassandra 
> node. The task tracker can then throttle how may concurrent tasks can run. So 
> you would not have 1,000 tasks sent to each of the 1,000 cassandra nodes.
>  
> 
> When the task runs on the cassandra node it will iterate through all of the 
> rows in the specified ColumnFamily with keys in the Token range the Node is 
> responsible for. If cassandra is using the RandomPartitioner, data will be 
> spear around the cluster. So, for example, a Map-Reduce job that only wants 
> to read the last weeks data may have to read from every node. Obviously this 
> depends on how the data is broken up between rows / columns.
>  
>  
>  
> 
> 2. Some of the other people from riptano.com or rackspace may be able to help 
> with Cassandra's outer limits. There is a 400 node cluster planned 
> http://www.riptano.com/blog/riptano-and-digital-reasoning-form-partnership
>  
> 
> Hope that helps. 
> Aaron


NPE in cassandra0.7 (from trunk) while bootstrap

2010-10-22 Thread ruslan usifov
I try play with cassandra 0.7 (i build it from trunk) and its looks better
then 0.6 brunch, but when i try to add new node with auto_bootstrap: true i
got NPE (192.168.0.37 initial node with data on it, 192.168.0.220
bootstraped node):

DEBUG 14:00:58,931 Checking to see if compaction of Schema would be useful
DEBUG 14:00:58,948 Checking to see if compaction of IndexInfo would be
useful
 INFO 14:00:58,929 Upgrading to 0.7. Purging hints if there are any. Old
hints will be snapshotted.
 INFO 14:00:58,954 Cassandra version: 0.7.0-beta2-SNAPSHOT
 INFO 14:00:58,954 Thrift API version: 19.2.0
 INFO 14:00:58,961 Loading persisted ring state
 INFO 14:00:58,962 Starting up server gossip
 INFO 14:00:58,968 switching in a fresh Memtable for LocationInfo at
CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1
287741658826.log', position=700)
 INFO 14:00:58,969 Enqueuing flush of memtable-locationi...@14222419(227
bytes, 4 operations)
 INFO 14:00:58,970 Writing memtable-locationi...@14222419(227 bytes, 4
operations)
 INFO 14:00:59,089 Completed flushing
/data/cassandra/0.7/data/system/LocationInfo-e-1-Data.db
DEBUG 14:00:59,093 Checking to see if compaction of LocationInfo would be
useful
DEBUG 14:00:59,094 discard completed log segments for
CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1287741658826.lo
g', position=700), column family 0.
DEBUG 14:00:59,095 Marking replay position 700 on commit log
CommitLogSegment(/data/cassandra/0.7/commitlog/CommitLog-1287741658826.l
og)
DEBUG 14:00:59,116 attempting to connect to /192.168.0.37
ERROR 14:00:59,118 Exception encountered during startup.
java.lang.NullPointerException
at
org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:437)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159)
at
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
Exception encountered during startup.
java.lang.NullPointerException
at
org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:437)
at
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159)
at
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
at
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215)
at
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)



Is it bug or i do something wrong?






PS: here is my cassandra yaml

# Cassandra storage config YAML

cluster_name: 'Test Cluster'

initial_token:

auto_bootstrap: true

hinted_handoff_enabled: true

authenticator: org.apache.cassandra.auth.AllowAllAuthenticator

authority: org.apache.cassandra.auth.AllowAllAuthority

partitioner: org.apache.cassandra.dht.RandomPartitioner

# directories where Cassandra should store data on disk.
data_file_directories:
- /data/cassandra/0.7/data

# commit log
commitlog_directory: /data/cassandra/0.7/commitlog

# saved caches
saved_caches_directory: /data/cassandra/0.7/saved_caches

# Size to allow commitlog to grow to before creating a new segment
commitlog_rotation_threshold_in_mb: 128

commitlog_sync: periodic

commitlog_sync_period_in_ms: 1

seeds:
- 192.168.0.37

disk_access_mode: auto

concurrent_reads: 8
concurrent_writes: 32

memtable_flush_writers: 1

# TCP port, for commands and data
storage_port: 7000
listen_address: 192.168.0.220

rpc_address: 192.168.0.220
rpc_port: 9160

# enable or disable keepalive on rpc connections
rpc_keepalive: true

binary_memtable_throughput_in_mb: 256

# Add column indexes to a row after its contents reach this size.
# Increase if your column values are large, or if you have a very large
# number of columns.  The competing causes are, Cassandra has to
# deserialize this much of the row to read a single column, so you want
# it to be small - at least if you do many partial-row reads - but all
# the index data is read for each access, so you don't want to generate
# that wastefully either.
column_index_size_in_kb: 64

# Size limit for rows being compacted in memory.  Larger rows will spill
# over to disk and use a slower two-pass compaction process.  A message
# will be logged specifying the row key.
in_memory_compaction_limit_in_mb: 64

# Time to wait for a reply from other nodes before failing the command
rpc_timeout_in_ms: 1

# phi value that must be reached for a host to be marked down.
# most users should never need to adjust this.
# phi_convict_threshold: 8

# endpoint_snitch -- Set this to a class that implements
# I

KeyRange over Long keys

2010-10-22 Thread Christian Decker
Ever since I started implementing my second level caches I've been wondering
on how to deal with this, and thus far I've not found a good solution.

I have a CF acting as a secondary index, and I want to make range queries
against it. Since my keys are Long I simply went ahead and wrote them as
they were, which resulted them in being stored as UTF8 Strings. Now I'm
having the problem that if I want to make a range query on those keys (lets
say 1-100) they will be matched as string against each other, meaning that
55 > 100, which is not what I want.

Is there a simple way to make such queries by just adjusting the key?
Specifically I'm wondering if I could create a byte representation of the
Long that would also be lexicographically ordered.

Anyone had a similar problem?

Regards,
Chris


Re: KeyRange over Long keys

2010-10-22 Thread Eric Czech
Prepend zeros to every number out to a fixed length determined by the
maximum possible value.  As an example, 0055 < 0100 in a lexical ordering
where the maximum value is .

On Fri, Oct 22, 2010 at 5:05 AM, Christian Decker <
decker.christ...@gmail.com> wrote:

> Ever since I started implementing my second level caches I've been
> wondering on how to deal with this, and thus far I've not found a good
> solution.
>
> I have a CF acting as a secondary index, and I want to make range queries
> against it. Since my keys are Long I simply went ahead and wrote them as
> they were, which resulted them in being stored as UTF8 Strings. Now I'm
> having the problem that if I want to make a range query on those keys (lets
> say 1-100) they will be matched as string against each other, meaning that
> 55 > 100, which is not what I want.
>
> Is there a simple way to make such queries by just adjusting the key?
> Specifically I'm wondering if I could create a byte representation of the
> Long that would also be lexicographically ordered.
>
> Anyone had a similar problem?
>
> Regards,
> Chris
>


Re: Reading a keyrange when using RP

2010-10-22 Thread Jonathan Ellis
That gets you keys whose MD5s are between the MD5s of start and end,
which is not the same as the keys between start and end.

On Fri, Oct 22, 2010 at 2:07 AM, Oleg Anastasyev  wrote:
>>
>> The goal is actually getting the rows in the range of "start","end"The order
> is not important at all.But what I can see is, this does not seem to be 
> possible
> at all using RP. Am I wrong?
>
> Simpler solution is just compare MD5 of both keys and set start to one with
> lesser md5 and end to key with greater MD5. RandomPartitioner orders keys by
> their md5, not by value.
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: [Q] MapReduce behavior and Cassandra's scalability for petabytes of data

2010-10-22 Thread Jonathan Ellis
On Fri, Oct 22, 2010 at 3:30 AM, Takayuki Tsunakawa
 wrote:
> Yes, I meant one map task would be sent to each task tracker, resulting in
> 1,000 concurrent map tasks in the cluster. ColumnFamilyInputFormat cannot
> identify the nodes that actually hold some data, so the job tracker will
> send the map tasks to all of the 1,000 nodes. This is wasteful and
> time-consuming if only 200 nodes hold some data for a keyspace.

(a) Normally all data from each keyspace is spread around each node in
the cluster.  This is what you want for best parallelism.

(b) Cassandra generates input splits from the sampling of keys each
node has in memory.  So if a node does end up with no data for a
keyspace (because of bad OOP balancing for instance) it will have no
splits generated or mapped.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: KeyRange over Long keys

2010-10-22 Thread Stu Hood
> Specifically I'm wondering if I could create a byte representation of the Long
> that would also be lexicographically ordered.
This is probably what you want to do, combined with the ByteOrderedPartitioner 
in 0.7

-Original Message-
From: "Eric Czech" 
Sent: Friday, October 22, 2010 7:05am
To: user@cassandra.apache.org
Subject: Re: KeyRange over Long keys

Prepend zeros to every number out to a fixed length determined by the
maximum possible value.  As an example, 0055 < 0100 in a lexical ordering
where the maximum value is .

On Fri, Oct 22, 2010 at 5:05 AM, Christian Decker <
decker.christ...@gmail.com> wrote:

> Ever since I started implementing my second level caches I've been
> wondering on how to deal with this, and thus far I've not found a good
> solution.
>
> I have a CF acting as a secondary index, and I want to make range queries
> against it. Since my keys are Long I simply went ahead and wrote them as
> they were, which resulted them in being stored as UTF8 Strings. Now I'm
> having the problem that if I want to make a range query on those keys (lets
> say 1-100) they will be matched as string against each other, meaning that
> 55 > 100, which is not what I want.
>
> Is there a simple way to make such queries by just adjusting the key?
> Specifically I'm wondering if I could create a byte representation of the
> Long that would also be lexicographically ordered.
>
> Anyone had a similar problem?
>
> Regards,
> Chris
>




Benchmarking & Testing

2010-10-22 Thread David Replogle
I'm coming to the portion of the Cassandra installation where the customer is 
looking for benchmarking and testing for purposes of "keeping an eye" on the 
system to see if we need to add capacity or just to see how the system in 
general is doing. Basically, warm fuzzies that the system is still performing 
properly and quickly.

My question is: what are the points in the system that you guys test? What are 
the metrics for the test-points? Any flags that you guys use to see if more 
capacity / nodes are needed?

Thanks in advance. Trying to figure this out and figured I'd ask the community 
with more experience than I have.

David
Sent from my iPhone

Re: NPE in cassandra0.7 (from trunk) while bootstrap

2010-10-22 Thread Jonathan Ellis
This was a regression from the Thrift 0.5 upgrade.  Should be fixed in r1026415

On Fri, Oct 22, 2010 at 5:11 AM, ruslan usifov  wrote:
> I try play with cassandra 0.7 (i build it from trunk) and its looks better
> then 0.6 brunch, but when i try to add new node with auto_bootstrap: true i
> got NPE (192.168.0.37 initial node with data on it, 192.168.0.220
> bootstraped node):
>
> DEBUG 14:00:58,931 Checking to see if compaction of Schema would be useful
> DEBUG 14:00:58,948 Checking to see if compaction of IndexInfo would be
> useful
>  INFO 14:00:58,929 Upgrading to 0.7. Purging hints if there are any. Old
> hints will be snapshotted.
>  INFO 14:00:58,954 Cassandra version: 0.7.0-beta2-SNAPSHOT
>  INFO 14:00:58,954 Thrift API version: 19.2.0
>  INFO 14:00:58,961 Loading persisted ring state
>  INFO 14:00:58,962 Starting up server gossip
>  INFO 14:00:58,968 switching in a fresh Memtable for LocationInfo at
> CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1
> 287741658826.log', position=700)
>  INFO 14:00:58,969 Enqueuing flush of memtable-locationi...@14222419(227
> bytes, 4 operations)
>  INFO 14:00:58,970 Writing memtable-locationi...@14222419(227 bytes, 4
> operations)
>  INFO 14:00:59,089 Completed flushing
> /data/cassandra/0.7/data/system/LocationInfo-e-1-Data.db
> DEBUG 14:00:59,093 Checking to see if compaction of LocationInfo would be
> useful
> DEBUG 14:00:59,094 discard completed log segments for
> CommitLogContext(file='/data/cassandra/0.7/commitlog/CommitLog-1287741658826.lo
> g', position=700), column family 0.
> DEBUG 14:00:59,095 Marking replay position 700 on commit log
> CommitLogSegment(/data/cassandra/0.7/commitlog/CommitLog-1287741658826.l
> og)
> DEBUG 14:00:59,116 attempting to connect to /192.168.0.37
> ERROR 14:00:59,118 Exception encountered during startup.
> java.lang.NullPointerException
>     at
> org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308)
>     at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:437)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
> Exception encountered during startup.
> java.lang.NullPointerException
>     at
> org.apache.cassandra.db.SystemTable.isBootstrapped(SystemTable.java:308)
>     at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:437)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:159)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:55)
>     at
> org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:215)
>     at
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:134)
>
>
>
> Is it bug or i do something wrong?
>
>
>
>
>
>
> PS: here is my cassandra yaml
>
> # Cassandra storage config YAML
>
> cluster_name: 'Test Cluster'
>
> initial_token:
>
> auto_bootstrap: true
>
> hinted_handoff_enabled: true
>
> authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
>
> authority: org.apache.cassandra.auth.AllowAllAuthority
>
> partitioner: org.apache.cassandra.dht.RandomPartitioner
>
> # directories where Cassandra should store data on disk.
> data_file_directories:
>     - /data/cassandra/0.7/data
>
> # commit log
> commitlog_directory: /data/cassandra/0.7/commitlog
>
> # saved caches
> saved_caches_directory: /data/cassandra/0.7/saved_caches
>
> # Size to allow commitlog to grow to before creating a new segment
> commitlog_rotation_threshold_in_mb: 128
>
> commitlog_sync: periodic
>
> commitlog_sync_period_in_ms: 1
>
> seeds:
>     - 192.168.0.37
>
> disk_access_mode: auto
>
> concurrent_reads: 8
> concurrent_writes: 32
>
> memtable_flush_writers: 1
>
> # TCP port, for commands and data
> storage_port: 7000
> listen_address: 192.168.0.220
>
> rpc_address: 192.168.0.220
> rpc_port: 9160
>
> # enable or disable keepalive on rpc connections
> rpc_keepalive: true
>
> binary_memtable_throughput_in_mb: 256
>
> # Add column indexes to a row after its contents reach this size.
> # Increase if your column values are large, or if you have a very large
> # number of columns.  The competing causes are, Cassandra has to
> # deserialize this much of the row to read a single column, so you want
> # it to be small - at least if you do many partial-row reads - but all
> # the index data is read for each access, so you don't want to generate
> # that wastefully either.
> column_index_size_in_kb: 64
>
> # Size limit for rows being compacted in memory.  Larger rows will spill
> # over to disk and use a slower two-pass compaction process.  A mess

Re: error: identifier ONE is unqualified!

2010-10-22 Thread J T
Thanks very much, that did the trick :)

On Thu, Oct 21, 2010 at 9:28 PM, Aaron Morton wrote:

> Look for  lib/thrift-rX.jar in the source.  is the svn revision to
> use.
>
> http://wiki.apache.org/cassandra/InstallThrift
>
> Not sure if all those steps still apply, but it's what I did last time I
> felt like feeling some angst.
>
> Aaron
>
>
> On 22 Oct, 2010,at 08:57 AM, J T  wrote:
>
> What is the latest version of Thrift that cassandra-trunk is is supposed to
> work with ?
>
> I know Thrift 0.2.0 works, I'm using that on an existing cassandra 0.7
> trunk install.
>
> I recently tried setting up another casandra node and just got the latest
> version of Thrift, which is now at 0.6.0 but after getting thrift to build,
> which was as much of a pain as I remember it being from my previous install,
> I am unable to generate the thrift cassandra bindings - in my case i'm
> interested in the erlang bindings, but I get the same problem if I try
> producing the python or java bindings.
>
> The error that occurs is below:
>
> gen-thrift-py:
>  [echo] Generating Thrift Python code from
> /opt/cassandra-trunk-0.7.0/interface/cassandra.thrift 
>  [exec]
> * [exec]
> [FAILURE:/opt/cassandra-trunk-0.7.0/interface/cassandra.thrift:376] error:
> identifier ONE is unqualified!*
>  [exec] Result: 1
>
> Sure, I could go through each version of thrift backwards, to 0.2.0 but
> given how much hassle I have building it each time I thought it worth asking
> you guys what the latest version you use is ?
>
> Jason
>
>


Re: Cassandra crashed - possible JMX threads leak

2010-10-22 Thread Bill Au
Not with the nodeprobe or nodetool command because the JVM these two
commands spawn has a very short life span.

I am using a webapp to monitor my cassandra cluster.  It pretty much uses
the same code as NodeCmd class.  For each incoming request, it creates an
NodeProbe object and use it to get get various status of the cluster.  I can
reproduce the Cassandra JVM crash by issuing requests to this webapp in a
bash while loop.  I took a deeper look and here is what I discovered:

In the webapp when NodeProbe creates a JMXConnector to connect to the
Cassandra JMX port, a thread
(com.sun.jmx.remote.internal.ClientCommunicatorAdmin$Checker) is created and
run in the webapp's JVM.  Meanwhile in the Cassamdra JVM there is a
com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout thread to
timeout remote JMX connection.  However, since NodeProbe does not call
JMXConnector.close(), the JMX client checker threads remains in the webapp's
JVM even after the NobeProbe object has been garbage collected.  So this JMX
connection is still considered open and that keeps the JMX timeout thread
running inside the Cassandra JVM.  The number of JMX client checker threads
in my webapp's JVM matches up with the number of JMX server timeout threads
in my Cassandra's JVM.  If I stop my webapp's JVM,
all the JMX server timeout threads in my Cassandra's JVM all disappear after
2 minutes, the default timeout for a JMX connection.  This is why the
problem cannot be reproduced by nodeprobe or nodetool.  Even though
JMXConnector.close() is not called, the JVM exits shortly so the JMX client
checker thread do not stay around.  So their corresponding JMX server
timeout thread goes away after two minutes.  This is not the case with my
weabpp since its JVM keeps running, so all the JMX client checker threads
keep running as well.  The threads keep piling up until it crashes
Casssandra's JVM.

In my case I think I can change my webapp to use a static NodeProbe instead
of creating a new one for every request.  That should get around the leak.

However, I have seen the leak occurs in another situation.  On more than one
occasions when I restarted one node in a live multi-node clusters, I see
that the JMX server timeout threads quickly piled up (number in the
thousands) in Cassandra's JVM.  It only happened on a live cluster that is
servicing read and write requests.  I am guessing the hinted hand off might
have something to do with it.  I am still trying to understand what is
happening there.

Bill


On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis  wrote:

> can you reproduce this by, say, running nodeprobe ring in a bash while
> loop?
>
> On Wed, Oct 20, 2010 at 3:09 PM, Bill Au  wrote:
> > One of my Cassandra server crashed with the following:
> >
> > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419
> > CassandraDaemon.java (line 82) Uncaught exception in thread
> > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main]
> > java.lang.OutOfMemoryError: unable to create new native thread
> > at java.lang.Thread.start0(Native Method)
> > at java.lang.Thread.start(Thread.java:597)
> > at
> >
> org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533)
> >
> >
> > I took threads dump in the JVM on all the other Cassandra severs in my
> > cluster.  They all have thousand of threads looking like this:
> >
> > "JMX server connection timeout 183373" daemon prio=10
> tid=0x2aad230db800
> > nid=0x5cf6 in Object.wait() [0x2aad7a316000]
> >java.lang.Thread.State: TIMED_WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at
> >
> com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150)
> > - locked <0x2aab056ccee0> (a [I)
> > at java.lang.Thread.run(Thread.java:619)
> >
> > It seems to me that there is a JMX threads leak in Cassandra.  NodeProbe
> > creates a JMXConnector but never calls its close() method.  I tried
> setting
> > jmx.remote.x.server.connection.timeout to 0 hoping that would disable the
> > JMX server connection timeout threads.  But that did not make any
> > difference.
> >
> > Has anyone else seen this?
> >
> > Bill
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


How can build Bond graph?

2010-10-22 Thread ruslan usifov
Hello

Does anybody have receipt how possible effectively hold  Bond graph in
cassandra. For example relations between users in social
networks(friendship).
Simplest that comes to mind is follow keyspace


  



But this have a minus, if one user have many many friends, and all relations
for this one user will by hold on one node. What kind of data design should
i use to avoid problem?

Thanks


Re: How can build Bond graph?

2010-10-22 Thread Tyler Hobbs
Unless one user has several hundred million friends, this shouldn't be a
problem.

- Tyler

On Fri, Oct 22, 2010 at 3:00 PM, ruslan usifov wrote:

> Hello
>
> Does anybody have receipt how possible effectively hold  Bond graph in
> cassandra. For example relations between users in social
> networks(friendship).
> Simplest that comes to mind is follow keyspace
>
> 
>   
> 
>
>
> But this have a minus, if one user have many many friends, and all
> relations for this one user will by hold on one node. What kind of data
> design should i use to avoid problem?
>
> Thanks
>
>


Re: Cassandra crashed - possible JMX threads leak

2010-10-22 Thread Jonathan Ellis
Is the fix as simple as calling close() then?  Can you submit a patch for that?

On Fri, Oct 22, 2010 at 2:49 PM, Bill Au  wrote:
> Not with the nodeprobe or nodetool command because the JVM these two
> commands spawn has a very short life span.
>
> I am using a webapp to monitor my cassandra cluster.  It pretty much uses
> the same code as NodeCmd class.  For each incoming request, it creates an
> NodeProbe object and use it to get get various status of the cluster.  I can
> reproduce the Cassandra JVM crash by issuing requests to this webapp in a
> bash while loop.  I took a deeper look and here is what I discovered:
>
> In the webapp when NodeProbe creates a JMXConnector to connect to the
> Cassandra JMX port, a thread
> (com.sun.jmx.remote.internal.ClientCommunicatorAdmin$Checker) is created and
> run in the webapp's JVM.  Meanwhile in the Cassamdra JVM there is a
> com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout thread to
> timeout remote JMX connection.  However, since NodeProbe does not call
> JMXConnector.close(), the JMX client checker threads remains in the webapp's
> JVM even after the NobeProbe object has been garbage collected.  So this JMX
> connection is still considered open and that keeps the JMX timeout thread
> running inside the Cassandra JVM.  The number of JMX client checker threads
> in my webapp's JVM matches up with the number of JMX server timeout threads
> in my Cassandra's JVM.  If I stop my webapp's JVM,
> all the JMX server timeout threads in my Cassandra's JVM all disappear after
> 2 minutes, the default timeout for a JMX connection.  This is why the
> problem cannot be reproduced by nodeprobe or nodetool.  Even though
> JMXConnector.close() is not called, the JVM exits shortly so the JMX client
> checker thread do not stay around.  So their corresponding JMX server
> timeout thread goes away after two minutes.  This is not the case with my
> weabpp since its JVM keeps running, so all the JMX client checker threads
> keep running as well.  The threads keep piling up until it crashes
> Casssandra's JVM.
>
> In my case I think I can change my webapp to use a static NodeProbe instead
> of creating a new one for every request.  That should get around the leak.
>
> However, I have seen the leak occurs in another situation.  On more than one
> occasions when I restarted one node in a live multi-node clusters, I see
> that the JMX server timeout threads quickly piled up (number in the
> thousands) in Cassandra's JVM.  It only happened on a live cluster that is
> servicing read and write requests.  I am guessing the hinted hand off might
> have something to do with it.  I am still trying to understand what is
> happening there.
>
> Bill
>
>
> On Wed, Oct 20, 2010 at 5:16 PM, Jonathan Ellis  wrote:
>>
>> can you reproduce this by, say, running nodeprobe ring in a bash while
>> loop?
>>
>> On Wed, Oct 20, 2010 at 3:09 PM, Bill Au  wrote:
>> > One of my Cassandra server crashed with the following:
>> >
>> > ERROR [ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn] 2010-10-19 00:25:10,419
>> > CassandraDaemon.java (line 82) Uncaught exception in thread
>> > Thread[ACCEPT-xxx.xxx.xxx/nnn.nnn.nnn.nnn,5,main]
>> > java.lang.OutOfMemoryError: unable to create new native thread
>> >     at java.lang.Thread.start0(Native Method)
>> >     at java.lang.Thread.start(Thread.java:597)
>> >     at
>> >
>> > org.apache.cassandra.net.MessagingService$SocketThread.run(MessagingService.java:533)
>> >
>> >
>> > I took threads dump in the JVM on all the other Cassandra severs in my
>> > cluster.  They all have thousand of threads looking like this:
>> >
>> > "JMX server connection timeout 183373" daemon prio=10
>> > tid=0x2aad230db800
>> > nid=0x5cf6 in Object.wait() [0x2aad7a316000]
>> >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
>> >     at java.lang.Object.wait(Native Method)
>> >     at
>> >
>> > com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150)
>> >     - locked <0x2aab056ccee0> (a [I)
>> >     at java.lang.Thread.run(Thread.java:619)
>> >
>> > It seems to me that there is a JMX threads leak in Cassandra.  NodeProbe
>> > creates a JMXConnector but never calls its close() method.  I tried
>> > setting
>> > jmx.remote.x.server.connection.timeout to 0 hoping that would disable
>> > the
>> > JMX server connection timeout threads.  But that did not make any
>> > difference.
>> >
>> > Has anyone else seen this?
>> >
>> > Bill
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


DC Cassandra training and Atlanta meetup

2010-10-22 Thread Jonathan Ellis
Riptano is bringing some Cassandra love to the East coast the first
week of November.

First, on the evening of Nov 3, we're sponsoring a meetup in Atlanta.
This is held at the ApacheCon venue but you do _not_ have to be going
to ApacheCon to come; it is free to attend!  I will be there and
several other committers and contributors.  Register at
http://www.eventbrite.com/event/981873811/.

Second, on Nov 5, we are holding an all-day intensive Cassandra
training in Washington, DC., on Nov 5.  This will be our first
training class covering 0.7: http://www.eventbrite.com/event/900402127

See you there!

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Hung Repair

2010-10-22 Thread Dan Hendry
I am currently running a 4 node cluster on Cassandra beta 2. Yesterday, I
ran into a number of problems and the one of my nodes went down for a few
hours. I tried to run a nodetool repair and at least at a data level,
everything seems to be consistent and alright. The problem is that the node
is still chewing up 100% of its available CPU, 20 hours after I started the
repair. Load averages are 8-9 which is crazy given it is a single core ec2
m1.small.

 

Besides sitting at 100% cpu, the node on which I ran the repair seems to be
fine. The Cassandra logs appear normal. Based on bandwidth patterns between
nodes, it does not seem like they are transferring any repair related data
(as they did initially). No pending tasks are being shown in any of the
services when inspecting via jmx. I have a reasonable amount of data in the
cluster (~6 gb * 2 replication factor) but nothing crazy. The last repair
related entry in the logs is as follows:

 

INFO [Thread-145] 2010-10-22 00:24:10,561 AntiEntropyService.java (line 828)
# completed successfully: 14
outstanding.

 


Any idea what is going on? Could the CPU usage STILL be related to the
repair? Is there any way to check? I hesitate to simply kill the node given
the "14 outstanding" log message and as doing so has caused me problems in
the past when using beta versions.

 

 

Dan Hendry

 



remove

2010-10-22 Thread Dave Wellman
remove



HintedHandoff and ReplicationFactor with a downed node

2010-10-22 Thread Craig Ching
Hi,

I'm testing Cassandra to ensure it fits my needs.  One of the tests I
want to perform is writing while a node is down.  Here's the scenario:

Cassandra 0.6.6
2 nodes
replication factor of 2
hinted handoff on

I load node A with 50,000 rows while B is shutdown (BTW, I'm using
CL.ONE during the inserts, which, according to the HintedHandoff wiki
shouldn't be working in this case?).  All columns are successfully
created.  I then start node B and wait a bit.  I start doing a get
(with CL.ONE) for every key I created in node A.  They seem to be
trickling in to node B and eventually (after about an hour?) they all
get there.  Is this expected?  Is there any way to tune that?  I'm
mostly concerned with the amount of time it's taking to fully
replicate.  Even better, I'd love a way to not allow B to be available
until replication is complete, can I detect that somehow?

I appreciate any help or suggestions!

Cheers,
Craig


Re: HintedHandoff and ReplicationFactor with a downed node

2010-10-22 Thread Rob Coli

On 10/22/10 2:55 PM, Craig Ching wrote:

Even better, I'd love a way to not allow B to be available
until replication is complete, can I detect that somehow?


Proposed and rejected a while back :

https://issues.apache.org/jira/browse/CASSANDRA-768

=Rob


Re: HintedHandoff and ReplicationFactor with a downed node

2010-10-22 Thread Dan Washusen
The last time this came up on the list Jonathan Ellis said (something
along the lines of) if your application can't tolerate stale data then
you should read with a consistency level of QUORUM.

It would be nice if there was some sort of middle ground for an
application that can tolerate slightly stale data (minutes) but not
very stale data (hours or days) could still get the performance gain
of consistency level of ONE.  Even if a node just made a best effort
in the OPs scenario it might be sufficient...?

Is there an alternative solution to reading with consistency level of
QUORUM?  For example, if a node has been down for an extended period
of time could you re-add it as a new node (fetching all its data
again) and avoid having to read with QUORUM?

Just curious... :)

Cheers,
Dan

On Sat, Oct 23, 2010 at 10:01 AM, Rob Coli  wrote:
>
> On 10/22/10 2:55 PM, Craig Ching wrote:
>>
>> Even better, I'd love a way to not allow B to be available
>> until replication is complete, can I detect that somehow?
>
> Proposed and rejected a while back :
>
> https://issues.apache.org/jira/browse/CASSANDRA-768
>
> =Rob


Streaming got stuck for a long time

2010-10-22 Thread Henry Luo
When using nodetool move command, the streaming between nodes got stuck for a 
long period like the following:

Streaming from: /10.100.10.66
   Profile: 
/opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Index.db 
0/809960194
   Profile: 
/opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Filter.db 
0/77858845
   Profile: 
/opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Data.db 
0/5711978741
   Profile: 
/opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Index.db 
0/1857117923
   Profile: 
/opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Filter.db 
0/85398565
   Profile: 
/opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Data.db 
0/22536290920

What's wrong here?

Cassandra version used: 0.6.1.

Thanks.


The information transmitted is intended only for the person or entity to which 
it is addressed and may contain confidential, proprietary, and/or privileged 
material. Any review, retransmission, dissemination or other use of, or taking 
of any action in reliance upon this information by persons or entities other 
than the intended recipient is prohibited. If you received this in error, 
please contact the sender and delete the material from all computers.


Re: Streaming got stuck for a long time

2010-10-22 Thread Jonathan Ellis
This is a known bug in early 0.6, fixed in 0.6.5 iirc.  But at this
point you should upgrade to 0.6.6.

On Fri, Oct 22, 2010 at 8:52 PM, Henry Luo  wrote:
> When using nodetool move command, the streaming between nodes got stuck for
> a long period like the following:
>
>
>
> Streaming from: /10.100.10.66
>
>    Profile:
> /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Index.db
> 0/809960194
>
>    Profile:
> /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Filter.db
> 0/77858845
>
>    Profile:
> /opt/choicestream/data/cassandra/data/Profile/U_Profiles-tmp-1137-Data.db
> 0/5711978741
>
>    Profile:
> /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Index.db
> 0/1857117923
>
>    Profile:
> /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Filter.db
> 0/85398565
>
>    Profile:
> /opt/choicestream/data/cassandra/data/Profile/T_Profiles-tmp-3624-Data.db
> 0/22536290920
>
>
>
> What’s wrong here?
>
>
>
> Cassandra version used: 0.6.1.
>
>
>
> Thanks.
>
> 
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential, proprietary, and/or
> privileged material. Any review, retransmission, dissemination or other use
> of, or taking of any action in reliance upon this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this in error, please contact the sender and delete the material from all
> computers.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Remove

2010-10-22 Thread Gmail
Remove