access a multinode cluster

2010-06-01 Thread huajun qi
If you have a multinode cluster, which node you should connect to fetch data? Is there a master node in a cluster which accepts data request and dispatch it? Or every node in the cluster is completely same? If all nodes are same in a cluster, should client connect to random node to reduce cassand

Re: [***SPAM*** ] access a multinode cluster

2010-06-01 Thread Shuai Yuan
?? 2010-06-01 15:00 +0800??huajun qi?? > If you have a multinode cluster, which node you should connect to > fetch data? any one. > > Is there a master node in a cluster which accepts data request and > dispatch it? Or every node in the cluster is completely same? no master. all the same.

Re: [***SPAM*** ] access a multinode cluster

2010-06-01 Thread huajun qi
谢谢

Re: Administration Memory for Noobs. (GC for ConcurrentMarkSweep ?)

2010-06-01 Thread Oleg Anastasjev
xavier manach tekio.org> writes: > > Hi.  I search informations for basic tunning of memory in Cassandra.My situation :  I started to test larges imports of data in Cassandra 6.1.My first import worked fine : 100 Millions row in 2 hours ~ around 1 insert row by seconds > My second is slower

Re: searching keys of the form substring*

2010-06-01 Thread vd
As I told you on IRC channel dont go for shortcuts ...learn java first. ___ Vineet Daniel ___ Let your email find you On Tue, Jun 1, 2010 at 11:47 AM, Sagar Agrawal wrote: > Thanks Vineet for replying, but I am

writing speed test

2010-06-01 Thread Shuai Yuan
Hi all, I'm testing writing speed of cassandra with 4 servers. I'm confused by the behavior of cassandra. ---env--- load-data app written in c++, using libcassandra (w/ modified batch insert) 20 writing threads in 2 processes running on 2 servers ---optimization--- 1.turn log level to INFO 2.JVM

Re: Administration Memory for Noobs. (GC for ConcurrentMarkSweep ?)

2010-06-01 Thread xavier manach
Perfect :) I test it. I didn't open this file before. I did think the configuration only was in the foloder conf. I am not a specialist java. I will search about the meaning of JVM parameters. For now, I read this page for undertand the others options of JVM : http://java.sun.com/performance/r

question about class SlicePredicate

2010-06-01 Thread Shuai Yuan
Hi all, I don't quite understand the usage of 'class SlicePredicate' when trying to retrieve a ranged slice. How should it be initialized? Thanks! -- Kevin Yuan www.yuan-shuai.info

Re: Algorithm for distributing key of Cassandra

2010-06-01 Thread gabriele renzi
On Mon, May 31, 2010 at 8:50 PM, Jonathan Ellis wrote: > Doesn't ring a bell.  Maybe if you included the link to which you refer? I guess this is the related post http://spyced.blogspot.com/2009/05/consistent-hashing-vs-order-preserving.html thought I believe the original poster misphrased or mi

Re: question about class SlicePredicate

2010-06-01 Thread Eric Yu
It needs a SliceRange. For example: SliceRange range = new SliceRange(); range.setStart("".getBytes()); range.setFinish("".getBytes()); range.setReversed(true); range.setCount(20); SlicePredicate sp = new SlicePredicate(); sp.setSlice_range(range); client.get_slice(KEYSPACE, KEY, ColumnParent, sp,

Re: question about class SlicePredicate

2010-06-01 Thread Olivier Mallassi
Does it work whatever the chosen partionner? Or only for OrderPreservingPartitionner ? On Tuesday, June 1, 2010, Eric Yu wrote: > It needs a SliceRange. For example: > SliceRange range = new SliceRange(); > range.setStart("".getBytes()); > range.setFinish("".getBytes()); > range.setReversed(true)

Re: writing speed test

2010-06-01 Thread 史英杰
Hi, It would be better if we know which Consistency Level did you choose, and what is the schema of test data? 在 2010年6月1日 下午4:48,Shuai Yuan 写道: > Hi all, > > I'm testing writing speed of cassandra with 4 servers. I'm confused by > the behavior of cassandra. > > ---env--- > load-data app written

Skipping corrupted rows when doing compaction

2010-06-01 Thread hive13 Wong
Hi, Is there a way to skip corrupted rows when doing compaction? We are currently deploying 2 nodes with replicationfactor=2 but one node reports lots of exceptions like java.io.UTFDataFormatException: malformed input around byte 72. My guess is that some of the data in the SSTable is corrupted b

Which kind of applications are Cassandra fit for?

2010-06-01 Thread 史英杰
Hi,ALL I found that most applications on Cassandra are for web applications, such as store friiend information or digg information, and they get good performance, many companies or groups want to move their applications to Cassandra, so which kind of applications are Cassandra fit for? Thank

Re: nodetool cleanup isn't cleaning up?

2010-06-01 Thread Jonathan Ellis
I'm saying that .99 is getting a copy of all the data for which .124 is the primary. (If you are using RackUnawarePartitioner. If you are using RackAware it is some other node.) On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory wrote: > ok, let me try and translate your answer ;) > Are you saying that

Re: access a multinode cluster

2010-06-01 Thread Jonathan Ellis
http://wiki.apache.org/cassandra/FAQ#node_clients_connect_to On Tue, Jun 1, 2010 at 2:00 AM, huajun qi wrote: > If you have a multinode cluster, which node you should connect to fetch > data? > Is there a master node in a cluster which accepts data request and dispatch > it? Or every node in the

Re: Skipping corrupted rows when doing compaction

2010-06-01 Thread Jonathan Ellis
If you're on a version earlier than 0.6.1, you might be running into https://issues.apache.org/jira/browse/CASSANDRA-866. Upgrading will fix it, you don't need to reload data. It's also worth trying 0.6.2 and DiskAccessMode=standard, in case you've found another similar bug. On Tue, Jun 1, 2010

Re: Which kind of applications are Cassandra fit for?

2010-06-01 Thread sharanabasava raddi
The applications which require bigger storage and fast response for retrieval. On Tue, Jun 1, 2010 at 6:13 PM, 史英杰 wrote: > Hi,ALL > I found that most applications on Cassandra are for web applications, > such as store friiend information or digg information, and they get good > performanc

Re: Which kind of applications are Cassandra fit for?

2010-06-01 Thread 史英杰
Thanks, but would you please describe it in more details, because most applications require fast response for retrieval. 2010/6/1 sharanabasava raddi > The applications which require bigger storage and fast response for > retrieval. > > > On Tue, Jun 1, 2010 at 6:13 PM, 史英杰 wrote: > >> Hi,ALL

Re: Which kind of applications are Cassandra fit for?

2010-06-01 Thread sharanabasava raddi
1. Performance data of network storage elements which may be required for performance tuning. 2. Data dictionaries. 3. Satellite communications. 4. General search applications. etc. below is the performance statistics compared to traditional Databases. MySQL Comparison *MySQL > 50 GB Data Wr

Re: Skipping corrupted rows when doing compaction

2010-06-01 Thread hive13 Wong
Thanks, Jonathan I'm using 0.6.1 And another thing is that I get lots of zero-sized tmp files in the data directory. When I restarted cassandra those tmp files will be deleted then new empty tmp files will be generated gradually, while still lots of UTFDataFormatException in the system.log Using

wr0ngway rubber support for cassandra

2010-06-01 Thread Denis Haskin
(cross-posted from http://groups.google.com/group/rubber-ec2, apologies) Cassandra support got added recently to http://github.com/wr0ngway/rubber (yay!).  Is anyone else using it? I updated the install script to use the published apache repo and will plan on getting this up to github with a pull

Monitoring compaction

2010-06-01 Thread Ian Soboroff
Are stats exposed over JMX for compaction? I'm trying to see when a node is in compaction, and guess when it will complete. tpstats doesn't show anything but the process is using lots of CPU time... I was wondering if there's a better view on compaction besides looking backwards in the system.log

Re: Monitoring compaction

2010-06-01 Thread Dylan Egan / WildfireApp.com
Hi Ian, On Tue, Jun 1, 2010 at 9:27 AM, Ian Soboroff wrote: > Are stats exposed over JMX for compaction? You can view them via the org.apache.cassandra.db:type=CompactionManager MBean. The PendingTasks attribute might suit you best. Cheers, Dylan.

Re: Monitoring compaction

2010-06-01 Thread Ian Soboroff
Thanks. Are folks open to exposing this via nodetool? I've been trying to figure out a decent way to aggregate and expose all this information that is easier than nodetool and less noisy than nagios... suggestions appreciated. (My cluster only exposes a master node and everything else is private

Re: Monitoring compaction

2010-06-01 Thread Ian Soboroff
Regarding compaction thresholds... the BMT example says to set the threshold to 0 during an import. Is this advisable during any bulk import (say using batch mutations or just lots and lots of thrift inserts)? Also, when I asked "are folks open to..." I meant that I'm happy to code a patch if any

Re: Monitoring compaction

2010-06-01 Thread Dylan Egan / WildfireApp.com
Hi Ian, On Tue, Jun 1, 2010 at 9:41 AM, Ian Soboroff wrote: > Thanks.  Are folks open to exposing this via nodetool?  I've been trying to > figure out a decent way to aggregate and expose all this information that is > easier than nodetool and less noisy than nagios... suggestions appreciated. Y

Re: [ANN] Cassandra Tutorial @ OSCON

2010-06-01 Thread Eric Evans
On Mon, 2010-05-24 at 17:04 -0500, Eric Evans wrote: > For those interested in Cassandra training, I'll be giving a 3-hour > tutorial[1] at OSCON this year entitled Hands-on Cassandra. > > [1]: http://www.oscon.com/oscon2010/public/schedule/detail/14283 > > The tutorial will cover setup, configur

Re: nodetool cleanup isn't cleaning up?

2010-06-01 Thread Ran Tavory
I'm using RackAwareStrategy. But it still doesn't make sense I think... let's see what did I miss... According to http://wiki.apache.org/cassandra/Operations - RackAwareStrategy: replica 2 is placed in the first node along the ring the belongs in *another* data center than the first; th

Re: Can't get data after building cluster

2010-06-01 Thread Jonathan Shook
Depending on the key, the request would have been proxied to the first or second node. The CLI uses a consistency level of "ONE", meaning that only a single node's data would have been considered when you get(). Also, the responsible nodes for a given key are mapped accordingly at request time, and

Re: writing speed test

2010-06-01 Thread Jonathan Shook
Also, what are you meaning specifically by 'slow'? Which measurements are you looking at. What are your baseline constraints for your test system? 2010/6/1 史英杰 : > Hi, It would be better if we know which Consistency Level did you choose, > and what is the schema of test data? > > 在 2010年6月1日 下午4:

Handling disk-full scenarios

2010-06-01 Thread Ian Soboroff
My nodes have 5 disks and are using them separately as data disks. The usage on the disks is not uniform, and one is nearly full. Is there some way to manually balance the files across the disks? Pretty much anything done via nodetool incurs an anticompaction with obviously fails. system/ is no

Re: Which kind of applications are Cassandra fit for?

2010-06-01 Thread Jonathan Shook
There is no easy answer to this. The requirements vary widely even within a particular "type" of application. If you have a list of specific requirements for a given application, it is easier to say whether it is a good fit. If you need a schema marshaling system, then you will have to build it in

Re: Which kind of applications are Cassandra fit for?

2010-06-01 Thread Rafał Krupiński
On 01.06.2010 15:32, sharanabasava raddi wrote: 1. Performance data of network storage elements which may be required for performance tuning. 2. Data dictionaries. 3. Satellite communications. 4. General search applications. etc. below is the performance statistics compared to traditional Da

Is there any way to detect when a node is down so I can failover more effectively?

2010-06-01 Thread Patricio Echagüe
Hi all, I'm using Hector framework to interact with Cassandra and at trying to handle failover more effectively I found it a bit complicated to fetch all cassandra nodes that are up and running. My goal is to keep an up-to-date list of active/up Cassandra servers to provide HEctor every time I nee

Nodes dropping out of cluster due to GC

2010-06-01 Thread Eric Halpern
Hello, We're running a 4 node cluster on beefy EC2 virtual instances (8 core, 32 GB) using EBS storage with 8 GB of heap allocated to the JVM. Every couple of hours, each of the nodes does a concurrent mark/sweep that takes around 30 seconds to complete. During that GC, the node temporarily dro

Re: writing speed test

2010-06-01 Thread Shuai Yuan
?? 2010-06-01 15:00 -0500??Jonathan Shook?? > Also, what are you meaning specifically by 'slow'? Which measurements > are you looking at. What are your baseline constraints for your test > system? > Actually, the problem is the utilizaton of resources(for a single machine): CPU: 700% / 160

Re: writing speed test

2010-06-01 Thread lwl
MEM: almost 100% (16GB) - maybe this is the bottleneck. writing concerns Memtable and SSTable in memory. 在 2010年6月2日 上午9:48,Shuai Yuan 写道: > 在 2010-06-01二的 15:00 -0500,Jonathan Shook写道: > > Also, what are you meaning specifically by 'slow'? Which measurements > > are you looking a

Re: [***SPAM*** ] Re: writing speed test

2010-06-01 Thread Shuai Yuan
Thanks lwl. Then is there anyway of tuning this, faster flush to disk or else? Cheers, Kevin ?? 2010-06-02 09:57 +0800??lwl?? > MEM: almost 100% (16GB) > - > maybe this is the bottleneck. > writing concerns Memtable and SSTable in memory. > > ?? 2010??6??2?? 9:48??S

Column or SuperColumn

2010-06-01 Thread Peter Hsu
I have a pretty simple data modeling question. I don't know whether or not to use a CF or SCF in one instance. Here's my example. I have an Store entry and locations for each store. So I have something like: Using CF: Store { //CF storeId { //row key storeName:str, storeLogo:i

Re: [***SPAM*** ] Re: writing speed test

2010-06-01 Thread lwl
is all the 4 servers' MEM almost 100%? 在 2010年6月2日 上午10:12,Shuai Yuan 写道: > Thanks lwl. > > Then is there anyway of tuning this, faster flush to disk or else? > > Cheers, > > Kevin > > 在 2010-06-02三的 09:57 +0800,lwl写道: > > MEM: almost 100% (16GB) > > - > > maybe this is the bottl

Re: [***SPAM*** ] Re: [***SPAM*** ] Re: writing speed test

2010-06-01 Thread Shuai Yuan
?? 2010-06-02 10:37 +0800??lwl?? > is all the 4 servers' MEM almost 100%? Yes > ?? 2010??6??2?? 10:12??Shuai Yuan ?? > > Thanks lwl. > > Then is there anyway of tuning this, faster flush to disk or > else? > > Cheers, >

Read operation with CL.ALL, not yet supported?

2010-06-01 Thread Yuki Morishita
Hi, I'm testing several read operations(get, get_slice, get_count, etc.) with various ConsistencyLevel and noticed that ConsistencyLevel.ALL is "not yet supported" in most of read ops (other than get_range_slice). I've looked up code in StorageProxy#readProtocol and it seems to be able to handle