R: Re: Migration from 0.7 to 1.0

2012-01-05 Thread cbert...@libero.it


Aaron first of all thanks for your great support.
I'm paranoid, so I would  upgrade 1 node and let it soak in for a few 
hours. Nothing like upgrading an entire cluster and then discovering a
problem. 
Ok but as far as my application is concerned is safe to keep a cluster with 
part of 1.0 and part of 0.7?I've read that they can communicate but will it 
bring to "strange" situations? Will my application continue working 
(java/pelops)?
You can take some extra steps when doing a rolling restart see 
http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
This is what I was looking for! :-)Thanks for the repair tips ...
Best regards,Carlo




Messaggio originale

Da: aa...@thelastpickle.com

Data: 04/01/2012 22.00

A: 

Ogg: Re: Migration from 0.7 to 1.0



Sounds good. 
You can take some extra steps when doing a rolling restart see 
http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
Also make sure repair *does not* run until all the nodes have been upgraded. 
Do i miss something (I will backup everything before the 
upgrade)? I'm paranoid, so I would  upgrade 1 node and let it soak in for a few 
hours. Nothing like upgrading an entire cluster and then discovering a problem. 
As far as 
maintenance is concerned, is enough to run a repair every x? (x < 
GCGraceSeconds)once for each node with in that time frame 
http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

Cheers

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com


On 5/01/2012, at 2:47 AM, cbert...@libero.it wrote:Hi,
I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to 
know the best way to do it ...

"Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, 
one node at a time.  (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: 
upgrade to the most recent 0.8 release first.) You do not need to bring down 
the whole cluster at once.  - After upgrading, run nodetool scrub against each 
node before running repair, moving nodes, or adding new ones."

So what I'd do is for each node to ...

1 - run nodetool drain
2 - stop cassandra process
3 - start the new cassandra 1.0
4 - run nodetool scrub on the node

Is it right? Do i miss something (I will backup everything before the 
upgrade)? Should I worry for some kind of particular/known problems? As far as 
maintenance is concerned, is enough to run a repair every x? (x < 
GCGraceSeconds)

Best regards,
Carlo





 

Re: Should I throttle deletes?

2012-01-05 Thread aaron morton
> I use a batch mutator in Pycassa to delete ~1M rows based on
> a longish list of keys I'm extracting from an auxiliary CF (with no
> problem of any sort).
What is the size of the deletion batches ?

> Now, it appears that such heads-on delete puts a temporary
> but large load on the cluster. I have SSD's and they go to 100%
> utilization, and the CPU spikes to significant loads.
Does the load spike during the deletion or after it ? 
Do any of the thread pool back up in nodetool tpstats during the load ?  

I can think of a few general issues you may want to avoid:

* Each row in a batch mutation is handled by a task in a thread pool on the 
nodes. So if you send a batch to delete 1,000 rows it will put 1,000 tasks in 
the Mutation stage. This will reduce the query throughput.
* Lots of deletes in a row will add overhead to reads on the row. 

You may want to check for excessive memtable flushing, but if you have default 
automatic memory management running lots of deletes should not result in extra 
flushing.  

Hope that helps
Aaron

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 10:13 AM, Maxim Potekhin wrote:

> Now that my cluster appears to run smoothly and after a few successful
> repairs and compacts, I'm back in the business of deletion of portions
> of data based on its date of insertion. For reasons too lengthy to be
> explained here, I don't want to use TTL.
> 
> I use a batch mutator in Pycassa to delete ~1M rows based on
> a longish list of keys I'm extracting from an auxiliary CF (with no
> problem of any sort).
> 
> Now, it appears that such heads-on delete puts a temporary
> but large load on the cluster. I have SSD's and they go to 100%
> utilization, and the CPU spikes to significant loads.
> 
> Does anyone do throttling on such mass-delete procedure?
> 
> Thanks in advance,
> 
> Maxim
> 



Writes slower then reads

2012-01-05 Thread R. Verlangen
Hi there,

I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2.
Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~
0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram).

When I'm running my writes against the cluster with cl = ONE all reads
appear to be faster then the writes.

Average write speed = 1600us/operation
Average read speed = 200us/operation

I'm really wondering why this is the case. Anyone got a clue?

With kind regards,
Robin


Re: Writes slower then reads

2012-01-05 Thread Philippe
What can you see in vmstat/dstat ?
Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :

> Hi there,
>
> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2.
> Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~
> 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram).
>
> When I'm running my writes against the cluster with cl = ONE all reads
> appear to be faster then the writes.
>
> Average write speed = 1600us/operation
> Average read speed = 200us/operation
>
> I'm really wondering why this is the case. Anyone got a clue?
>
> With kind regards,
> Robin
>


Re: Writes slower then reads

2012-01-05 Thread R. Verlangen
CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out).
Paging 0. System int ~ 1300, csw ~ 2500.

2012/1/5 Philippe 

> What can you see in vmstat/dstat ?
> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>
> Hi there,
>>
>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2.
>> Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~
>> 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram).
>>
>> When I'm running my writes against the cluster with cl = ONE all reads
>> appear to be faster then the writes.
>>
>> Average write speed = 1600us/operation
>> Average read speed = 200us/operation
>>
>> I'm really wondering why this is the case. Anyone got a clue?
>>
>> With kind regards,
>> Robin
>>
>


Re: Writes slower then reads

2012-01-05 Thread R. Verlangen
As I posted this I noticed that the other node's CPU is running high on
some other cronjobs (every couple of minutes to 60% usage). Is the lack of
more CPU cycles a problem in this case?

Robin

2012/1/5 R. Verlangen 

> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out).
> Paging 0. System int ~ 1300, csw ~ 2500.
>
>
> 2012/1/5 Philippe 
>
>> What can you see in vmstat/dstat ?
>> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>>
>> Hi there,
>>>
>>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2.
>>> Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~
>>> 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram).
>>>
>>> When I'm running my writes against the cluster with cl = ONE all reads
>>> appear to be faster then the writes.
>>>
>>> Average write speed = 1600us/operation
>>> Average read speed = 200us/operation
>>>
>>> I'm really wondering why this is the case. Anyone got a clue?
>>>
>>> With kind regards,
>>> Robin
>>>
>>
>


Re: Consistency Level

2012-01-05 Thread aaron morton
I missed a ! in the code :) The query will break the token ring into ranges 
based on the node tokens and then find the UP nodes for each range. 

I've taken another walk through the code, the logs helped. 

In short, you do not have enough UP nodes to support an indexed get at CL ONE. 
It is working by design and you *should* have gotten an UnavailableException 
returned. There must be CL up replicas for each token range. In your test node 
200.190 is down and so the next node, with RF 2 this means there are no 
replicas for the range. The log line below is logged just before the 
UnavalableException is raised

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) 
> Live nodes  do not satisfy ConsistencyLevel (1 required)


You will need at least every RF'th node UP. Another way to look at is if you 
have RF contiguous nodes DOWN you cannot perform an indexed get. 

If you are interested this is what the logs are saying…

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) 
> scan ranges are 
> [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
There are 4 token ranges to query, i.e. we have to make 4 reads to query over 
the whole cluster. 

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) 
> Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) 
> reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) 
> reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
Starting to read for the first token range. A bug in 0.8.6 makes it read from 
202.118 when it does not need to. 


> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) 
> collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx 
> [7a7a32323636373030303438303031:false:0@1325704860925009,])
Scanned the secondary index on 200.130 and found an entry for the row key 
7a7a32323636373030303438303031 matched the index expression. 

> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 
> 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
Returning ZERO rows for the query result. Because the row key we read above has 
the token 111413491371349413596553235966977111575L which is not in the first 
token range from above (0,42535295865117307932921825928971026432] and this is 
the range we are interested in now. 

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) 
> Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) 
> reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
Processing the second range now. There is only one node up for this range, 
202.118

> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 
> ResponseVerbHandler.java (line 48) Processing response on a callback from 
> 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 
> ResponseVerbHandler.java (line 48) Processing response on a callback from 
> 173@/172.16.202.118
Got the callback from 202.118 for both the query ranges. 

The logs on 202.118 show the same local local query. But I'm a little confused 
as to why the row exists on node 2 at all.

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) 
> Blockfor/repair is 1/false; setting up requests to
Moving on, time to process the third token range 
(85070591730234615865843651857942052864,127605887595351923798765477786913079296]

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) 
> Live nodes  do not satisfy ConsistencyLevel (1 required)

Oh noes there are no nodes available for that token range. Throw 
UnavailableException

Hope that helps. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote:

> Hi Aaron, 
> 
> Thanks for your response!
> 
> I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1 
> conta

Re: is it bad to have lots of column families?

2012-01-05 Thread aaron morton
Sort of. Depends. 

In Cassandra automatic memory management means the server can support more CF's 
and it has apparently been tested to 100's or 1000's of CF's. Having lots of 
CF's will impact performance by putting memory and IO under pressure though. 

If you have 10's you should not have to worry too much. Best thing is to test 
and post your findings. 

Hope that helps. 
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 11:49 AM, Michael Cetrulo wrote:

> in a traditional database it's not a good a idea to have hundreds of tables 
> but is it also bad to have hundreds of column families in cassandra? thank 
> you.



Re: Migration from 0.7 to 1.0

2012-01-05 Thread aaron morton
> Ok but as far as my application is concerned is safe to keep a cluster with 
> part of 1.0 and part of 0.7?
I *think* it should be so long as it's a short time and you do not run any 
repairs. 

If 1.0 creates any new files, via mutations or compaction, they will not be 
readable by 0.7. So the rollback to 0.7 will require going back to the 
snapshot. 

> I've read that they can communicate but will it bring to "strange" 
> situations? Will my application continue working (java/pelops)?
Again I think so (framed transport is there). But this should be an easy test 
to do against a dev server. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 9:33 PM, cbert...@libero.it wrote:

> 
> Aaron first of all thanks for your great support.
> 
>   I'm paranoid, so I would  upgrade 1 node and let it soak in for a few 
> hours. Nothing like upgrading an entire cluster and then discovering a
> problem. 
> 
> Ok but as far as my application is concerned is safe to keep a cluster with 
> part of 1.0 and part of 0.7?
> I've read that they can communicate but will it bring to "strange" 
> situations? Will my application continue working (java/pelops)?
> 
>   You can take some extra steps when doing a rolling restart see 
> http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
> 
> This is what I was looking for! :-)
> Thanks for the repair tips ...
> 
> Best regards,
> Carlo
> 
> 
> 
> 
> 
> Messaggio originale
> Da: aa...@thelastpickle.com
> Data: 04/01/2012 22.00
> A: 
> Ogg: Re: Migration from 0.7 to 1.0
> 
> Sounds good. 
> 
> You can take some extra steps when doing a rolling restart see 
> http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
> 
> Also make sure repair *does not* run until all the nodes have been upgraded. 
> 
>> Do i miss something (I will backup everything before the 
>> upgrade)? 
> I'm paranoid, so I would  upgrade 1 node and let it soak in for a few hours. 
> Nothing like upgrading an entire cluster and then discovering a problem. 
> 
>> As far as 
>> maintenance is concerned, is enough to run a repair every x? (x < 
>> GCGraceSeconds)
> once for each node with in that time frame 
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5/01/2012, at 2:47 AM, cbert...@libero.it wrote:
> 
>> Hi,
>> I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to 
>> know the best way to do it ...
>> 
>> "Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, 
>> one node at a time.  (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: 
>> upgrade to the most recent 0.8 release first.) You do not need to bring down 
>> the whole cluster at once.  - After upgrading, run nodetool scrub against 
>> each 
>> node before running repair, moving nodes, or adding new ones."
>> 
>> So what I'd do is for each node to ...
>> 
>> 1 - run nodetool drain
>> 2 - stop cassandra process
>> 3 - start the new cassandra 1.0
>> 4 - run nodetool scrub on the node
>> 
>> Is it right? Do i miss something (I will backup everything before the 
>> upgrade)? Should I worry for some kind of particular/known problems? As far 
>> as 
>> maintenance is concerned, is enough to run a repair every x? (x < 
>> GCGraceSeconds)
>> 
>> Best regards,
>> Carlo
> 
> 
> 
>  



Re: emptying my cluster

2012-01-05 Thread Alexandru Sicoe
Hi,

On Wed, Jan 4, 2012 at 9:54 PM, aaron morton wrote:

> Some thoughts on the plan:
>
> * You are monkeying around with things, do not be surprised when
> surprising things happen.
>

I am just trying to explore different solutions for solving my problem.


> * Deliberately unbalancing the cluster may lead to Bad Things happening.
>

I will take your advice on this. I would have liked to have an extra node
to have 2 nodes in each DC.


> * In the design discussed it is perfectly reasonable for data not to be on
> the archive node.
>

You mean when having the 2 DC setup I mentioned and using TTL? In case I
have the 2 DC setup but don't use TTL I don't understand why data wouldn't
be on the archive node?


> * Truncate is a cluster wide operation and all nodes must be online before
> it will start.
>
* Truncate will snapshot before deleting data, you could use this snapshot.
> * TTL for a column is for a column no matter which node it is on.
>

Thanks for clarifying these!


> * IMHO Cassandra data files (sstables or JSON dumps) are not a good format
> for a historical archive, nothing against Cassandra. You need the lowest
> common format.
>

So what data format should I use for historical archiving?


>
> If you have the resources for a second cluster could you put the two
> together and just have one cluster with a very large retention policy? One
> cluster is easier than two.
>

I am constrained to have limited retention on the Cassandra cluster that is
collecting the data . Once I archive the data for long term storage I
cannot bring it back in the same Cassandra cluster that collected it in the
first place because it's in an enclosed network with strict rules. I have
to load it in another cluster outside the enclosed network. It's not that I
have the resources for a second cluster, I am forced to use a second
cluster.


>
> Assuming there is no business case for this, consider either:
>
> * Dumping the historical data into a Hadoop (with or without HDFS) cluster
> with high compression. If needed you could then run Hive / Pig to fill a
> companion Cassandra cluster with data on demand. Or just query using Hadoop.
> * Dumping the historical data to files with high compression and a roll
> your own solution to fill a cluster.
>
> Ok, thanks for these suggestions, I will have to investigate further.


> Also considering talking to Data Stax about DSE.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote:
>
>
Cheers,
Alex

> Hi,
>
> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton wrote:
>
>>   Running a time based rolling window of data can be done using the TTL.
>> Backing up the nodes for disaster recover can be done using snapshots.
>> Restoring any point in time will be tricky because to may restore columns
>> where the TTL has expired.
>>
>
> Yeah, that's the thing...if I want to use the system as I explain further
> below, I cannot do backing up of data (for later restoration) if I'm using
> TTLs.
>
>
>>
>> Will I get a single copy of the data in the remote storage or will it be
>> twice the data (data + replica)?
>>
>> You will  RF copies of the data. (By the way, there is no original copy)
>>
>
> Well, if I organize the cluster as I mentioned in the first email, I will
> get one copy of each row at a certain point in time on node2 if I take it
> offline, perform a major compaction and GC, won't I? I don't want to send
> duplicated data to the mass storage!
>
>
>>
>> Can you share a bit more about the use case ? How much data and what sort
>> of read patterns ?
>>
>>
> I have several applications that feed into Cassandra about 2 million
> different variables (each representing a different monitoring
> value/channel). The system receives updates for each of these monitoring
> values at different rates. For each new update, the timestamp and value are
> recorded in a Cassandra name-value pair. The schema of Cassandra is built
> using one CF for data and 4 other CFs for metadata (metadata CFs are static
> - don't grow almost at all once they've been loaded). The data CF uses a
> row for each variable. Each row acts as a 4 hour time bin. I achieve this
> by creating the row key as a concatenation of  the first 6 digits of the
> timestamp at which the data is inserted + the unique ID of the variable.
> After the time bin expires, a new row will be created for the same variable
> ID.
>
> The system can currently sustain the insertion load. Now I'm looking into 
> organizing
> the flow of data out of the cluster and retrieval performance for random
> queries:
>
> Why do I need to organize the data out? Well, my requirement is to keep
> all the data coming into the system at the highest granularity for long
> term (several years). The 3 node cluster I mentioned is the online cluster
> which is supposed to be able to absorb the input load for a relatively
> short period of time, a few weeks (I a

Composite column docs

2012-01-05 Thread Shimi Kiviti
Is there a doc for using composite columns with thrift?
Is
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
the
only doc?
does the client needs to add the length to the get \ get_slice... queries
or is it taken care of on the server side?

Shimi


Re: is it bad to have lots of column families?

2012-01-05 Thread Philippe
My 0.8 production cluster contains around 150 CFs spread across 5
keyspaces. Haven't found that to be an issue (yet?).
Some of them are huge (dozens of GB), some are tiny (some MB).

Cheers

2012/1/5 aaron morton 

> Sort of. Depends.
>
> In Cassandra automatic memory management means the server can support more
> CF's and it has apparently been tested to 100's or 1000's of CF's. Having
> lots of CF's will impact performance by putting memory and IO under
> pressure though.
>
> If you have 10's you should not have to worry too much. Best thing is to
> test and post your findings.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/01/2012, at 11:49 AM, Michael Cetrulo wrote:
>
> in a traditional database it's not a good a idea to have hundreds of
> tables but is it also bad to have hundreds of column families in cassandra?
> thank you.
>
>
>


Re: Writes slower then reads

2012-01-05 Thread Philippe
Depending on the CL you're reading at it will yes : if the CL requires that
the "slow" node create a digest of the data and send it to the coordinator
then it might explain the poor performance on reads. What is your read CL ?

2012/1/5 R. Verlangen 

> As I posted this I noticed that the other node's CPU is running high on
> some other cronjobs (every couple of minutes to 60% usage). Is the lack of
> more CPU cycles a problem in this case?
>
> Robin
>
> 2012/1/5 R. Verlangen 
>
> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
>> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out).
>> Paging 0. System int ~ 1300, csw ~ 2500.
>>
>>
>> 2012/1/5 Philippe 
>>
>>> What can you see in vmstat/dstat ?
>>> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>>>
>>> Hi there,

 I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2.
 Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~
 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram).

 When I'm running my writes against the cluster with cl = ONE all reads
 appear to be faster then the writes.

 Average write speed = 1600us/operation
 Average read speed = 200us/operation

 I'm really wondering why this is the case. Anyone got a clue?

 With kind regards,
 Robin

>>>
>>
>


Re: Writes slower then reads

2012-01-05 Thread R. Verlangen
I'm also reading with CL = ONE

2012/1/5 Philippe 

> Depending on the CL you're reading at it will yes : if the CL requires
> that the "slow" node create a digest of the data and send it to the
> coordinator then it might explain the poor performance on reads. What is
> your read CL ?
>
> 2012/1/5 R. Verlangen 
>
> As I posted this I noticed that the other node's CPU is running high on
>> some other cronjobs (every couple of minutes to 60% usage). Is the lack of
>> more CPU cycles a problem in this case?
>>
>> Robin
>>
>> 2012/1/5 R. Verlangen 
>>
>> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
>>> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out).
>>> Paging 0. System int ~ 1300, csw ~ 2500.
>>>
>>>
>>> 2012/1/5 Philippe 
>>>
 What can you see in vmstat/dstat ?
 Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :

 Hi there,
>
> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF =
> 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk
> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 
> 16GB
> ram).
>
> When I'm running my writes against the cluster with cl = ONE all reads
> appear to be faster then the writes.
>
> Average write speed = 1600us/operation
> Average read speed = 200us/operation
>
> I'm really wondering why this is the case. Anyone got a clue?
>
> With kind regards,
> Robin
>

>>>
>>
>


Re: Writes slower then reads

2012-01-05 Thread Philippe
What if you shutdown the cassandra service on the slow node, does that
improve your read performance ?
If it does then that sole node is responsible for the slow down because it
can't act as a coordinator fast enough.

2012/1/5 R. Verlangen 

> I'm also reading with CL = ONE
>
>
> 2012/1/5 Philippe 
>
>> Depending on the CL you're reading at it will yes : if the CL requires
>> that the "slow" node create a digest of the data and send it to the
>> coordinator then it might explain the poor performance on reads. What is
>> your read CL ?
>>
>> 2012/1/5 R. Verlangen 
>>
>> As I posted this I noticed that the other node's CPU is running high on
>>> some other cronjobs (every couple of minutes to 60% usage). Is the lack of
>>> more CPU cycles a problem in this case?
>>>
>>> Robin
>>>
>>> 2012/1/5 R. Verlangen 
>>>
>>> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
 Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out).
 Paging 0. System int ~ 1300, csw ~ 2500.


 2012/1/5 Philippe 

> What can you see in vmstat/dstat ?
> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>
> Hi there,
>>
>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF =
>> 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk
>> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 
>> 16GB
>> ram).
>>
>> When I'm running my writes against the cluster with cl = ONE all
>> reads appear to be faster then the writes.
>>
>> Average write speed = 1600us/operation
>> Average read speed = 200us/operation
>>
>> I'm really wondering why this is the case. Anyone got a clue?
>>
>> With kind regards,
>> Robin
>>
>

>>>
>>
>


Re: Writes slower then reads

2012-01-05 Thread R. Verlangen
It does not appear to affect the response time, certainly not in a positive
way.

2012/1/5 Philippe 

> What if you shutdown the cassandra service on the slow node, does that
> improve your read performance ?
> If it does then that sole node is responsible for the slow down because it
> can't act as a coordinator fast enough.
>
> 2012/1/5 R. Verlangen 
>
> I'm also reading with CL = ONE
>>
>>
>> 2012/1/5 Philippe 
>>
>>> Depending on the CL you're reading at it will yes : if the CL requires
>>> that the "slow" node create a digest of the data and send it to the
>>> coordinator then it might explain the poor performance on reads. What is
>>> your read CL ?
>>>
>>> 2012/1/5 R. Verlangen 
>>>
>>> As I posted this I noticed that the other node's CPU is running high on
 some other cronjobs (every couple of minutes to 60% usage). Is the lack of
 more CPU cycles a problem in this case?

 Robin

 2012/1/5 R. Verlangen 

 CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out).
> Paging 0. System int ~ 1300, csw ~ 2500.
>
>
> 2012/1/5 Philippe 
>
>> What can you see in vmstat/dstat ?
>> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>>
>> Hi there,
>>>
>>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF =
>>> 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk
>>> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 
>>> 16GB
>>> ram).
>>>
>>> When I'm running my writes against the cluster with cl = ONE all
>>> reads appear to be faster then the writes.
>>>
>>> Average write speed = 1600us/operation
>>> Average read speed = 200us/operation
>>>
>>> I'm really wondering why this is the case. Anyone got a clue?
>>>
>>> With kind regards,
>>> Robin
>>>
>>
>

>>>
>>
>


Re: Writes slower then reads

2012-01-05 Thread Philippe
You may be overloading the cluster though...

My hypothesis is that your traffic is being spread across your node and
that one slow node is slowing down the fraction of traffic that goes to
that node (when it's acting as coordinator).
So what I would do is reduce the read load a lot to make sure I don't
overload the cluster and measure if I see a 1/RF improvement in response
time which would validate my hypothesis.


2012/1/5 R. Verlangen 

> It does not appear to affect the response time, certainly not in a
> positive way.
>
>
> 2012/1/5 Philippe 
>
>> What if you shutdown the cassandra service on the slow node, does that
>> improve your read performance ?
>> If it does then that sole node is responsible for the slow down because
>> it can't act as a coordinator fast enough.
>>
>> 2012/1/5 R. Verlangen 
>>
>> I'm also reading with CL = ONE
>>>
>>>
>>> 2012/1/5 Philippe 
>>>
 Depending on the CL you're reading at it will yes : if the CL requires
 that the "slow" node create a digest of the data and send it to the
 coordinator then it might explain the poor performance on reads. What is
 your read CL ?

 2012/1/5 R. Verlangen 

 As I posted this I noticed that the other node's CPU is running high on
> some other cronjobs (every couple of minutes to 60% usage). Is the lack of
> more CPU cycles a problem in this case?
>
> Robin
>
> 2012/1/5 R. Verlangen 
>
> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K.
>> Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / 
>> out).
>> Paging 0. System int ~ 1300, csw ~ 2500.
>>
>>
>> 2012/1/5 Philippe 
>>
>>> What can you see in vmstat/dstat ?
>>> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>>>
>>> Hi there,

 I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF
 = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk
 throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine with 
 16GB
 ram).

 When I'm running my writes against the cluster with cl = ONE all
 reads appear to be faster then the writes.

 Average write speed = 1600us/operation
 Average read speed = 200us/operation

 I'm really wondering why this is the case. Anyone got a clue?

 With kind regards,
 Robin

>>>
>>
>

>>>
>>
>


libQtCassandra minus Qt

2012-01-05 Thread David Gosselin
Good afternoon,

I am curious if anyone here has taken the libQtCassandra high-level client and 
stripped-out the Qt pieces to make it Qt independent?


Thanks,

David Gosselin
Senior Software Engineer
Acme Packet
(781) 328-2604



Re: Writes slower then reads

2012-01-05 Thread R. Verlangen
The write and read load is very minimal the moment. Roughly 10 writes + 10
reads / second. So 20 operations per second. Don't think that overloads my
cluster, does it?

2012/1/5 Philippe 

> You may be overloading the cluster though...
>
> My hypothesis is that your traffic is being spread across your node and
> that one slow node is slowing down the fraction of traffic that goes to
> that node (when it's acting as coordinator).
> So what I would do is reduce the read load a lot to make sure I don't
> overload the cluster and measure if I see a 1/RF improvement in response
> time which would validate my hypothesis.
>
>
> 2012/1/5 R. Verlangen 
>
> It does not appear to affect the response time, certainly not in a
>> positive way.
>>
>>
>> 2012/1/5 Philippe 
>>
>>> What if you shutdown the cassandra service on the slow node, does that
>>> improve your read performance ?
>>> If it does then that sole node is responsible for the slow down because
>>> it can't act as a coordinator fast enough.
>>>
>>> 2012/1/5 R. Verlangen 
>>>
>>> I'm also reading with CL = ONE


 2012/1/5 Philippe 

> Depending on the CL you're reading at it will yes : if the CL requires
> that the "slow" node create a digest of the data and send it to the
> coordinator then it might explain the poor performance on reads. What is
> your read CL ?
>
> 2012/1/5 R. Verlangen 
>
> As I posted this I noticed that the other node's CPU is running high
>> on some other cronjobs (every couple of minutes to 60% usage). Is the 
>> lack
>> of more CPU cycles a problem in this case?
>>
>> Robin
>>
>> 2012/1/5 R. Verlangen 
>>
>> CPU is idle (< 10% usage). Disk reads occasionally blocks over
>>> 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s 
>>> (in
>>> / out). Paging 0. System int ~ 1300, csw ~ 2500.
>>>
>>>
>>> 2012/1/5 Philippe 
>>>
 What can you see in vmstat/dstat ?
 Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :

 Hi there,
>
> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF
> = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk
> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine 
> with 16GB
> ram).
>
> When I'm running my writes against the cluster with cl = ONE all
> reads appear to be faster then the writes.
>
> Average write speed = 1600us/operation
> Average read speed = 200us/operation
>
> I'm really wondering why this is the case. Anyone got a clue?
>
> With kind regards,
> Robin
>

>>>
>>
>

>>>
>>
>


Hector and CQL

2012-01-05 Thread dir dir
Hi Folk,

I am a beginner user in Cassandra. I have a question about the usage and
integration (or installation) hector into eclipse IDE? I try to find the
answer
by googling, but I do not find a proper guidance to do it. Would you want
to help me
by telling me how to do it or showing me the proper guidance in the
internet??

Thank you.


Re: is it bad to have lots of column families?

2012-01-05 Thread Віталій Тимчишин
2012/1/5 Michael Cetrulo 

> in a traditional database it's not a good a idea to have hundreds of
> tables but is it also bad to have hundreds of column families in cassandra?
> thank you.
>

As far as I can see, this may raise memory requirements for you, since you
need to have index/bloom filter for each column family in memory.

-- 
Best regards,
 Vitalii Tymchyshyn


Integration Error between Cassandra and Eclipse

2012-01-05 Thread bobby saputra
Hi There,

I am a beginner user in Cassandra. I hear from many people said Cassandra is
a powerful database software which is used by Facebook, Twitter, Digg, etc.
So I feel interesting to study more about Cassandra.

When I performed integration process between Cassandra with Eclipse IDE (in
this case I use Java as computer language), I get trouble and have many
problem.
I have already followed all instruction from
http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
tutorial was not working properly. I got a lot of errors and warnings while
creating Java project in eclipse.

These are the errors and warnings:

Error(X) (1 item):
Description Resource  Location
The method rangeSet(Range...) in the type Range is not applicable for
the arguments (Range[]) RangeTest.java line 178

Warnings(!) (100 of 2916 items):
Description Resource Location
AbstractType is a raw type. References to generic type AbstractType
should be parameterized AbstractColumnContainer.java line 72
(and many same warnings)

These are what i've done:
1. I checked out cassandra-trunk from given link using SlikSvn as svn
client.
2. I moved to cassandra-trunk folder, and build with ant using "ant build"
command.
3. I generate eclipse files with ant using "ant generate-eclipse-files"
command.
4. I create new java project on eclipse, insert project name with
"cassandra-trunk", browse the location into cassandra-trunk folder.

Do I perform any mistakes? Or there are something wrong with the tutorial in
http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??

I have already googling to find the solution to solve this problem, but
unfortunately
I found no results. Would you want to help me by giving me a guide how to
solve
this problem? Please

Thank you very much for your help.

Best Regards,
Wira Saputra


RE: java.lang.AssertionError

2012-01-05 Thread Michael Vaknine
Thanks Aaron.

Michael

 

From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Wednesday, January 04, 2012 10:06 PM
To: user@cassandra.apache.org
Subject: Re: java.lang.AssertionError

 

Will be fixed in 1.0.7 https://issues.apache.org/jira/browse/CASSANDRA-3656

 

Cheers

 

-

Aaron Morton

Freelance Developer

@aaronmorton

http://www.thelastpickle.com

 

On 4/01/2012, at 11:26 PM, Michael Vaknine wrote:





Hi,

 

I have a 4 cluster version 1.0.3 which was upgraded from 0.7.6 in 2 stages.

Upgrade to 1.0.0 run scrub on all nodes

Upgrade to 1.0.3

 

I keep getting this errors from time to time on all 4 nodes.

 

Is there any maintenance I can do to fix the problem?

I tried to run repair on the cluster a few times but it did not help.

 

Thanks in advance for your help.

 

Michael

 

The kind of errors I get:

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392
AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392
java.lang.AssertionError

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
org.apache.cassandra.service.GCInspector.logGCResults(GCInspector.java:103)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
org.apache.cassandra.service.GCInspector.access$000(GCInspector.java:41)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
org.apache.cassandra.service.GCInspector$1.run(GCInspector.java:85)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$
101(ScheduledThreadPoolExecutor.java:98)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeri
odic(ScheduledThreadPoolExecutor.java:181)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Sch
eduledThreadPoolExecutor.java:205)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

NYC-Cass3 ERROR [ScheduledTasks:1] 2012-01-03 05:54:16,392 at
java.lang.Thread.run(Thread.java:619)

 



Re: Writes slower then reads

2012-01-05 Thread Philippe
Unless you are doing huge batches no... don't have any other idea for now...

2012/1/5 R. Verlangen 

> The write and read load is very minimal the moment. Roughly 10 writes + 10
> reads / second. So 20 operations per second. Don't think that overloads my
> cluster, does it?
>
>
> 2012/1/5 Philippe 
>
>> You may be overloading the cluster though...
>>
>> My hypothesis is that your traffic is being spread across your node and
>> that one slow node is slowing down the fraction of traffic that goes to
>> that node (when it's acting as coordinator).
>> So what I would do is reduce the read load a lot to make sure I don't
>> overload the cluster and measure if I see a 1/RF improvement in response
>> time which would validate my hypothesis.
>>
>>
>> 2012/1/5 R. Verlangen 
>>
>> It does not appear to affect the response time, certainly not in a
>>> positive way.
>>>
>>>
>>> 2012/1/5 Philippe 
>>>
 What if you shutdown the cassandra service on the slow node, does that
 improve your read performance ?
 If it does then that sole node is responsible for the slow down because
 it can't act as a coordinator fast enough.

 2012/1/5 R. Verlangen 

 I'm also reading with CL = ONE
>
>
> 2012/1/5 Philippe 
>
>> Depending on the CL you're reading at it will yes : if the CL
>> requires that the "slow" node create a digest of the data and send it to
>> the coordinator then it might explain the poor performance on reads. What
>> is your read CL ?
>>
>> 2012/1/5 R. Verlangen 
>>
>> As I posted this I noticed that the other node's CPU is running high
>>> on some other cronjobs (every couple of minutes to 60% usage). Is the 
>>> lack
>>> of more CPU cycles a problem in this case?
>>>
>>> Robin
>>>
>>> 2012/1/5 R. Verlangen 
>>>
>>> CPU is idle (< 10% usage). Disk reads occasionally blocks over
 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s 
 (in
 / out). Paging 0. System int ~ 1300, csw ~ 2500.


 2012/1/5 Philippe 

> What can you see in vmstat/dstat ?
> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>
> Hi there,
>>
>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's),
>> RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. 
>> Disk
>> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine 
>> with 16GB
>> ram).
>>
>> When I'm running my writes against the cluster with cl = ONE all
>> reads appear to be faster then the writes.
>>
>> Average write speed = 1600us/operation
>> Average read speed = 200us/operation
>>
>> I'm really wondering why this is the case. Anyone got a clue?
>>
>> With kind regards,
>> Robin
>>
>

>>>
>>
>

>>>
>>
>


Deciding on CF

2012-01-05 Thread Sunit Randhawa
Hello,

We are working on some new cassandra requirements and I wanted to get your
recommendations on how to go ahead and put schema in place in terms of how
many CF one should have for below scenario:

1- There are 10 applications. Out of which 1 or 2 applications are very
active giving 90%+ load.

2- Every application has 10-15 defined transaction types.

3- Transaction Data needs to be stored in cassandra that is categorized
based on application (#1), transaction type (#2) and originating server.
Size of each transaction data is 5KB. There can be max. of 250 million
transactions per day. Transaction Data can be purged after 60 days. (There
are no updates but only inserts)

4- Finally, Transaction Data Report need to be generated that can be
rolled-up based on a timeline (could be past 5 mins upto max of 60 days)
based on application, transaction type and/or originating server.

Wanted to take the user group suggestion on how to decide on umber of  CF
and indexing option.


Re: Hector and CQL

2012-01-05 Thread rektide
Hector is a library. It needs to be added to your Eclipse project's "build 
classpath"
somehow before you can begin using it in Eclipse.

On Thu, Jan 05, 2012 at 11:25:16PM +0700, dir dir wrote:
>Hi Folk,
>I am a beginner user in Cassandra. I have a question about the usage and
>integration (or installation) hector into eclipse IDE? I try to find the
>answer
>by googling, but I do not find a proper guidance to do it. Would you want
>to help me
>by telling me how to do it or showing me the proper guidance in the
>internet??
>Thank you. �


Re: emptying my cluster

2012-01-05 Thread aaron morton
> * In the design discussed it is perfectly reasonable for data not to be on 
> the archive node. 
> 
> You mean when having the 2 DC setup I mentioned and using TTL? In case I have 
> the 2 DC setup but don't use TTL I don't understand why data wouldn't be on 
> the archive node?
Originally you were talking about taking the archive node down, and then having 
HH write hints back. HH is not considered a reliable mechanism for obtaining 
consistency, it's better in 1.0 but repair is AFAIK still considered the way to 
achieve consistency. For example HH only collects hints for a down node for 1 
hour.  Also a read operation will check consistency and may repair it, 
snapshots do not do that. 

Finally if you write into the DC with 2 nodes at a CL other than QUORUM or 
EACH_QUORUM there is no guarantee that the write will be committed in the other 
DC. 
 
>  So what data format should I use for historical archiving?
Plain text file, with documentation. So that any who follows you can work with 
the data.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/01/2012, at 12:31 AM, Alexandru Sicoe wrote:

> Hi,
> 
> On Wed, Jan 4, 2012 at 9:54 PM, aaron morton  wrote:
> Some thoughts on the plan:
> 
> * You are monkeying around with things, do not be surprised when surprising 
> things happen. 
> 
> I am just trying to explore different solutions for solving my problem.
>  
> * Deliberately unbalancing the cluster may lead to Bad Things happening. 
> 
> I will take your advice on this. I would have liked to have an extra node to 
> have 2 nodes in each DC.
>  
> * In the design discussed it is perfectly reasonable for data not to be on 
> the archive node. 
> 
> You mean when having the 2 DC setup I mentioned and using TTL? In case I have 
> the 2 DC setup but don't use TTL I don't understand why data wouldn't be on 
> the archive node?
>  
> * Truncate is a cluster wide operation and all nodes must be online before it 
> will start. 
> * Truncate will snapshot before deleting data, you could use this snapshot. 
> * TTL for a column is for a column no matter which node it is on. 
> 
> Thanks for clarifying these!
>  
> * IMHO Cassandra data files (sstables or JSON dumps) are not a good format 
> for a historical archive, nothing against Cassandra. You need the lowest 
> common format. 
> 
> So what data format should I use for historical archiving?
>  
> 
> If you have the resources for a second cluster could you put the two together 
> and just have one cluster with a very large retention policy? One cluster is 
> easier than two.  
> 
> I am constrained to have limited retention on the Cassandra cluster that is 
> collecting the data . Once I archive the data for long term storage I cannot 
> bring it back in the same Cassandra cluster that collected it in the first 
> place because it's in an enclosed network with strict rules. I have to load 
> it in another cluster outside the enclosed network. It's not that I have the 
> resources for a second cluster, I am forced to use a second cluster.
>  
> 
> Assuming there is no business case for this, consider either:
> 
> * Dumping the historical data into a Hadoop (with or without HDFS) cluster 
> with high compression. If needed you could then run Hive / Pig to fill a 
> companion Cassandra cluster with data on demand. Or just query using Hadoop.
> * Dumping the historical data to files with high compression and a roll your 
> own solution to fill a cluster. 
> 
> Ok, thanks for these suggestions, I will have to investigate further.
>  
> Also considering talking to Data Stax about DSE. 
> 
> Cheers 
>   
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 5/01/2012, at 1:41 AM, Alexandru Sicoe wrote:
> 
> 
> Cheers,
> Alex 
>> Hi,
>> 
>> On Tue, Jan 3, 2012 at 8:19 PM, aaron morton  wrote:
>> Running a time based rolling window of data can be done using the TTL. 
>> Backing up the nodes for disaster recover can be done using snapshots. 
>> Restoring any point in time will be tricky because to may restore columns 
>> where the TTL has expired. 
>>  
>> Yeah, that's the thing...if I want to use the system as I explain further 
>> below, I cannot do backing up of data (for later restoration) if I'm using 
>> TTLs. 
>>  
>> 
>>> Will I get a single copy of the data in the remote storage or will it be 
>>> twice the data (data + replica)?
>> You will  RF copies of the data. (By the way, there is no original copy)
>> 
>> Well, if I organize the cluster as I mentioned in the first email, I will 
>> get one copy of each row at a certain point in time on node2 if I take it 
>> offline, perform a major compaction and GC, won't I? I don't want to send 
>> duplicated data to the mass storage!
>>  
>> 
>> Can you share a bit more about the use case ? How much data and what sort of 
>> read patterns ? 
>> 
>> 
>> I have several applications that feed into Cassandra

Re: Composite column docs

2012-01-05 Thread aaron morton
What client are you using ? 

For example pycassa has some sweet documentation 
http://pycassa.github.com/pycassa/assorted/composite_types.html

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/01/2012, at 12:48 AM, Shimi Kiviti wrote:

> Is there a doc for using composite columns with thrift?
> Is 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/marshal/CompositeType.java
>  the only doc?
> does the client needs to add the length to the get \ get_slice... queries or 
> is it taken care of on the server side?
> 
> Shimi



Re: Writes slower then reads

2012-01-05 Thread aaron morton
What happens when you turn off the cron jobs ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 6/01/2012, at 6:57 AM, Philippe wrote:

> Unless you are doing huge batches no... don't have any other idea for now...
> 
> 2012/1/5 R. Verlangen 
> The write and read load is very minimal the moment. Roughly 10 writes + 10 
> reads / second. So 20 operations per second. Don't think that overloads my 
> cluster, does it?
> 
> 
> 2012/1/5 Philippe 
> You may be overloading the cluster though...
> 
> My hypothesis is that your traffic is being spread across your node and that 
> one slow node is slowing down the fraction of traffic that goes to that node 
> (when it's acting as coordinator).
> So what I would do is reduce the read load a lot to make sure I don't 
> overload the cluster and measure if I see a 1/RF improvement in response time 
> which would validate my hypothesis.
> 
> 
> 2012/1/5 R. Verlangen 
> 
> It does not appear to affect the response time, certainly not in a positive 
> way.
> 
> 
> 2012/1/5 Philippe 
> What if you shutdown the cassandra service on the slow node, does that 
> improve your read performance ?
> If it does then that sole node is responsible for the slow down because it 
> can't act as a coordinator fast enough.
> 
> 2012/1/5 R. Verlangen 
> 
> I'm also reading with CL = ONE
> 
> 
> 2012/1/5 Philippe 
> Depending on the CL you're reading at it will yes : if the CL requires that 
> the "slow" node create a digest of the data and send it to the coordinator 
> then it might explain the poor performance on reads. What is your read CL ?
> 
> 2012/1/5 R. Verlangen 
> 
> As I posted this I noticed that the other node's CPU is running high on some 
> other cronjobs (every couple of minutes to 60% usage). Is the lack of more 
> CPU cycles a problem in this case?
> 
> Robin
> 
> 2012/1/5 R. Verlangen 
> 
> CPU is idle (< 10% usage). Disk reads occasionally blocks over 32/64K. Writes 
> around 0-5MB per second. Network traffic 0.1 / 0.1 MB/s (in / out). Paging 0. 
> System int ~ 1300, csw ~ 2500.
> 
> 
> 2012/1/5 Philippe 
> What can you see in vmstat/dstat ?
> 
> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
> 
> Hi there,
> 
> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's), RF = 2. 
> Actual data on the nodes is only 1GB. Disk latency < 1ms. Disk throughput ~ 
> 0.4MB/s. OS load always below 1 (on a 8 core machine with 16GB ram). 
> 
> When I'm running my writes against the cluster with cl = ONE all reads appear 
> to be faster then the writes. 
> 
> Average write speed = 1600us/operation
> Average read speed = 200us/operation
> 
> I'm really wondering why this is the case. Anyone got a clue?
> 
> With kind regards,
> Robin 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Should I throttle deletes?

2012-01-05 Thread Maxim Potekhin

Hello Aaron,

On 1/5/2012 4:25 AM, aaron morton wrote:

I use a batch mutator in Pycassa to delete ~1M rows based on
a longish list of keys I'm extracting from an auxiliary CF (with no
problem of any sort).

What is the size of the deletion batches ?


2000 mutations.





Now, it appears that such heads-on delete puts a temporary
but large load on the cluster. I have SSD's and they go to 100%
utilization, and the CPU spikes to significant loads.

Does the load spike during the deletion or after it ?


During.



Do any of the thread pool back up in nodetool tpstats during the load ?


Haven't checked, thank you for the lead.


I can think of a few general issues you may want to avoid:

* Each row in a batch mutation is handled by a task in a thread pool 
on the nodes. So if you send a batch to delete 1,000 rows it will put 
1,000 tasks in the Mutation stage. This will reduce the query throughput.


Aah. I didn't know that. I was under the impression that batching saves 
the communication overhead, and that's it.


Then I do have a question, what do people generally use as the batch size?

Thanks

Maxim




Re: Writes slower then reads

2012-01-05 Thread R. Verlangen
I turned off 1 large cronjob which caused the CPU not to get used for ~ 60%
once every 10 minutes. Both write and read are fast now. Just think I was
overloading the node.

Weird though that shutting down the node did not improve the speed.

Thank you all for your time!

Robin

2012/1/5 aaron morton 

> What happens when you turn off the cron jobs ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 6/01/2012, at 6:57 AM, Philippe wrote:
>
> Unless you are doing huge batches no... don't have any other idea for
> now...
>
> 2012/1/5 R. Verlangen 
>
>> The write and read load is very minimal the moment. Roughly 10 writes +
>> 10 reads / second. So 20 operations per second. Don't think that overloads
>> my cluster, does it?
>>
>>
>> 2012/1/5 Philippe 
>>
>>> You may be overloading the cluster though...
>>>
>>> My hypothesis is that your traffic is being spread across your node and
>>> that one slow node is slowing down the fraction of traffic that goes to
>>> that node (when it's acting as coordinator).
>>> So what I would do is reduce the read load a lot to make sure I don't
>>> overload the cluster and measure if I see a 1/RF improvement in response
>>> time which would validate my hypothesis.
>>>
>>>
>>> 2012/1/5 R. Verlangen 
>>>
>>> It does not appear to affect the response time, certainly not in a
 positive way.


 2012/1/5 Philippe 

> What if you shutdown the cassandra service on the slow node, does that
> improve your read performance ?
> If it does then that sole node is responsible for the slow down
> because it can't act as a coordinator fast enough.
>
> 2012/1/5 R. Verlangen 
>
> I'm also reading with CL = ONE
>>
>>
>> 2012/1/5 Philippe 
>>
>>> Depending on the CL you're reading at it will yes : if the CL
>>> requires that the "slow" node create a digest of the data and send it to
>>> the coordinator then it might explain the poor performance on reads. 
>>> What
>>> is your read CL ?
>>>
>>> 2012/1/5 R. Verlangen 
>>>
>>> As I posted this I noticed that the other node's CPU is running high
 on some other cronjobs (every couple of minutes to 60% usage). Is the 
 lack
 of more CPU cycles a problem in this case?

 Robin

 2012/1/5 R. Verlangen 

 CPU is idle (< 10% usage). Disk reads occasionally blocks over
> 32/64K. Writes around 0-5MB per second. Network traffic 0.1 / 0.1 
> MB/s (in
> / out). Paging 0. System int ~ 1300, csw ~ 2500.
>
>
> 2012/1/5 Philippe 
>
>> What can you see in vmstat/dstat ?
>> Le 5 janv. 2012 11:58, "R. Verlangen"  a écrit :
>>
>> Hi there,
>>>
>>> I'm running a cassandra 0.8.6 cluster with 2 nodes (in 2 DC's),
>>> RF = 2. Actual data on the nodes is only 1GB. Disk latency < 1ms. 
>>> Disk
>>> throughput ~ 0.4MB/s. OS load always below 1 (on a 8 core machine 
>>> with 16GB
>>> ram).
>>>
>>> When I'm running my writes against the cluster with cl = ONE all
>>> reads appear to be faster then the writes.
>>>
>>> Average write speed = 1600us/operation
>>> Average read speed = 200us/operation
>>>
>>> I'm really wondering why this is the case. Anyone got a clue?
>>>
>>> With kind regards,
>>> Robin
>>>
>>
>

>>>
>>
>

>>>
>>
>
>


Re: Hector and CQL

2012-01-05 Thread Chris Gerken
I hate to admit it, but I use maven to get the classpaths right in Eclipse: 
  

org.apache.cassandra
cassandra-all
1.0.6
jar
compile


org.cassandraunit
cassandra-unit
1.0.1.1
jar
compile


Chris Gerken


On Jan 5, 2012, at 12:51 PM, rektide wrote:

> Hector is a library. It needs to be added to your Eclipse project's "build 
> classpath"
> somehow before you can begin using it in Eclipse.
> 
> On Thu, Jan 05, 2012 at 11:25:16PM +0700, dir dir wrote:
>>   Hi Folk,
>>   I am a beginner user in Cassandra. I have a question about the usage and
>>   integration (or installation) hector into eclipse IDE? I try to find the
>>   answer
>>   by googling, but I do not find a proper guidance to do it. Would you want
>>   to help me
>>   by telling me how to do it or showing me the proper guidance in the
>>   internet??
>>   Thank you. �



Re: Hector and CQL

2012-01-05 Thread Brian O'Neill
If you are looking to add hector, you'll need:


  me.prettyprint  hector
  1.0-2


-brian


 
Brian O'Neill
Lead Architect, Software Development
Health Market Science | 2700 Horizon Drive | King of Prussia, PA 19406
p: 215.588.6024blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/







On 1/5/12 3:04 PM, "Chris Gerken"  wrote:

>I hate to admit it, but I use maven to get the classpaths right in
>Eclipse: 
>  
>   
>   org.apache.cassandra
>   cassandra-all
>   1.0.6
>   jar
>   compile
>   
>   
>   org.cassandraunit
>   cassandra-unit
>   1.0.1.1
>   jar
>   compile
>   
>
>Chris Gerken
>
>
>On Jan 5, 2012, at 12:51 PM, rektide wrote:
>
>> Hector is a library. It needs to be added to your Eclipse project's
>>"build classpath"
>> somehow before you can begin using it in Eclipse.
>> 
>> On Thu, Jan 05, 2012 at 11:25:16PM +0700, dir dir wrote:
>>>   Hi Folk,
>>>   I am a beginner user in Cassandra. I have a question about the usage
>>>and
>>>   integration (or installation) hector into eclipse IDE? I try to find
>>>the
>>>   answer
>>>   by googling, but I do not find a proper guidance to do it. Would you
>>>want
>>>   to help me
>>>   by telling me how to do it or showing me the proper guidance in the
>>>   internet??
>>>   Thank you. �
>




Re: Should I throttle deletes?

2012-01-05 Thread Philippe
>
> Then I do have a question, what do people generally use as the batch size?
>
I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered I've moved to
batches of 20 for writes and 256 for reads. Everything is a lot smoother :
no more timeouts.

The downside though is that I have to run more client threads in parallele
to maximize throughput.

Cheers


Re: is it bad to have lots of column families?

2012-01-05 Thread Carlo Pires
Does index for CFs must fit in node's memory?

2012/1/5 Віталій Тимчишин 

>
>
> 2012/1/5 Michael Cetrulo 
>
>> in a traditional database it's not a good a idea to have hundreds of
>> tables but is it also bad to have hundreds of column families in cassandra?
>> thank you.
>>
>
> As far as I can see, this may raise memory requirements for you, since you
> need to have index/bloom filter for each column family in memory.
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>



-- 
  Carlo Pires
  62 8209-1444 TIM
  62 3251-1383
  Skype: carlopires


Re: Should I throttle deletes?

2012-01-05 Thread Maxim Potekhin
Thanks, that's quite helpful. I'm wondering though if multiplying the 
number of clients will

end up doing same thing.

On 1/5/2012 3:29 PM, Philippe wrote:


Then I do have a question, what do people generally use as the
batch size?

I used to do batches from 500 to 2000 like you do.
After investigating issues such as the one you've encountered I've 
moved to batches of 20 for writes and 256 for reads. Everything is a 
lot smoother : no more timeouts.


The downside though is that I have to run more client threads in 
parallele to maximize throughput.


Cheers




Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Chris Gerken
I wouldn't worry about the warnings.  Eclipse Java support defaults to fairly 
restrictive warning settings.  You can go into the preferences for 
Java->Compiler and change the 'warning' settings to 'ignore' for any of those 
problems that you don't or shouldn't really care about.

As for the error, is that a Test class or part of the main source body?
 
Chris Gerken



On Jan 5, 2012, at 11:04 AM, bobby saputra wrote:

> Hi There,
> 
> I am a beginner user in Cassandra. I hear from many people said Cassandra is 
> a powerful database software which is used by Facebook, Twitter, Digg, etc.
> So I feel interesting to study more about Cassandra.
> 
> When I performed integration process between Cassandra with Eclipse IDE (in 
> this case I use Java as computer language), I get trouble and have many 
> problem.
> I have already followed all instruction from 
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this tutorial 
> was not working properly. I got a lot of errors and warnings while creating 
> Java project in eclipse.
> 
> These are the errors and warnings:
> 
> Error(X) (1 item):
> Description   
> ResourceLocation
> The method rangeSet(Range...) in the type Range is not applicable for the 
> arguments (Range[])  RangeTest.java  line 178
> 
> Warnings(!) (100 of 2916 items):
> Description   
> ResourceLocation
> AbstractType is a raw type. References to generic type AbstractType should 
> be parameterizedAbstractColumnContainer.javaline 72
> (and many same warnings)
> 
> These are what i've done:
> 1. I checked out cassandra-trunk from given link using SlikSvn as svn client.
> 2. I moved to cassandra-trunk folder, and build with ant using "ant build" 
> command.
> 3. I generate eclipse files with ant using "ant generate-eclipse-files" 
> command.
> 4. I create new java project on eclipse, insert project name with 
> "cassandra-trunk", browse the location into cassandra-trunk folder.
> 
> Do I perform any mistakes? Or there are something wrong with the tutorial in
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse ?? 
> 
> I have already googling to find the solution to solve this problem, but 
> unfortunately
> I found no results. Would you want to help me by giving me a guide how to 
> solve
> this problem? Please
> 
> Thank you very much for your help.
> 
> Best Regards,
> Wira Saputra



Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Maki Watanabe
How about to use "File->Import..." rather than "File->New Java Project"?

After extracting the source, ant build, and ant generate-eclipse-files:
1. File->Import...
2. Choose "Existing Project into workspace..."
3. Choose your source directory as root directory and then push "Finish"


2012/1/6 bobby saputra :
> Hi There,
>
> I am a beginner user in Cassandra. I hear from many people said Cassandra is
> a powerful database software which is used by Facebook, Twitter, Digg, etc.
> So I feel interesting to study more about Cassandra.
>
> When I performed integration process between Cassandra with Eclipse IDE (in
> this case I use Java as computer language), I get trouble and have many
> problem.
> I have already followed all instruction from
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
> tutorial was not working properly. I got a lot of errors and warnings while
> creating Java project in eclipse.
>
> These are the errors and warnings:
>
> Error(X) (1 item):
> Description Resource  Location
> The method rangeSet(Range...) in the type Range is not applicable for the
> arguments (Range[]) RangeTest.java line 178
>
> Warnings(!) (100 of 2916 items):
> Description Resource Location
> AbstractType is a raw type. References to generic type AbstractType
> should be parameterized AbstractColumnContainer.java line 72
> (and many same warnings)
>
> These are what i've done:
> 1. I checked out cassandra-trunk from given link using SlikSvn as svn
> client.
> 2. I moved to cassandra-trunk folder, and build with ant using "ant build"
> command.
> 3. I generate eclipse files with ant using "ant generate-eclipse-files"
> command.
> 4. I create new java project on eclipse, insert project name with
> "cassandra-trunk", browse the location into cassandra-trunk folder.
>
> Do I perform any mistakes? Or there are something wrong with the tutorial in
> http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??
>
> I have already googling to find the solution to solve this problem, but
> unfortunately
> I found no results. Would you want to help me by giving me a guide how to
> solve
> this problem? Please
>
> Thank you very much for your help.
>
> Best Regards,
> Wira Saputra



-- 
w3m


Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Maki Watanabe
Sorry, ignore my reply.
I had same result with import. ( 1 error in unit test code & many warnings )

2012/1/6 Maki Watanabe :
> How about to use "File->Import..." rather than "File->New Java Project"?
>
> After extracting the source, ant build, and ant generate-eclipse-files:
> 1. File->Import...
> 2. Choose "Existing Project into workspace..."
> 3. Choose your source directory as root directory and then push "Finish"
>
>
> 2012/1/6 bobby saputra :
>> Hi There,
>>
>> I am a beginner user in Cassandra. I hear from many people said Cassandra is
>> a powerful database software which is used by Facebook, Twitter, Digg, etc.
>> So I feel interesting to study more about Cassandra.
>>
>> When I performed integration process between Cassandra with Eclipse IDE (in
>> this case I use Java as computer language), I get trouble and have many
>> problem.
>> I have already followed all instruction from
>> http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
>> tutorial was not working properly. I got a lot of errors and warnings while
>> creating Java project in eclipse.
>>
>> These are the errors and warnings:
>>
>> Error(X) (1 item):
>> Description Resource  Location
>> The method rangeSet(Range...) in the type Range is not applicable for the
>> arguments (Range[]) RangeTest.java line 178
>>
>> Warnings(!) (100 of 2916 items):
>> Description Resource Location
>> AbstractType is a raw type. References to generic type AbstractType
>> should be parameterized AbstractColumnContainer.java line 72
>> (and many same warnings)
>>
>> These are what i've done:
>> 1. I checked out cassandra-trunk from given link using SlikSvn as svn
>> client.
>> 2. I moved to cassandra-trunk folder, and build with ant using "ant build"
>> command.
>> 3. I generate eclipse files with ant using "ant generate-eclipse-files"
>> command.
>> 4. I create new java project on eclipse, insert project name with
>> "cassandra-trunk", browse the location into cassandra-trunk folder.
>>
>> Do I perform any mistakes? Or there are something wrong with the tutorial in
>> http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??
>>
>> I have already googling to find the solution to solve this problem, but
>> unfortunately
>> I found no results. Would you want to help me by giving me a guide how to
>> solve
>> this problem? Please
>>
>> Thank you very much for your help.
>>
>> Best Regards,
>> Wira Saputra
>
>
>
> --
> w3m



-- 
w3m


Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Yuki Morishita
Also note that Cassandra project switched to git from svn.
See "Source control" section of http://cassandra.apache.org/download/ .

Regards,

Yuki 

-- 
Yuki Morishita


On Thursday, January 5, 2012 at 7:59 PM, Maki Watanabe wrote:

> Sorry, ignore my reply.
> I had same result with import. ( 1 error in unit test code & many warnings )
> 
> 2012/1/6 Maki Watanabe  (mailto:watanabe.m...@gmail.com)>:
> > How about to use "File->Import..." rather than "File->New Java Project"?
> > 
> > After extracting the source, ant build, and ant generate-eclipse-files:
> > 1. File->Import...
> > 2. Choose "Existing Project into workspace..."
> > 3. Choose your source directory as root directory and then push "Finish"
> > 
> > 
> > 2012/1/6 bobby saputra mailto:zaibat...@gmail.com)>:
> > > Hi There,
> > > 
> > > I am a beginner user in Cassandra. I hear from many people said Cassandra 
> > > is
> > > a powerful database software which is used by Facebook, Twitter, Digg, 
> > > etc.
> > > So I feel interesting to study more about Cassandra.
> > > 
> > > When I performed integration process between Cassandra with Eclipse IDE 
> > > (in
> > > this case I use Java as computer language), I get trouble and have many
> > > problem.
> > > I have already followed all instruction from
> > > http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
> > > tutorial was not working properly. I got a lot of errors and warnings 
> > > while
> > > creating Java project in eclipse.
> > > 
> > > These are the errors and warnings:
> > > 
> > > Error(X) (1 item):
> > > Description Resource  Location
> > > The method rangeSet(Range...) in the type Range is not applicable for 
> > > the
> > > arguments (Range[]) RangeTest.java line 178
> > > 
> > > Warnings(!) (100 of 2916 items):
> > > Description Resource Location
> > > AbstractType is a raw type. References to generic type AbstractType
> > > should be parameterized AbstractColumnContainer.java line 72
> > > (and many same warnings)
> > > 
> > > These are what i've done:
> > > 1. I checked out cassandra-trunk from given link using SlikSvn as svn
> > > client.
> > > 2. I moved to cassandra-trunk folder, and build with ant using "ant build"
> > > command.
> > > 3. I generate eclipse files with ant using "ant generate-eclipse-files"
> > > command.
> > > 4. I create new java project on eclipse, insert project name with
> > > "cassandra-trunk", browse the location into cassandra-trunk folder.
> > > 
> > > Do I perform any mistakes? Or there are something wrong with the tutorial 
> > > in
> > > http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??
> > > 
> > > I have already googling to find the solution to solve this problem, but
> > > unfortunately
> > > I found no results. Would you want to help me by giving me a guide how to
> > > solve
> > > this problem? Please
> > > 
> > > Thank you very much for your help.
> > > 
> > > Best Regards,
> > > Wira Saputra
> > > 
> > 
> > 
> > 
> > 
> > --
> > w3m
> > 
> 
> 
> 
> 
> -- 
> w3m
> 
> 




RE: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Kuldeep Sengar
Hi,
Can you post the error(saying that only 1 error is there), that'll make things 
more clear.
Thanks

Kuldeep Singh Sengar

Opera Solutions
Tech Boulevard,8th floor, Tower C,
Sector 127, Plot No 6,Noida 201 301
+91 (120) 4642424 facsimile, Ext : 2418
+91 8800595878 (M)  

-Original Message-
From: Maki Watanabe [mailto:watanabe.m...@gmail.com] 
Sent: Friday, January 06, 2012 7:30 AM
To: user@cassandra.apache.org
Subject: Re: Integration Error between Cassandra and Eclipse

Sorry, ignore my reply.
I had same result with import. ( 1 error in unit test code & many warnings )

2012/1/6 Maki Watanabe :
> How about to use "File->Import..." rather than "File->New Java Project"?
>
> After extracting the source, ant build, and ant generate-eclipse-files:
> 1. File->Import...
> 2. Choose "Existing Project into workspace..."
> 3. Choose your source directory as root directory and then push "Finish"
>
>
> 2012/1/6 bobby saputra :
>> Hi There,
>>
>> I am a beginner user in Cassandra. I hear from many people said Cassandra is
>> a powerful database software which is used by Facebook, Twitter, Digg, etc.
>> So I feel interesting to study more about Cassandra.
>>
>> When I performed integration process between Cassandra with Eclipse IDE (in
>> this case I use Java as computer language), I get trouble and have many
>> problem.
>> I have already followed all instruction from
>> http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
>> tutorial was not working properly. I got a lot of errors and warnings while
>> creating Java project in eclipse.
>>
>> These are the errors and warnings:
>>
>> Error(X) (1 item):
>> Description Resource  Location
>> The method rangeSet(Range...) in the type Range is not applicable for the
>> arguments (Range[]) RangeTest.java line 178
>>
>> Warnings(!) (100 of 2916 items):
>> Description Resource Location
>> AbstractType is a raw type. References to generic type AbstractType
>> should be parameterized AbstractColumnContainer.java line 72
>> (and many same warnings)
>>
>> These are what i've done:
>> 1. I checked out cassandra-trunk from given link using SlikSvn as svn
>> client.
>> 2. I moved to cassandra-trunk folder, and build with ant using "ant build"
>> command.
>> 3. I generate eclipse files with ant using "ant generate-eclipse-files"
>> command.
>> 4. I create new java project on eclipse, insert project name with
>> "cassandra-trunk", browse the location into cassandra-trunk folder.
>>
>> Do I perform any mistakes? Or there are something wrong with the tutorial in
>> http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??
>>
>> I have already googling to find the solution to solve this problem, but
>> unfortunately
>> I found no results. Would you want to help me by giving me a guide how to
>> solve
>> this problem? Please
>>
>> Thank you very much for your help.
>>
>> Best Regards,
>> Wira Saputra
>
>
>
> --
> w3m



-- 
w3m


Re: Integration Error between Cassandra and Eclipse

2012-01-05 Thread Dave Brosius

This works for me

http://wiki.apache.org/cassandra/HowToDebug



On 01/06/2012 01:18 AM, Kuldeep Sengar wrote:

Hi,
Can you post the error(saying that only 1 error is there), that'll make things 
more clear.
Thanks

Kuldeep Singh Sengar

Opera Solutions
Tech Boulevard,8th floor, Tower C,
Sector 127, Plot No 6,Noida 201 301
+91 (120) 4642424 facsimile, Ext : 2418
+91 8800595878 (M)

-Original Message-
From: Maki Watanabe [mailto:watanabe.m...@gmail.com]
Sent: Friday, January 06, 2012 7:30 AM
To: user@cassandra.apache.org
Subject: Re: Integration Error between Cassandra and Eclipse

Sorry, ignore my reply.
I had same result with import. ( 1 error in unit test code&  many warnings )

2012/1/6 Maki Watanabe:

How about to use "File->Import..." rather than "File->New Java Project"?

After extracting the source, ant build, and ant generate-eclipse-files:
1. File->Import...
2. Choose "Existing Project into workspace..."
3. Choose your source directory as root directory and then push "Finish"


2012/1/6 bobby saputra:

Hi There,

I am a beginner user in Cassandra. I hear from many people said Cassandra is
a powerful database software which is used by Facebook, Twitter, Digg, etc.
So I feel interesting to study more about Cassandra.

When I performed integration process between Cassandra with Eclipse IDE (in
this case I use Java as computer language), I get trouble and have many
problem.
I have already followed all instruction from
http://wiki.apache.org/cassandra/RunningCassandraInEclipse, but this
tutorial was not working properly. I got a lot of errors and warnings while
creating Java project in eclipse.

These are the errors and warnings:

Error(X) (1 item):
Description Resource  Location
The method rangeSet(Range...) in the type Range is not applicable for the
arguments (Range[]) RangeTest.java line 178

Warnings(!) (100 of 2916 items):
Description Resource Location
AbstractType is a raw type. References to generic type AbstractType
should be parameterized AbstractColumnContainer.java line 72
(and many same warnings)

These are what i've done:
1. I checked out cassandra-trunk from given link using SlikSvn as svn
client.
2. I moved to cassandra-trunk folder, and build with ant using "ant build"
command.
3. I generate eclipse files with ant using "ant generate-eclipse-files"
command.
4. I create new java project on eclipse, insert project name with
"cassandra-trunk", browse the location into cassandra-trunk folder.

Do I perform any mistakes? Or there are something wrong with the tutorial in
http://wiki.apache.org/cassandra/RunningCassandraInEclipse ??

I have already googling to find the solution to solve this problem, but
unfortunately
I found no results. Would you want to help me by giving me a guide how to
solve
this problem? Please

Thank you very much for your help.

Best Regards,
Wira Saputra



--
w3m







Re: Dealing with "Corrupt (negative) value length encountered"

2012-01-05 Thread Philippe
Thanks Aaron, I was able to complete the repair by scrubbing the column
family on all three replicas.

Cheers

2012/1/4 aaron morton 

> I was able to scrub the node the repair that failed was running on. Are
> you saying the error could be displayed on that node but the bad data
> coming from another node ?
>
> Yes. The error occurred the node was receiving a data stream from another,
> you will need to clean the source of the data. You can either crawl through
> the logs or scrub the entire cluster.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/01/2012, at 9:15 AM, Philippe wrote:
>
> I was able to scrub the node the repair that failed was running on. Are
> you saying the error could be displayed on that node but the bad data
> coming from another node ?
>
> Log inspection also showed many of these, they seem to happen around when
> a stream transfer finishes.
> ERROR [Thread-550876] 2012-01-03 16:35:31,922
> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
> Thread[Thread-550876,5,main]
> java.lang.IllegalArgumentException
> at
> sun.nio.ch.FileChannelImpl.transferFrom(FileChannelImpl.java:586)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.readnwrite(IncomingStreamReader.java:110)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.readFile(IncomingStreamReader.java:85)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:61)
> at
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
>
> Thanks
>
> 2012/1/2 aaron morton 
>
>> I would try to nodetool scrub the data on the node that that sent the bad
>> data in the stream. You may be able to work which node from the logs, or it
>> may be easier to just scrub them all.
>>
>> Hope that helps.
>>
>>   -
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 31/12/2011, at 12:20 AM, Philippe wrote:
>>
>> Hello,
>> Running a combination of 0.8.6 and 0.8.8 with RF=3, I am getting the
>> following while repairing one node (all other nodes completed successfully).
>> Can I just stop the instance, erase the SSTable and restart cleanup ?
>> Thanks
>>
>> ERROR [Thread-402484] 2011-12-29 14:51:03,687
>> AbstractCassandraDaemon.java (line 139) Fatal exception in thread
>> Thread[Thread-402484,5,main]
>> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
>> java.io.IOError: java.io.IOException: Corrupt (negative) value length
>> encountered
>> at
>> org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:154)
>> at
>> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
>> at
>> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
>> Caused by: java.util.concurrent.ExecutionException: java.io.IOError:
>> java.io.IOException: Corrupt (negative) value length encountered
>> at
>> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>> at
>> org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:138)
>> ... 3 more
>>
>>
>>
>
>