RE: Replica info

2013-05-09 Thread Kanwar Sangha
Thanks ! Is there also a way to find out the replica nodes ?

Say we have 2 DCs, DC1 and DC2 with RF=2 (DC1:1, DC2:1)

Can we find out which node in DC2 is a replica ?



From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: 08 May 2013 21:08
To: user@cassandra.apache.org
Subject: Re: Replica info

http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints
This tells you where a key lives. (you need to hex encode the key)

On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
nodetool describering {keyspace}


From: Kanwar Sangha 
mailto:kan...@mavenir.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Wednesday, May 8, 2013 3:00 PM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: Replica info

Is there a way in Cassandra that we can know which node has the replica for the 
data ? if we have 4 nodes and RF = 2, is there a way we can find which 2 nodes 
have the same data ?

Thanks,
Kanwar



Re: Replica info

2013-05-09 Thread Michael Morris
Not directly, but you should be able to use the output of the getendpoints
operation, and of nodetool ring to find the IP address that matches the DC
you are looking for.

Thanks,

Mike


On Thu, May 9, 2013 at 11:08 AM, Kanwar Sangha  wrote:

>  Thanks ! Is there also a way to find out the replica nodes ?
>
> ** **
>
> Say we have 2 DCs, DC1 and DC2 with RF=2 (DC1:1, DC2:1)
>
> ** **
>
> Can we find out which node in DC2 is a replica ?
>
> ** **
>
> ** **
>
> ** **
>
> *From:* Edward Capriolo [mailto:edlinuxg...@gmail.com]
> *Sent:* 08 May 2013 21:08
> *To:* user@cassandra.apache.org
> *Subject:* Re: Replica info
>
> ** **
>
> http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints
> 
>
> This tells you where a key lives. (you need to hex encode the key)
>
> ** **
>
> On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean  wrote:
> 
>
> nodetool describering {keyspace}
>
>
> From: Kanwar Sangha mailto:kan...@mavenir.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Wednesday, May 8, 2013 3:00 PM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Replica info
>
> Is there a way in Cassandra that we can know which node has the replica
> for the data ? if we have 4 nodes and RF = 2, is there a way we can find
> which 2 nodes have the same data ?
>
> Thanks,
> Kanwar
>
> ** **
>


Re: Sudden increase in diskspace usage

2013-05-09 Thread Alexis Rodríguez
Nicolai,

Perhaps you can check the system.log to see if there are any errors on
compaction. Also, I believe C* 1.2.0 it's not a stable version.




On Thu, May 9, 2013 at 2:43 AM, Nicolai Gylling  wrote:

> Hi
>
> I have a 3-node SSD-based cluster, with around 1 TB data, RF:3, C*
> v.1.2.0, vnodes. One large CF, LCS. Everything was running smooth, until
> one of the nodes crashed and was restarted.
>
> At the time of normal operation there was 800 gb free space on each node.
> After the crash, C* started using a lot more, resulting in an
> out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just 2
> days, giving us very little time to do anything about it, since
> repairs/joins takes a considerable amount of time.
>
> What can make C* suddenly use this amount of disk-space? We did see a lot
> of pending compactions on one node (7k).
>
> Any tips on recovering from an out-of-diskspace on multiple nodes,
> situation? I've tried moving some SStables away, but C* seems to use
> whatever space I free up in no time. I'm not sure if any of the nodes is
> fully updated as 'nodetool status' reports 3 different loads
>
> --  Address   Load   Tokens  Owns (effective)  Host ID
>   Rack
> UN  10.146.145.26 1.4 TB 256 100.0%
>  1261717d-ddc1-457e-9c93-431b3d3b5c5b  rack1
> UN  10.148.149.1411.03 TB256 100.0%
>  f80bfa31-e19d-4346-9a14-86ae87f06356  rack1
> DN  10.146.146.4  1.11 TB256 100.0%
>  85d4cd28-93f4-4b96-8140-3605302e90a9  rack1
>
>
> --
>
> Sincerely,
>
> *Nicolai Gylling*
>
>


Re: Sudden increase in diskspace usage

2013-05-09 Thread Robert Coli
On Wed, May 8, 2013 at 10:43 PM, Nicolai Gylling  wrote:
> At the time of normal operation there was 800 gb free space on each node.
> After the crash, C* started using a lot more, resulting in an
> out-of-diskspace situation on 2 nodes, eg. C* used up the 800 gb in just 2
> days, giving us very little time to do anything about it, since
> repairs/joins takes a considerable amount of time.

Did someone do a repair? Repair very frequently results in (usually
temporary) >2x disk consumption.

> What can make C* suddenly use this amount of disk-space? We did see a lot of
> pending compactions on one node (7k).

Mostly repair.

> Any tips on recovering from an out-of-diskspace on multiple nodes,
> situation? I've tried moving some SStables away, but C* seems to use
> whatever space I free up in no time. I'm not sure if any of the nodes is
> fully updated as 'nodetool status' reports 3 different loads

A relevant note here is that moving sstables out of the full partition
while cassandra is running will not result in any space recovery,
because Cassandra still has an open filehandle to that sstable. In
order to deal with out of disk space condition you need to stop
Cassandra. Unfortunately the JVM stops responding to clean shutdown
request when the disk is full, you will have to kill -KILL the
process.

If you have a lot of overwrites/fragmentation, you could attempt to
clear enough space to do a major compaction of remaining data, do that
major compaction, split your One Huge sstable with the (experimental)
sstable_split tool and then copy temporarily moved sstables back onto
the node. You could also attempt to use user defined compaction (via
JMX endpoint) to strategically compact such data. If you grep for
compaction in your logs, do you see compactions resulting in smaller
output file sizes? (compacted to X% of original messages)

I agree with Alexis Rodriguez that Cassandra 1.2.0 is not a version
anyone should run, it contains significant bugs.

=Rob


Re: Cannot resolve schema disagreement

2013-05-09 Thread Robert Coli
On Wed, May 8, 2013 at 5:40 PM, srmore  wrote:
> After running the commands, I get back to the same issue. Cannot afford to
> lose the data so I guess this is the only option for me. And unfortunately I
> am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might be
> happening or any pointers will be greatly appreciated.

If you can afford downtime on the cluster, the solution to this
problem with the highest chance of success is :

1) dump the existing schema from a good node
2) nodetool drain on all nodes
3) stop cluster
4) move schema and migration CF tables out of the way on all nodes
5) start cluster
6) re-load schema, being careful to explicitly check for schema
agreement on all nodes between schema modifying statements

In many/most cases of schema disagreement, people try the FAQ approach
and it doesn't work and they end up being forced to do the above
anyway. In general if you can tolerate the downtime, you should save
yourself the effort and just do the above process.

=Rob


Cassandra Experts

2013-05-09 Thread Liz Lee
Hello,

My name is Liz and I just subscribed to the mailing list for Cassandra.  I work 
for a consulting company by the name of Red Oak Technologies, and we have a 
world-class client who is in need of Cassandra professional services expertise. 
 If anyone has any tips or leads for me, I surely would be grateful! :)

Liz

Liz Lee
Red Oak Technologies
liz@redoaktech.com


Re: Cassandra Experts

2013-05-09 Thread Steven Siebert
Hi Liz,

Are you looking for a reference to professional cassandra
services/support...or looking to learn cassandra to provide said support?
 If the former, I highly recommend DataStax (
http://www.datastax.com/what-we-offer/products-services/consulting).  I'm a
non-affiliated future customer (adoption delay is on our side), and have
thus far received great support from their sales and technical teams - they
have spent a lot of time out of hide to capture my needs and answer my
questions.

...not to mention their direct ties to apache cassandra (
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra),
they obviously have the technical capabilities current and future.

>From one cassandra user -- if you're looking for paid support, that's where
I would go.

Regards,

Steve

On Thu, May 9, 2013 at 4:25 PM, Liz Lee  wrote:

> Hello,
>
> My name is Liz and I just subscribed to the mailing list for Cassandra.  I
> work for a consulting company by the name of Red Oak Technologies, and we
> have a world-class client who is in need of Cassandra professional services
> expertise.  If anyone has any tips or leads for me, I surely would be
> grateful! :)
>
> Liz
>
> Liz Lee
> Red Oak Technologies
> liz@redoaktech.com
>


Re: Cannot resolve schema disagreement

2013-05-09 Thread srmore
Thanks Rob !

Tried the steps, that did not work, however I was able to resolve the
problem by syncing the clocks. The thing that confuses me is that, the FAQ
says "Before 0.7.6, this can also be caused by cluster system clocks being
substantially out of sync with each other". The version I am using was
1.0.12.

This raises an important question, where does Cassandra get the time
information from ? and is it required (I know it is highly highly advisable
to) to keep clocks in sync, any suggestions/best practices on how to keep
the clocks in sync  ?



/srm


On Thu, May 9, 2013 at 1:58 PM, Robert Coli  wrote:

> On Wed, May 8, 2013 at 5:40 PM, srmore  wrote:
> > After running the commands, I get back to the same issue. Cannot afford
> to
> > lose the data so I guess this is the only option for me. And
> unfortunately I
> > am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might be
> > happening or any pointers will be greatly appreciated.
>
> If you can afford downtime on the cluster, the solution to this
> problem with the highest chance of success is :
>
> 1) dump the existing schema from a good node
> 2) nodetool drain on all nodes
> 3) stop cluster
> 4) move schema and migration CF tables out of the way on all nodes
> 5) start cluster
> 6) re-load schema, being careful to explicitly check for schema
> agreement on all nodes between schema modifying statements
>
> In many/most cases of schema disagreement, people try the FAQ approach
> and it doesn't work and they end up being forced to do the above
> anyway. In general if you can tolerate the downtime, you should save
> yourself the effort and just do the above process.
>
> =Rob
>


RE: Cassandra Experts

2013-05-09 Thread Viktor Jevdokimov
Consulting company is a body shop that looking for a job candidates for their 
clients, shortly - recruiters, so not interested in support or learning, just 
selling bodies with some brains.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider
Take a ride with Adform's Rich Media Suite

[Adform News] 
[Adform awarded the Best Employer 2012] 



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Steven Siebert [mailto:smsi...@gmail.com]
Sent: Friday, May 10, 2013 00:02
To: user@cassandra.apache.org
Subject: Re: Cassandra Experts

Hi Liz,

Are you looking for a reference to professional cassandra services/support...or 
looking to learn cassandra to provide said support?  If the former, I highly 
recommend DataStax 
(http://www.datastax.com/what-we-offer/products-services/consulting).  I'm a 
non-affiliated future customer (adoption delay is on our side), and have thus 
far received great support from their sales and technical teams - they have 
spent a lot of time out of hide to capture my needs and answer my questions.

...not to mention their direct ties to apache cassandra 
(http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra),
 they obviously have the technical capabilities current and future.

>From one cassandra user -- if you're looking for paid support, that's where I 
>would go.

Regards,

Steve
On Thu, May 9, 2013 at 4:25 PM, Liz Lee 
mailto:liz@redoaktech.com>> wrote:
Hello,

My name is Liz and I just subscribed to the mailing list for Cassandra.  I work 
for a consulting company by the name of Red Oak Technologies, and we have a 
world-class client who is in need of Cassandra professional services expertise. 
 If anyone has any tips or leads for me, I surely would be grateful! :)

Liz

Liz Lee
Red Oak Technologies
liz@redoaktech.com

<><>

Re: how to get column family details dynamically in cassandra bulk load program

2013-05-09 Thread aaron morton
The schema is available over the various interfaces, check with the client you 
are using to see it exposes the information. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/05/2013, at 1:36 AM, chandana.tumm...@wipro.com wrote:

> Dear All,
> 
> I am using cassandra bulkload program from  
> www.datastax.com/dev/blog/bulk-loading‎
> In This for CSV entry we are giving column name and validation class .
> Is there any way to get the column names and validation class directly from 
> database by giving 
> just keyspace and column family name ,like using JDBC
> metadata we can get  details of the table dynamically. 
> Please can you let me know if there is any way to get.
> 
> Thanks & Regards,
> Chandana Tummala.
> 
> Please do not print this email unless it is absolutely necessary.
> 
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain proprietary, confidential or privileged information. If you are not 
> the intended recipient, you should not disseminate, distribute or copy this 
> e-mail. Please notify the sender immediately and destroy all copies of this 
> message and any attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> 
> www.wipro.com
> 



Re: hector or astyanax

2013-05-09 Thread aaron morton
Yup, thats the one. 

A

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/05/2013, at 3:40 AM, Blair Zajac  wrote:

> On 05/07/2013 01:37 AM, aaron morton wrote:
>>> i want to know which cassandra client is better?
>> Go with Astynax or Native Binary, they are both under active development
>> and support by a vendor / large implementor.
> 
> Native Binary being which one specifically?  Do you mean the new DataStax 
> java-driver? [1]
> 
> Regards,
> Blair
> 
> [1] https://github.com/datastax/java-driver



Re: mutation stalls and FileNotFoundException

2013-05-09 Thread aaron morton
When mutation stage messages are dropped I start by looking for GC problems in 
the cassandra logs.

You also have some blocked flush writer tasks, which may be due to a large 
number of CF's, a large number of secondary indexes, slow disk IO or excessive 
use of snapshot / flush. Check the comments for the memtable_flush_queue_size 
in the yaml file. 

Hope that helps. 
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/05/2013, at 4:41 AM, Keith Wright  wrote:

> I am running 1.2.4 with Vnodes and have been writing at low volume.  I have 
> doubled the volume and suddenly 3 of my 6 nodes are showing much higher load 
> than the others (30 vs 3) and tpstats show the mutation stage as completely 
> full (see below).  I did find a FileNotFoundException that I pasted below 
> which appears to be caused by creating, dropping, and creating a keyspace 
> (something I did but 4 or 5 days ago).  Anyone have any idea what's going on 
> here?
> 
> Thanks
> 
> Keiths-MacBook-Pro:bin keith$ ./nodetool tpstats -h lxpcas005.nanigans.com
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0 130990 0
>  0
> RequestResponseStage  0 0 344216 0
>  0
> MutationStage   128   4523464036 0
>  0
> ReadRepairStage   0 0  14131 0
>  0
> ReplicateOnWriteStage 0 0  32872 0
>  0
> GossipStage   1   611   6351 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0  9 0
>  0
> MemtablePostFlusher   0 0 91 0
>  0
> FlushWriter   0 0 60 0
> 27
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0  3 0
>  0
> HintedHandoff 1 1 13 0
>  0
> 
> Message type   Dropped
> RANGE_SLICE  0
> READ_REPAIR 54
> BINARY   0
> READ 0
> MUTATION  8539
> _TRACE   0
> REQUEST_RESPONSE 0
> 
> 
> 
> ERROR [ReplicateOnWriteStage:95404] 2013-05-06 14:55:06,555 
> CassandraDaemon.java (line 174) Exception in thread 
> Thread[ReplicateOnWriteStage:95404,5,main]
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.io.FileNotFoundException: 
> /data/1/cassandra/data/users/global_user_stats/users-global_user_stats-ib-30716-Data.db
>  (No such file or directory)
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1582)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /data/1/cassandra/data/users/global_user_stats/users-global_user_stats-ib-30716-Data.db
>  (No such file or directory)
> at 
> org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:46)
> at 
> org.apache.cassandra.io.util.CompressedSegmentedFile.createReader(CompressedSegmentedFile.java:57)
> at 
> org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:41)
> at 
> org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:976)
> at 
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.createFileDataInput(SSTableNamesIterator.java:98)
> at 
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:117)
> at 
> org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:64)
> at 
> org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:81)
> at 
> org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:68)
> at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:274)
> at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
> at 
> org.apache.cassandra.db.ColumnFamilyStore

Re: Cassanrda 1.1.11 compression: how to tell if it works ?

2013-05-09 Thread aaron morton
> At what point does compression start ? 
It starts for new SSTables created after the schema was altered. 

> How can I confirm it is working ?
Compressed SSTables include a -CompressionInfo.db component on disk. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/05/2013, at 6:35 AM, Oleg Dulin  wrote:

> I have a column family with really wide rows set to use Snappy like this:
> 
> compression_options = {'sstable_compression' : 
> 'org.apache.cassandra.io.compress.SnappyCompressor'}  
> 
> My understanding is that if a file is compressed I should not be able to use 
> "strings" command to view its contents. But it seems like I can view the 
> contents like this:
> 
> strings *-Data.db 
> 
> At what point does compression start ? How can I confirm it is working ?
> 
> 
> -- 
> Regards,
> Oleg Dulin
> NYC Java Big Data Engineer
> http://www.olegdulin.com/



Re: backup strategy

2013-05-09 Thread aaron morton
Assuming you are using the SimpleStrategy or the NetworkTopologyStrategy and 
one rack per DC. If you backed up every 2nd node you would get one copy *IF* 
all nodes were consistent on disk. That can be a reasonably large if that you 
need to monitor.

It's easier to back up all the nodes it will also make it easier to restore the 
cluster.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/05/2013, at 8:54 AM, Kanwar Sangha  wrote:

> Hi – If we have a RF=2 in a 4 node cluster, how do we ensure that the backup 
> taken is only for 1 copy of the data ? in other words, is it possible for us 
> to take back-up only from 2 nodes and not all 4 and still have at least 1 
> copy of the data ?
>  
> Thanks,
> Kanwar
>  
>  
>  



Re: how to monitor nodetool cleanup?

2013-05-09 Thread aaron morton
nodetool setcompactionthroughput controls the speed of compaction, and cleanup 
runs in the compaction manager. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/05/2013, at 8:59 AM, Michael Morris  wrote:

> Not sure about making things go faster, but you should be able to monitor it 
> with nodetool compactionstats.
> 
> Thanks,
> 
> Mike
> 
> 
> On Tue, May 7, 2013 at 12:43 PM, Brian Tarbox  
> wrote:
> I'm recovering from a significant failure and so am doing lots of nodetool 
> move, removetoken, repair and cleanup.
> 
> For most of these I can do "nodetool netstats" to monitor progress but it 
> doesn't show anything for cleanup...how can I monitor the progress of 
> cleanup?  On a related note: I'm able to stop all client access to the 
> cluster until things are happy again...is there anything I can do to make 
> move/repair/cleanup go faster?
> 
> FWIW my problems came from trying to move nodes between EC2 availability 
> zones...which led to
> 1) killing a node and recreating it in another availability zone
> 2) new node had different local ip address so cluster thought old node was 
> just down and we had a new node...
> 
> I did the removetoken on the dead node and gave the new node oldToken-1...but 
> things still got weird and I ended up spending a couple of days cleaning up 
> (which seems odd for only about 300 gig total data).
> 
> Anyway, any suggestions for monitoring / speeding up cleanup would be 
> appreciated.
> 
> Brian Tarbox
> 
> 



Re: Backup and restore between different node-sized clusters

2013-05-09 Thread aaron morton
Special case if you have 3 nodes and RF 3 you can copy the files to each node 
and use nodetool refresh. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/05/2013, at 11:20 AM, Jonathan Ellis  wrote:

> You want to use sstableloader when the cluster sizes are different; it
> will stream things to the right places in the new one.
> 
> On Wed, May 8, 2013 at 6:03 PM, Ron Siemens  wrote:
>> 
>> I have a 3-node cluster in production and a single-node development cluster. 
>>  I tested snapshotting a column family from the 3-node production cluster, 
>> grouping the files together, and restoring onto my single node development 
>> system.  That worked fine.  Can I go the other direction?  It's not easy for 
>> me to test in that direction: I'll get the chance at some point but would 
>> like to hear if you've done this.
>> 
>> If I just put the snapshot from the single node cluster on one of the nodes 
>> from the 3-node cluster, and do a JMX loadNewSSTables on that node, will the 
>> data load correctly into the 3-nodes?  Or is something more complex involved?
>> 
>> FYI, I'm following the instructions below, but only doing per column family 
>> backup and restore.
>> 
>> http://www.datastax.com/docs/1.2/operations/backup_restore
>> 
>> Thanks,
>> Ron
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced



Re: HintedHandoff

2013-05-09 Thread aaron morton
> ·If node ‘X ‘ in DC1 which is a ‘replica’ node is down and a write 
> comes with CL =1 to DC1, the co-ordinator node will write the hint and also 
> the data will be written to the other ‘replica’ node in DC2 ? Is this correct 
> ?
Writes always go to all UP replicas. So yes. 

> ·If yes, then when we try to do a ‘read’ of this data with CL = 
> ‘local_quorum’ from DC1, it will fail (since the data was written as a hint) 
> and we will need to read it from the other DC ?
It depends on your definition of fail. 
If node X is still down a read at LOCAL_QUORUM will fail to start and raise an 
UnavailableException. 
If node X is up, but the hints have not been delivered, the read will proceed 
and return the value before the write in step 1 above (including no value)

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/05/2013, at 4:38 PM, Kanwar Sangha  wrote:

> Is this correct guys ?
>  
> From: Kanwar Sangha [mailto:kan...@mavenir.com] 
> Sent: 07 May 2013 14:07
> To: user@cassandra.apache.org
> Subject: HintedHandoff
>  
> Hi -I had a question on  hinted-handoff.  We have 2 DCs configured with 
> overall RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total – 8 nodes across 
> 2  DCs)
>  
> Now we do a write with CL = ONE and Hinted Handoff enabled.
>  
> ·If node ‘X ‘ in DC1 which is a ‘replica’ node is down and a write 
> comes with CL =1 to DC1, the co-ordinator node will write the hint and also 
> the data will be written to the other ‘replica’ node in DC2 ? Is this correct 
> ?
> ·If yes, then when we try to do a ‘read’ of this data with CL = 
> ‘local_quorum’ from DC1, it will fail (since the data was written as a hint) 
> and we will need to read it from the other DC ?
>  
> Thanks,
> Kanwar



Re: Cannot resolve schema disagreement

2013-05-09 Thread aaron morton
> This raises an important question, where does Cassandra get the time 
> information from ? 
http://docs.oracle.com/javase/6/docs/api/java/lang/System.html
normally milliSeconds, not sure if 1.0.12 may use nanoTime() which is less 
reliable on some VM's. 

> and is it required (I know it is highly highly advisable to) to keep clocks 
> in sync, any suggestions/best practices on how to keep the clocks in sync  ? 
http://en.wikipedia.org/wiki/Network_Time_Protocol

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/05/2013, at 9:16 AM, srmore  wrote:

> Thanks Rob !
> 
> Tried the steps, that did not work, however I was able to resolve the problem 
> by syncing the clocks. The thing that confuses me is that, the FAQ says 
> "Before 0.7.6, this can also be caused by cluster system clocks being 
> substantially out of sync with each other". The version I am using was 1.0.12.
> 
> This raises an important question, where does Cassandra get the time 
> information from ? and is it required (I know it is highly highly advisable 
> to) to keep clocks in sync, any suggestions/best practices on how to keep the 
> clocks in sync  ? 
> 
> 
> 
> /srm
> 
> 
> On Thu, May 9, 2013 at 1:58 PM, Robert Coli  wrote:
> On Wed, May 8, 2013 at 5:40 PM, srmore  wrote:
> > After running the commands, I get back to the same issue. Cannot afford to
> > lose the data so I guess this is the only option for me. And unfortunately I
> > am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might be
> > happening or any pointers will be greatly appreciated.
> 
> If you can afford downtime on the cluster, the solution to this
> problem with the highest chance of success is :
> 
> 1) dump the existing schema from a good node
> 2) nodetool drain on all nodes
> 3) stop cluster
> 4) move schema and migration CF tables out of the way on all nodes
> 5) start cluster
> 6) re-load schema, being careful to explicitly check for schema
> agreement on all nodes between schema modifying statements
> 
> In many/most cases of schema disagreement, people try the FAQ approach
> and it doesn't work and they end up being forced to do the above
> anyway. In general if you can tolerate the downtime, you should save
> yourself the effort and just do the above process.
> 
> =Rob
> 



Re: Cassandra Experts

2013-05-09 Thread Steven Siebert
Good point, Viktor...I didn't bother checking if they were consulting firm
or recruiting...which you are correct, they appear to be head hunters.  My
mistake in thinking the OP was involved in a more direct consulting
situation and discovered they needed some outside expertise -- I guess
my skepticism filter is lower for a message on an OSS mailing list =)

Well played, OP

S

On Thu, May 9, 2013 at 5:34 PM, Viktor Jevdokimov <
viktor.jevdoki...@adform.com> wrote:

>  Consulting company is a body shop that looking for a job candidates for
> their clients, shortly – recruiters, so not interested in support or
> learning, just selling bodies with some brains.
>
> ** **
>
> ** **
>Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider 
> Take a ride with Adform's Rich Media Suite
>  [image: Adform News] 
> [image: Adform awarded the Best Employer 2012]
> 
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* Steven Siebert [mailto:smsi...@gmail.com]
> *Sent:* Friday, May 10, 2013 00:02
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra Experts
>
> ** **
>
> Hi Liz,
>
> ** **
>
> Are you looking for a reference to professional cassandra
> services/support...or looking to learn cassandra to provide said support?
>  If the former, I highly recommend DataStax (
> http://www.datastax.com/what-we-offer/products-services/consulting).  I'm
> a non-affiliated future customer (adoption delay is on our side), and have
> thus far received great support from their sales and technical teams - they
> have spent a lot of time out of hide to capture my needs and answer my
> questions.
>
> ** **
>
> ...not to mention their direct ties to apache cassandra (
> http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra),
> they obviously have the technical capabilities current and future.
>
> ** **
>
> From one cassandra user -- if you're looking for paid support, that's
> where I would go.
>
> ** **
>
> Regards,
>
> ** **
>
> Steve
>
> On Thu, May 9, 2013 at 4:25 PM, Liz Lee  wrote:***
> *
>
> Hello,
>
> My name is Liz and I just subscribed to the mailing list for Cassandra.  I
> work for a consulting company by the name of Red Oak Technologies, and we
> have a world-class client who is in need of Cassandra professional services
> expertise.  If anyone has any tips or leads for me, I surely would be
> grateful! :)
>
> Liz
>
> Liz Lee
> Red Oak Technologies
> liz@redoaktech.com
>
> ** **
>
<><>

Re: Cassandra Experts

2013-05-09 Thread Liz Lee-Red Oak
Obviously my inquiry was misdirected. Steven. You will have to pardon me, I was 
merely searching for help. I do appreciate you pointing me in the right 
direction (Datastax), and will be referring them to our customer.  Thank you. 

On May 9, 2013, at 4:25 PM, Steven Siebert  wrote:

> Good point, Viktor...I didn't bother checking if they were consulting firm or 
> recruiting...which you are correct, they appear to be head hunters.  My 
> mistake in thinking the OP was involved in a more direct consulting situation 
> and discovered they needed some outside expertise -- I guess my skepticism 
> filter is lower for a message on an OSS mailing list =)
> 
> Well played, OP
> 
> S
> 
> On Thu, May 9, 2013 at 5:34 PM, Viktor Jevdokimov 
>  wrote:
>> Consulting company is a body shop that looking for a job candidates for 
>> their clients, shortly – recruiters, so not interested in support or 
>> learning, just selling bodies with some brains.
>> 
>>  
>> 
>>  
>> 
>> Best regards / Pagarbiai
>> Viktor Jevdokimov
>> Senior Developer
>> 
>> Email: viktor.jevdoki...@adform.com
>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>> Follow us on Twitter: @adforminsider
>> Take a ride with Adform's Rich Media Suite
>> 
>> 
>> 
>> Disclaimer: The information contained in this message and attachments is 
>> intended solely for the attention and use of the named addressee and may be 
>> confidential. If you are not the intended recipient, you are reminded that 
>> the information remains the property of the sender. You must not use, 
>> disclose, distribute, copy, print or rely on this e-mail. If you have 
>> received this message in error, please contact the sender immediately and 
>> irrevocably delete this message and any copies.
>> 
>> From: Steven Siebert [mailto:smsi...@gmail.com] 
>> Sent: Friday, May 10, 2013 00:02
>> To: user@cassandra.apache.org
>> Subject: Re: Cassandra Experts
>> 
>>  
>> 
>> Hi Liz,
>> 
>>  
>> 
>> Are you looking for a reference to professional cassandra 
>> services/support...or looking to learn cassandra to provide said support?  
>> If the former, I highly recommend DataStax 
>> (http://www.datastax.com/what-we-offer/products-services/consulting).   I'm 
>> a non-affiliated future customer (adoption delay is on our side), and have 
>> thus far received great support from their sales and technical teams - they 
>> have spent a lot of time out of hide to capture my needs and answer my 
>> questions.
>> 
>>  
>> 
>> ...not to mention their direct ties to apache cassandra 
>> (http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra),
>>  they obviously have the technical capabilities current and future.
>> 
>>  
>> 
>> From one cassandra user -- if you're looking for paid support, that's where 
>> I would go.
>> 
>>  
>> 
>> Regards,
>> 
>>  
>> 
>> Steve
>> 
>> On Thu, May 9, 2013 at 4:25 PM, Liz Lee  wrote:
>> 
>> Hello,
>> 
>> My name is Liz and I just subscribed to the mailing list for Cassandra.  I 
>> work for a consulting company by the name of Red Oak Technologies, and we 
>> have a world-class client who is in need of Cassandra professional services 
>> expertise.  If anyone has any tips or leads for me, I surely would be 
>> grateful! :)
>> 
>> Liz
>> 
>> Liz Lee
>> Red Oak Technologies
>> liz@redoaktech.com
> 


Compaction in Cassandra

2013-05-09 Thread Techy Teck
How to figure out from the Datastax OPSCENTER whether the compaction is
finished/done?


pycassa failures in large batch cycling

2013-05-09 Thread John R. Frank

C* users,

We have a process that loads a large batch of rows from Cassandra into 
many separate compute workers.  The rows are one-column wide and range in 
size for a couple KB to ~100 MB.  After manipulating the data for a while, 
each compute worker writes the data back with *new* row keys computed by 
the workers (UUIDs).


After the full batch is written back to new rows, a cleanup worker deletes 
the old rows.


After several cycles, pycassa starts getting connection failures.

Should we use a pycassa listener to catch these failures and just recreate 
the ConnectionPool and keep going as if the connection had not dropped? 
Or is there a better approach?


These failures happen on just a simple single-node setup with a total data 
set less than half the size of Java heap space, e.g. 2GB data (times two 
for the two copies during cycling) versus 8GB heap.  We tried reducing 
memtable_flush_queue_size to 2 so that it would flush the deletes faster, 
and also tried multithreaded_compaction=true, but still pycassa gets 
connection failures.


Is this expected before for shedding load?  Or is this unexpected?

Would things be any different if we used multiple nodes and scaled the 
data and worker count to match?  I mean, is there something inherent to 
cassandra's operating model that makes it want to always have multiple 
nodes?


Thanks for pointers,
John


Re: Cannot resolve schema disagreement

2013-05-09 Thread srmore
Thought so.

Thanks Aaron !



On Thu, May 9, 2013 at 6:09 PM, aaron morton wrote:

> This raises an important question, where does Cassandra get the time
> information from ?
>
> http://docs.oracle.com/javase/6/docs/api/java/lang/System.html
> normally milliSeconds, not sure if 1.0.12 may use nanoTime() which is less
> reliable on some VM's.
>
> and is it required (I know it is highly highly advisable to) to keep
> clocks in sync, any suggestions/best practices on how to keep the clocks in
> sync  ?
>
> http://en.wikipedia.org/wiki/Network_Time_Protocol
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/05/2013, at 9:16 AM, srmore  wrote:
>
> Thanks Rob !
>
> Tried the steps, that did not work, however I was able to resolve the
> problem by syncing the clocks. The thing that confuses me is that, the FAQ
> says "Before 0.7.6, this can also be caused by cluster system clocks being
> substantially out of sync with each other". The version I am using was
> 1.0.12.
>
> This raises an important question, where does Cassandra get the time
> information from ? and is it required (I know it is highly highly advisable
> to) to keep clocks in sync, any suggestions/best practices on how to keep
> the clocks in sync  ?
>
>
>
> /srm
>
>
> On Thu, May 9, 2013 at 1:58 PM, Robert Coli  wrote:
>
>> On Wed, May 8, 2013 at 5:40 PM, srmore  wrote:
>> > After running the commands, I get back to the same issue. Cannot afford
>> to
>> > lose the data so I guess this is the only option for me. And
>> unfortunately I
>> > am using 1.0.12 ( cannot upgrade as of now ). Any, ideas on what might
>> be
>> > happening or any pointers will be greatly appreciated.
>>
>> If you can afford downtime on the cluster, the solution to this
>> problem with the highest chance of success is :
>>
>> 1) dump the existing schema from a good node
>> 2) nodetool drain on all nodes
>> 3) stop cluster
>> 4) move schema and migration CF tables out of the way on all nodes
>> 5) start cluster
>> 6) re-load schema, being careful to explicitly check for schema
>> agreement on all nodes between schema modifying statements
>>
>> In many/most cases of schema disagreement, people try the FAQ approach
>> and it doesn't work and they end up being forced to do the above
>> anyway. In general if you can tolerate the downtime, you should save
>> yourself the effort and just do the above process.
>>
>> =Rob
>>
>
>
>