date:20130311

> ·There are some columns set with TTL as X. After X Cassandra will 
> mark them as tombstones. Is there still a probability of running into 
> DistributedDeletes issue ?  I understand that “distributeddeletes” is more 
> applicable to application deletes ?
TTL'd columns are first turned into Tombstones and then purged after gc_grace. 

>Nodetool repair will ask the neighbour node say node 2 to generate the 
> merkle tree. As I understand, currently the repair introduces  2 compactions. 
> Repairs currently require 2 major compactions: one to validate a column 
> family, and then another to send the disagreeing ranges. Will this be done 
> over the complete data set in the node 2 ? Or only for the range as per 
> Vnodes ?
It only does one compaction. A Validation is used to create the merkle tree. 
Streaming differences is not done via compaction and only reads certain 
portions of files.
It's done per token range.
 
> ·How does Cassandra do nodetool repair across Data centres ? Assume 
> RF=1 in DC1 and RF=1 in DC2 with total RF = 2 across the two DCs.
No difference to all in the same DC.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 7/03/2013, at 2:04 PM, Kanwar Sangha  wrote:

> Hi Guys – I have a question on Vnodes and nodetool repair.  If I have 
> configured the nodes as vnodes, say for example 2 nodes with Rf=2.
>  
> Questions –
>  
> ·There are some columns set with TTL as X. After X Cassandra will 
> mark them as tombstones. Is there still a probability of running into 
> DistributedDeletes issue ?  I understand that “distributeddeletes” is more 
> applicable to application deletes ?
> ·Nodetool repair will ask the neighbour node say node 2 to generate 
> the merkle tree. As I understand, currently the repair introduces  2 
> compactions. Repairs currently require 2 major compactions: one to validate a 
> column family, and then another to send the disagreeing ranges. Will this be 
> done over the complete data set in the node 2 ? Or only for the range as per 
> Vnodes ?
> ·How does Cassandra do nodetool repair across Data centres ? Assume 
> RF=1 in DC1 and RF=1 in DC2 with total RF = 2 across the two DCs.
>  
> Thanks,
> Kanwar
>

Re: Using WHERE IN with Wide Rows

What statement are you issuing ? 
What have you tried ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 5:49 AM, Adam Venturella  wrote:

> TL;DR:
> Is it possible to use WHERE IN on wide rows but only have it return the 1st 
> column of each of the rows in the IN()?
> 
> First, I am aware that WHERE IN (id1, id2, id3...N) is not the most 
> performant, and should not be used on large sets.
> 
> Assuming there is also little difference from just issuing N SELECTs from the 
> requesting application. I'm guessing Cassandra may try to perform some 
> optimization on it's end, parallelizing the requests to the nodes if 
> applicable? Otherwise probably, generally speaking, it's more or less the 
> same-ish as issuing multiple SELECTs.
> 
> That said, I need to extract some data, and WHERE IN() is looking like the 
> best way to do it given that I have the row keys and just need the data. 
> 
> I have a few thousand id's and figure the best way to grab that info is in 10 
> id blocks so as not to abuse WHERE IN: IN (1...10), IN(11...20). Now maybe 
> issuing 100's of WHERE IN's is itself being abusive; my ignorance shows 
> though. Regardless, I still need to get some data out =)
> 
> The next catch is the rows identified by the keys are wide rows (time 
> series). Assuming each row is a minimum of 100 columns wide issuing the WHERE 
> IN seems to pull back all of the columns for each row key specified, as 
> expected. 
> 
> So my question. Is it possible to use WHERE IN on wide rows but only have it 
> return the 1st column of each of the rows in the IN()?
> 
> I can also just issue SELECTs per row key as well, but I thought I would ask 
> to see if there was something I was missing using WHERE IN.

Re: Nodetool drain automatically shutting down node?

+1, we ran into the same issue.  From docs, nodetool drain shuts down ports 
which then prevent nodetool from working as it needs to talk to cassandra 
through the ports.

I asked this same question 2 weeks ago but didn't get an answer.  I am guessing 
but I think 1.2.2 seems to fix this a little better so it doesn't complete shut 
down(ie. I think in 1.2.2 you can still use nodetool after a drain though I 
need to retest this).

A workaround for this issue though is enable your iptables to isolate all nodes 
and allow now communication in or out on ports 9160 and 7199, do a drain, start 
cassandra back up while it is isolated and then snapshot the cluster.

OR

Allow 1.2.2 to read all the commit logs which is what we did (we did snapshot 
then drain and created hardlinks to the commit log files as well)

Take your pick.  I have a feeling they are trying to make this better in the 
newer releases so we can drain and then snapshot though I would have been fine 
with a snapshot and no need to drain I think as in our testing the drain did 
NOT drain.  It did move everything in the commit log to sstables but the commit 
log files where still all there and where the same size AND they all got 
replayed on startup again.  I highly suggest to QA all this yourself and us "du 
–sh /…/…/commitlog" and "du –sh /…/…/data_files" so you get a feel on what you 
should expect.

Dean

From: Andrew Bialecki 
mailto:andrew.biale...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Friday, March 8, 2013 8:36 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Nodetool drain automatically shutting down node?

Hey all,

We're getting ready to upgrade our cluster to 1.2.2 from 1.1.5 and we're 
testing the upgrade process on our dev cluster. We turned off all client access 
to the cluster and then ran "nodetool drain" on the first instance with the 
intention of running "nodetool snapshot" once it finished. However, after 
running the drain, didn't see any errors, but the Cassandra process was no 
longer running. Is that expected? From everything I've read it doesn't seem 
like it, but maybe I'm mistaken.

Here's the relevant portion of the log from that node (notice it says it's 
shutting down the server thread in there):

INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288 
StorageService.java (line 774) DRAINING: starting drain process
 INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288 
CassandraDaemon.java (line 218) Stop listening to thrift clients
 INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,315 
Gossiper.java (line 1133) Announcing shutdown
 INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,318 
MessagingService.java (line 534) Waiting for messaging service to quiesce
 INFO 
[ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 
2013-03-09 03:26:49,319 MessagingService.java (line 690) MessagingService 
shutting down server thread.
 INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:49,338 
ColumnFamilyStore.java (line 659) Enqueuing flush of 
Memtable-Counter1@177255852(14810190/60139556 serialized/live bytes, 243550 ops)
 INFO [FlushWriter:7] 2013-03-09 03:26:49,338 Memtable.java (line 264) Writing 
Memtable-Counter1@177255852(14810190/60139556 serialized/live bytes, 243550 ops)
 INFO [FlushWriter:7] 2013-03-09 03:26:49,899 Memtable.java (line 305) 
Completed flushing 
/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db 
(15204741 bytes) for commitlog position ReplayPosition(segmentId=1362797442799, 
position=27621115)
 INFO [CompactionExecutor:11] 2013-03-09 03:26:49,900 CompactionTask.java (line 
109) Compacting 
[SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-102-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-103-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-104-Data.db'),
 
SSTableReader(path='/var/lib/cassandra/data/Keyspace1/Counter1/Keyspace1-Counter1-he-101-Data.db')]
 INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:50,193 
StorageService.java (line 774) DRAINED

Thanks in advanced for any help.

Cheers,
Andrew

Re: Time series data and deletion

> I'm trying to understand what will happen when we start deleting the old data.
Are you going to delete data or use the TTL?

> With size tiered compaction, suppose we have one 160Gb sstable and some 
> smaller tables totalling 40Gb.
Not sure on that, it depends on the work load. 

>  My understanding is that, even if we start deleting, we will have to wait 
> for 3 more 160Gb tables to appear, in order to have the first sstable 
> compacted and the disk space freed. 
v1.2 will run compactions on single SSTables that have a high number of 
tombstones 
https://issues.apache.org/jira/browse/CASSANDRA-3442
https://issues.apache.org/jira/browse/CASSANDRA-4234

>  So although we need to store 200Gb worth of data, we'll need something like 
> 800Gb disk space in order to be on the safe side, right?

You want to keep the disks below 75% capacity, and want to have free space to 
handle node moves etc.  
I do not think you need 800GB because of tombstones deletions. 

> What would happen instead with leveled compaction? 
Levelled compaction is more suited to workloads that have a high insert/delete 
ratio. In your case, write once read many data is will suited to Sized Tiered. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 9:13 AM, Flavio Baronti  wrote:

> Hello,
> 
> we are using Cassandra for storing time series data. We never update, only 
> append; we plan to store 1 year worth of data, occupying something around 
> 200Gb. I'm trying to understand what will happen when we start deleting the 
> old data.
> 
> With size tiered compaction, suppose we have one 160Gb sstable and some 
> smaller tables totalling 40Gb. My understanding is that, even if we start 
> deleting, we will have to wait for 3 more 160Gb tables to appear, in order to 
> have the first sstable compacted and the disk space freed. So although we 
> need to store 200Gb worth of data, we'll need something like 800Gb disk space 
> in order to be on the safe side, right?
> 
> What would happen instead with leveled compaction? And why is the default 
> sstable size so small (5Mb)? If we need to store 200Gb, this means we will 
> have 40k sstables; since each one makes 5 files, we'll have 200k files in a 
> single directory, which we'm afraid will undermine the stability of the file 
> system.
> 
> Thank you for your suggestions!
> 
> Flavio
>

Re: has anyone used dynamic snitch at all

Check that read_repair_chance on the CF's is 0.1, not the old 1.0

Wait at least 10 minutes for the DynamicSnitch to re-calculate. 

Use the org.apache.cassandra.db:type=DynamicEndpointSnitch MBean to see what 
scores it has given the nodes. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 11:40 AM, Edward Capriolo  wrote:

> dynamic_snitch=true is the default. So it is usually on "wrapping" other 
> snitches. I have found several scenarios where it does not work exactly as 
> your would expect.
> 
> On Fri, Mar 8, 2013 at 2:26 PM, Hiller, Dean  wrote:
> Our test setup
> 
> 4 nodes, RF=3, reads at CL=QUOROM and we tried CL=TWO
> Tell the network card to slow down every packet on node 2
> After fixing astyanax to not go to node 2 anymore, we are still seeing 
> cassandra have issues as it seems to be involving node 2 somehow.  If we take 
> node 2 down, it all speeds back up.
> 
> We are trying to get this working such that a slow node in cassandra does not 
> impact our customers.
> 
> We are in 1.2.2 and added the following properties….(our properties show 
> PropertyFileSnitch though I see the keyspace has 
> org.apache.cassandra.locator.SimpleStrategy set probably because it was 
> created through a tool instead of CLI…shucks)….anyways, I still expected 
> dynamic snitch to work….
> 
> # controls how often to perform the more expensive part of host score
> # calculation
> dynamic_snitch: true
> dynamic_snitch_update_interval_in_ms: 100
> # controls how often to reset all host scores, allowing a bad host to
> # possibly recover
> dynamic_snitch_reset_interval_in_ms: 60
> # if set greater than zero and read_repair_chance is < 1.0, this will allow
> # 'pinning' of replicas to hosts in order to increase cache capacity.
> # The badness threshold will control how much worse the pinned host has to be
> # before the dynamic snitch will prefer other replicas over it.  This is
> # expressed as a double which represents a percentage.  Thus, a value of
> # 0.2 means Cassandra would continue to prefer the static snitch values
> # until the pinned host was 20% worse than the fastest.
> dynamic_snitch_badness_threshold: 0.1
> 
> Any help appreciated,
> Thanks,
> Dean
>

Re: Nodetool drain automatically shutting down node?

Drain stops listening for connections from client and other nodes, and flushes 
all the data to disk. The purpose is to get everything into SSTables, so we do 
not want to process any more writes. 

The error is logged at DEBUG as it's not important, just means a thread (the 
processed gossip) was cancelled. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 7:58 PM, Andrew Bialecki  wrote:

> If it's helps, here's the log with debug log statements. Possibly issue with 
> that exception?
> 
> INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,402 
> StorageService.java (line 774) DRAINING: starting drain process
>  INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,403 
> CassandraDaemon.java (line 218) Stop listening to thrift clients
>  INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:32,404 
> Gossiper.java (line 1133) Announcing shutdown
> DEBUG [GossipTasks:1] 2013-03-09 03:54:33,328 
> DebuggableThreadPoolExecutor.java (line 190) Task cancelled
> java.util.concurrent.CancellationException
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:220)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>   at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.extractThrowable(DebuggableThreadPoolExecutor.java:182)
>   at 
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logExceptionsAfterExecute(DebuggableThreadPoolExecutor.java:146)
>   at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:50)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:662)
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406 
> StorageService.java (line 776) DRAINING: shutting down MessageService
>  INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,406 
> MessagingService.java (line 534) Waiting for messaging service to quiesce
>  INFO [ACCEPT-ip-10-116-111-143.ec2.internal/10.116.111.143] 2013-03-09 
> 03:54:33,407 MessagingService.java (line 690) MessagingService shutting down 
> server thread.
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 
> StorageService.java (line 776) DRAINING: waiting for streaming
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,408 
> StorageService.java (line 776) DRAINING: clearing mutation stage
> DEBUG [Thread-5] 2013-03-09 03:54:33,408 Gossiper.java (line 221) Reseting 
> version for /10.83.55.44
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 
> StorageService.java (line 776) DRAINING: flushing column families
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,409 
> ColumnFamilyStore.java (line 713) forceFlush requested but everything is 
> clean in Counter1
> DEBUG [Thread-6] 2013-03-09 03:54:33,410 Gossiper.java (line 221) Reseting 
> version for /10.80.187.124
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 
> ColumnFamilyStore.java (line 713) forceFlush requested but everything is 
> clean in Super1
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 
> ColumnFamilyStore.java (line 713) forceFlush requested but everything is 
> clean in SuperCounter1
> DEBUG [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,410 
> ColumnFamilyStore.java (line 713) forceFlush requested but everything is 
> clean in Standard1
>  INFO [RMI TCP Connection(2)-10.116.111.143] 2013-03-09 03:54:33,510 
> StorageService.java (line 774) DRAINED
> 
> On Fri, Mar 8, 2013 at 10:36 PM, Andrew Bialecki  
> wrote:
> Hey all,
> 
> We're getting ready to upgrade our cluster to 1.2.2 from 1.1.5 and we're 
> testing the upgrade process on our dev cluster. We turned off all client 
> access to the cluster and then ran "nodetool drain" on the first instance 
> with the intention of running "nodetool snapshot" once it finished. However, 
> after running the drain, didn't see any errors, but the Cassandra process was 
> no longer running. Is that expected? From everything I've read it doesn't 
> seem like it, but maybe I'm mistaken.
> 
> Here's the relevant portion of the log from that node (notice it says it's 
> shutting down the server thread in there):
> 
> INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288 
> StorageService.java (line 774) DRAINING: starting drain process
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,288 
> CassandraDaemon.java (line 218) Stop listening to thrift clients
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 03:26:48,315 
> Gossiper.java (line 1133) Announcing shutdown
>  INFO [RMI TCP Connection(38)-10.116.111.143] 2013-03-09 0

dynamic switch works with NetworkSTrategy NOT SimpleStrategy..solution followsŠ.

Well, we finally have dynamic switch working.  It seems to switch nodes to a 
Remote node and SimpleSTrategy can not deal with that well.  Also, we had to 
move to CL=QUOROM_LOCAL instead of QUOROM.

So for those of you that want cassandra to keep performing well when one node 
starts to get really slow, make sure you have

 1.  NetworkTopologyStrategy set
 2.  Have CL=QUOROM_LOCAL

We had to dive into the code to figure all that out.  I hope it helps someone 
else.

Later,
Dean

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 11, 2013 6:48 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: has anyone used dynamic snitch at all

Check that read_repair_chance on the CF's is 0.1, not the old 1.0

Wait at least 10 minutes for the DynamicSnitch to re-calculate.

Use the org.apache.cassandra.db:type=DynamicEndpointSnitch MBean to see what 
scores it has given the nodes.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 11:40 AM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>> wrote:

dynamic_snitch=true is the default. So it is usually on "wrapping" other 
snitches. I have found several scenarios where it does not work exactly as your 
would expect.

On Fri, Mar 8, 2013 at 2:26 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
Our test setup

4 nodes, RF=3, reads at CL=QUOROM and we tried CL=TWO
Tell the network card to slow down every packet on node 2
After fixing astyanax to not go to node 2 anymore, we are still seeing 
cassandra have issues as it seems to be involving node 2 somehow.  If we take 
node 2 down, it all speeds back up.

We are trying to get this working such that a slow node in cassandra does not 
impact our customers.

We are in 1.2.2 and added the following properties….(our properties show 
PropertyFileSnitch though I see the keyspace has 
org.apache.cassandra.locator.SimpleStrategy set probably because it was created 
through a tool instead of CLI…shucks)….anyways, I still expected dynamic snitch 
to work….

# controls how often to perform the more expensive part of host score
# calculation
dynamic_snitch: true
dynamic_snitch_update_interval_in_ms: 100
# controls how often to reset all host scores, allowing a bad host to
# possibly recover
dynamic_snitch_reset_interval_in_ms: 60
# if set greater than zero and read_repair_chance is < 1.0, this will allow
# 'pinning' of replicas to hosts in order to increase cache capacity.
# The badness threshold will control how much worse the pinned host has to be
# before the dynamic snitch will prefer other replicas over it.  This is
# expressed as a double which represents a percentage.  Thus, a value of
# 0.2 means Cassandra would continue to prefer the static snitch values
# until the pinned host was 20% worse than the fastest.
dynamic_snitch_badness_threshold: 0.1

Any help appreciated,
Thanks,
Dean

HintedHandoff IOError?

2013-03-11 Thread Janne Jalkanen

I keep seeing these in my log.  Three-node cluster, one node is working fine, 
but two other nodes have increased latencies and these in the error logs (might 
of course be unrelated). No obvious GC pressure, no disk errors that I can see. 
 Ubuntu 12.04 on EC2, Java 7. Repair is run regularly.

My two questions: 1) should I worry, and 2) what might be going on, and 3) is 
there any way to get rid of these? Can I just blow my HintedHandoff table to 
smithereens?

The only relevant issue I might see is CASSANDRA-5158, but it's not about HH.

Any more info I could dig?

Node A:

ERROR [OptionalTasks:1] 2013-03-11 13:34:19,153 AbstractCassandraDaemon.java 
(line 135) Exception in thread Thread[OptionalTasks:1,5,main]
java.io.IOError: java.io.EOFException: bloom filter claims to be 0 bytes, 
longer than entire row size 2147483647
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:101)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:66)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:86)
at 
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator$1.create(SSTableScanner.java:198)
at 
org.apache.cassandra.db.columniterator.LazyColumnIterator.getSubIterator(LazyColumnIterator.java:54)
at 
org.apache.cassandra.db.columniterator.LazyColumnIterator.getColumnFamily(LazyColumnIterator.java:66)
at 
org.apache.cassandra.db.RowIteratorFactory$2.reduce(RowIteratorFactory.java:95)
at 
org.apache.cassandra.db.RowIteratorFactory$2.reduce(RowIteratorFactory.java:79)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:101)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1403)
at 
org.apache.cassandra.db.ColumnFamilyStore$2.computeNext(ColumnFamilyStore.java:1399)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at 
org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:1476)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1455)
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1450)
at 
org.apache.cassandra.db.HintedHandOffManager.scheduleAllDeliveries(HintedHandOffManager.java:406)
at 
org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:85)
at 
org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:120)
at 
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:79)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.EOFException: bloom filter claims to be 0 bytes, longer than 
entire row size 2147483647
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:129)
at 
org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter(IndexHelper.java:110)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.read(SSTableNamesIterator.java:113)
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:96)
... 30 more

Node B:

ERROR [OptionalTasks:1] 2013-03-11 13:51:02,177 AbstractCassandraDaemon.java 
(line 135) Exception in thread Thread[OptionalTasks:1,5,main]
java.io.IOError: java.io.EOFException
at 
org.apache.cassandra.db.columniterator.SSTableNamesIterator.(SSTableNamesIterator.java:101)
at 
org.apache.cassandra.db.filter.NamesQueryFilter.getSSTableColumnIterator(NamesQueryFilter.java:66)
at 
org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:86)
at 
org.apache.cassandra.io.sstabl

Hadoop+Cassandra

2013-03-11 Thread oualid ait wafli

Hi

I need a tutorial for deployong Hadoop+Cassandra on single-nodes

Thanks

Importing data from SQL Server

I have seen numerous posts on transferring data from MySql to Cassandra but 
have yet to find a good way to transfer directly from a Microsoft SQL Server 
table to a Cassandra CF. Even better would be a method to take as input the 
output of an arbitrary SQL query. Ideas?

Re: Hadoop+Cassandra

2013-03-11 Thread Renato Marroquín Mogrovejo

Hi there,

Check this out [1]. It´s kinda old but I think it will help you get started.


Renato M.

[1] http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr

2013/3/11 oualid ait wafli :
> Hi
>
> I need a tutorial for deployong Hadoop+Cassandra on single-nodes
>
> Thanks

RE: Importing data from SQL Server

2013-03-11 Thread Lohith Samaga M

Hi,
You may try Talend data integration suite.

Lohith

 Original Message 
Subject: Importing data from SQL Server
From: Kevin Burton 
To: "user@cassandra.apache.org" 
CC: 

I have seen numerous posts on transferring data from MySql to Cassandra but 
have yet to find a good way to transfer directly from a Microsoft SQL Server 
table to a Cassandra CF. Even better would be a method to take as input the 
output of an arbitrary SQL query. Ideas?

Information transmitted by this e-mail is proprietary to MphasiS, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.

Re: Hadoop+Cassandra

2013-03-11 Thread oualid ait wafli

I use Cassandra 1.2.2  and Hadoop 1.0.4


2013/3/11 Renato Marroquín Mogrovejo 

> Hi there,
>
> Check this out [1]. It´s kinda old but I think it will help you get
> started.
>
>
> Renato M.
>
> [1] http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr
>
> 2013/3/11 oualid ait wafli :
> > Hi
> >
> > I need a tutorial for deployong Hadoop+Cassandra on single-nodes
> >
> > Thanks
>

RE: Importing data from SQL Server

2013-03-11 Thread Pierre Chalamet

You can quickly create a query using dapper and then transfer all your rows  
using cassandra-sharp actually. The only thing you have to do is to create a 
class to materialize a row and obviously the SQL and CQL statements.

I also have a pending change for cassandra-sharp-contrib that I'm planning to 
submit for performing such task. If you are patient enough, you will be able to 
get it in a few days. This requires cql 3 btw.

- Pierre

-Original Message-
From: "Kevin Burton" 
Sent: ‎11/‎03/‎2013 15:25
To: "user@cassandra.apache.org" 
Subject: Importing data from SQL Server

I have seen numerous posts on transferring data from MySql to Cassandra but 
have yet to find a good way to transfer directly from a Microsoft SQL Server 
table to a Cassandra CF. Even better would be a method to take as input the 
output of an arbitrary SQL query. Ideas?

Re: Importing data from SQL Server

Not familiar with 'dapper' or Cassandra-sharp is there a step by step guide to 
this process including the install? Thanks for the tip.

On Mar 11, 2013, at 9:41 AM, Pierre Chalamet  wrote:

> You can quickly create a query using dapper and then transfer all your rows  
> using cassandra-sharp actually. The only thing you have to do is to create a 
> class to materialize a row and obviously the SQL and CQL statements.
> 
> I also have a pending change for cassandra-sharp-contrib that I'm planning to 
> submit for performing such task. If you are patient enough, you will be able 
> to get it in a few days. This requires cql 3 btw.
> 
> - Pierre
> From: Kevin Burton
> Sent: ‎11/‎03/‎2013 15:25
> To: user@cassandra.apache.org
> Subject: Importing data from SQL Server
> 
> I have seen numerous posts on transferring data from MySql to Cassandra but 
> have yet to find a good way to transfer directly from a Microsoft SQL Server 
> table to a Cassandra CF. Even better would be a method to take as input the 
> output of an arbitrary SQL query. Ideas?

Re: Importing data from SQL Server

They mention Hadoop, HBase, and Hive. I am assuming that Cassandra comes under 
the umbrella of 'NoSql databases'.

On Mar 11, 2013, at 9:33 AM, "Lohith Samaga M"  
wrote:

> Hi,
> You may try Talend data integration suite.
> 
> Lohith
> 
>  Original Message 
> Subject: Importing data from SQL Server
> From: Kevin Burton 
> To: "user@cassandra.apache.org" 
> CC: 
> 
> I have seen numerous posts on transferring data from MySql to Cassandra but 
> have yet to find a good way to transfer directly from a Microsoft SQL Server 
> table to a Cassandra CF. Even better would be a method to take as input the 
> output of an arbitrary SQL query. Ideas?
> 
> Information transmitted by this e-mail is proprietary to MphasiS, its 
> associated companies and/ or its customers and is intended 
> for use only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient or it appears that this mail has been forwarded 
> to you without proper authority, you are notified that any use or 
> dissemination of this information in any manner is strictly 
> prohibited. In such cases, please notify us immediately at 
> mailmas...@mphasis.com and delete this mail from your records.

Re: Importing data from SQL Server

2013-03-11 Thread Lohith Samaga M

Please check the bigdata package.

Lohith.

Sent from my Xperia™ smartphone smartphone

 Original Message 
Subject: Re: Importing data from SQL Server
From: Kevin Burton 
To: "user@cassandra.apache.org" 
CC: 

They mention Hadoop, HBase, and Hive. I am assuming that Cassandra comes under 
the umbrella of 'NoSql databases'.

On Mar 11, 2013, at 9:33 AM, "Lohith Samaga M"  
wrote:

> Hi,
> You may try Talend data integration suite.
> 
> Lohith
> 
>  Original Message 
> Subject: Importing data from SQL Server
> From: Kevin Burton 
> To: "user@cassandra.apache.org" 
> CC: 
> 
> I have seen numerous posts on transferring data from MySql to Cassandra but 
> have yet to find a good way to transfer directly from a Microsoft SQL Server 
> table to a Cassandra CF. Even better would be a method to take as input the 
> output of an arbitrary SQL query. Ideas?
> 
> Information transmitted by this e-mail is proprietary to MphasiS, its 
> associated companies and/ or its customers and is intended 
> for use only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient or it appears that this mail has been forwarded 
> to you without proper authority, you are notified that any use or 
> dissemination of this information in any manner is strictly 
> prohibited. In such cases, please notify us immediately at 
> mailmas...@mphasis.com and delete this mail from your records.

Information transmitted by this e-mail is proprietary to MphasiS, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.

Re: Importing data from SQL Server

Where can I get the 'bigdata package"?

On Mar 11, 2013, at 10:01 AM, "Lohith Samaga M"  
wrote:

> Please check the bigdata package.
> 
> Lohith.
> 
> Sent from my Xperia™ smartphone smartphone
> 
>  Original Message 
> Subject: Re: Importing data from SQL Server
> From: Kevin Burton 
> To: "user@cassandra.apache.org" 
> CC: 
> 
> They mention Hadoop, HBase, and Hive. I am assuming that Cassandra comes 
> under the umbrella of 'NoSql databases'.
> 
> On Mar 11, 2013, at 9:33 AM, "Lohith Samaga M"  
> wrote:
> 
>> Hi,
>> You may try Talend data integration suite.
>> 
>> Lohith
>> 
>>  Original Message 
>> Subject: Importing data from SQL Server
>> From: Kevin Burton 
>> To: "user@cassandra.apache.org" 
>> CC: 
>> 
>> I have seen numerous posts on transferring data from MySql to Cassandra but 
>> have yet to find a good way to transfer directly from a Microsoft SQL Server 
>> table to a Cassandra CF. Even better would be a method to take as input the 
>> output of an arbitrary SQL query. Ideas?
>> 
>> Information transmitted by this e-mail is proprietary to MphasiS, its 
>> associated companies and/ or its customers and is intended 
>> for use only by the individual or entity to which it is addressed, and may 
>> contain information that is privileged, confidential or 
>> exempt from disclosure under applicable law. If you are not the intended 
>> recipient or it appears that this mail has been forwarded 
>> to you without proper authority, you are notified that any use or 
>> dissemination of this information in any manner is strictly 
>> prohibited. In such cases, please notify us immediately at 
>> mailmas...@mphasis.com and delete this mail from your records.
> 
> Information transmitted by this e-mail is proprietary to MphasiS, its 
> associated companies and/ or its customers and is intended 
> for use only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient or it appears that this mail has been forwarded 
> to you without proper authority, you are notified that any use or 
> dissemination of this information in any manner is strictly 
> prohibited. In such cases, please notify us immediately at 
> mailmas...@mphasis.com and delete this mail from your records.

C* 1.2.2 , startup error

2013-03-11 Thread Marco Matarazzo

I'm seeing this error on cassandra 1.2.2 on startup: 

ERROR [COMMIT-LOG-ALLOCATOR] 2013-03-11 16:51:22,076 CassandraDaemon.java (line 
132) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
FSWriteError in /var/lib/cassandra/commitlog/CommitLog-2-1363017061553.log
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:132)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAllocator.java:196)
at 
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:94)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:679)
Caused by: java.io.FileNotFoundException: 
/var/lib/cassandra/commitlog/CommitLog-2-1363017061553.log (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFile.java:233)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:116)
... 4 more

Clues about what can cause it ?

--
Marco Matarazzo
== Hex Keep ==

W: http://www.hexkeep.com
M: +39 347 8798528
E: marco.matara...@hexkeep.com

"You can learn more about a man
  in one hour of play
  than in one year of conversation.” - Plato

Re: C* 1.2.2 , startup error

2013-03-11 Thread Edward Capriolo

Caused by: java.io.FileNotFoundException: /var/lib/cassandra/commitlog/
CommitLog-2-1363017061553.log (Permission denied)

^ It seems like your running cassandra as a user thatdoes not have access
to this directory. Possibly you can something as "root" and now the files
are root owned at one point in time. Now you are trying to run as
"cassandra" and it is not working.


On Mon, Mar 11, 2013 at 11:57 AM, Marco Matarazzo <
marco.matara...@hexkeep.com> wrote:

> I'm seeing this error on cassandra 1.2.2 on startup:
>
> ERROR [COMMIT-LOG-ALLOCATOR] 2013-03-11 16:51:22,076 CassandraDaemon.java
> (line 132) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
> FSWriteError in /var/lib/cassandra/commitlog/CommitLog-2-1363017061553.log
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:132)
> at
> org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAllocator.java:196)
> at
> org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:94)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at java.lang.Thread.run(Thread.java:679)
> Caused by: java.io.FileNotFoundException:
> /var/lib/cassandra/commitlog/CommitLog-2-1363017061553.log (Permission
> denied)
> at java.io.RandomAccessFile.open(Native Method)
> at java.io.RandomAccessFile.(RandomAccessFile.java:233)
> at
> org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment.java:116)
> ... 4 more
>
> Clues about what can cause it ?
>
> --
> Marco Matarazzo
> == Hex Keep ==
>
> W: http://www.hexkeep.com
> M: +39 347 8798528
> E: marco.matara...@hexkeep.com
>
> "You can learn more about a man
>   in one hour of play
>   than in one year of conversation.” - Plato
>
>
>
>
>

Re: C* 1.2.2 , startup error

Always make sure root does not have "java" in his path so you don't make
that mistake.  At least that is how we handle it so root never runs
cassandra as we end up with java not found.

Later,
Dean

On 3/11/13 9:57 AM, "Marco Matarazzo"  wrote:

>I'm seeing this error on cassandra 1.2.2 on startup:
>
>ERROR [COMMIT-LOG-ALLOCATOR] 2013-03-11 16:51:22,076 CassandraDaemon.java
>(line 132) Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
>FSWriteError in /var/lib/cassandra/commitlog/CommitLog-2-1363017061553.log
>   at 
>org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment
>.java:132)
>   at 
>org.apache.cassandra.db.commitlog.CommitLogAllocator$3.run(CommitLogAlloca
>tor.java:196)
>   at 
>org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitL
>ogAllocator.java:94)
>   at 
>org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at java.lang.Thread.run(Thread.java:679)
>Caused by: java.io.FileNotFoundException:
>/var/lib/cassandra/commitlog/CommitLog-2-1363017061553.log (Permission
>denied)
>   at java.io.RandomAccessFile.open(Native Method)
>   at java.io.RandomAccessFile.(RandomAccessFile.java:233)
>   at 
>org.apache.cassandra.db.commitlog.CommitLogSegment.(CommitLogSegment
>.java:116)
>   ... 4 more
>
>Clues about what can cause it ?
>
>--
>Marco Matarazzo
>== Hex Keep ==
>
>W: http://www.hexkeep.com
>M: +39 347 8798528
>E: marco.matara...@hexkeep.com
>
>"You can learn more about a man
>  in one hour of play
>  than in one year of conversation.² - Plato
>
>
>
>

Re: dynamic switch works with NetworkSTrategy NOT SimpleStrategy..solution followsŠ.

2013-03-11 Thread Edward Capriolo

You should file a JIRA if dsnitch only works with LOCAL_QUORUM something is
very wrong.

On Mon, Mar 11, 2013 at 9:58 AM, Hiller, Dean  wrote:

> Well, we finally have dynamic switch working.  It seems to switch nodes to
> a Remote node and SimpleSTrategy can not deal with that well.  Also, we had
> to move to CL=QUOROM_LOCAL instead of QUOROM.
>
> So for those of you that want cassandra to keep performing well when one
> node starts to get really slow, make sure you have
>
>  1.  NetworkTopologyStrategy set
>  2.  Have CL=QUOROM_LOCAL
>
> We had to dive into the code to figure all that out.  I hope it helps
> someone else.
>
> Later,
> Dean
>
> From: aaron morton mailto:aa...@thelastpickle.com
> >>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Date: Monday, March 11, 2013 6:48 AM
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Re: has anyone used dynamic snitch at all
>
> Check that read_repair_chance on the CF's is 0.1, not the old 1.0
>
> Wait at least 10 minutes for the DynamicSnitch to re-calculate.
>
> Use the org.apache.cassandra.db:type=DynamicEndpointSnitch MBean to see
> what scores it has given the nodes.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/03/2013, at 11:40 AM, Edward Capriolo  edlinuxg...@gmail.com>> wrote:
>
> dynamic_snitch=true is the default. So it is usually on "wrapping" other
> snitches. I have found several scenarios where it does not work exactly as
> your would expect.
>
> On Fri, Mar 8, 2013 at 2:26 PM, Hiller, Dean  dean.hil...@nrel.gov>> wrote:
> Our test setup
>
> 4 nodes, RF=3, reads at CL=QUOROM and we tried CL=TWO
> Tell the network card to slow down every packet on node 2
> After fixing astyanax to not go to node 2 anymore, we are still seeing
> cassandra have issues as it seems to be involving node 2 somehow.  If we
> take node 2 down, it all speeds back up.
>
> We are trying to get this working such that a slow node in cassandra does
> not impact our customers.
>
> We are in 1.2.2 and added the following properties….(our properties show
> PropertyFileSnitch though I see the keyspace has
> org.apache.cassandra.locator.SimpleStrategy set probably because it was
> created through a tool instead of CLI…shucks)….anyways, I still expected
> dynamic snitch to work….
>
> # controls how often to perform the more expensive part of host score
> # calculation
> dynamic_snitch: true
> dynamic_snitch_update_interval_in_ms: 100
> # controls how often to reset all host scores, allowing a bad host to
> # possibly recover
> dynamic_snitch_reset_interval_in_ms: 60
> # if set greater than zero and read_repair_chance is < 1.0, this will allow
> # 'pinning' of replicas to hosts in order to increase cache capacity.
> # The badness threshold will control how much worse the pinned host has to
> be
> # before the dynamic snitch will prefer other replicas over it.  This is
> # expressed as a double which represents a percentage.  Thus, a value of
> # 0.2 means Cassandra would continue to prefer the static snitch values
> # until the pinned host was 20% worse than the fastest.
> dynamic_snitch_badness_threshold: 0.1
>
> Any help appreciated,
> Thanks,
> Dean
>
>
>

Re: Time series data and deletion

2013-03-11 Thread Flavio Baronti

Il 2013/03/11 14:42 PM, aaron morton ha scritto:

I'm trying to understand what will happen when we start deleting the old data.

Are you going to delete data or use the TTL?

We delete the data explicitly, since we might change idea on the data TTL after
it has been written.

With size tiered compaction, suppose we have one 160Gb sstable and some smaller
tables totalling 40Gb.

Not sure on that, it depends on the work load.

NVM, was just a hypothesis.

My understanding is that, even if we start deleting, we will have to wait for 3 more 160Gb tables to appear, in
order to have the first sstable compacted and the disk space freed.

v1.2 will run compactions on single SSTables that have a high number of
tombstones
https://issues.apache.org/jira/browse/CASSANDRA-3442
https://issues.apache.org/jira/browse/CASSANDRA-4234

I did not know about these improvements in 1.2! We're still on 1.0.12, I'll
push for an upgrade.

One more question. I read and reread your description of deletes [1], but I still am confused on tombstones and
GCGraceSeconds, specifically when you say "If the deletion is before gcBefore it is totally ignored".
Suppose I delete something, but compaction between the tombstone and the deleted data does not happen within
GCGraceSeconds. From what I understood, it looks like the tombstone will be ignored, and the data will "resurrect"...
where am I wrong?

Cheers
Flavio

[1]
http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/#local_reads_for_local_queries

running Cassandra in dual stack (ipv4 + ipv6)

2013-03-11 Thread Илья Шипицин

Hello!

is it possible to use both ipv4 and ipv6 for Cassandra cluster ?

Cheers,
Ilya Shipitsin

Re: Pig / Map Reduce on Cassandra

2013-03-11 Thread cscetbon.ext

You said all versions. However, when I try to access 
cassandra://twissandra/users based on 
http://www.datastax.com/docs/1.0/dml/using_cql I get :

2013-03-11 17:35:48,444 [main] INFO  org.apache.pig.Main - Apache Pig version 
0.11.0 (r1446324) compiled Feb 14 2013, 16:40:57
2013-03-11 17:35:48,445 [main] INFO  org.apache.pig.Main - Logging error 
messages to: /Users/cyril/pig_1363019748442.log
2013-03-11 17:35:48.583 java[13809:1203] Unable to load realm info from 
SCDynamicStore
2013-03-11 17:35:48,750 [main] INFO  org.apache.pig.impl.util.Utils - Default 
bootup file /Users/cyril/.pigbootup not found
2013-03-11 17:35:48,831 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2013-03-11 17:35:49,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2245: Cannot get schema from loadFunc 
org.apache.cassandra.hadoop.pig.CassandraStorage

with pig 0.11.0

any idea why the function loadFunc does not work correctly ?

thanks
--
Cyril SCETBON

On Jan 18, 2013, at 7:00 PM, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:

Silly question -- but does hive/pig hadoop etc work with cassandra
1.1.8?  Or only with 1.2?
all versions.

We are using astyanax library, which seems
to fail horribly on 1.2,
How does it fail ?
If you think you have a bug post it at https://github.com/Netflix/astyanax

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/01/2013, at 7:48 AM, James Lyons 
mailto:james.ly...@gmail.com>> wrote:

Silly question -- but does hive/pig hadoop etc work with cassandra
1.1.8?  Or only with 1.2?  We are using astyanax library, which seems
to fail horribly on 1.2, so we're still on 1.1.8.  But we're just
starting out with this and i'm still debating between cassandra and
hbase.  So I just want to know if there is a limitation here or not,
as I have no idea when 1.2 support will exist in astyanax.

That said, are there other java (scala) libraries that people use to
connect to cassandra that support 1.2?

-James-

On Thu, Jan 17, 2013 at 8:30 AM,  
mailto:cscetbon@orange.com>> wrote:
Ok, I understand that I need to manage both cassandra and hadoop components
and that pig will use hadoop components to launch its tasks which will use
Cassandra as the Storage engine.

Thanks
--
Cyril SCETBON

On Jan 17, 2013, at 4:03 PM, James Schappet 
mailto:jschap...@gmail.com>> wrote:

This really depends on how you design your Hadoop Cluster.  The testing I
have done, had Hadoop and Cassandra Nodes collocated on the same hosts.
Remember that Pig code runs inside of your hadoop cluster, and connects to
Cassandra as the Database engine.


I have not done any testing with Hive, so someone else will have to answer
that question.


From: mailto:cscetbon@orange.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thursday, January 17, 2013 8:58 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Pig / Map Reduce on Cassandra

Jimmy,

I understand that CFS can replace HDFS for those who use Hadoop. I just want
to use pig and hive on cassandra. I know that pig samples are provided and
work now with cassandra natively (they are part of the core). However, does
it mean that the process will be spread over nodes with
number_of_mapper=number_of_nodes or something like that ?
Can Hive connect to Cassandra 1.2 easily too ?

--
Cyril Scetbon

On Jan 17, 2013, at 2:42 PM, James Schappet 
mailto:jschap...@gmail.com>> wrote:

CFS is Cassandra File System:
http://www.datastax.com/dev/blog/cassandra-file-system-design


But you don't need CFS to connect from PIG to Cassandra.  The latest
versions of Cassandra Source ship with examples of connecting from pig to
cassandra.


apache-cassandra-1.2.0-src/examples/pig   --
http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-src.tar.gz

--Jimmy


From: 
Reply-To: 
Date: Thursday, January 17, 2013 6:35 AM
To: "user@cassandra.apache.org" 
Subject: Re: Pig / Map Reduce on Cassandra

what do you mean ? it's not needed by Pig or Hive to access Cassandra data.

Regards

On Jan 16, 2013, at 11:14 PM, Brandon Williams  wrote:

You won't get CFS,
but it's not a hard requirement, either.


_

Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu
ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete
altere, deforme ou falsifie. Merci.

This message and its attachments may conta

Re: Importing data from SQL Server

2013-03-11 Thread Lohith Samaga M

Sorry, I was referring to the Talend Open Studio for BigData.

Lohith.

Sent from my Xperia™ smartphone smartphone

 Original Message 
Subject: Re: Importing data from SQL Server
From: Kevin Burton 
To: "user@cassandra.apache.org" 
CC: 

Where can I get the 'bigdata package"?

On Mar 11, 2013, at 10:01 AM, "Lohith Samaga M"  
wrote:

> Please check the bigdata package.
> 
> Lohith.
> 
> Sent from my Xperia™ smartphone smartphone
> 
>  Original Message 
> Subject: Re: Importing data from SQL Server
> From: Kevin Burton 
> To: "user@cassandra.apache.org" 
> CC: 
> 
> They mention Hadoop, HBase, and Hive. I am assuming that Cassandra comes 
> under the umbrella of 'NoSql databases'.
> 
> On Mar 11, 2013, at 9:33 AM, "Lohith Samaga M"  
> wrote:
> 
>> Hi,
>> You may try Talend data integration suite.
>> 
>> Lohith
>> 
>>  Original Message 
>> Subject: Importing data from SQL Server
>> From: Kevin Burton 
>> To: "user@cassandra.apache.org" 
>> CC: 
>> 
>> I have seen numerous posts on transferring data from MySql to Cassandra but 
>> have yet to find a good way to transfer directly from a Microsoft SQL Server 
>> table to a Cassandra CF. Even better would be a method to take as input the 
>> output of an arbitrary SQL query. Ideas?
>> 
>> Information transmitted by this e-mail is proprietary to MphasiS, its 
>> associated companies and/ or its customers and is intended 
>> for use only by the individual or entity to which it is addressed, and may 
>> contain information that is privileged, confidential or 
>> exempt from disclosure under applicable law. If you are not the intended 
>> recipient or it appears that this mail has been forwarded 
>> to you without proper authority, you are notified that any use or 
>> dissemination of this information in any manner is strictly 
>> prohibited. In such cases, please notify us immediately at 
>> mailmas...@mphasis.com and delete this mail from your records.
> 
> Information transmitted by this e-mail is proprietary to MphasiS, its 
> associated companies and/ or its customers and is intended 
> for use only by the individual or entity to which it is addressed, and may 
> contain information that is privileged, confidential or 
> exempt from disclosure under applicable law. If you are not the intended 
> recipient or it appears that this mail has been forwarded 
> to you without proper authority, you are notified that any use or 
> dissemination of this information in any manner is strictly 
> prohibited. In such cases, please notify us immediately at 
> mailmas...@mphasis.com and delete this mail from your records.

Information transmitted by this e-mail is proprietary to MphasiS, its 
associated companies and/ or its customers and is intended 
for use only by the individual or entity to which it is addressed, and may 
contain information that is privileged, confidential or 
exempt from disclosure under applicable law. If you are not the intended 
recipient or it appears that this mail has been forwarded 
to you without proper authority, you are notified that any use or dissemination 
of this information in any manner is strictly 
prohibited. In such cases, please notify us immediately at 
mailmas...@mphasis.com and delete this mail from your records.

Re: Quorum read after quorum write guarantee

2013-03-11 Thread Tyler Hobbs

What kind of inserts and multiget queries are you running?


On Sun, Mar 10, 2013 at 1:16 PM, André Cruz  wrote:

> On 10/03/2013, at 16:49, Chuan-Heng Hsiao 
> wrote:
> > However, my guess is that cassandra only guarantee that
> > if you successfully write and you successfully read, then quorum will
> > give you the latest data.
>
> That's what I thought, but that's not what I'm seeing all the time. I have
> no errors reading or writing.
>
> André




-- 
Tyler Hobbs
DataStax

Re: Time series data and deletion

2013-03-11 Thread Tyler Hobbs

On Mon, Mar 11, 2013 at 11:25 AM, Flavio Baronti
wrote:

>
>
> One more question. I read and reread your description of deletes [1],  but
> I still am confused on tombstones and GCGraceSeconds, specifically when you
> say "If the deletion is before gcBefore it is totally ignored".
> Suppose I delete something, but compaction between the tombstone and the
> deleted data does not happen within GCGraceSeconds. From what I understood,
> it looks like the tombstone will be ignored, and the data will
> "resurrect"... where am I wrong?


This may answer your questions:
http://wiki.apache.org/cassandra/DistributedDeletes


-- 
Tyler Hobbs
DataStax

Re: dynamic switch works with NetworkSTrategy NOT SimpleStrategy..solution followsŠ.

Ticket filed.

https://issues.apache.org/jira/browse/CASSANDRA-5333

Thanks,
Dean

From: Edward Capriolo mailto:edlinuxg...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, March 11, 2013 9:16 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: dynamic switch works with NetworkSTrategy NOT 
SimpleStrategy..solution followsŠ.

You should file a JIRA if dsnitch only works with LOCAL_QUORUM something is 
very wrong.

On Mon, Mar 11, 2013 at 9:58 AM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>> wrote:
Well, we finally have dynamic switch working.  It seems to switch nodes to a 
Remote node and SimpleSTrategy can not deal with that well.  Also, we had to 
move to CL=QUOROM_LOCAL instead of QUOROM.

So for those of you that want cassandra to keep performing well when one node 
starts to get really slow, make sure you have

 1.  NetworkTopologyStrategy set
 2.  Have CL=QUOROM_LOCAL

We had to dive into the code to figure all that out.  I hope it helps someone 
else.

Later,
Dean

From: aaron morton 
mailto:aa...@thelastpickle.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Date: Monday, March 11, 2013 6:48 AM
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: Re: has anyone used dynamic snitch at all

Check that read_repair_chance on the CF's is 0.1, not the old 1.0

Wait at least 10 minutes for the DynamicSnitch to re-calculate.

Use the org.apache.cassandra.db:type=DynamicEndpointSnitch MBean to see what 
scores it has given the nodes.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 11:40 AM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>>>
 wrote:

dynamic_snitch=true is the default. So it is usually on "wrapping" other 
snitches. I have found several scenarios where it does not work exactly as your 
would expect.

On Fri, Mar 8, 2013 at 2:26 PM, Hiller, Dean 
mailto:dean.hil...@nrel.gov>>>
 wrote:
Our test setup

4 nodes, RF=3, reads at CL=QUOROM and we tried CL=TWO
Tell the network card to slow down every packet on node 2
After fixing astyanax to not go to node 2 anymore, we are still seeing 
cassandra have issues as it seems to be involving node 2 somehow.  If we take 
node 2 down, it all speeds back up.

We are trying to get this working such that a slow node in cassandra does not 
impact our customers.

We are in 1.2.2 and added the following properties….(our properties show 
PropertyFileSnitch though I see the keyspace has 
org.apache.cassandra.locator.SimpleStrategy set probably because it was created 
through a tool instead of CLI…shucks)….anyways, I still expected dynamic snitch 
to work….

# controls how often to perform the more expensive part of host score
# calculation
dynamic_snitch: true
dynamic_snitch_update_interval_in_ms: 100
# controls how often to reset all host scores, allowing a bad host to
# possibly recover
dynamic_snitch_reset_interval_in_ms: 60
# if set greater than zero and read_repair_chance is < 1.0, this will allow
# 'pinning' of replicas to hosts in order to increase cache capacity.
# The badness threshold will control how much worse the pinned host has to be
# before the dynamic snitch will prefer other replicas over it.  This is
# expressed as a double which represents a percentage.  Thus, a value of
# 0.2 means Cassandra would continue to prefer the static snitch values
# until the pinned host was 20% worse than the fastest.
dynamic_snitch_badness_threshold: 0.1

Any help appreciated,
Thanks,
Dean

Re: Running cassandra across nat?

2013-03-11 Thread Ben Chobot

Can you not set up VPN between your data centers?

On Mar 10, 2013, at 7:05 AM, Илья Шипицин wrote:

> Hello!
> 
> Is it possible to run cluster in 2 datacenters which are not routable?
> Each datacenter is running its own lan prefixes, however lan are not routable 
> across datacenters.
> 
> Cheers,
> Ilya Shipitsin

Re:Hadoop+Cassandra

2013-03-11 Thread Шамим

http://frommyworkshop.blogspot.ru/2012/07/single-node-hadoop-cassandra-pig-setup.html

> I use Cassandra 1.2.2 and Hadoop 1.0.4
> 
> 2013/3/11 Renato Marroquín Mogrovejo 
> 
>> Hi there,
>>
>> Check this out [1]. It´s kinda old but I think it will help you get started.
>>
>> Renato M.
>>
>> [1] http://www.datastax.com/docs/0.7/map_reduce/hadoop_mr
>>
>> 2013/3/11 oualid ait wafli :
>>
>>> Hi
>>
>>>
>>
>>> I need a tutorial for deployong Hadoop+Cassandra on single-nodes
>>
>>>
>>
>>> Thanks


 --

Re: Quorum read after quorum write guarantee

2013-03-11 Thread André Cruz

On Mar 11, 2013, at 5:02 PM, Tyler Hobbs  wrote:

> What kind of inserts and multiget queries are you running?

I use the ColumnFamily objects. The pool is initialised with 
"write_consistency_level=ConsistencyLevel.QUORUM".

The insert is a regular insert, so the QUORUM is used. When fetching I use:

CF.multiget(blocks, columns=['size'], 
read_consistency_level=ConsistencyLevel.QUORUM)

I have 6 nodes and a RF 3.

André

migrating from SimpleStrategy to NetworkTopologyStrategy

2013-03-11 Thread Dane Miller

Hi,

I'd like to resurrect this thread from April 2012 -
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/migrating-from-SimpleStrategy-to-NetworkTopologyStrategy-td7481090.html
- "migrating from SimpleStrategy to NetworkTopologyStrategy"

We're in a similar situation, and I'd like to understand this more
thoroughly.  In preparation for adding another datacenter to our
cassandra cluster, I'd like to migrate from EC2Snitch + SimpleStrategy
to GosspingPropertyFileSnitch + NetworkTopologyStrategy.  Here are the
steps I'm planning:

Change the Snitch
1. set endpoint_snitch: GosspingPropertyFileSnitch in cassandra.yaml
2. configure cassandra-rackdc.properties to use a single rack and
datacenter,: rack=RAC1, dc=dc1
3. do a rolling restart of all nodes

Change replication strategy
4. for all keyspaces (except system*), alter keyspace ... with
replication = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3}
5. run "nodetool repair -pr" on all nodes

Does this look right?  Also, I'm curious whether/why step 5 is
necessary, given the single rack configuration.

Versions:
cassandra 1.2.2
dsc12 1.2.2-1
Ubuntu 12.04, x86_64
Datastax AMI

Thanks!

Dane

Re: HintedHandoff IOError?

2013-03-11 Thread Robert Coli

On Mon, Mar 11, 2013 at 7:05 AM, Janne Jalkanen
 wrote:
> I keep seeing these in my log.  Three-node cluster, one node is working fine, 
> but two other nodes have increased latencies and these in the error logs 
> (might of course be unrelated). No obvious GC pressure, no disk errors that I 
> can see.  Ubuntu 12.04 on EC2, Java 7. Repair is run regularly.
>
> My two questions: 1) should I worry, and 2) what might be going on, and 3) is 
> there any way to get rid of these? Can I just blow my HintedHandoff table to 
> smithereens?

http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/io/sstable/IndexHelper.java
"
public static Filter defreezeBloomFilter(FileDataInput file, long
maxSize, boolean useOldBuffer) throws IOException
{
int size = file.readInt();
if (size > maxSize || size <= 0)
throw new EOFException("bloom filter claims to be " + size
+ " bytes, longer than entire row size " + maxSize);
ByteBuffer bytes = file.readBytes(size);
"

Based on the above, I would suspect either a zero byte -Filter.db file
or a corrupt one. Probably worry a little bit, but only a little bit
unless your cluster is RF=1.

=Rob

Re: Can't replace dead node

2013-03-11 Thread Arya Goudarzi

You may have bumped to this issue:
https://github.com/Netflix/Priam/issues/161
make sure is_replace_token Priam API call is working for you.

On Fri, Mar 8, 2013 at 8:22 AM, aaron morton wrote:

> If it does not have the schema check the logs for errors and ensure it is
> actually part of the cluster.
>
> You may have better luck with Priam specific questions on
> https://github.com/Netflix/Priam
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 7/03/2013, at 11:11 AM, Andrey Ilinykh  wrote:
>
> Hello everybody!
>
> I used to run cassandra 1.1.5 with Priam. To replace dead node priam
> launches cassandra with cassandra.replace_token property. It works smoothly
> with 1.1.5. Couple days ago I moved to 1.1.10 and have a problem now. New
> cassandra successfully starts, joins the ring but it doesn't see my
> keyspaces. It doesn't try to stream data from other nodes. I see only
> system keyspace. Any idea what is the difference between 1.1.5 and 1.1.10?
> How am I supposed to replace dead node?
>
> Thank you,
>Andrey
>
>
>

Re: migrating from SimpleStrategy to NetworkTopologyStrategy

2013-03-11 Thread Jason Wee

Probably also ensure port 7000 for the nodes to be reachable between nodes.

Jason


On Tue, Mar 12, 2013 at 4:11 AM, Dane Miller  wrote:

> Hi,
>
> I'd like to resurrect this thread from April 2012 -
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/migrating-from-SimpleStrategy-to-NetworkTopologyStrategy-td7481090.html
> - "migrating from SimpleStrategy to NetworkTopologyStrategy"
>
> We're in a similar situation, and I'd like to understand this more
> thoroughly.  In preparation for adding another datacenter to our
> cassandra cluster, I'd like to migrate from EC2Snitch + SimpleStrategy
> to GosspingPropertyFileSnitch + NetworkTopologyStrategy.  Here are the
> steps I'm planning:
>
> Change the Snitch
> 1. set endpoint_snitch: GosspingPropertyFileSnitch in cassandra.yaml
> 2. configure cassandra-rackdc.properties to use a single rack and
> datacenter,: rack=RAC1, dc=dc1
> 3. do a rolling restart of all nodes
>
> Change replication strategy
> 4. for all keyspaces (except system*), alter keyspace ... with
> replication = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3}
> 5. run "nodetool repair -pr" on all nodes
>
> Does this look right?  Also, I'm curious whether/why step 5 is
> necessary, given the single rack configuration.
>
> Versions:
> cassandra 1.2.2
> dsc12 1.2.2-1
> Ubuntu 12.04, x86_64
> Datastax AMI
>
> Thanks!
>
> Dane
>

Re: Cassandra cluster setup issue

> I can see this problem resurfacing in Cassandra 1.1.9 on my system. I am 
> using RHEL 6.0 and 7199 can be seen bound by Cassandra upon netstat. When I 
> do "telnet x.x.x.x 7199", that works too. 
If the process was started with a JMX port property then you are not seeing 
that error. 

Can you show the output from node ring? 
Can you show the command you are using for nodetool ? 

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 11:34 PM, amulya rattan  wrote:

> I just saw that the "failed to connect to 'x.x.x.x:7199' : connection 
> refused" error for nodetool that I am facing used to exist earlier also, case 
> in point https://issues.apache.org/jira/browse/CASSANDRA-4494
> I can see this problem resurfacing in Cassandra 1.1.9 on my system. I am 
> using RHEL 6.0 and 7199 can be seen bound by Cassandra upon netstat. When I 
> do "telnet x.x.x.x 7199", that works too. 
> 
> Any pointers please. 
> 
> 
> 2013/3/8 amulya rattan 
> I have started cassandra on three nodes, however opscenter only shows two of 
> them. The missing one is starting properly, with gossip and everything else 
> showing no errors. It also has the other nodes as seeds in cassandra.yaml, so 
> I am amazed why it doesn't show as part of the ring. I have tried restarting 
> it a bunch of times but neither does it throw any error, nor it shows as part 
> of the ring 
> 
> Another strange thing is that nodetool won't start on any of the nodes. It 
> keeps throwing connection refused error. It's not the same usual error one 
> faces that starts with "Error connection to remote JMX agent!". It simply 
> says connection refused on x.x.x.x:7199. However, I can see the JMX port is 
> bound by cassandra on 0.0.0.0. I can use cassandra-cli on 9160 and opscenter 
> on , but not jmx port. As last resort, I turned off firewall completely, 
> but to no avail.
> 
> I never faced any such errors setting cassandra cluster up in the past, so 
> this is completely flabbergasting to me given that the port is bound and 
> firewall is off. 
> 
> Anybody faced anything similar in the past? Or anybody knows how to solve it? 
> All responses appreciated.
> 
> ~Amulya 
>

Re: Incompatible Gossip 1.1.6 to 1.2.1 Upgrade?

> Is this just a display bug in nodetool or this upgraded node really sees the 
> other ones as dead?
Is the 1.2.2 node which is see all the others as down processing requests ? 
Is it showing the others as down in the log ? 

I'm not really sure what's happening. But you can try starting the 1.2.2 node 
with the 

-Dcassandra.load_ring_state=false  

parameter, append it at the bottom of the cassandra-env.sh file. It will force 
the node to get the ring state from the others. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/03/2013, at 10:24 PM, Arya Goudarzi  wrote:

> OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new problems 
> that I had and I posted them in a separate email, this issue still exists but 
> now it is only on 1.2.2 node. This means that the nodes running 1.1.6 see all 
> other nodes including 1.2.2 as Up. Here is the ring and gossip from nodes 
> with 1.1.6 for example. Bold denotes upgraded node:
> 
> Address DC  RackStatus State   Load
> Effective-Ownership Token
>   
>  141784319550391026443072753098378663700
> XX.180.36us-east 1b  Up Normal  49.47 GB25.00%
>   1808575600
> XX.231.121  us-east 1c  Up Normal  47.08 GB25.00% 
>  7089215977519551322153637656637080005
> XX.177.177  us-east 1d  Up Normal  33.64 GB25.00% 
>  14178431955039102644307275311465584410
> XX.7.148us-east 1b  Up Normal  41.27 GB25.00% 
>  42535295865117307932921825930779602030
> XX.20.9 us-east 1c  Up Normal  38.51 GB25.00% 
>  49624511842636859255075463585608106435
> XX.86.255us-east 1d  Up Normal  34.78 GB25.00%
>   56713727820156410577229101240436610840
> XX.63.230us-east 1b  Up Normal  38.11 GB25.00%
>   85070591730234615865843651859750628460
> XX.163.36   us-east 1c  Up Normal  44.25 GB25.00% 
>  92159807707754167187997289514579132865
> XX.31.234us-east 1d  Up Normal  44.66 GB25.00%
>   99249023685273718510150927169407637270
> XX.132.169   us-east 1b  Up Normal  44.2 GB 25.00%
>   127605887595351923798765477788721654890
> XX.71.63 us-east 1c  Up Normal  38.74 GB25.00%
>   134695103572871475120919115443550159295
> XX.197.209  us-east 1d  Up Normal  41.5 GB 25.00% 
>  141784319550391026443072753098378663700
> 
> /XX.71.63
>   RACK:1c
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.1598705272E10
>   DC:us-east
>   INTERNAL_IP:XX.194.92
>   STATUS:NORMAL,134695103572871475120919115443550159295
>   RPC_ADDRESS:XX.194.92
>   RELEASE_VERSION:1.1.6
> /XX.86.255
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:3.734334162E10
>   DC:us-east
>   INTERNAL_IP:XX.6.195
>   STATUS:NORMAL,56713727820156410577229101240436610840
>   RPC_ADDRESS:XX.6.195
>   RELEASE_VERSION:1.1.6
> /XX.7.148
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.4316975808E10
>   DC:us-east
>   INTERNAL_IP:XX.47.250
>   STATUS:NORMAL,42535295865117307932921825930779602030
>   RPC_ADDRESS:XX.47.250
>   RELEASE_VERSION:1.1.6
> /XX.63.230
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.0918593305E10
>   DC:us-east
>   INTERNAL_IP:XX.89.127
>   STATUS:NORMAL,85070591730234615865843651859750628460
>   RPC_ADDRESS:XX.89.127
>   RELEASE_VERSION:1.1.6
> /XX.132.169
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.745883458E10
>   DC:us-east
>   INTERNAL_IP:XX.94.161
>   STATUS:NORMAL,127605887595351923798765477788721654890
>   RPC_ADDRESS:XX.94.161
>   RELEASE_VERSION:1.1.6
> /XX.180.36
>   RACK:1b
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:5.311963027E10
>   DC:us-east
>   INTERNAL_IP:XX.123.112
>   STATUS:NORMAL,1808575600
>   RPC_ADDRESS:XX.123.112
>   RELEASE_VERSION:1.1.6
> /XX.163.36
>   RACK:1c
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.7516755022E10
>   DC:us-east
>   INTERNAL_IP:XX.163.180
>   STATUS:NORMAL,92159807707754167187997289514579132865
>   RPC_ADDRESS:XX.163.180
>   RELEASE_VERSION:1.1.6
> /XX.31.234
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.7954372912E10
>   DC:us-east
>   INTERNAL_IP:XX.192.159
>   STATUS:NORMAL,99249023685273718510150927169407637270
>   RPC_ADDRESS:XX.192.159
>   RELEASE_VERSION:1.1.6
> /XX.197.209
>   RACK:1d
>   SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575
>   LOAD:4.4558968005E10
>   DC:us-east
>   INTERNAL_IP:XX.66.205
>   STATUS:NORMAL,141784319550391026443072753098378663700
>   RPC_ADDRE

Re: Deploy Cassandra with Hadoop

It's a lot easier for people to help you if you state what the problem is and 
what you have tried. 

There is some information on the wiki 
http://wiki.apache.org/cassandra/HadoopSupport
and some documentation on the data stax site 
http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/03/2013, at 4:08 AM, oualid ait wafli  wrote:

> Hi,
> 
> I am trying to deploy Cassandra (1.2.2) with Hadoop (1.0.4) to implement a 
> MapReduce Job, on Single-node cluster (simple machine)
> 
> So can anybody help me !
> 
> Cordialy

Re: Pig / Map Reduce on Cassandra

> any idea why the function loadFunc does not work correctly ?
No sorry. 
Not sure why you are linking to the CQL info or what Pig script / config you 
are running. 
Did you follow the example in the examples/pig in the source distribution ? 

Also please use at least cassandra 1.1. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/03/2013, at 9:39 AM, cscetbon@orange.com wrote:

> You said all versions. However, when I try to access 
> cassandra://twissandra/users based on 
> http://www.datastax.com/docs/1.0/dml/using_cql I get :
> 
> 2013-03-11 17:35:48,444 [main] INFO  org.apache.pig.Main - Apache Pig version 
> 0.11.0 (r1446324) compiled Feb 14 2013, 16:40:57
> 2013-03-11 17:35:48,445 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /Users/cyril/pig_1363019748442.log
> 2013-03-11 17:35:48.583 java[13809:1203] Unable to load realm info from 
> SCDynamicStore
> 2013-03-11 17:35:48,750 [main] INFO  org.apache.pig.impl.util.Utils - Default 
> bootup file /Users/cyril/.pigbootup not found
> 2013-03-11 17:35:48,831 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: file:///
> 2013-03-11 17:35:49,235 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 2245: Cannot get schema from loadFunc 
> org.apache.cassandra.hadoop.pig.CassandraStorage
> 
> with pig 0.11.0
> 
> any idea why the function loadFunc does not work correctly ?
> 
> thanks
> -- 
> Cyril SCETBON
> 
> On Jan 18, 2013, at 7:00 PM, aaron morton  wrote:
> 
>>> Silly question -- but does hive/pig hadoop etc work with cassandra
>>> 1.1.8?  Or only with 1.2?  
>> all versions. 
>> 
>>> We are using astyanax library, which seems
>>> to fail horribly on 1.2, 
>> How does it fail ? 
>> If you think you have a bug post it at https://github.com/Netflix/astyanax
>> 
>> Cheers
>> 
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 18/01/2013, at 7:48 AM, James Lyons  wrote:
>> 
>>> Silly question -- but does hive/pig hadoop etc work with cassandra
>>> 1.1.8?  Or only with 1.2?  We are using astyanax library, which seems
>>> to fail horribly on 1.2, so we're still on 1.1.8.  But we're just
>>> starting out with this and i'm still debating between cassandra and
>>> hbase.  So I just want to know if there is a limitation here or not,
>>> as I have no idea when 1.2 support will exist in astyanax.
>>> 
>>> That said, are there other java (scala) libraries that people use to
>>> connect to cassandra that support 1.2?
>>> 
>>> -James-
>>> 
>>> On Thu, Jan 17, 2013 at 8:30 AM,   wrote:
 Ok, I understand that I need to manage both cassandra and hadoop components
 and that pig will use hadoop components to launch its tasks which will use
 Cassandra as the Storage engine.
 
 Thanks
 --
 Cyril SCETBON
 
 On Jan 17, 2013, at 4:03 PM, James Schappet  wrote:
 
 This really depends on how you design your Hadoop Cluster.  The testing I
 have done, had Hadoop and Cassandra Nodes collocated on the same hosts.
 Remember that Pig code runs inside of your hadoop cluster, and connects to
 Cassandra as the Database engine.
 
 
 I have not done any testing with Hive, so someone else will have to answer
 that question.
 
 
 From: 
 Reply-To: 
 Date: Thursday, January 17, 2013 8:58 AM
 To: "user@cassandra.apache.org" 
 Subject: Re: Pig / Map Reduce on Cassandra
 
 Jimmy,
 
 I understand that CFS can replace HDFS for those who use Hadoop. I just 
 want
 to use pig and hive on cassandra. I know that pig samples are provided and
 work now with cassandra natively (they are part of the core). However, does
 it mean that the process will be spread over nodes with
 number_of_mapper=number_of_nodes or something like that ?
 Can Hive connect to Cassandra 1.2 easily too ?
 
 --
 Cyril Scetbon
 
 On Jan 17, 2013, at 2:42 PM, James Schappet  wrote:
 
 CFS is Cassandra File System:
 http://www.datastax.com/dev/blog/cassandra-file-system-design
 
 
 But you don't need CFS to connect from PIG to Cassandra.  The latest
 versions of Cassandra Source ship with examples of connecting from pig to
 cassandra.
 
 
 apache-cassandra-1.2.0-src/examples/pig   --
 http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-src.tar.gz
 
 --Jimmy
 
 
 From: 
 Reply-To: 
 Date: Thursday, January 17, 2013 6:35 AM
 To: "user@cassandra.apache.org" 
 Subject: Re: Pig / Map Reduce on Cassandra
 
 what do you mean ? it's not needed by Pig or Hive to access Cassandra data.
 
 Regards
 
 On Jan 16, 2013, at 11:14 PM, Brandon Williams  wrote:
 
 You won't get CF

Re: Quorum read after quorum write guarantee

> by a multiget will not find the just inserted data.
Can you explain how the data is not found. 
Does it not find new columns or does it return stale columns ? 
If the read is run again does it return the expected value? 

if you are getting stale data double check the the nodes / clients have their 
clocks synchronised. 
If you are doing reads and writes using QUOURM double check that your code is 
correct. If it is provide some more info on what you are seeing. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/03/2013, at 12:50 PM, André Cruz  wrote:

> On Mar 11, 2013, at 5:02 PM, Tyler Hobbs  wrote:
> 
>> What kind of inserts and multiget queries are you running?
> 
> I use the ColumnFamily objects. The pool is initialised with 
> "write_consistency_level=ConsistencyLevel.QUORUM".
> 
> The insert is a regular insert, so the QUORUM is used. When fetching I use:
> 
> CF.multiget(blocks, columns=['size'], 
> read_consistency_level=ConsistencyLevel.QUORUM)
> 
> I have 6 nodes and a RF 3.
> 
> André
> 
>

Re: HintedHandoff IOError?

What version of cassandra are you using?
I would stop each node and delete the hints. If it happens again I could either 
indicate a failing disk or a bug. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/03/2013, at 2:13 PM, Robert Coli  wrote:

> On Mon, Mar 11, 2013 at 7:05 AM, Janne Jalkanen
>  wrote:
>> I keep seeing these in my log.  Three-node cluster, one node is working 
>> fine, but two other nodes have increased latencies and these in the error 
>> logs (might of course be unrelated). No obvious GC pressure, no disk errors 
>> that I can see.  Ubuntu 12.04 on EC2, Java 7. Repair is run regularly.
>> 
>> My two questions: 1) should I worry, and 2) what might be going on, and 3) 
>> is there any way to get rid of these? Can I just blow my HintedHandoff table 
>> to smithereens?
> 
> http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/io/sstable/IndexHelper.java
> "
> public static Filter defreezeBloomFilter(FileDataInput file, long
> maxSize, boolean useOldBuffer) throws IOException
>{
>int size = file.readInt();
>if (size > maxSize || size <= 0)
>throw new EOFException("bloom filter claims to be " + size
> + " bytes, longer than entire row size " + maxSize);
>ByteBuffer bytes = file.readBytes(size);
> "
> 
> Based on the above, I would suspect either a zero byte -Filter.db file
> or a corrupt one. Probably worry a little bit, but only a little bit
> unless your cluster is RF=1.
> 
> =Rob

Re: Row cache off-heap ?