Cassandra as storage for cache data

2013-06-25 Thread Dmitry Olshansky

Hello,

we are using Cassandra as a data storage for our caching system. Our 
application generates about 20 put and get requests per second. An 
average size of one cache item is about 500 Kb.


Cache items are placed into one column family with TTL set to 20 - 60 
minutes. Keys and values are bytes (not utf8 strings). Compaction 
strategy is SizeTieredCompactionStrategy.


We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. 
Each node has 10GB of RAM and enough space on HDD.


Now when we're putting this cluster into the load it's quickly fills 
with our runtime data (about 5 GB on every node) and we start observing 
performance degradation with often timeouts on client side.


We see that on each node compaction starts very frequently and lasts for 
several minutes to complete. It seems that each node usually busy with 
compaction process.


Here the questions:

What are the recommended setup configuration for our use case?

Is it makes sense to somehow tell Cassandra to keep all data in memory 
(memtables) to eliminate flushing it to disk (sstables) thus decreasing 
number of compactions? How to achieve this behavior?


Cassandra is starting with default shell script that gives the following 
command line:


jsvc.exec -user cassandra -home 
/usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile 
/var/run/cassandra.pid -errfile &1 -outfile 
/var/log/cassandra/output.log -cp  
-Dlog4j.configuration=log4j-server.properties 
-Dlog4j.defaultInitOverride=true 
-XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof 
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea 
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M 
-Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseTLAB -Djava.net.preferIPv4Stack=true 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
org.apache.cassandra.service.CassandraDaemon


--
Best regards,
Dmitry Olshansky



Re: Cassandra as storage for cache data

2013-06-25 Thread Jeremy Hanna
If you have rapidly expiring data, then tombstones are probably filling your 
disk and your heap (depending on how you order the data on disk).  To check to 
see if your queries are affected by tombstones, you might try using the query 
tracing that's built-in to 1.2.
See:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  -- has an example of tracing where you can see tombstones affecting the query
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

You'll want to consider reducing the gc_grace period from the default of 10 
days for those column families - with the understanding why gc_grace exists in 
the first place, see http://wiki.apache.org/cassandra/DistributedDeletes .  
Then once the gc_grace period has passed, the tombstones will stay around until 
they are compacted away.  So there are two options currently to compact them 
away more quickly:
1) use leveled compaction - see 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled 
compaction only requires 10% headroom (as opposed to 50% for size tiered 
compaction) for amount of disk that needs to be kept free.
2) if 1 doesn't work and you're still seeing performance degrading and the 
tombstones aren't getting cleared out fast enough, you might consider using 
size tiered compaction but performing regular major compactions to get rid of 
expired data.

Keep in mind though that if you use gc_grace of 0 and do any kind of manual 
deletes outside of TTLs, you probably want to do the deletes at 
ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a 
chance that deleted data may be resurrected.  That only applies to non-ttl data 
where you manually delete it.  See the explanation of distributed deletes for 
more information.

 
On 25 Jun 2013, at 13:31, Dmitry Olshansky  
wrote:

> Hello,
> 
> we are using Cassandra as a data storage for our caching system. Our 
> application generates about 20 put and get requests per second. An average 
> size of one cache item is about 500 Kb.
> 
> Cache items are placed into one column family with TTL set to 20 - 60 
> minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is 
> SizeTieredCompactionStrategy.
> 
> We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each 
> node has 10GB of RAM and enough space on HDD.
> 
> Now when we're putting this cluster into the load it's quickly fills with our 
> runtime data (about 5 GB on every node) and we start observing performance 
> degradation with often timeouts on client side.
> 
> We see that on each node compaction starts very frequently and lasts for 
> several minutes to complete. It seems that each node usually busy with 
> compaction process.
> 
> Here the questions:
> 
> What are the recommended setup configuration for our use case?
> 
> Is it makes sense to somehow tell Cassandra to keep all data in memory 
> (memtables) to eliminate flushing it to disk (sstables) thus decreasing 
> number of compactions? How to achieve this behavior?
> 
> Cassandra is starting with default shell script that gives the following 
> command line:
> 
> jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ 
> -pidfile /var/run/cassandra.pid -errfile &1 -outfile 
> /var/log/cassandra/output.log -cp  
> -Dlog4j.configuration=log4j-server.properties 
> -Dlog4j.defaultInitOverride=true 
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof 
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea 
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB 
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> org.apache.cassandra.service.CassandraDaemon
> 
> -- 
> Best regards,
> Dmitry Olshansky
> 



Re: NREL has released open source Databus on github for time series data

2013-06-25 Thread Hiller, Dean
When you say aggregates, do you mean converting 1 minute data to 15 minute data 
or do you mean summing different streams such that you have the total energy 
from energy streams A, B, C, etc.

Ps. We are working on supporting both….there is a clusterable cron job thing in 
place right now that does some aggregation already but there is another in the 
works for moving higher rate data to lower rates.

Dean

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Monday, June 24, 2013 9:51 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: NREL has released open source Databus on github for time series 
data

Hi Dean,
Does this handle rollup aggregates along with the time series data ?
I had a quick look at the links and could not see anything.

Cheers
Aaron

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 22/06/2013, at 2:51 AM, "Hiller, Dean" 
mailto:dean.hil...@nrel.gov>> wrote:

NREL has released their open source databus.  They spin it as energy data (and 
a system for campus energy/building energy) but it is very general right now 
and probably will stay pretty general.  More information can be found here

http://www.nrel.gov/analysis/databus/

The source code can be found here
https://github.com/deanhiller/databus

Star the project if you like the idea.  NREL just did a big press release and 
is developing a community around the project.  It is in it's early stages but 
there are users using it and I am helping HP set an instance up this month.  If 
you want to become a committer on the project, let me know as well.

Later,
Dean




Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Tony Anecito
Hi All,

Is nested select supported by Cassandra JDBC driver?

So for a simple example to get a list of user details from a users column 
family:

Select * from user_details where user_id in (Select user_id from users)

Thanks!
-Tony


Re: Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Sylvain Lebresne
No. CQL3 doesn't support nested selects.

--
Sylvain


On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito  wrote:

> Hi All,
>
> Is nested select supported by Cassandra JDBC driver?
>
> So for a simple example to get a list of user details from a users column
> family:
>
> Select * from user_details where user_id in (Select user_id from users)
>
> Thanks!
> -Tony
>


cassandra-unit 1.2.0.1 is released : CQL3 and Spring

2013-06-25 Thread Jérémy SEVELLEC
Hi all,

Just to let you know that a new release of cassandra-unit is available with
CQL3 dataset support and Spring integration.

More here :
http://www.unchticafe.fr/2013/06/cassandra-unit-1201-is-out-cql3-script.html

Regards,

-- 
Jérémy


Re: Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Tony Anecito
Ok. So if I have a composite key table instead of a nested select I will have 
to run 2 queries else denormalize? Unless there is something provided by CQL 3 
to do the same thing?

Thanks,
-Tony





 From: Sylvain Lebresne 
To: "user@cassandra.apache.org" ; Tony Anecito 
 
Sent: Tuesday, June 25, 2013 9:06 AM
Subject: Re: Is nexted selects supported by Cassandra JDBC??
 


No. CQL3 doesn't support nested selects.

--
Sylvain



On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito  wrote:

Hi All,
>
>
>Is nested select supported by Cassandra JDBC driver?
>
>
>So for a simple example to get a list of user details from a users column 
>family:
>
>
>Select * from user_details where user_id in (Select user_id from users)
>
>
>Thanks!-Tony

Re: Is nexted selects supported by Cassandra JDBC??

2013-06-25 Thread Sylvain Lebresne
Yes, denormalization is usually the answer to the absence of sub-queries
(and joins for that matter) in Cassandra (though sometimes, simply doing 2
queries is fine, depends on your use case and performance requirements).


On Tue, Jun 25, 2013 at 6:46 PM, Tony Anecito  wrote:

> Ok. So if I have a composite key table instead of a nested select I will
> have to run 2 queries else denormalize? Unless there is something provided
> by CQL 3 to do the same thing?
>
> Thanks,
> -Tony
>
>
>   --
>  *From:* Sylvain Lebresne 
> *To:* "user@cassandra.apache.org" ; Tony
> Anecito 
> *Sent:* Tuesday, June 25, 2013 9:06 AM
> *Subject:* Re: Is nexted selects supported by Cassandra JDBC??
>
> No. CQL3 doesn't support nested selects.
>
> --
> Sylvain
>
>
> On Tue, Jun 25, 2013 at 5:02 PM, Tony Anecito  wrote:
>
> Hi All,
>
> Is nested select supported by Cassandra JDBC driver?
>
> So for a simple example to get a list of user details from a users column
> family:
>
> Select * from user_details where user_id in (Select user_id from users)
>
> Thanks!
> -Tony
>
>
>
>
>


Re: [Cassandra] Replacing a cassandra node with one of the same IP

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 8:53 PM, aaron morton  wrote:
>> so I am just wondering if this means the hinted handoffs are also updated to 
>> reflect the new Cassandra node uuid.
> Without checking the code I would guess not.
> Because it would involve a potentially large read / write / delete to create 
> a new row with the same data. And Hinted Handoff is an optimisation.

So are hints to a given UUID discarded after some period of time with
that UUID not present in the cluster? Or might they need to be
manually purged?

=Rob


Re: Problems with node rejoining cluster

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 11:19 PM, Arindam Barua  wrote:
> -  We do not specify any tokens in cassandra.yaml relying on
> bootstrap assigning the tokens automatically.

As cassandra.yaml comments state, you should never ever do this in a
real cluster.

I don't know what is causing your underlying issue, but not-specifying
tokens is a strong contender.

=Rob


Re: Counter value becomes incorrect after several dozen reads & writes

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak  wrote:
> There is only 1 thread running this sequence, and consistency levels are set
> to ALL. The behavior is fairly repeatable - the unexpectation mutation will
> happen at least 10% of the time I run this program, but at different points.
> When it does not go awry, I can run this loop many thousands of times and
> keep the counter exact. But if it starts happening to a specific counter,
> the counter will never "recover" and will continue to maintain it's
> incorrect value even after successful subsequent writes.

Sounds like a corrupt counter shard. Hard to understand how it can
happen at ALL. If I were you I would file a JIRA including your repro
path...

=Rob


Re: copy data between clusters

2013-06-25 Thread Robert Coli
On Mon, Jun 24, 2013 at 8:35 PM, S C  wrote:
> I have a scenario here. I have a cluster A and cluster B running on
> cassandra 1.1. I need to copy data from Cluster A to Cluster B. Cluster A
> has few keyspaces that I need to copy over to Cluster B. What are my
> options?

http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

=Rob


Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-25 Thread sankalp kohli
Your young gen is 1/4 of 1.8G which is 450MB. Also in slice queries, the
co-ordinator will get the results from replicas as per consistency level
used and merge the results before returning to the client.
What is the replication in your keyspace and what consistency you are
reading with.
Also 55MB on disk will not mean 55MB in memory. The data is compressed on
disk and also there are other overheads.



On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller wrote:

>  No deletes. In my test, I am just writing and reading data.
>
>  There is a lot of GC, but only on the younger generation. Cassandra
> terminates before the GC for old generation kicks in.
>
>  I know that our queries are reading an unusual amount of data. However,
> I expected it to throw a timeout exception instead of crashing. Also, don't
> understand why 1.8 Gb heap is getting full when the total data stored in
> the entire Cassandra cluster is less than 55 MB.
>
> Mohammed
>
> On Jun 21, 2013, at 7:30 PM, "sankalp kohli" 
> wrote:
>
>   Looks like you are putting lot of pressure on the heap by doing a slice
> query on a large row.
> Do you have lot of deletes/tombstone on the rows? That might be causing a
> problem.
> Also why are you returning so many columns as once, you can use auto
> paginate feature in Astyanax.
>
>  Also do you see lot of GC happening?
>
>
> On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam  wrote:
>
>> Hello Mohammed,
>>
>>  You should increase the heap space. You should also tune the garbage
>> collection so young generation objects are collected faster, relieving
>> pressure on heap We have been using jdk 7 and it uses G1 as the default
>> collector. It does a better job than me trying to optimise the JDK 6 GC
>> collectors.
>>
>>  Bear in mind though that the OS will need memory, so will the row cache
>> and the filing system. Although memory usage will depend on the workload of
>> your system.
>>
>>  I'm sure you'll also get good advice from other members of the mailing
>> list.
>>
>>  Thanks
>>
>> Jabbar Azam
>>
>>
>> On 21 June 2013 18:49, Mohammed Guller  wrote:
>>
>>>  We have a 3-node cassandra cluster on AWS. These nodes are running
>>> cassandra 1.2.2 and have 8GB memory. We didn't change any of the default
>>> heap or GC settings. So each node is allocating 1.8GB of heap space. The
>>> rows are wide; each row stores around 260,000 columns. We are reading the
>>> data using Astyanax. If our application tries to read 80,000 columns each
>>> from 10 or more rows at the same time, some of the nodes run out of heap
>>> space and terminate with OOM error. Here is the error message:
>>>
>>> ** **
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)***
>>> *
>>>
>>> at
>>> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
>>> 
>>>
>>> at
>>> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
>>> 
>>>
>>> at org.apache.cassandra.db.Table.getRow(Table.java:355)
>>>
>>> at
>>> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
>>> 
>>>
>>>at
>>> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
>>> 
>>>
>>> at
>>> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
>>> 
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>> 
>>>
>>> at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>> 
>>>
>>> at java.lang.Thread.run(Thread.java:722)
>>>
>>> ** **
>>>
>>> ERROR 02:14:

Re: Cassandra as storage for cache data

2013-06-25 Thread sankalp kohli
Apart from what Jeremy said, you can try these
1) Use replication = 1. It is cache data and you dont need persistence.
2) Try playing with memtable size.
3) Use netflix client library as it will reduce one hop. It will chose the
node with data as the co ordinator.
4) Work on your schema. You might want to have fewer columns in each row.
With fatter rows, bloom filter will give out more sstables which are
eligible.

-Sankalp


On Tue, Jun 25, 2013 at 9:04 AM, Jeremy Hanna wrote:

> If you have rapidly expiring data, then tombstones are probably filling
> your disk and your heap (depending on how you order the data on disk).  To
> check to see if your queries are affected by tombstones, you might try
> using the query tracing that's built-in to 1.2.
> See:
>
> http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
>  -- has an example of tracing where you can see tombstones affecting the
> query
> http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2
>
> You'll want to consider reducing the gc_grace period from the default of
> 10 days for those column families - with the understanding why gc_grace
> exists in the first place, see
> http://wiki.apache.org/cassandra/DistributedDeletes .  Then once the
> gc_grace period has passed, the tombstones will stay around until they are
> compacted away.  So there are two options currently to compact them away
> more quickly:
> 1) use leveled compaction - see
> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled
> compaction only requires 10% headroom (as opposed to 50% for size tiered
> compaction) for amount of disk that needs to be kept free.
> 2) if 1 doesn't work and you're still seeing performance degrading and the
> tombstones aren't getting cleared out fast enough, you might consider using
> size tiered compaction but performing regular major compactions to get rid
> of expired data.
>
> Keep in mind though that if you use gc_grace of 0 and do any kind of
> manual deletes outside of TTLs, you probably want to do the deletes at
> ConsistencyLevel.ALL or else if a node goes down, then comes back up,
> there's a chance that deleted data may be resurrected.  That only applies
> to non-ttl data where you manually delete it.  See the explanation of
> distributed deletes for more information.
>
>
> On 25 Jun 2013, at 13:31, Dmitry Olshansky 
> wrote:
>
> > Hello,
> >
> > we are using Cassandra as a data storage for our caching system. Our
> application generates about 20 put and get requests per second. An average
> size of one cache item is about 500 Kb.
> >
> > Cache items are placed into one column family with TTL set to 20 - 60
> minutes. Keys and values are bytes (not utf8 strings). Compaction strategy
> is SizeTieredCompactionStrategy.
> >
> > We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2.
> Each node has 10GB of RAM and enough space on HDD.
> >
> > Now when we're putting this cluster into the load it's quickly fills
> with our runtime data (about 5 GB on every node) and we start observing
> performance degradation with often timeouts on client side.
> >
> > We see that on each node compaction starts very frequently and lasts for
> several minutes to complete. It seems that each node usually busy with
> compaction process.
> >
> > Here the questions:
> >
> > What are the recommended setup configuration for our use case?
> >
> > Is it makes sense to somehow tell Cassandra to keep all data in memory
> (memtables) to eliminate flushing it to disk (sstables) thus decreasing
> number of compactions? How to achieve this behavior?
> >
> > Cassandra is starting with default shell script that gives the following
> command line:
> >
> > jsvc.exec -user cassandra -home
> /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp  -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
> >
> > --
> > Best regards,
> > Dmitry Olshansky
> >
>
>


Re: Counter value becomes incorrect after several dozen reads & writes

2013-06-25 Thread Andrew Bialecki
If you can reproduce the invalid behavior 10+% of the time with steps to
repro that take 5-10s/iteration, that sounds extremely interesting for
getting to the bottom of the invalid shard issue (if that's what the root
cause ends up being). Would be very interested in the set up to see if the
behavior can be duplicated.

Andrew


On Tue, Jun 25, 2013 at 2:18 PM, Robert Coli  wrote:

> On Mon, Jun 24, 2013 at 6:42 PM, Josh Dzielak  wrote:
> > There is only 1 thread running this sequence, and consistency levels are
> set
> > to ALL. The behavior is fairly repeatable - the unexpectation mutation
> will
> > happen at least 10% of the time I run this program, but at different
> points.
> > When it does not go awry, I can run this loop many thousands of times and
> > keep the counter exact. But if it starts happening to a specific counter,
> > the counter will never "recover" and will continue to maintain it's
> > incorrect value even after successful subsequent writes.
>
> Sounds like a corrupt counter shard. Hard to understand how it can
> happen at ALL. If I were you I would file a JIRA including your repro
> path...
>
> =Rob
>


Re: Custom 1.2 Authentication plugin will not work unless user is in system_auth.users column family

2013-06-25 Thread Bao Le
Sorry for not following up on this one in time. I filed a JIRA (5651) and it 
seems user lookup is here to stay.

https://issues.apache.org/jira/browse/CASSANDRA-5651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

On a related note, that column family is, by default, set up to have key cached 
only. It might be a good idea to have its row cached turned on if row cache is 
enabled.


Bao


RE: copy data between clusters

2013-06-25 Thread S C
Bob and Arthur - thanks for your inputs.
I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?
sstableloader -d 10.225.64.2,10.225.64.3 service/context INFO 14:43:49,937 
Opening service/context/service-context-hf-50 (164863 bytes)DEBUG 14:43:50,063 
INDEX LOAD TIME for service/context/service-context-hf-50: 128 ms. INFO 
14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)DEBUG 
14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 13 ms. 
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 
bytes)DEBUG 14:43:50,078 INDEX LOAD TIME for 
service/context/service-context-hf-51: 2 ms.Streaming revelant part of 
service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3] 
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.DEBUG 14:43:50,124 
Adding file service/context/service-context-hf-49-Data.db to be streamed.DEBUG 
14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to be 
streamed. INFO 14:43:50,136 Streaming to /10.225.64.2DEBUG 14:43:50,144 Files 
are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0% INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.DEBUG 14:43:50,159 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.DEBUG 14:43:50,159 
Adding file service/context/service-context-hf-49-Data.db to be streamed.DEBUG 
14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to be 
streamed. INFO 14:43:50,160 Streaming to /10.225.64.3DEBUG 14:43:50,160 Files 
are service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address) WARN 14:43:50,241 Failed attempt 1 to connect to 
/10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 
(0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:43:54,227 
Failed attempt 2 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address) WARN 14:43:54,244 Failed attempt 2 to connect to 
/10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 
(0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)] WARN 14:44:02,229 
Failed attempt 3 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address) WARN 14:44:02,309 Failed attempt 3 to connect to 
/10.225.64.2 to stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)progress: [/10.225.64.2 0/3 
(0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 0MB/s)]DEBUG 14:44:18,231 
closing with status falseStreaming session to /10.225.64.3 failedERROR 
14:44:18,236 Error in ThreadPoolExecutorjava.lang.RuntimeException: 
java.net.SocketException: Invalid argument or cannot assign requested address   
  at org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:636) 
  at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)

what happen if coordinator node fails during write

2013-06-25 Thread Jiaan Zeng
Hi there,

I am writing data to Cassandra by thrift client (not hector) and
wonder what happen if the coordinator node fails. The same question
applies for bulk loader which uses gossip protocol instead of thrift
protocol. In my understanding, the HintedHandoff only takes care of
the replica node fails.

Thanks.

--
Regards,
Jiaan


Re: copy data between clusters

2013-06-25 Thread Arthur Zubarev
Hello SC,

whilst most of the sstableloader errors stem from incorrect setups I suspect 
this time you merely have a connectivity issue e.g. a firewall blocking traffic.

From: S C 
Sent: Tuesday, June 25, 2013 5:28 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters

Bob and Arthur - thanks for your inputs. 

I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?

sstableloader -d 10.225.64.2,10.225.64.3 service/context
INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes)
DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 
128 ms.
INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)
DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 
13 ms.
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes)
DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 
ms.
Streaming revelant part of service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3]
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,136 Streaming to /10.225.64.2
DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%
INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,160 Streaming to /10.225.64.3
DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%

progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)]DEBUG 14:44:18,231 closing with status false
Streaming session to /10.225.64.3 failed
ERROR 14:44:1

Re: what happen if coordinator node fails during write

2013-06-25 Thread Andrey Ilinykh
It depends on cassandra version. As far as I know in 1.2 coordinator logs
request before it updates replicas. If it fails it will replay log on
startup.
In 1.1 you may have inconsistant state, because only part of your request
is propagated to replicas.

Thank you,
  Andrey


On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng  wrote:

> Hi there,
>
> I am writing data to Cassandra by thrift client (not hector) and
> wonder what happen if the coordinator node fails. The same question
> applies for bulk loader which uses gossip protocol instead of thrift
> protocol. In my understanding, the HintedHandoff only takes care of
> the replica node fails.
>
> Thanks.
>
> --
> Regards,
> Jiaan
>


Re: Date range queries

2013-06-25 Thread Colin Blower
You could just separate the history data from the current data. Then
when the user's result is updated, just write into two tables.

CREATE TABLE all_answers (
  user_id uuid,
  created timeuuid,
  result text,
  question_id varint,
  PRIMARY KEY (user_id, created)
)

CREATE TABLE current_answers (
  user_id uuid,
  question_id varint,
  created timeuuid,
  result text,
  PRIMARY KEY (user_id, question_id)
)


> select * FROM current_answers ;
 user_id  | question_id | result | created
--+-++--
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   1 | no |
f9893ee0-ddfa-11e2-b74c-35d7be46b354
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |   2 |   blah |
f7af75d0-ddfa-11e2-b74c-35d7be46b354

> select * FROM all_answers ;
 user_id  |
created  | question_id | result
--+--+-+
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
f0141234-ddfa-11e2-b74c-35d7be46b354 |   1 |yes
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
f7af75d0-ddfa-11e2-b74c-35d7be46b354 |   2 |   blah
 11b1e59c-ddfa-11e2-a28f-0800200c9a66 |
f9893ee0-ddfa-11e2-b74c-35d7be46b354 |   1 | no

This way you can get the history of answers if you want and there is a
simple way to get the most current answers.

Just a thought.
-Colin B.


On 06/24/2013 03:28 PM, Christopher J. Bottaro wrote:
> Yes, that makes sense and that article helped a lot, but I still have
> a few questions...
>
> The created_at in our answers table is basically used as a version id.
>  When a user updates his answer, we don't overwrite the old answer,
> but rather insert a new answer with a more recent timestamp (the version).
>
> answers
> ---
> user_id | created_at | question_id | result
> ---
>   1 | 2013-01-01 | 1   | yes
>   1 | 2013-01-01 | 2   | blah
>   1 | 2013-01-02 | 1   | no
>
> So the queries we really want to run are "find me all the answers for
> a given user at a given time."  So given the date of 2013-01-02 and
> user_id 1, we would want rows 2 and 3 returned (since rows 3 obsoletes
> row 1).  Is it possible to do this with CQL given the current schema?
>
> As an aside, we can do this in Postgresql using window functions, not
> standard SQL, but pretty neat.
>
> We can alter our schema like so...
>
> answers
> ---
> user_id | start_at | end_at | question_id | result
>
> Where the start_at and end_at denote when an answer is active.  So the
> example above would become:
>
> answers
> ---
> user_id | start_at   | end_at | question_id | result
> 
>   1 | 2013-01-01 | 2013-01-02 | 1   | yes
>   1 | 2013-01-01 | null   | 2   | blah
>   1 | 2013-01-02 | null   | 1   | no
>
> Now we can query "SELECT * FROM answers WHERE user_id = 1 AND start_at
> >= '2013-01-02' AND (end_at < '2013-01-02' OR end_at IS NULL)".
>
> How would one define the partitioning key and cluster columns in CQL
> to accomplish this?  Is it as simple as PRIMARY KEY (user_id,
> start_at, end_at, question_id) (remembering that we sometimes want to
> limit by question_id)?
>
> Also, we are a bit worried about race conditions.  Consider two
> separate processes updating an answer for a given user_id /
> question_id.  There will be a race condition between the two to update
> the correct row's end_at field.  Does that make sense?  I can draw it
> out with ASCII tables, but I feel like this email is already too
> long... :P
>
> Thanks for the help.
>
>
>
> On Wed, Jun 19, 2013 at 2:28 PM, David McNelis  > wrote:
>
> So, if you want to grab by the created_at and occasionally limit
> by question id, that is why you'd use created_at.
>
> The way the primary keys work is the first part of the primary key
> is the Partioner key, that field is what essentially is the single
> cassandra row.  The second key is the order preserving key, so you
> can sort by that key.  If you have a third piece, then that is the
> secondary order preserving key.
>
> The reason you'd want to do (user_id, created_at, question_id) is
> because when you do a query on the keys, if you MUST use the
> preceding pieces of the primary key.  So in your case, you could
> not do a query with just user_id and question_id with the
> user-created-question key.  Alternatively if you went with
> (user_id, question_id, created_at), you would not be able to
> include a range of created_at unless you were also filtering on
> the question_id.
>
> Does that make sense?
>
> As for the large rows, 10k is unlikely to cause you too many
> issues (unless the answer is potentially a big blob o

Re: what happen if coordinator node fails during write

2013-06-25 Thread sankalp kohli
Read this
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2


On Tue, Jun 25, 2013 at 8:45 PM, Andrey Ilinykh  wrote:

> It depends on cassandra version. As far as I know in 1.2 coordinator logs
> request before it updates replicas. If it fails it will replay log on
> startup.
> In 1.1 you may have inconsistant state, because only part of your request
> is propagated to replicas.
>
> Thank you,
>   Andrey
>
>
> On Tue, Jun 25, 2013 at 5:11 PM, Jiaan Zeng  wrote:
>
>> Hi there,
>>
>> I am writing data to Cassandra by thrift client (not hector) and
>> wonder what happen if the coordinator node fails. The same question
>> applies for bulk loader which uses gossip protocol instead of thrift
>> protocol. In my understanding, the HintedHandoff only takes care of
>> the replica node fails.
>>
>> Thanks.
>>
>> --
>> Regards,
>> Jiaan
>>
>
>


RE: copy data between clusters

2013-06-25 Thread S C
Is there any configuration reference that help me?

Thanks,SC

From: arthur.zuba...@aol.com
To: user@cassandra.apache.org
Subject: Re: copy data between clusters
Date: Tue, 25 Jun 2013 20:30:23 -0400







Hello SC,
 
whilst most of the sstableloader errors stem from incorrect setups I 
suspect this time you merely have a connectivity issue e.g. a firewall blocking 
traffic.


 

From: S C 
Sent: Tuesday, June 25, 2013 5:28 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters
 

Bob and Arthur - thanks for your inputs. 
 
I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?
 

sstableloader -d 10.225.64.2,10.225.64.3 service/context
INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 
bytes)
DEBUG 14:43:50,063 INDEX LOAD TIME for 
service/context/service-context-hf-50: 128 ms.
INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 
bytes)
DEBUG 14:43:50,076 INDEX LOAD TIME for 
service/context/service-context-hf-49: 13 ms.
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 
bytes)
DEBUG 14:43:50,078 INDEX LOAD TIME for 
service/context/service-context-hf-51: 2 ms.
Streaming revelant part of service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, 
/10.225.64.3]
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 
0%], 3 sstables.
DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.
DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-49-Data.db to be streamed.
DEBUG 14:43:50,124 Adding file 
service/context/service-context-hf-51-Data.db to be streamed.
INFO 14:43:50,136 Streaming to /10.225.64.2
DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db 
sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db 
sections=1 progress=0/164863 - 0%
INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 
0%], 3 sstables.
DEBUG 14:43:50,159 Adding file 
service/context/service-context-hf-50-Data.db to be streamed.
DEBUG 14:43:50,159 Adding file 
service/context/service-context-hf-49-Data.db to be streamed.
DEBUG 14:43:50,160 Adding file 
service/context/service-context-hf-51-Data.db to be streamed.
INFO 14:43:50,160 Streaming to /10.225.64.3
DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db 
sections=1 progress=0/6703 - 0%,service/context/service-context-hf-50-Data.db 
sections=1 progress=0/164863 - 0%
 
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s 
(avg: 0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to 
stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 4000 ms. (java.net.SocketException: 
Invalid 
argument or cannot assign requested address)
WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s 
(avg: 0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to 
stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 8000 ms. (java.net.SocketException: 
Invalid 
argument or cannot assign requested address)
WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s 
(avg: 0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to 
stream service/context/service-context-hf-49-Data.db sections=1 
progress=0/7688939 - 0%. Retrying in 16000 ms. (java.net.SocketException: 
Invalid argument or cannot assign requested address)
WARN 14:44:02,309 Failed attempt 3 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 16000 ms. (java.net.SocketException: Invalid argume

Re: copy data between clusters

2013-06-25 Thread Arthur Zubarev
This is the best reference I have seen so far 
http://www.datastax.com/dev/blog/bulk-loading But I must tell it is not updated 
to match the most recent changes in C*. I suggest you read thru comments, too.

From: S C 
Sent: Tuesday, June 25, 2013 10:23 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters

Is there any configuration reference that help me?

Thanks, 
SC





From: arthur.zuba...@aol.com
To: user@cassandra.apache.org
Subject: Re: copy data between clusters
Date: Tue, 25 Jun 2013 20:30:23 -0400


Hello SC,

whilst most of the sstableloader errors stem from incorrect setups I suspect 
this time you merely have a connectivity issue e.g. a firewall blocking traffic.

From: S C 
Sent: Tuesday, June 25, 2013 5:28 PM
To: user@cassandra.apache.org 
Subject: RE: copy data between clusters

Bob and Arthur - thanks for your inputs. 

I tried sstableloader but ran into below issue. Anything to do with the 
configuration to run sstableloader?

sstableloader -d 10.225.64.2,10.225.64.3 service/context
INFO 14:43:49,937 Opening service/context/service-context-hf-50 (164863 bytes)
DEBUG 14:43:50,063 INDEX LOAD TIME for service/context/service-context-hf-50: 
128 ms.
INFO 14:43:50,063 Opening service/context/service-context-hf-49 (7688939 bytes)
DEBUG 14:43:50,076 INDEX LOAD TIME for service/context/service-context-hf-49: 
13 ms.
INFO 14:43:50,076 Opening service/context/service-context-hf-51 (6703 bytes)
DEBUG 14:43:50,078 INDEX LOAD TIME for service/context/service-context-hf-51: 2 
ms.
Streaming revelant part of service/context/service-context-hf-50-Data.db 
service/context/service-context-hf-49-Data.db 
service/context/service-context-hf-51-Data.db to [/10.225.64.2, /10.225.64.3]
INFO 14:43:50,124 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,124 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,136 Streaming to /10.225.64.2
DEBUG 14:43:50,144 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%
INFO 14:43:50,159 Stream context metadata 
[service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%, service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 
- 0%, service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 
- 0%], 3 sstables.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-50-Data.db to 
be streamed.
DEBUG 14:43:50,159 Adding file service/context/service-context-hf-49-Data.db to 
be streamed.
DEBUG 14:43:50,160 Adding file service/context/service-context-hf-51-Data.db to 
be streamed.
INFO 14:43:50,160 Streaming to /10.225.64.3
DEBUG 14:43:50,160 Files are service/context/service-context-hf-49-Data.db 
sections=1 progress=0/7688939 - 
0%,service/context/service-context-hf-51-Data.db sections=1 progress=0/6703 - 
0%,service/context/service-context-hf-50-Data.db sections=1 progress=0/164863 - 
0%

progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:50,225 Failed attempt 1 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:50,241 Failed attempt 1 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 4000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:43:54,227 Failed attempt 2 to connect to /10.225.64.3 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
WARN 14:43:54,244 Failed attempt 2 to connect to /10.225.64.2 to stream 
service/context/service-context-hf-49-Data.db sections=1 progress=0/7688939 - 
0%. Retrying in 8000 ms. (java.net.SocketException: Invalid argument or cannot 
assign requested address)
progress: [/10.225.64.2 0/3 (0)] [/10.225.64.3 0/3 (0)] [total: 0 - 0MB/s (avg: 
0MB/s)] WARN 14:44:02,229 Failed attempt 3 to connect to /10.225.64.3 to stream 
service/context

Re: Cassandra terminates with OutOfMemory (OOM) error

2013-06-25 Thread Mohammed Guller
Replication is 3 and read consistency level is one. One of the non-cordinator 
mode is crashing, so the OOM is happening before aggregation of the data to be 
returned.

Thanks for the info about the space allocated to young generation heap. That is 
helpful.

Mohammed

On Jun 25, 2013, at 1:28 PM, "sankalp kohli" 
mailto:kohlisank...@gmail.com>> wrote:

Your young gen is 1/4 of 1.8G which is 450MB. Also in slice queries, the 
co-ordinator will get the results from replicas as per consistency level used 
and merge the results before returning to the client.
What is the replication in your keyspace and what consistency you are reading 
with.
Also 55MB on disk will not mean 55MB in memory. The data is compressed on disk 
and also there are other overheads.



On Mon, Jun 24, 2013 at 8:38 PM, Mohammed Guller 
mailto:moham...@glassbeam.com>> wrote:
No deletes. In my test, I am just writing and reading data.

There is a lot of GC, but only on the younger generation. Cassandra terminates 
before the GC for old generation kicks in.

I know that our queries are reading an unusual amount of data. However, I 
expected it to throw a timeout exception instead of crashing. Also, don't 
understand why 1.8 Gb heap is getting full when the total data stored in the 
entire Cassandra cluster is less than 55 MB.

Mohammed

On Jun 21, 2013, at 7:30 PM, "sankalp kohli" 
mailto:kohlisank...@gmail.com>> wrote:

Looks like you are putting lot of pressure on the heap by doing a slice query 
on a large row.
Do you have lot of deletes/tombstone on the rows? That might be causing a 
problem.
Also why are you returning so many columns as once, you can use auto paginate 
feature in Astyanax.

Also do you see lot of GC happening?


On Fri, Jun 21, 2013 at 1:13 PM, Jabbar Azam 
mailto:aja...@gmail.com>> wrote:
Hello Mohammed,

You should increase the heap space. You should also tune the garbage collection 
so young generation objects are collected faster, relieving pressure on heap We 
have been using jdk 7 and it uses G1 as the default collector. It does a better 
job than me trying to optimise the JDK 6 GC collectors.

Bear in mind though that the OS will need memory, so will the row cache and the 
filing system. Although memory usage will depend on the workload of your system.

I'm sure you'll also get good advice from other members of the mailing list.

Thanks

Jabbar Azam


On 21 June 2013 18:49, Mohammed Guller 
mailto:moham...@glassbeam.com>> wrote:
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra 
1.2.2 and have 8GB memory. We didn't change any of the default heap or GC 
settings. So each node is allocating 1.8GB of heap space. The rows are wide; 
each row stores around 260,000 columns. We are reading the data using Astyanax. 
If our application tries to read 80,000 columns each from 10 or more rows at 
the same time, some of the nodes run out of heap space and terminate with OOM 
error. Here is the error message:

java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.duplicate(HeapByteBuffer.java:107)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:50)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:60)
at 
org.apache.cassandra.db.marshal.AbstractCompositeType.split(AbstractCompositeType.java:126)
at 
org.apache.cassandra.db.filter.ColumnCounter$GroupByPrefix.count(ColumnCounter.java:96)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:164)
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:136)
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:84)
at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:294)
at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65)
at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1363)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1220)
at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1132)
at org.apache.cassandra.db.Table.getRow(Table.java:355)
at 
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:70)
   at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1052)
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1578)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

ERROR 02:14:05,351 Exception in thread Thread[Thrift:6,5,main]
java.lang.OutOfMemoryErr

Re: Heap is not released and streaming hangs at 0%

2013-06-25 Thread aaron morton
> bloom_filter_fp_chance value that was changed from default to 0.1, looked at 
> the filters and they are about 2.5G on disk and I have around 8G of heap.
> I will try increasing the value to 0.7 and report my results. 
You need to re-write the sstables on disk using nodetool upgradesstables. 
Otherwise only the new tables with have the 0.1 setting. 

> I will try increasing the value to 0.7 and report my results. 
No need to, it will probably be something like "Oh no, really, what, how, 
please make it stop" :)
0.7 will mean reads will hit most / all of the SSTables for the CF. 

I covered a high row situation in on of my talks at the summit this month, the 
slide deck is here 
http://www.slideshare.net/aaronmorton/cassandra-sf-2013-in-case-of-emergency-break-glass
 and the videos will soon be up at Planet Cassandra. 

Rebuild the sstables, then reduce the index_interval if you still need to 
reduce mem pressure. 
 
Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com


On 22/06/2013, at 1:17 PM, sankalp kohli  wrote:

> I will take a heap dump and see whats in there rather than guessing. 
> 
> 
> On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot  wrote:
> bloom_filter_fp_chance = 0.7 is probably way too large to be effective and 
> you'll probably have issues compacting deleted rows and get poor read 
> performance with a value that high.  I'd guess that anything larger than 0.1 
> might as well be 1.0.
> 
> -Bryan
> 
> 
> 
> On Fri, Jun 21, 2013 at 5:58 AM, srmore  wrote:
> 
> On Fri, Jun 21, 2013 at 2:53 AM, aaron morton  wrote:
>> > nodetool -h localhost flush didn't do much good.
> Do you have 100's of millions of rows ?
> If so see recent discussions about reducing the bloom_filter_fp_chance and 
> index_sampling. 
> Yes, I have 100's of millions of rows. 
>  
> 
> If this is an old schema you may be using the very old setting of 0.000744 
> which creates a lot of bloom filters. 
> 
> bloom_filter_fp_chance value that was changed from default to 0.1, looked at 
> the filters and they are about 2.5G on disk and I have around 8G of heap.
> I will try increasing the value to 0.7 and report my results. 
> 
> It also appears to be a case of hard GC failure (as Rob mentioned) as the 
> heap is never released, even after 24+ hours of idle time, the JVM needs to 
> be restarted to reclaim the heap.
> 
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 20/06/2013, at 6:36 AM, Wei Zhu  wrote:
> 
>> If you want, you can try to force the GC through Jconsole. Memory->Perform 
>> GC.
>> 
>> It theoretically triggers a full GC and when it will happen depends on the 
>> JVM
>> 
>> -Wei
>> 
>> From: "Robert Coli" 
>> To: user@cassandra.apache.org
>> Sent: Tuesday, June 18, 2013 10:43:13 AM
>> Subject: Re: Heap is not released and streaming hangs at 0%
>> 
>> On Tue, Jun 18, 2013 at 10:33 AM, srmore  wrote:
>> > But then shouldn't JVM C G it eventually ? I can still see Cassandra alive
>> > and kicking but looks like the heap is locked up even after the traffic is
>> > long stopped.
>> 
>> No, when GC system fails this hard it is often a permanent failure
>> which requires a restart of the JVM.
>> 
>> > nodetool -h localhost flush didn't do much good.
>> 
>> This adds support to the idea that your heap is too full, and not full
>> of memtables.
>> 
>> You could try nodetool -h localhost invalidatekeycache, but that
>> probably will not free enough memory to help you.
>> 
>> =Rob
> 
> 
> 
> 



Re: Cassandra 1.0.9 Performance

2013-06-25 Thread aaron morton
> serving a load of approximately 600GB
is that 600GB in the cluster or 600GB per node ? 
In pre 1.2 days we recommend around 300GB to 500GB per node with spinning disks 
and 1Gbe networking. It's a soft rule of thumb not a hard rule. Above that size 
repair and replacing a failed node can take a long time.
 
> Does anyone have CPU/memory/network graphs (e.g. Cacti) over the last 1-2 
> months they are willing to share of their Cassandra database nodes?
If you can share yours and any specific concerns you may have we may be able to 
help. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/06/2013, at 1:14 PM, G man  wrote:

> Hi All,
> 
> We are running a 1.0.9 cluster with 3 nodes (RF=3) serving a load of 
> approximately 600GB, and since I am fairly new to Cassandra, I'd like to 
> compare notes with other people running a cluster of similar size (perhaps 
> not in the amount of data, but the number of nodes).
> 
> Does anyone have CPU/memory/network graphs (e.g. Cacti) over the last 1-2 
> months they are willing to share of their Cassandra database nodes?
> 
> Just trying to compare our patterns with others to see if they are "normal".
> 
> Thanks in advance.
> G



Re: about FlushWriter "All time blocked"

2013-06-25 Thread aaron morton
> FlushWriter   0 0191 0
> 12

This means there were 12 times the code wanted to put an memtable in the queue 
to be flushed to disk but the queue was full. 

The length of this queue is controlled by the memtable_flush_queue_size 
https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L299 
and memtable_flush_writers .

When this happens an internal lock around the commit log is held which prevents 
writes from being processed. 

In general it means the IO system cannot keep up. It can sometimes happen when 
snapshot is used as all the CF's are flushed to disk at once. I also suspect it 
happens sometimes when a commit log segment is flushed and their are a lot of 
dirty CF's. But i've never proved it. 

Increase memtable_flush_queue_size following the help in the yaml file. If you 
do not use secondary indexes are you using snapshot?

Hope that helps. 
A
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 24/06/2013, at 3:41 PM, yue.zhang  wrote:

> 3 node
> cent os
> CPU 8core memory 32GB
> cassandra 1.2.5
> my scenario: many counter incr, every node has one client program, 
> performance is 400 wps /every clicent (it’s so slowly)
>  
> my question:
> Ø  nodetool tpstats
> -
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> ReadStage 0 0   8453 0
>  0
> RequestResponseStage  0 0  138303982 0
>  0
> MutationStage 0 0  172002988 0
>  0
> ReadRepairStage   0 0  0 0
>  0
> ReplicateOnWriteStage 0 0   82246354 0
>  0
> GossipStage   0 01052389 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> MemtablePostFlusher   0 0670 0
>  0
> FlushWriter   0 0191 0
> 12
> MiscStage 0 0  0 0
>  0
> commitlog_archiver0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> HintedHandoff 0 0 56 0
>  0
> ---
> FlushWriter “All time blocked”=12,I restart the node,but no use,it’s normally 
> ?
>  
> thx
>  
> -heipark
>  
>  



Re: Heap is not released and streaming hangs at 0%

2013-06-25 Thread sulong
I also encountered similar problem. I dump the jvm heap and analyse it by
eclipse mat. The eclipse plugin told me there are 10334 instances of
SSTableReader, consuming 6.6G memory. I found the CompactionExecutor thread
held  8000+ SSTalbeReader object. I wonder why there are so many
SSTableReader in memory.


On Wed, Jun 26, 2013 at 1:16 PM, aaron morton wrote:

>  bloom_filter_fp_chance value that was changed from default to 0.1,
>>> looked at the filters and they are about 2.5G on disk and I have around 8G
>>> of heap.
>>> I will try increasing the value to 0.7 and report my results.
>>>
>> You need to re-write the sstables on disk using nodetool upgradesstables.
> Otherwise only the new tables with have the 0.1 setting.
>
>  I will try increasing the value to 0.7 and report my results.
>>>
>> No need to, it will probably be something like "Oh no, really, what, how,
> please make it stop" :)
> 0.7 will mean reads will hit most / all of the SSTables for the CF.
>
> I covered a high row situation in on of my talks at the summit this month,
> the slide deck is here
> http://www.slideshare.net/aaronmorton/cassandra-sf-2013-in-case-of-emergency-break-glass
>  and
> the videos will soon be up at Planet Cassandra.
>
> Rebuild the sstables, then reduce the index_interval if you still need to
> reduce mem pressure.
>
> Cheers
>
>
>-
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
>
> On 22/06/2013, at 1:17 PM, sankalp kohli  wrote:
>
> I will take a heap dump and see whats in there rather than guessing.
>
>
> On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot wrote:
>
>> bloom_filter_fp_chance = 0.7 is probably way too large to be effective
>> and you'll probably have issues compacting deleted rows and get poor read
>> performance with a value that high.  I'd guess that anything larger than
>> 0.1 might as well be 1.0.
>>
>> -Bryan
>>
>>
>>
>> On Fri, Jun 21, 2013 at 5:58 AM, srmore  wrote:
>>
>>>
>>> On Fri, Jun 21, 2013 at 2:53 AM, aaron morton 
>>> wrote:
>>>
 > nodetool -h localhost flush didn't do much good.

 Do you have 100's of millions of rows ?
 If so see recent discussions about reducing the bloom_filter_fp_chance
 and index_sampling.

>>> Yes, I have 100's of millions of rows.
>>>
>>>

 If this is an old schema you may be using the very old setting of
 0.000744 which creates a lot of bloom filters.

 bloom_filter_fp_chance value that was changed from default to 0.1,
>>> looked at the filters and they are about 2.5G on disk and I have around 8G
>>> of heap.
>>> I will try increasing the value to 0.7 and report my results.
>>>
>>> It also appears to be a case of hard GC failure (as Rob mentioned) as
>>> the heap is never released, even after 24+ hours of idle time, the JVM
>>> needs to be restarted to reclaim the heap.
>>>
>>> Cheers

-
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 20/06/2013, at 6:36 AM, Wei Zhu  wrote:

 If you want, you can try to force the GC through Jconsole.
 Memory->Perform GC.

 It theoretically triggers a full GC and when it will happen depends on
 the JVM

 -Wei

 --
 *From: *"Robert Coli" 
 *To: *user@cassandra.apache.org
 *Sent: *Tuesday, June 18, 2013 10:43:13 AM
 *Subject: *Re: Heap is not released and streaming hangs at 0%

 On Tue, Jun 18, 2013 at 10:33 AM, srmore  wrote:
 > But then shouldn't JVM C G it eventually ? I can still see Cassandra
 alive
 > and kicking but looks like the heap is locked up even after the
 traffic is
 > long stopped.

 No, when GC system fails this hard it is often a permanent failure
 which requires a restart of the JVM.

 > nodetool -h localhost flush didn't do much good.

 This adds support to the idea that your heap is too full, and not full
 of memtables.

 You could try nodetool -h localhost invalidatekeycache, but that
 probably will not free enough memory to help you.

 =Rob



>>>
>>
>
>