Re: Hbase vs Cassandra

2015-06-08 Thread Ajay
Hi All,

Thanks for all the input. I posted the same question in HBase forum and got
more response.

Posting the consolidated list here.

Our case is that a central team builds and maintain the platform (Cassandra
as a service). We have couple of usecases which fits Cassandra like
time-series data. But as a platform team, we need to know more features and
usecases which fits or best handled in Cassandra. Also to understand the
usecases where HBase performs better (we might need to have it as a service
too).

*Cassandra:*

1) From 2013 both can still be relevant:
http://www.pythian.com/blog/watch-hbase-vs-cassandra/

2) Here are some use cases from PlanetCassandra.org of companies who chose
Cassandra over HBase after evaluation, or migrated to Cassandra from HBase.
The eComNext interview cited on the page touches on time-series data;
http://planetcassandra.org/hbase-to-cassandra-migration/

3) From googling, the most popular advantages for Cassandra over HBase is
easy to deploy, maintain & monitor and no single point of failure.

4) From our six months research and POC experience in Cassandra, CQL is
pretty limited. Though CQL is targeted for Real time Read and Write, there
are cases where need to pull out data differently and we are OK with little
more latency. But Cassandra doesn't support that. We need MapReduce or
Spark for those. Then the debate starts why Cassandra and why not HBase if
we need Hadoop/Spark for MapReduce.

Expected a few more technical features/usecases that is best handled by
Cassandra (and how it works).

*HBase:*

1) As for the #4 you might be interested in reading
https://aphyr.com/posts/294-call-me-maybe-cassandra
Not sure if there is comparable article about HBase (anybody knows?) but it
can give you another perspective about what else to keep an eye on
regarding these systems.

2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe

3) http://blog.parsely.com/post/1928/cass/
*Anyone have any comments on this?*

4) 1. No killer features comparing to hbase
2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool for
Cassandra but it doesn't support vnodes.
3. Rumors say it fast when it works;) the reason- it can silently drop data
you try to write.
4. Timeseries is a nightmare. The easiest approach is just replicate data
to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala

5)  Migrated from Cassandra to HBase.
Reasons:
Scan is fast with HBase. It fits better with time series data model. Please
look at opentsdb. Cassandra models it with large rows.
Server side filtering. You can use to filter some of your time series data
on the server side.
Hbase has a better integration with hadoop in general. We had to write our
own bulk loader using mapreduce for cassandra. hbase has already had a tool
for that. There is a nice integration with flume and kite.
High availability didnet matter for us. 10 secs down is fine for our use
cases.HBase started to support eventually consistent reads.

6) Coprocessor framework (custom code inside Region Server and
MasterServers), which Cassandra is missing, afaik.
   Coprocessors have been widely used by hBase users (Phoenix SQL, for
example) since inception (in 0.92).
* HBase security model is more mature and align well with Hadoop/HDFS
security. Cassandra provides just basic authentication/authorization/SSL
encryption, no Kerberos, no end-to-end data encryption,
no cell level security.

7) Another point to add is the new "HBase read high-availability using
timeline-consistent region replicas" feature from HBase 1.0 onward, which
brings HBase closer to Cassandra in term of Read Availability during
node failures.  You have a choice for Read Availability now.
https://issues.apache.org/jira/browse/HBASE-10070

8) Hbase can do range scans, and one can attack many problems with range
scans. Cassandra can't do range scans.

9) HBase is a distributed, consistent, sorted key value store. The "sorted"
bit allows for range scans in addition to the point gets that all K/V
stores support. Nothing more, nothing less.
It happens to store its data in HDFS by default, and we provide convenient
input and output formats for map reduce.

*Neutral:*
1)
http://khangaonkar.blogspot.com/2013/09/cassandra-vs-hbase-which-nosql-store-do.html

2) The fundamental differences that come to mind are:
* HBase is always consistent. Machine outages lead to inability to read or
write data on that machine. With Cassandra you can always write.

* Cassandra defaults to a random partitioner, so range scans are not
possible (by default)
* HBase has a range partitioner (if you don't want that the client has to
prefix the rowkey with a prefix of a hash of the rowkey). The main feature
that set HBase apart are range scans.

* HBase is much more tightly integrated with Hadoop/MapReduce/HDFS, etc.
You can map reduce directly into HFiles and map those into HBase instantly.

* Cassandra has a dedicated company supporting (and promoting) it.
* Getting s

Re: Hbase vs Cassandra

2015-06-08 Thread Jens Rantil
Hi,

Some minor comments:

> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
for Cassandra but it doesn't support vnodes.

Not entirely sure what you mean here, but we ran Cloudera for a while and
Cloudera Manager was buggy and hard to debug. Overall, our experience
wasn't very good. This was definitely also due to us not knowing how all
the Cloudera packages were configured.

> HBase is always consistent. Machine outages lead to inability to read or
write data on that machine. With Cassandra you can always write.

Sort of true. You can decide write consistency and throw an exception if
write didn't go through consistently. However, do note that Cassandra will
never rollback failed writes which means writes aren't atomic (as in ACID).

We chose Cassandra over HBase mostly due to ease of managability. We are a
small team, and my feeling is that you will want dedicated people taking
care of a Hadoop cluster if you are going down the HBase path. A Cassandra
cluster can be handled by a single engineer and is, in my opinion, easier
to maintain.

Cheers,
Jens

On Mon, Jun 8, 2015 at 9:59 AM, Ajay  wrote:

> Hi All,
>
> Thanks for all the input. I posted the same question in HBase forum and
> got more response.
>
> Posting the consolidated list here.
>
> Our case is that a central team builds and maintain the platform
> (Cassandra as a service). We have couple of usecases which fits Cassandra
> like time-series data. But as a platform team, we need to know more
> features and usecases which fits or best handled in Cassandra. Also to
> understand the usecases where HBase performs better (we might need to have
> it as a service too).
>
> *Cassandra:*
>
> 1) From 2013 both can still be relevant:
> http://www.pythian.com/blog/watch-hbase-vs-cassandra/
>
> 2) Here are some use cases from PlanetCassandra.org of companies who chose
> Cassandra over HBase after evaluation, or migrated to Cassandra from HBase.
> The eComNext interview cited on the page touches on time-series data;
> http://planetcassandra.org/hbase-to-cassandra-migration/
>
> 3) From googling, the most popular advantages for Cassandra over HBase is
> easy to deploy, maintain & monitor and no single point of failure.
>
> 4) From our six months research and POC experience in Cassandra, CQL is
> pretty limited. Though CQL is targeted for Real time Read and Write, there
> are cases where need to pull out data differently and we are OK with little
> more latency. But Cassandra doesn't support that. We need MapReduce or
> Spark for those. Then the debate starts why Cassandra and why not HBase if
> we need Hadoop/Spark for MapReduce.
>
> Expected a few more technical features/usecases that is best handled by
> Cassandra (and how it works).
>
> *HBase:*
>
> 1) As for the #4 you might be interested in reading
> https://aphyr.com/posts/294-call-me-maybe-cassandra
> Not sure if there is comparable article about HBase (anybody knows?) but
> it can give you another perspective about what else to keep an eye on
> regarding these systems.
>
> 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe
>
> 3) http://blog.parsely.com/post/1928/cass/
> *Anyone have any comments on this?*
>
> 4) 1. No killer features comparing to hbase
> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
> for Cassandra but it doesn't support vnodes.
> 3. Rumors say it fast when it works;) the reason- it can silently drop
> data you try to write.
> 4. Timeseries is a nightmare. The easiest approach is just replicate data
> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>
> 5)  Migrated from Cassandra to HBase.
> Reasons:
> Scan is fast with HBase. It fits better with time series data model.
> Please look at opentsdb. Cassandra models it with large rows.
> Server side filtering. You can use to filter some of your time series data
> on the server side.
> Hbase has a better integration with hadoop in general. We had to write our
> own bulk loader using mapreduce for cassandra. hbase has already had a tool
> for that. There is a nice integration with flume and kite.
> High availability didnet matter for us. 10 secs down is fine for our use
> cases.HBase started to support eventually consistent reads.
>
> 6) Coprocessor framework (custom code inside Region Server and
> MasterServers), which Cassandra is missing, afaik.
>Coprocessors have been widely used by hBase users (Phoenix SQL, for
> example) since inception (in 0.92).
> * HBase security model is more mature and align well with Hadoop/HDFS
> security. Cassandra provides just basic authentication/authorization/SSL
> encryption, no Kerberos, no end-to-end data encryption,
> no cell level security.
>
> 7) Another point to add is the new "HBase read high-availability using
> timeline-consistent region replicas" feature from HBase 1.0 onward, which
> brings HBase closer to Cassandra in term of Read Availability during
> node failures.  You have a choice for Read A

Ghost compaction process

2015-06-08 Thread Arturas Raizys
Hello,

I'm having problem there in 1 node I have continues compaction process
running and consuming CPU. nodetool tpstats show 1 compaction in
progress, but if I try to query system.compactions_in_progress table, I
see 0 records. This never ending compaction does slow down node and it
becomes laggy.
I'm willing to hire a contractor to solve this problem if anyone is
interested.


Cheers,
Arturas


Re: Hbase vs Cassandra

2015-06-08 Thread Ajay
Hi Jens,

All the points listed weren't from me. I posted the HBase Vs Cassandra in
both the forums and consolidated here for the discussion.


On Mon, Jun 8, 2015 at 2:27 PM, Jens Rantil  wrote:

> Hi,
>
> Some minor comments:
>
> > 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
> for Cassandra but it doesn't support vnodes.
>
> Not entirely sure what you mean here, but we ran Cloudera for a while and
> Cloudera Manager was buggy and hard to debug. Overall, our experience
> wasn't very good. This was definitely also due to us not knowing how all
> the Cloudera packages were configured.
>

*>>> This is the one of the response I got it from HBase forum. Datastax
OpsCenter is there but seems it doesn't support the latest Cassandra
versions (we tried it couple of times and there were bugs too)*

>
> > HBase is always consistent. Machine outages lead to inability to read
> or write data on that machine. With Cassandra you can always write.
>
> Sort of true. You can decide write consistency and throw an exception if
> write didn't go through consistently. However, do note that Cassandra will
> never rollback failed writes which means writes aren't atomic (as in ACID).
>
> *>>> If I understand correctly, you mean when we write with QUORUM and
Cassandra writes to few machines and fails to write to few machines and
throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
doesn't rollback?. *


> We chose Cassandra over HBase mostly due to ease of managability. We are a
> small team, and my feeling is that you will want dedicated people taking
> care of a Hadoop cluster if you are going down the HBase path. A Cassandra
> cluster can be handled by a single engineer and is, in my opinion, easier
> to maintain.
>

*>>> This is the most popular reason for Cassandra over HBase. But this
alone is not a sufficient driver. *


> Cheers,
> Jens
>
> On Mon, Jun 8, 2015 at 9:59 AM, Ajay  wrote:
>
>> Hi All,
>>
>> Thanks for all the input. I posted the same question in HBase forum and
>> got more response.
>>
>> Posting the consolidated list here.
>>
>> Our case is that a central team builds and maintain the platform
>> (Cassandra as a service). We have couple of usecases which fits Cassandra
>> like time-series data. But as a platform team, we need to know more
>> features and usecases which fits or best handled in Cassandra. Also to
>> understand the usecases where HBase performs better (we might need to have
>> it as a service too).
>>
>> *Cassandra:*
>>
>> 1) From 2013 both can still be relevant:
>> http://www.pythian.com/blog/watch-hbase-vs-cassandra/
>>
>> 2) Here are some use cases from PlanetCassandra.org of companies who
>> chose Cassandra over HBase after evaluation, or migrated to Cassandra from
>> HBase.
>> The eComNext interview cited on the page touches on time-series data;
>> http://planetcassandra.org/hbase-to-cassandra-migration/
>>
>> 3) From googling, the most popular advantages for Cassandra over HBase is
>> easy to deploy, maintain & monitor and no single point of failure.
>>
>> 4) From our six months research and POC experience in Cassandra, CQL is
>> pretty limited. Though CQL is targeted for Real time Read and Write, there
>> are cases where need to pull out data differently and we are OK with little
>> more latency. But Cassandra doesn't support that. We need MapReduce or
>> Spark for those. Then the debate starts why Cassandra and why not HBase if
>> we need Hadoop/Spark for MapReduce.
>>
>> Expected a few more technical features/usecases that is best handled by
>> Cassandra (and how it works).
>>
>> *HBase:*
>>
>> 1) As for the #4 you might be interested in reading
>> https://aphyr.com/posts/294-call-me-maybe-cassandra
>> Not sure if there is comparable article about HBase (anybody knows?) but
>> it can give you another perspective about what else to keep an eye on
>> regarding these systems.
>>
>> 2) See http://hbase.apache.org/book.html#perf.network.call_me_maybe
>>
>> 3) http://blog.parsely.com/post/1928/cass/
>> *Anyone have any comments on this?*
>>
>> 4) 1. No killer features comparing to hbase
>> 2.terrible!!! Ambari/cloudera manager rulezzz. Netflix has its own tool
>> for Cassandra but it doesn't support vnodes.
>> 3. Rumors say it fast when it works;) the reason- it can silently drop
>> data you try to write.
>> 4. Timeseries is a nightmare. The easiest approach is just replicate data
>> to hdfs, partition it by hour/day and run spark/scalding/pig/hive/Impala
>>
>> 5)  Migrated from Cassandra to HBase.
>> Reasons:
>> Scan is fast with HBase. It fits better with time series data model.
>> Please look at opentsdb. Cassandra models it with large rows.
>> Server side filtering. You can use to filter some of your time series
>> data on the server side.
>> Hbase has a better integration with hadoop in general. We had to write
>> our own bulk loader using mapreduce for cassandra. hbase has already had a
>> tool for that. There is a nice integration with f

Re: Ghost compaction process

2015-06-08 Thread Tim Heckman
Does `nodetool comactionstats` show nothing running as well? Also, for
posterity what are some details of the setup (C* version, etc.)?

-Tim

--
Tim Heckman
Operations Engineer
PagerDuty, Inc.


On Sun, Jun 7, 2015 at 6:40 PM, Arturas Raizys 
wrote:

> Hello,
>
> I'm having problem there in 1 node I have continues compaction process
> running and consuming CPU. nodetool tpstats show 1 compaction in
> progress, but if I try to query system.compactions_in_progress table, I
> see 0 records. This never ending compaction does slow down node and it
> becomes laggy.
> I'm willing to hire a contractor to solve this problem if anyone is
> interested.
>
>
> Cheers,
> Arturas
>


Re: Ghost compaction process

2015-06-08 Thread Arturas Raizys
Hi,

> Does `nodetool comactionstats` show nothing running as well? Also, for
> posterity what are some details of the setup (C* version, etc.)?

`nodetool comactionstats` does not return anything, it just waits.
If I do enable DEBUG logging, I see this line poping up while executing
`nodetool compactionstats` :
DEBUG [RMI TCP Connection(1856)-127.0.0.1] 2015-06-08 09:29:46,043
LeveledManifest.java:680 - Estimating [0, 0, 0, 0, 0, 0, 0, 0, 0]
compactions to do for system.paxos

I'm running Cassandra 2.1.14 with 7 node cluster. We're using small VM
with 8GB of ram and SSD. Our data size per node with RF=2 is ~40GB. Load
is ~ 1000 writes/second. Most of the data TTL is 2weeks.


Cheers,
Arturas


Re: Ghost compaction process

2015-06-08 Thread Carlos Rolo
HI,

Is it 2.0.14 or 2.1.4? If you are on 2.1.4 I would recommend an upgrade to
2.1.5 regardless of that issue.

>From the data you provide it is difficult to access what is the issue. If
you are running with RF=2 you can always add another node and kill that one
if that is the only node that shows that problem. With 40GB load is not a
big issue.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Jun 8, 2015 at 4:04 AM, Arturas Raizys 
wrote:

> Hi,
>
> > Does `nodetool comactionstats` show nothing running as well? Also, for
> > posterity what are some details of the setup (C* version, etc.)?
>
> `nodetool comactionstats` does not return anything, it just waits.
> If I do enable DEBUG logging, I see this line poping up while executing
> `nodetool compactionstats` :
> DEBUG [RMI TCP Connection(1856)-127.0.0.1] 2015-06-08 09:29:46,043
> LeveledManifest.java:680 - Estimating [0, 0, 0, 0, 0, 0, 0, 0, 0]
> compactions to do for system.paxos
>
> I'm running Cassandra 2.1.14 with 7 node cluster. We're using small VM
> with 8GB of ram and SSD. Our data size per node with RF=2 is ~40GB. Load
> is ~ 1000 writes/second. Most of the data TTL is 2weeks.
>
>
> Cheers,
> Arturas
>

-- 


--





Re: Hbase vs Cassandra

2015-06-08 Thread Jens Rantil
On Mon, Jun 8, 2015 at 11:16 AM, Ajay  wrote:

> >>> If I understand correctly, you mean when we write with QUORUM and
> Cassandra writes to few machines and fails to write to few machines and
> throws exception if it doesn't satisfy QUORUM, leaving it inconsistent and
> doesn't rollback?.


Yes.

/Jens


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.ran...@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook  Linkedin

 Twitter 


Re: Avoiding Data Duplication

2015-06-08 Thread Paulo Motta
Some options I can think of:

1 - depending on your data size and stime query frequency, you may use
spark to peform queries filtering by server time in the log table, maybe
within an device time window to reduce the dataset your spark job will need
to go through. more info on the spark connector:
https://github.com/datastax/spark-cassandra-connector

2 - if dtime and stime are almost always in the same date bucket
(day/hour/minute/second), you may create an additional table stable_log
with the same structure, but the date bucket refers to the sdate field. so,
when you have an entry when stime and dtime are not from the same bucket,
you should insert that entry in both the log and stime_log tables. when you
want to query entries by stime, you take the distinct union of the query of
both tables in your client application. this way, you only duplicate
delayed data.

3 - if you "data" field is big and you can't afford duplicating that,
create an additional table stable_log, but do not store the data field,
only the metadata (imei, date, dtime, stime).. so when you want to query by
stime, first query the stable_log, and then query the original log table to
fetch the data field.

2015-06-05 18:10 GMT-03:00 Abhishek Singh Bailoo <
abhishek.singh.bai...@gmail.com>:

> Hello!
>
> I have a column family to log in data coming from my GPS devices.
>
> CREATE TABLE log(
>   imei ascii,
>   date ascii,
>   dtime timestamp,
>   data ascii,
>   stime timestamp,
>   PRIMARY KEY ((imei, date), dtime))
>   WITH CLUSTERING ORDER BY (dtime DESC)
> ;
>
> It is the standard schema for modeling time series data where
> imei is the unique ID associated with each GPS device
> date is the date taken from dtime
> dtime is the date-time coming from the device
> data is all the latitude, longitude etc that the device is sending us
> stime is the date-time stamp of the server
>
> The reason why I put dtime in the primary key as the clustering column is
> because most of our queries are done on device time. There can be a delay
> of a few minutes to a few hours (or a few days! in rare cases) between
> dtime and stime if the network is not available.
>
> However, now we want to query on server time as well for the purpose of
> debugging. These queries will be not as common as queries on  device time.
> Say for every 100 queries on dtime there will be just 1 query on stime.
>
> What options do I have?
>
> 1. Seconday Index - not possible because stime is a timestamp and CQL does
> not allow me to put < or > in the query for secondary index
>
> 2. Data duplication - I can build another column family where I will index
> by stime but that means I am storing twice as much data. I know everyone
> says that write operations are cheap and storage is cheap but how? If I
> have to buy twice as many machines on AWS EC2 each with their own ephemeral
> storage, then my bill doubles up!
>
> Any other ideas I can try?
>
> Many Thanks,
> Abhishek
>


Restoring all cluster from snapshots

2015-06-08 Thread Anton Koshevoy
Hello all.

I need to transfer and start the copy of production cluster in a test 
environment. My steps:

- nodetool snapshot -t `hostname`-#{cluster_name}-#{timestamp} -p #{jmx_port}
- nodetool ring -p #{jmx_port} | grep `/sbin/ifconfig eth0 | grep 'inet addr' | 
awk -F: '{print $2}' | awk '{print $1}'` | awk '{ print $NF }' | tr '\\n' ',' | 
sudo tee /etc/cassandra/#{cluster_name}.conf/tokens.txt
- rsync snapshots to the backup machine
- copy files to the 2 test servers in the same folders as on production.
- sudo rm -rf /db/cassandra/cr/data0*/system/*
- paste list of initial_token from step 2 to the cassandra.yaml file on each 
server
- start both test servers.

And instead of gigabytes of my keyspaces I see only:

Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                  
             Rack
UN  10.40.231.3   151.06 KB  256     100.0%            
c505db2f-d14a-4044-949f-cb952ec022f6  RACK01
UN  10.40.231.31  134.59 KB  256     100.0%            
12879849-ade0-4dcb-84c0-abb3db996ba7  RACK01

And any mentions about my keyspaces here:

[cqlsh 5.0.1 | Cassandra 2.1.3 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh>
cqlsh> describe keyspaces

system_traces  system
cqlsh>

What Do I miss in this process?

Re: sstableloader usage doubts

2015-06-08 Thread ZeroUno

Il 05/06/15 22:40, Robert Coli ha scritto:


On Fri, Jun 5, 2015 at 7:53 AM, Sebastian Estevez
mailto:sebastian.este...@datastax.com>>
wrote:

Since you only restored one dc's sstables, you should be able to
rebuild them on the second DC.

Refresh means pick up new SSTables that have been directly added to
the data directory.

Rebuild means stream data from other replicas to re create SSTables
from scratch.

Sebastian's response is correct; use rebuild. Sorry that I missed that
specific aspect of your question!


Thank you both.

So you mean that "refresh" needs to be used if the cluster is running, 
but if I stopped cassandra while copying the sstables then refresh is 
useless? So the error "No new SSTables were found" during my refresh 
attempt is due to the fact that the sstables in my data dir were not 
"new" because already loaded, and not to the files not being found?


So... if I stop the two nodes on the first DC, restore their sstables' 
files, and then restart the nodes, nothing else needs to be done on the 
first DC?


And on the second DC instead I just need to do "nodetool rebuild -- 
FirstDC" on _both_ nodes?


--
01



[RELEASE] Apache Cassandra 2.1.6 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.1.6.  We are now calling 2.1 series stable and suitable for
production.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.1 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/8aR9L2 (CHANGES.txt)
[2]: http://goo.gl/dstU4D (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


[RELEASE] Apache Cassandra 2.2.0-rc1 released

2015-06-08 Thread Jake Luciani
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.2.0-rc1.

Apache Cassandra is a fully distributed database. It is the right choice
when you need scalability and high availability without compromising
performance.

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a release candidate[1] on the 2.2 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/pBjybx (CHANGES.txt)
[2]: http://goo.gl/E1RiHd (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


Re: Cassandra crashes daily; nothing on the log

2015-06-08 Thread Paulo Motta
try checking your system logs (generally /var/log/syslog) to check if the
cassandra process was killed by the OS oom-killer

2015-06-06 15:39 GMT-03:00 Brian Sam-Bodden :

> Berk,
>1 GB is not enough to run C*, the minimum memory we use on Digital
> Ocean is 4GB.
>
> Cheers,
> Brian
> http://integrallis.com
>
> On Sat, Jun 6, 2015 at 10:50 AM,  wrote:
>
>> Hi all,
>>
>> I've installed Cassandra on a test server hosted on Digital Ocean. The
>> server has 1GB RAM, and is running a single docker container alongside C*.
>> Somehow, every night, the Cassandra instance crashes. The annoying part is
>> that I cannot see anything wrong with the log files, so I can't tell what's
>> going on.
>>
>> The log files are here:
>> http://pastebin.com/Zquu5wvd
>>
>> Do you have any idea what's going on? Can you suggest some ways I can try
>> to troubleshoot this?
>>
>> Thanks!
>>  Berk
>>
>
>
>
> --
> Cheers,
> Brian
> http://www.integrallis.com
>


Re: Cassandra crashes daily; nothing on the log

2015-06-08 Thread Bryan Holladay
It could be the linux kernel killing Cassandra b/c of memory usage. When
this happens, nothing is logged in Cassandra. Check the system
logs: /var/log/messages  Look for a message saying "Out of Memory"... "kill
process"...

On Mon, Jun 8, 2015 at 1:37 PM, Paulo Motta 
wrote:

> try checking your system logs (generally /var/log/syslog) to check if the
> cassandra process was killed by the OS oom-killer
>
> 2015-06-06 15:39 GMT-03:00 Brian Sam-Bodden :
>
>> Berk,
>>1 GB is not enough to run C*, the minimum memory we use on Digital
>> Ocean is 4GB.
>>
>> Cheers,
>> Brian
>> http://integrallis.com
>>
>> On Sat, Jun 6, 2015 at 10:50 AM,  wrote:
>>
>>> Hi all,
>>>
>>> I've installed Cassandra on a test server hosted on Digital Ocean. The
>>> server has 1GB RAM, and is running a single docker container alongside C*.
>>> Somehow, every night, the Cassandra instance crashes. The annoying part is
>>> that I cannot see anything wrong with the log files, so I can't tell what's
>>> going on.
>>>
>>> The log files are here:
>>> http://pastebin.com/Zquu5wvd
>>>
>>> Do you have any idea what's going on? Can you suggest some ways I can
>>> try to troubleshoot this?
>>>
>>> Thanks!
>>>  Berk
>>>
>>
>>
>>
>> --
>> Cheers,
>> Brian
>> http://www.integrallis.com
>>
>
>


Re: sstableloader usage doubts

2015-06-08 Thread Robert Coli
On Mon, Jun 8, 2015 at 6:58 AM, ZeroUno  wrote:

> So you mean that "refresh" needs to be used if the cluster is running, but
> if I stopped cassandra while copying the sstables then refresh is useless?
> So the error "No new SSTables were found" during my refresh attempt is due
> to the fact that the sstables in my data dir were not "new" because already
> loaded, and not to the files not being found?
>

Yes. You should be able to see logs of it opening the files it finds in the
data dir.


> So... if I stop the two nodes on the first DC, restore their sstables'
> files, and then restart the nodes, nothing else needs to be done on the
> first DC?
>

Be careful to avoid bootstrapping, but yes.


> And on the second DC instead I just need to do "nodetool rebuild --
> FirstDC" on _both_ nodes?


Yes.

=Rob


Re: Restoring all cluster from snapshots

2015-06-08 Thread Robert Coli
On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy  wrote:

> - sudo rm -rf /db/cassandra/cr/data0*/system/*
>

This removes the schema. You can't load SSTables for column families which
don't exist.

=Rob


Deserialize the collection type data from the SSTable file

2015-06-08 Thread java8964
Hi, Cassandra users:
I have a question related to how to Deserialize the new collection types data 
in the Cassandra 2.x. (The exactly version is C 2.0.10).
I create the following example tables in the CQLSH:
CREATE TABLE coupon (  account_id bigint,  campaign_id uuid,  
,  discount_info map,  
,  PRIMARY KEY (account_id, campaign_id))
The other columns can be ignored in this case. Then I inserted into the one 
test data like this:
insert into coupon (account_id, campaign_id, discount_info) values (111,uuid(), 
{'test_key':'test_value'});
After this, I got the SSTable files. I use the sstable2json file to check the 
output:
$./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db[{"key": 
"006f","columns": 
[["0336e50d-21aa-4b3a-9f01-989a8c540e54:","",1433792922055000], 
["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info","0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!",1433792922054999,"t",1433792922],
 
["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579","746573745f76616c7565",1433792922055000]]}]
 
What I want to is to get the {"test_key" : "test_value"} as key/value pair that 
I input into "discount_info" column. I followed the sstable2json code, and try 
to deserialize the data by myself, but to my surprise, I cannot make it work, 
even I tried several ways, but kept getting Exception.
>From what I researched, I know that Cassandra put the "campaign_id" + 
>"discount_info" + "Another ByteBuffer" as composite column in this case. When 
>I deserialize this columnName, I got the following dumped out as String:
"0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579".
It includes 3 parts: the first part is the uuid for the campaign_id. The 2nd 
part as "discount_info", which is the static name I defined in the table. The 3 
part is a bytes array as length of 46, which I am not sure what it is. 
The corresponding value part of this composite column is another byte array as 
length of 10, hex as "746573745f76616c7565" if I dump it out.
Now, here is what I did and not sure why it doesn't work. First, I assume the 
value part stores the real value I put in the Map, so I did the following:
ByteBuffer value = ByteBufferUtil.clone(column.value());MapType 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map output = result.compose(value);// it gave me the following 
exception: org.apache.cassandra.serializers.MarshalException: Not enough bytes 
to read a mapThen I am think that the real value must be stored as part of the 
column names (the 3rd part of 46 bytes), so I did this:MapType 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map output = result.compose(third_part.value);// I got the 
following exception:java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
at 
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)
I can get all other non-collection types data, but I cannot get the data from 
the Map. My questions are:1) How does the Cassandra store the collection data 
in the SSTable files? From the length of bytes, it is most likely as part of 
the composite column. If so, why I got the exception as above? 2) The 
sstable2json doesn't deserialize the real data out from the collection type. So 
I don't have an example to follow. Do I use the wrong way trying to compose the 
Map type data?
Thanks
Yong  

Deserialize the collection type data from the SSTable file

2015-06-08 Thread java8964
Hi, Cassandra users:
I have a question related to how to Deserialize the new collection types data 
in the Cassandra 2.x. (The exactly version is C 2.0.10).
I create the following example tables in the CQLSH:
CREATE TABLE coupon (  account_id bigint,  campaign_id uuid,  
,  discount_info map,  
,  PRIMARY KEY (account_id, campaign_id))
The other columns can be ignored in this case. Then I inserted into the one 
test data like this:
insert into coupon (account_id, campaign_id, discount_info) values (111,uuid(), 
{'test_key':'test_value'});
After this, I got the SSTable files. I use the sstable2json file to check the 
output:
$./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db[{"key": 
"006f","columns": 
[["0336e50d-21aa-4b3a-9f01-989a8c540e54:","",1433792922055000], 
["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info","0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!",1433792922054999,"t",1433792922],
 
["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579","746573745f76616c7565",1433792922055000]]}]
 
What I want to is to get the {"test_key" : "test_value"} as key/value pair that 
I input into "discount_info" column. I followed the sstable2json code, and try 
to deserialize the data by myself, but to my surprise, I cannot make it work, 
even I tried several ways, but kept getting Exception.
>From what I researched, I know that Cassandra put the "campaign_id" + 
>"discount_info" + "Another ByteBuffer" as composite column in this case. When 
>I deserialize this columnName, I got the following dumped out as String:
"0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579".
It includes 3 parts: the first part is the uuid for the campaign_id. The 2nd 
part as "discount_info", which is the static name I defined in the table. The 3 
part is a bytes array as length of 46, which I am not sure what it is. 
The corresponding value part of this composite column is another byte array as 
length of 10, hex as "746573745f76616c7565" if I dump it out.
Now, here is what I did and not sure why it doesn't work. First, I assume the 
value part stores the real value I put in the Map, so I did the following:
ByteBuffer value = ByteBufferUtil.clone(column.value());MapType 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map output = result.compose(value);// it gave me the following 
exception: org.apache.cassandra.serializers.MarshalException: Not enough bytes 
to read a mapThen I am think that the real value must be stored as part of the 
column names (the 3rd part of 46 bytes), so I did this:MapType 
result = MapType.getInstance(UTF8Type.instance, UTF8Type.instance);
Map output = result.compose(third_part.value);// I got the 
following exception:java.lang.IllegalArgumentException
at java.nio.Buffer.limit(Buffer.java:267)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
at 
org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
at 
org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
at 
org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)
I can get all other non-collection types data, but I cannot get the data from 
the Map. My questions are:1) How does the Cassandra store the collection data 
in the SSTable files? From the length of bytes, it is most likely as part of 
the composite column. If so, why I got the exception as above? 2) The 
sstable2json doesn't deserialize the real data out from the collection type. So 
I don't have an example to follow. Do I use the wrong way trying to compose the 
Map type data?
Thanks
Yong  

Re: Restoring all cluster from snapshots

2015-06-08 Thread Anton Koshevoy
Rob, thanks for the answer.

I just follow instruction from 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

If not to remove system table data, the test cluster starts interfering to a 
production cluster. How Can I avoid this situation?



On June 8, 2015 at 9:48:30 PM, Robert Coli (rc...@eventbrite.com) wrote:

On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy  wrote:
- sudo rm -rf /db/cassandra/cr/data0*/system/*

This removes the schema. You can't load SSTables for column families which 
don't exist.
 
=Rob



Re: Restoring all cluster from snapshots

2015-06-08 Thread Alain RODRIGUEZ
I think you just have to do a "DESC KEYSPACE mykeyspace;" from one node of
the production cluster then copy the output and import it in your dev
cluster using "cqlsh -f output.cql".

Take care at the start of the output you might want to change DC names, RF
or strategy.

Also, if you don't want to restart nodes you can load data by using
"nodetool refresh mykeyspace mycf"

C*heers

Alain

2015-06-08 22:42 GMT+02:00 Anton Koshevoy :

> Rob, thanks for the answer.
>
> I just follow instruction from
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html
>
> If not to remove system table data, the test cluster starts interfering
> to a production cluster. How Can I avoid this situation?
>
>
>
> On June 8, 2015 at 9:48:30 PM, Robert Coli (rc...@eventbrite.com) wrote:
>
>  On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy  wrote:
>
>>  - sudo rm -rf /db/cassandra/cr/data0*/system/*
>>
>
> This removes the schema. You can't load SSTables for column families which
> don't exist.
>
>  =Rob
>
>


RE: Restoring all cluster from snapshots

2015-06-08 Thread Sanjay Baronia
Yes, you shouldn’t delete the system directory. Next steps are …reconfigure the 
test cluster with new IP addresses, clear the gossiping information and then 
boot the test cluster.

If you are running Cassandra on VMware,  then you may also want to look at this 
solution
 from Trilio Data, where you can create a Cassandra backup and restore it to a 
Test Cluster.

Regards,

Sanjay

_
Sanjay Baronia
VP of Product & Solutions Management
TrilioData
(c) 508-335-2306
sanjay.baro...@triliodata.com
[Trilio-Business Assurance_300 Pixels]

Experience Trilio in action, please click 
here to request a demo 
today!

From: Anton Koshevoy [mailto:nowa...@gmail.com]
Sent: Monday, June 8, 2015 4:42 PM
To: user@cassandra.apache.org
Subject: Re: Restoring all cluster from snapshots

Rob, thanks for the answer.

I just follow instruction from 
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_snapshot_restore_new_cluster.html

If not to remove system table data, the test cluster starts interfering to a 
production cluster. How Can I avoid this situation?




On June 8, 2015 at 9:48:30 PM, Robert Coli 
(rc...@eventbrite.com) wrote:
On Mon, Jun 8, 2015 at 6:22 AM, Anton Koshevoy 
mailto:nowa...@gmail.com>> wrote:
- sudo rm -rf /db/cassandra/cr/data0*/system/*

This removes the schema. You can't load SSTables for column families which 
don't exist.

=Rob



Re: Deserialize the collection type data from the SSTable file

2015-06-08 Thread Daniel Chia
I'm not sure why sstable2json doesn't work for collections, but if you're
into reading raw sstables we use the following code with good success:

https://github.com/coursera/aegisthus/blob/77c73f6259f2a30d3d8ca64578be5c13ecc4e6f4/aegisthus-hadoop/src/main/java/org/coursera/mapreducer/CQLMapper.java#L85

Thanks,
Daniel

On Mon, Jun 8, 2015 at 1:22 PM, java8964  wrote:

> Hi, Cassandra users:
>
> I have a question related to how to Deserialize the new collection types
> data in the Cassandra 2.x. (The exactly version is C 2.0.10).
>
> I create the following example tables in the CQLSH:
>
> CREATE TABLE coupon (
>   account_id bigint,
>   campaign_id uuid,
>   ,
>   discount_info map,
>   ,
>   PRIMARY KEY (account_id, campaign_id)
> )
>
> The other columns can be ignored in this case. Then I inserted into the
> one test data like this:
>
> insert into coupon (account_id, campaign_id, discount_info) values
> (111,uuid(), {'test_key':'test_value'});
>
> After this, I got the SSTable files. I use the sstable2json file to check
> the output:
>
> $./resources/cassandra/bin/sstable2json /xxx/test-coupon-jb-1-Data.db
> [
> {"key": "006f","columns":
> [["0336e50d-21aa-4b3a-9f01-989a8c540e54:","",1433792922055000],
> ["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info","0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:!",1433792922054999,"t",1433792922],
> ["0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579","746573745f76616c7565",1433792922055000]]}
> ]
>
> What I want to is to get the {"test_key" : "test_value"} as key/value pair
> that I input into "discount_info" column. I followed the sstable2json code,
> and try to deserialize the data by myself, but to my surprise, I cannot
> make it work, even I tried several ways, but kept getting Exception.
>
> From what I researched, I know that Cassandra put the "campaign_id" +
> "discount_info" + "Another ByteBuffer" as composite column in this case.
> When I deserialize this columnName, I got the following dumped out as
> String:
>
> "0336e50d-21aa-4b3a-9f01-989a8c540e54:discount_info:746573745f6b6579".
>
> It includes 3 parts: the first part is the uuid for the campaign_id. The
> 2nd part as "discount_info", which is the static name I defined in the
> table. The 3 part is a bytes array as length of 46, which I am not sure
> what it is.
>
> The corresponding value part of this composite column is another byte
> array as length of 10, hex as "746573745f76616c7565" if I dump it out.
>
> Now, here is what I did and not sure why it doesn't work.
> First, I assume the value part stores the real value I put in the Map, so
> I did the following:
>
> ByteBuffer value = ByteBufferUtil.clone(column.value());
>
> MapType result = MapType.getInstance(UTF8Type.instance, 
> UTF8Type.instance);
> Map output = result.compose(value);
>
> // it gave me the following exception: 
> org.apache.cassandra.serializers.MarshalException: Not enough bytes to read a 
> map
>
> Then I am think that the real value must be stored as part of the column 
> names (the 3rd part of 46 bytes), so I did this:
>
> MapType result = MapType.getInstance(UTF8Type.instance, 
> UTF8Type.instance);
> Map output = result.compose(third_part.value);
>
> // I got the following exception:
>
> java.lang.IllegalArgumentException
>   at java.nio.Buffer.limit(Buffer.java:267)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readBytes(ByteBufferUtil.java:587)
>   at 
> org.apache.cassandra.utils.ByteBufferUtil.readBytesWithShortLength(ByteBufferUtil.java:596)
>   at 
> org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:63)
>   at 
> org.apache.cassandra.serializers.MapSerializer.deserialize(MapSerializer.java:28)
>   at 
> org.apache.cassandra.db.marshal.AbstractType.compose(AbstractType.java:142)
>
>
> I can get all other non-collection types data, but I cannot get the data from 
> the Map. My questions are:
>
> 1) How does the Cassandra store the collection data in the SSTable files? 
> From the length of bytes, it is most likely as part of the composite column. 
> If so, why I got the exception as above?
>
> 2) The sstable2json doesn't deserialize the real data out from the collection 
> type. So I don't have an example to follow. Do I use the wrong way trying to 
> compose the Map type data?
>
>
> Thanks
>
>
> Yong
>
>


Re: Restoring all cluster from snapshots

2015-06-08 Thread Robert Coli
On Mon, Jun 8, 2015 at 2:52 PM, Sanjay Baronia <
sanjay.baro...@triliodata.com> wrote:

>  Yes, you shouldn’t delete the system directory. Next steps are
> …reconfigure the test cluster with new IP addresses, clear the gossiping
> information and then boot the test cluster.
>

If you don't delete the system directory, you run the risk of the test
cluster nodes joining the source cluster.

Just start a single node on the new cluster, empty, and create the schema
on it.

Then do the rest of the process.

=Rob


Re: DSE 4.7 security

2015-06-08 Thread Jack Krupansky
Cassandra authorization is at the keyspace and table level. Click on the
GRANT link on the doc page, to get more info:
http://docs.datastax.com/en/cql/3.1/cql/cql_reference/grant_r.html

Which says "*Permissions to access all keyspaces, a named keyspace, or a
table can be granted to a user.*"

There is no finer-grain authorization at the row, column, or cell level.

You might want to open a Jira for this valuable feature.

-- Jack Krupansky

On Sun, Jun 7, 2015 at 5:19 PM, Moshe Kranc  wrote:

> The DSE 4.7 documentation says: You use the familiar relational database
> GRANT/REVOKE paradigm to grant or revoke permissions to
>
> access Cassandra data.
>
> Does this mean authorization is per table?
>
> What if I need finer grain authorization, e.g., per row or even per cell
> (e.g., a specific column in a specific row may not be seen by users in a
> group)?
>
> Do I need to implement this in my application, because Cassandra does not
> support it?
>


auto clear data with ttl

2015-06-08 Thread 曹志富
I have C* 2.1.5,store some data with ttl.Reduce the gc_grace_seconds to
zero.

But it seems has no effect.

Did I miss something?
--
Ranger Tsao


Re: auto clear data with ttl

2015-06-08 Thread Aiman Parvaiz
So gc_grace zero will remove tombstones without any delay after compaction. So 
it's possible that tombstones containing SSTs still need to be compacted. So 
either you can wait for compaction to happen or do a manual compaction 
depending on your compaction strategy. Manual compaction does have some 
drawbacks so please read about it.

Sent from my iPhone

> On Jun 8, 2015, at 7:26 PM, 曹志富  wrote:
> 
> I have C* 2.1.5,store some data with ttl.Reduce the gc_grace_seconds to zero.
> 
> But it seems has no effect.
> 
> Did I miss something?
> --
> Ranger Tsao


Re: auto clear data with ttl

2015-06-08 Thread 曹志富
Thank You. I have change unchecked_tombstone_compaction to true . Major
compaction will cause a big sstable ,I think is a lot good choice

--
Ranger Tsao

2015-06-09 11:16 GMT+08:00 Aiman Parvaiz :

> So gc_grace zero will remove tombstones without any delay after
> compaction. So it's possible that tombstones containing SSTs still need to
> be compacted. So either you can wait for compaction to happen or do a
> manual compaction depending on your compaction strategy. Manual compaction
> does have some drawbacks so please read about it.
>
> Sent from my iPhone
>
> On Jun 8, 2015, at 7:26 PM, 曹志富  wrote:
>
> I have C* 2.1.5,store some data with ttl.Reduce the gc_grace_seconds to
> zero.
>
> But it seems has no effect.
>
> Did I miss something?
> --
> Ranger Tsao
>
>


C* 2.0.15 - java.lang.NegativeArraySizeException

2015-06-08 Thread Aiman Parvaiz
Hi everyone
I am running C* 2.0.9 and decided to do a rolling upgrade. Added a node of
C* 2.0.15 in the existing cluster and saw this twice:

Jun  9 02:27:20 prod-cass23.localdomain cassandra: 2015-06-09 02:27:20,658
INFO CompactionExecutor:4 CompactionTask.runMayThrow - Compacting
[SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-37-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-40-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-42-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-38-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-39-Data.db'),
SSTableReader(path='/var/lib/cassandra/data/system/schema_columns/system-schema_columns-jb-44-Data.db')]



Jun  9 02:27:20 prod-cass23.localdomain cassandra: 2015-06-09 02:27:20,669
ERROR CompactionExecutor:4 CassandraDaemon.uncaughtException - Exception in
thread Thread[CompactionExecutor:4,1,main]
Jun  9 02:27:20 prod-cass23.localdomain
*java.lang.NegativeArraySizeException*
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:335)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:462)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableReader.getAncestors(SSTableReader.java:1366)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata.createCollector(SSTableMetadata.java:134)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionTask.createCompactionWriter(CompactionTask.java:316)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:162)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
Jun  9 02:27:20 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:198)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.FutureTask.run(FutureTask.java:262)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun  9 02:27:20 prod-cass23.localdomain at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun  9 02:27:20 prod-cass23.localdomain at
java.lang.Thread.run(Thread.java:745)
Jun  9 02:27:47 prod-cass23.localdomain cassandra: 2015-06-09 02:27:47,725
INFO main StorageService.setMode - JOINING: Starting to bootstrap...

As you can see this happened first time even before Joining. Second
occasion stack trace:

Jun  9 02:32:15 prod-cass23.localdomain cassandra: 2015-06-09 02:32:15,097
ERROR CompactionExecutor:6 CassandraDaemon.uncaughtException - Exception in
thread Thread[CompactionExecutor:6,1,main]
Jun  9 02:32:15 prod-cass23.localdomain java.lang.NegativeArraySizeException
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.utils.EstimatedHistogram$EstimatedHistogramSerializer.deserialize(EstimatedHistogram.java:335)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:462)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:448)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata$SSTableMetadataSerializer.deserialize(SSTableMetadata.java:432)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableReader.getAncestors(SSTableReader.java:1366)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.io.sstable.SSTableMetadata.createCollector(SSTableMetadata.java:134)
Jun  9 02:32:15 prod-cass23.localdomain at
org.apache.cassandra.db.compaction.Compa

Support for ad-hoc query

2015-06-08 Thread Srinivasa T N
Hi All,
   I have an web application running with my backend data stored in
cassandra.  Now I want to do some analysis on the data stored which
requires some ad-hoc queries fired on cassandra.  How can I do the same?

Regards,
Seenu.