Re: paging through an entire table in chunks?

2014-09-29 Thread Brice Dutheil
You may be using the async feature

of the java driver. In order to manage complexity related to do several
queries I used RxJava, it leverages readability and asynchronicity in a
very elegant way (much more than Futures). However you may need to code
some code to bridge Rx and the Java driver but it’s worth it.

— Brice

On Sun, Sep 28, 2014 at 12:57 AM, Kevin Burton  wrote:

Agreed… but I’d like to parallelize it… Eventually I’ll just have too much
> data to do it on one server… plus, I need suspend/resume and this way if
> I’m doing like 10MB at a time I’ll be able to suspend / resume as well as
> track progress.
>
> On Sat, Sep 27, 2014 at 2:52 PM, DuyHai Doan  wrote:
>
>> Use the java driver and paging feature:
>> http://www.datastax.com/drivers/java/2.1/com/datastax/driver/core/Statement.html#setFetchSize(int)
>>
>> 1) Do you "SELECT * FROM" without any selection
>> 2) Set fetchSize to a sensitive value
>> 3) Execute the query and get an iterator from the ResultSet
>> 4) Iterate
>>
>>
>>
>> On Sat, Sep 27, 2014 at 11:42 PM, Kevin Burton 
>> wrote:
>>
>>> I need a way to do a full table scan across all of our data.
>>>
>>> Can’t I just use token() for this?
>>>
>>> This way I could split up our entire keyspace into say 1024 chunks, and
>>> then have one activemq task work with range 0, then range 1, etc… that way
>>> I can easily just map() my whole table.
>>>
>>> and since it’s token() I should (generally) read a contiguous range from
>>> a given table.
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>  ​


DSE install interfering with apache Cassandra 2.1.0

2014-09-29 Thread Andrew Cobley
Hi All,

Just come across this one, I’m at a bit of a loss on how to fix it.

A user here did the following steps

On a MAC
Install Datastax Enterprise (DSE) using the dmg file
test he can connect using the DSE cqlsh window
Unistall DSE (full uninstall which stops the services)

download apache cassandra 2.1.0
unzip
change to the non directory run sudo ./cassandra

Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin   he 
gets

Connection error: ('Unable to connect to any servers', {'127.0.0.1': 
ConnectionShutdown('Connection  is already closed',)})

This is probably related to
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E

but I can’t see why the uninstall of DSE is leaving the apache cassandra 
release cqlsh unable to attach to the apache cassandra runtime.

Ta
Andy



The University of Dundee is a registered Scottish Charity, No: SC015096


Re: DSE install interfering with apache Cassandra 2.1.0

2014-09-29 Thread Sumod Pawgi
Please run jps to check which Java services are still running and to make sure 
if c* is running. Then please check if 9160 port is in use. netstat -nltp | 
grep 9160

This will confirm what is happening in your case.

Sent from my iPhone

> On 29-Sep-2014, at 7:15 pm, Andrew Cobley  wrote:
> 
> Hi All,
> 
> Just come across this one, I’m at a bit of a loss on how to fix it.
> 
> A user here did the following steps
> 
> On a MAC
> Install Datastax Enterprise (DSE) using the dmg file
> test he can connect using the DSE cqlsh window
> Unistall DSE (full uninstall which stops the services)
> 
> download apache cassandra 2.1.0
> unzip
> change to the non directory run sudo ./cassandra
> 
> Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin   he 
> gets
> 
> Connection error: ('Unable to connect to any servers', {'127.0.0.1': 
> ConnectionShutdown('Connection  (closed)> is already closed',)})
> 
> This is probably related to 
> http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E
> 
> but I can’t see why the uninstall of DSE is leaving the apache cassandra 
> release cqlsh unable to attach to the apache cassandra runtime.
> 
> Ta
> Andy
> 
> 
> 
> The University of Dundee is a registered Scottish Charity, No: SC015096


Re: DSE install interfering with apache Cassandra 2.1.0

2014-09-29 Thread Andrew Cobley
Without the apache cassandra running I ran jps -l on this machine ,the only 
result was

338 sun.tool.jps.Jps

The Mac didn’t like the netstat command so I ran

netstat -atp tcp |  grep 9160

no result

Also  for the native port:

netstat-atp tcp | grep 9042

gave no result (command may be wrong)

So I ran port scan using the network utility (between 0 and 1).  Results as 
shown:


Port Scan has started…

Port Scanning host: 127.0.0.1

 Open TCP Port: 631ipp
Port Scan has completed…


Hope this helps.

Andy


On 29 Sep 2014, at 15:09, Sumod Pawgi 
mailto:spa...@gmail.com>> wrote:

Please run jps to check which Java services are still running and to make sure 
if c* is running. Then please check if 9160 port is in use. netstat -nltp | 
grep 9160

This will confirm what is happening in your case.

Sent from my iPhone

On 29-Sep-2014, at 7:15 pm, Andrew Cobley 
mailto:a.e.cob...@dundee.ac.uk>> wrote:

Hi All,

Just come across this one, I’m at a bit of a loss on how to fix it.

A user here did the following steps

On a MAC
Install Datastax Enterprise (DSE) using the dmg file
test he can connect using the DSE cqlsh window
Unistall DSE (full uninstall which stops the services)

download apache cassandra 2.1.0
unzip
change to the non directory run sudo ./cassandra

Now when he tries to connect using cqlsh from apache cassandra 2.1.0 bin   he 
gets

Connection error: ('Unable to connect to any servers', {'127.0.0.1': 
ConnectionShutdown('Connection  is already closed',)})

This is probably related to
http://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201409.mbox/%3CCALHCZd7RGSahJUbK32WoTr9JRoA+4K=mrfocmxuk0nbzoqq...@mail.gmail.com%3E

but I can’t see why the uninstall of DSE is leaving the apache cassandra 
release cqlsh unable to attach to the apache cassandra runtime.

Ta
Andy



The University of Dundee is a registered Scottish Charity, No: SC015096


The University of Dundee is a registered Scottish Charity, No: SC015096


Cassandra throwing java exceptions for nodetool repair on indexed tables

2014-09-29 Thread Jeronimo de A. Barros
Hi All,

We're running 2 cassandra 2.1 clusters (development and production) and
whenever I run a nodetool repair on indexed tables I get an java exception
about creating snapshots:

Command line:

[2014-09-29 11:25:24,945] Repair session
73c0d390-47e4-11e4-ba0f-c7788dc924ec for range
(-7298689860784559350,-7297558156602685286] failed with error
java.io.IOException: Failed during snapshot creation.
[2014-09-29 11:25:24,945] Repair command #5 finished

Cassandra log:

ERROR [Thread-49681] 2014-09-29 11:25:24,945 StorageService.java:2689 -
Repair session 73c0d390-47e4-11e4-ba0f-c7788dc924ec for range
(-7298689860784559350,-7297558156602685286] failed with error
java.io.IOException: Failed during snapshot creation.
java.util.concurrent.ExecutionException: java.lang.RuntimeException:
java.io.IOException: Failed during snapshot creation.
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
[na:1.7.0_67]
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
[na:1.7.0_67]
at
org.apache.cassandra.service.StorageService$4.runMayThrow(StorageService.java:2680)
~[apache-cassandra-2.1.0.jar:2.1.0]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
[apache-cassandra-2.1.0.jar:2.1.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[na:1.7.0_67]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_67]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_67]
Caused by: java.lang.RuntimeException: java.io.IOException: Failed during
snapshot creation.
at com.google.common.base.Throwables.propagate(Throwables.java:160)
~[guava-16.0.jar:na]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
[apache-cassandra-2.1.0.jar:2.1.0]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
[na:1.7.0_67]
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
[na:1.7.0_67]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
~[na:1.7.0_67]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
~[na:1.7.0_67]
... 1 common frames omitted
Caused by: java.io.IOException: Failed during snapshot creation.
at
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:344)
~[apache-cassandra-2.1.0.jar:2.1.0]
at
org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:128)
~[apache-cassandra-2.1.0.jar:2.1.0]
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
~[guava-16.0.jar:na]
... 3 common frames omitted

If I drop the index, the repair returns no error:

cqlsh:test> drop INDEX user_pass_idx ;

root@test:~# nodetool repair test user
[2014-09-29 11:27:29,668] Starting repair command #6, repairing 743 ranges
for keyspace test (seq=true, full=true)
.
.
[2014-09-29 11:28:38,030] Repair session
e6d40e10-47e4-11e4-ba0f-c7788dc924ec for range
(-7298689860784559350,-7297558156602685286] finished
[2014-09-29 11:28:38,030] Repair command #6 finished

The test table:

CREATE TABLE test.user (
login text PRIMARY KEY,
password text
)
create INDEX user_pass_idx on test.user (password) ;

Am I doing anything wrong ? Or is this a bug ? I searched but I couldn't
find any reference about this error.

Thanks in advance for any help.

Jero


Re: Cassandra throwing java exceptions for nodetool repair on indexed tables

2014-09-29 Thread Robert Coli
On Mon, Sep 29, 2014 at 8:35 AM, Jeronimo de A. Barros <
jeronimo.bar...@gmail.com> wrote:

> We're running 2 cassandra 2.1 clusters (development and production) and
> whenever I run a nodetool repair on indexed tables I get an java exception
> about creating snapshots:
>

Don't run 2.1 in production yet if you don't want to deal with bugs like
this in production.

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

Am I doing anything wrong ? Or is this a bug ? I searched but I couldn't
> find any reference about this error.
>

I would file this as a bug in the Cassandra JIRA, especially as it relates
to a just released version of the software and seems reproducable.

If you do file a JIRA, please let the list know what the URL is.

=Rob


Re: Repair taking long time

2014-09-29 Thread Robert Coli
On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux 
wrote:

>  I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and
> 4 in another.
>
>
>
> Running a repair on a large column family seems to be moving much slower
> than I expect.
>

Unfortunately, as others have mentioned, the slowness/broken-ness of repair
is a long running (groan!) issue and therefore currently expected.

At this time, I do not recommend upgrading to 2.1 in production to attempt
to fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.

Once can increase gc_grace_seconds to 34 days [1] and repair once a month,
which should help make repair slightly more tractable.

For now you should probably evaluate which of your column families you
*absolutely must* repair (because you do DELETE like operations in them,
etc.) and only repair those.

As an aside, you "just lose" with vnodes and clusters of the size. I
presume you plan to grow over appx 9 nodes per DC, in which case you
probably do want vnodes enabled.

One note :

>  Looking at nodetool compaction stats it indicates the Validation phase
> is running that the total bytes is 4.5T (4505336278756).


This is the uncompressed size, I'm betting your actual on disk size is
closer to 2T? Even though 2.0 has improved performance for nodes with lots
of data, 2T per node is still relatively "fat" for a Cassandra node.


=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-5850


Re: simple map / table scans without hadoop?

2014-09-29 Thread Robert Coli
On Fri, Sep 26, 2014 at 9:08 PM, Kevin Burton  wrote:

> I have the requirements to periodically run full tables scans on our
> data.  It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer
> to do it in Java because I need something mildly trivial.
>

http://wiki.apache.org/cassandra/FAQ#iter_world

?

=Rob


Re: Apache Cassandra 2.1.0 : cassandra-stress performance discrepancy between SSD and SATA drive

2014-09-29 Thread Shing Hing Man
I have run a sysbench  file io test on my home PC and office PC. The result is  
given below. The result shows my office PC (with a SSD) is about 3 times more 
performant than my home PC (with a sata hard disk).

Home PC :

gauss:~> sysbench --test=fileio --file-total-size=50G prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

128 files, 409600Kb each, 51200Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
.
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
53687091200 bytes written in 626.30 seconds (81.75 MB/sec).
matmsh@gauss:~> sysbench --test=fileio --file-total-size=50G 
--file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored


Extra file open flags: 0
128 files, 400Mb each
50Gb total file size
Block size 16Kb
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!

Operations performed:  14521 reads, 9680 writes, 30976 Other = 55177 Total
Read 226.89Mb  Written 151.25Mb  Total transferred 378.14Mb  (1.2605Mb/sec)
   80.67 Requests/sec executed

General statistics:
total time:  300.0030s
total number of events:  24201
total time taken by event execution: 186.0749s
response time:
 min:  0.00ms
 avg:  7.69ms
 max:132.43ms
 approx.  95 percentile:  19.57ms

Threads fairness:
events (avg/stddev):   24201./0.00
execution time (avg/stddev):   186.0749/0.00

gauss:~> 
===
Office PC :
shing@cauchy:~> sysbench --test=fileio --file-total-size=50G prepare
sysbench 0.5:  multi-threaded system evaluation benchmark

128 files, 409600Kb each, 51200Mb total
Creating files for the test...
Extra file open flags: 0
Creating file test_file.0
Creating file test_file.1
Creating file test_file.2
Creating file test_file.3
...Creating file test_file.122
Creating file test_file.123
Creating file test_file.124
Creating file test_file.125
Creating file test_file.126
Creating file test_file.127
53687091200 bytes written in 175.55 seconds (291.66 MB/sec).
cauchy:~> sysbench --test=fileio --file-total-size=50G --file-test-mode=rndrw 
--init-rng=on --max-time=300 --max-requests=0 run
sysbench 0.5:  multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 1
Random number generator seed is 0 and will be ignored

Extra file open flags: 0
128 files, 400Mb each
50Gb total file size
Block size 16Kb
Number of IO requests: 0
Read/Write ratio for combined random IO test: 1.50
Periodic FSYNC enabled, calling fsync() each 100 requests.
Calling fsync() at the end of test, Enabled.
Using synchronous I/O mode
Doing random r/w test
Threads started!

Operations performed:  43020 reads, 28680 writes, 91723 Other = 163423 Total
Read 672.19Mb  Written 448.12Mb  Total transferred 1.0941Gb  (3.7344Mb/sec)
  239.00 Requests/sec executed

General statistics:
total time:  300.0007s
total number of events:  71700
total time taken by event execution: 7.5550s
response time:
 min:  0.00ms
 avg:  0.11ms
 max: 12.89ms
 approx.  95 percentile:   0.22ms

Threads fairness:
events (avg/stddev):   71700./0.00
execution time (avg/stddev):   7.5550/0.00
===



Shing



On Saturday, 27 September 2014, 10:24, Shing Hing Man  wrote:
 


Hi Kevin,
   Thanks for the reply !
I do not know the exact brand of SSD in my office PC. But the SSD is  only 1 
year old,  and it is far from full. 

On both of office PC and home PC, I untared Apache Cassandra 2.1.0 and then 

run "cassandra -f " with the default config,   then

run cassandra-stress 

Both PCs  have Oracle Java 1.7.0_40.

I have noticed there are some parameters for SSD in cassandra.yaml, which I 
have adjusted, but with no improvement. 


It  puzzles me Cassandra on  my office PC, with far better hardware,  could be 
100% slower than my home PC. 



Shing







On Saturday, 27 September 2014, 5:12, Kevin Burton  wrote:
 


What SSD was it?  There are a lot of variability in terms of SSD performance.

1.  Is it a new vs old SSD?  Old SSDs can become slower if they’re really worn 
out

2.  was the office SSD near capacity holding other data?

3.  what models were they?

S

Re: Repair taking long time

2014-09-29 Thread Rahul Neelakantan
What is the recommendation on the number of tokens value? I am asking because 
of the issue with sequential repairs on token range after token range.

Rahul Neelakantan

> On Sep 29, 2014, at 2:29 PM, Robert Coli  wrote:
> 
>> On Fri, Sep 26, 2014 at 9:52 AM, Gene Robichaux  
>> wrote:
>> I am fairly new to Cassandra. We have a 9 node cluster, 5 in one DC and 4 in 
>> another.
>> 
>>  
>> 
>> Running a repair on a large column family seems to be moving much slower 
>> than I expect.
>> 
> 
> Unfortunately, as others have mentioned, the slowness/broken-ness of repair 
> is a long running (groan!) issue and therefore currently expected. 
> 
> At this time, I do not recommend upgrading to 2.1 in production to attempt to 
> fix it. I am also broadly skeptical that it as fixed in 2.1 as all that.
> 
> Once can increase gc_grace_seconds to 34 days [1] and repair once a month, 
> which should help make repair slightly more tractable.
> 
> For now you should probably evaluate which of your column families you 
> *absolutely must* repair (because you do DELETE like operations in them, 
> etc.) and only repair those.
> 
> As an aside, you "just lose" with vnodes and clusters of the size. I presume 
> you plan to grow over appx 9 nodes per DC, in which case you probably do want 
> vnodes enabled.
> 
> One note :
>>  Looking at nodetool compaction stats it indicates the Validation phase is 
>> running that the total bytes is 4.5T (4505336278756).
> 
> This is the uncompressed size, I'm betting your actual on disk size is closer 
> to 2T? Even though 2.0 has improved performance for nodes with lots of data, 
> 2T per node is still relatively "fat" for a Cassandra node.
> 
> 
> =Rob
> [1] https://issues.apache.org/jira/browse/CASSANDRA-5850


Re: Repair taking long time

2014-09-29 Thread Ken Hancock
On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli  wrote:

>
> As an aside, you "just lose" with vnodes and clusters of the size. I
> presume you plan to grow over appx 9 nodes per DC, in which case you
> probably do want vnodes enabled.
>

I typically only see discussion on vnodes vs. non-vnodes, but it seems to
me that might be more important to discuss the number of vnodes per node.
A small cluster having 256 vnodes/node is unwise given some of the
sequential operations that are still done.  Even if operations were done in
parallel, having a 256x increase in parallelization seems an equally bad
choice.

I've never seen any discussion on how many vnodes per node might be an
appropriate answer based a planned cluster size -- does such a thing exist?

Ken


Re: Cassandra throwing java exceptions for nodetool repair on indexed tables

2014-09-29 Thread Jeronimo de A. Barros
Hi again,

On Mon, Sep 29, 2014 at 3:16 PM, Robert Coli  wrote:

> Don't run 2.1 in production yet if you don't want to deal with bugs like
> this in production.
>

Well, I got the last "stable" cassandra... going back to 2.0 then.


> If you do file a JIRA, please let the list know what the URL is.
>

JIRA filled:  https://issues.apache.org/jira/browse/CASSANDRA-8020

Jero


Re: using dynamic cell names in CQL 3

2014-09-29 Thread Robert Coli
On Thu, Sep 25, 2014 at 6:13 AM, shahab  wrote:

> It seems that I was not clear in my question, I would like to store values
> in the column name, for example column.name would be event_name
> ("temperature") and column-content would be the respective value (e.g.
> 40.5) . And I need to know how the schema should look like in CQL 3
>

You cannot have dynamic column names, in the exact storage way you are
thinking of them, in CQL3.

You can have a simple E-A-V scheme which works more or less the same way.
It is less storage efficient, but you get the CQL interface. In the opinion
of the developers, this was an acceptable tradeoff. In most cases, it
probably is.

In other cases, I would recommend using thrift and actual dynamic
columns... except that I logically presume Thrift will be eventually be
deprecated. I am unable to recommend the use of a feature which I believe
will eventually be removed from the product.

=Rob


Re: A trigger that modifies the current Mutation

2014-09-29 Thread Robert Coli
On Sat, Sep 27, 2014 at 11:08 PM, Pinak Pani <
nishant.has.a.quest...@gmail.com> wrote:

> I wanted to create a trigger that alters the current mutation.
>

(ObMetaAside : Dear God... why?)

Triggers will probably not survive in their current form. If I was planning
to use them for anything, I would comprehensively avail myself of the state
of their development...

=Rob


Re: using dynamic cell names in CQL 3

2014-09-29 Thread Andrew Cobley
Isn’t the correct way to do this in CQL3 to use sets and user defined types (in 
C* 2.1) ?:

create type sensorreading( date timestamp, name text, value int);
CREATE TABLE sensordata (
name text,
data set>,
PRIMARY KEY (name)
);

insert into keyspace2.sensordata (name, data) values ('1234', {{date:'2012-10-2 
12:10',name:'temp',value:4}});
update sensordata set data = data+{{date:'2012-10-2 
12:10',name:'humidity',value:30}} where name='1234';
update sensordata set data = data+{{date:'2012-10-2 
12:12:30',name:'temp',value:5}} where name='1234';
update sensordata set data = data+{{date:'2012-10-2 
12:12:30',name:'humidity',value:31}} where name='1234';

select * from sensordata;


Perhaps not what you are after, but may be a start ?

Andy



On 29 Sep 2014, at 20:56, Robert Coli 
mailto:rc...@eventbrite.com>> wrote:

On Thu, Sep 25, 2014 at 6:13 AM, shahab 
mailto:shahab.mok...@gmail.com>> wrote:
It seems that I was not clear in my question, I would like to store values in 
the column name, for example column.name would be 
event_name ("temperature") and column-content would be the respective value 
(e.g. 40.5) . And I need to know how the schema should look like in CQL 3

You cannot have dynamic column names, in the exact storage way you are thinking 
of them, in CQL3.

You can have a simple E-A-V scheme which works more or less the same way. It is 
less storage efficient, but you get the CQL interface. In the opinion of the 
developers, this was an acceptable tradeoff. In most cases, it probably is.

In other cases, I would recommend using thrift and actual dynamic columns... 
except that I logically presume Thrift will be eventually be deprecated. I am 
unable to recommend the use of a feature which I believe will eventually be 
removed from the product.

=Rob


The University of Dundee is a registered Scottish Charity, No: SC015096


Not-Equals (!=) in Where Clause

2014-09-29 Thread Timmy Turner
Looking through the CQL 3.1 grammar for Cassandra 2.1, I noticed that the
not-equals operator (!=) is in the grammar definition, but I can't seem to
find any legal way to use it.

Is != supported as part of the where clause in Cassandra? Or is it the
grammar for some other purpose?


Re: Running out of disk at bootstrap in low-disk situation

2014-09-29 Thread Robert Coli
On Sat, Sep 20, 2014 at 12:11 AM, Erik Forsberg  wrote:

> I've added all the 15 nodes, with some time inbetween - definitely more
> than the 2-minute rule. But it seems like compaction is not keeping up with
> the incoming data. Or at least that's my theory.
>

I personally would not combine vnodes and trying to add more than one node
at a time, at this time. I understand that you have a lot of nodes to add,
but this is potentially confounding the situation.

I conjecture that you are using level compaction. There is in your version
a pathological behavior during bootstrap where one ends up doing a lot of
compaction. I *think*, but am not sure, that the workaround is to use size
tiered compaction during bootstrap. I *believe* that is what the patch
upstream effectively does.

Probably unthrottling compaction will help, assuming you are not CPU or i/o
bound there.

#cassandra on freenode is probably a slightly better forum for interactive
discusson of detailed operational questions about production environments.

=Rob


Re: unreadable partitions

2014-09-29 Thread Robert Coli
On Sun, Sep 28, 2014 at 3:45 AM, tommaso barbugli 
wrote:

> I see some data stored in Cassandra (2.0.7) being not readable from CQL;
> this affects entire partitions, querying this partitions raise a Java
> exception:
>

If the SSTable is not corrupt but is not readable via CQL and generates an
exception, that sounds like a bug to me.

Were I you, I would :

0) look for an existing JIRA
1) file a JIRA on http://issues.apache.org
2) reply to this thread with the URL of that JIRA for future googlers

=Rob


Re: Node Joining, Not Streaming

2014-09-29 Thread Robert Coli
On Wed, Sep 24, 2014 at 11:01 AM, Gene Robichaux 
wrote:

>  I just added two nodes, one in DC-A and one in DC-B.
>
>
>
> The node in DC-A started and immediately started to stream files from its
> piers. The node in DC-B has been in the JOINING state for nearly 24 hours
> and I have not seen any streams started.
>

Adding more than one node at a time is not really supported, and you can
end up in bad cases.

Future versions of Cassandra will Strongly Discourage you from doing this.

https://issues.apache.org/jira/browse/CASSANDRA-7069

If I were you, I would :

1) stop the DC-B node's bootstrap by stopping it and wiping its partially
bootstrapped state
2) wait for DC-A to finish bootstrapping
3) re-bootstrap DC-B node.

=Rob
http://twitter.com/rcolidba


Re: Is there harm from having all the nodes in the seed list?

2014-09-29 Thread Robert Coli
On Tue, Sep 23, 2014 at 10:31 AM, Donald Smith <
donald.sm...@audiencescience.com> wrote:

>  Is there any harm from having all the nodes listed in the seeds list in
> cassandra.yaml?
>

Yes, seed nodes cannot bootstrap.

https://issues.apache.org/jira/browse/CASSANDRA-5836

See comments there for details on how this actually doesn't make any sense.

The "correct" solution is almost certainly to have a dynamic seed provider,
which is why DSE and Priam both do that. But in practice it mostly doesn't
matter except in the annoying yet common CASSANDRA-5836 case.

=Rob


Re: Reading SSTables Potential File Descriptor Leak 1.2.18

2014-09-29 Thread Robert Coli
On Tue, Sep 23, 2014 at 5:47 PM, Tim Heckman  wrote:

> As best I could tell, the majority of the file descriptors open were for a
> single SSTable '.db' file. Looking in the error logs I found quite a few
> exceptions that looked to have been identical:
>
...

> Before opening a JIRA ticket I thought I'd reach out to the list to see if
> anyone has seen any similar behavior as well as do a bit of source-diving
> to try and verify that the descriptor is actually leaking.
>

I would (search for, and failing to find one..) open a JIRA, and let the
list know its URL.

=Rob


timeout for port 7000 on stateful firewall? streaming_socket_timeout_in_ms?

2014-09-29 Thread Donald Smith
We have a stateful firewall 
between data centers for port 7000 (inter-cluster). How long should the idle 
timeout be for the connections on the firewall?

Similarly what's appropriate for streaming_socket_timeout_in_ms in 
cassandra.yaml?  The default is 0 (no timeout).  I presume that 
streaming_socket_timeout_in_ms refers to streams such as for bootstrapping and 
rebuilding.

Thanks

Donald A. Smith | Senior Software Engineer
P: 425.201.3900 x 3866
C: (206) 819-5965
F: (646) 443-2333
dona...@audiencescience.com

[AudienceScience]



Re: Indexes Fragmentation

2014-09-29 Thread Robert Coli
On Sun, Sep 28, 2014 at 9:49 AM, Arthur Zubarev 
wrote:
>
> There are 200+ times more updates and 50x inserts than analytical loads.
> In Cassandra to just be able to query (in CQL) on a column I have to have
> an index, the question is what tall the fragmentation coming from the
> frequent updates and inserts has on a CF? Do I also need to manually
> defrug?
>

You have appeared to have just asked if maintaing indexes which have a high
rate of change in a log structured database with immutable data files is
likely to be more performant than maintaining them in a database with
modify-in-place semantics.

"No."

=Rob


best practice for waiting for schema changes to propagate

2014-09-29 Thread Clint Kelly
Hi all,

I often have problems with code that I write that uses the DataStax Java
driver to create / modify a keyspace or table and then soon after reads the
metadata for the keyspace to verify that whatever changes I made the
keyspace or table are complete.

As an example, I may create a table called `myTableName` and then very soon
after do something like:

assert(session
  .getCluster()
  .getMetaData()
  .getKeyspace(myKeyspaceName)
  .getTable(myTableName) != null)

I assume this fails sometimes because the default round-robin load
balancing policy for the Java driver will send my create-table request to
one node and the metadata read to another, and because it takes some time
for the table creation to propagate across all of the nodes in my cluster.

What is the best way to deal with this problem?  Is there a standard way to
wait for schema changes to propagate?

Best regards,
Clint


Re: Casssandra cluster setup.

2014-09-29 Thread Robert Coli
On Mon, Sep 22, 2014 at 6:32 AM, Muthu Kumar  wrote:

> > I  am trying to configure a Cassandra cluster with two nodes. I am new
> to Cassandra.
> > I am using datastax distribution of Cassandra ( windows). I have
> installed the same in two nodes and configured it  works as a separate
> instance but not as cluster.
>
As a general statement, help with first time installations of Cassandra are
probably best handled interactively on #cassandra on freenode.

Posting such a debugging issue to a mailing list carries meaningful risk of
Warnocking. [1]

=Rob
 [1] http://en.wikipedia.org/wiki/Warnock's_dilemma


Re: Saving file content to ByteBuffer and to column does not retrieve the same size of data

2014-09-29 Thread Robert Coli
On Mon, Sep 22, 2014 at 3:50 AM, Carlos Scheidecker 
wrote:

> I can successfully read a file to a ByteBuffer and then write to a
> Cassandra blob column. However, when I retrieve the value of the column,
> the size of the ByteBuffer retrieved is bigger than the original ByteBuffer
> where the file was read from. Writing to the disk, corrupts the image.
>

Probably don't write binary blobs like images into a database, use a
distributed filesystem?

https://github.com/mogilefs/

But I agree that this behavior sounds like a bug, I would probably file it
as a JIRA on http://issues.apache.org and then tell the list the URL of the
JIRA you filed.

=Rob


Re: Repair taking long time

2014-09-29 Thread Ben Bromhead
use https://github.com/BrianGallew/cassandra_range_repair



On 30 September 2014 05:24, Ken Hancock  wrote:

>
> On Mon, Sep 29, 2014 at 2:29 PM, Robert Coli  wrote:
>
>>
>> As an aside, you "just lose" with vnodes and clusters of the size. I
>> presume you plan to grow over appx 9 nodes per DC, in which case you
>> probably do want vnodes enabled.
>>
>
> I typically only see discussion on vnodes vs. non-vnodes, but it seems to
> me that might be more important to discuss the number of vnodes per node.
> A small cluster having 256 vnodes/node is unwise given some of the
> sequential operations that are still done.  Even if operations were done in
> parallel, having a 256x increase in parallelization seems an equally bad
> choice.
>
> I've never seen any discussion on how many vnodes per node might be an
> appropriate answer based a planned cluster size -- does such a thing exist?
>
> Ken
>
>
>
>
>


-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
 | +61 415 936 359


Re: best practice for waiting for schema changes to propagate

2014-09-29 Thread Ben Bromhead
The system.peers table which is a copy of some gossip info the node has
stored, including the schema version. You should query this and wait until
all schema versions have converged.

http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_sys_tab_cluster_t.html

http://www.datastax.com/dev/blog/the-data-dictionary-in-cassandra-1-2

As ensuring that the driver keeps talking to the node you made the schema
change on I would ask the drivers specific mailing list / IRC:


   - MAILING LIST:
   https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user
   - IRC: #datastax-drivers on irc.freenode.net 



On 30 September 2014 10:16, Clint Kelly  wrote:

> Hi all,
>
> I often have problems with code that I write that uses the DataStax Java
> driver to create / modify a keyspace or table and then soon after reads the
> metadata for the keyspace to verify that whatever changes I made the
> keyspace or table are complete.
>
> As an example, I may create a table called `myTableName` and then very
> soon after do something like:
>
> assert(session
>   .getCluster()
>   .getMetaData()
>   .getKeyspace(myKeyspaceName)
>   .getTable(myTableName) != null)
>
> I assume this fails sometimes because the default round-robin load
> balancing policy for the Java driver will send my create-table request to
> one node and the metadata read to another, and because it takes some time
> for the table creation to propagate across all of the nodes in my cluster.
>
> What is the best way to deal with this problem?  Is there a standard way
> to wait for schema changes to propagate?
>
> Best regards,
> Clint
>



-- 

Ben Bromhead

Instaclustr | www.instaclustr.com | @instaclustr
 | +61 415 936 359