Re: DataModelling to query date range

2016-03-24 Thread Chris Martin
Hi Vidur, I had a go at your solution but the problem is that it doesn't match routes which are valid all throughtout the range queried. For example if I have route that is valid for all of Jan 2016. I will have a table that looks something like this: start | end| vali

Re: DataModelling to query date range

2016-03-24 Thread Chris Martin
Ah- that looks interesting! I'm actaully still on cassandra 2.x but I was planning on updgrading anyway. Once I do so I'll check this one out. Chris On Thu, Mar 24, 2016 at 2:57 AM, Henry M wrote: > I haven't tried the new SASI indexer but it may help: > https://github.com/apache/cassandra/

Re: Large number of tombstones without delete or update

2016-03-24 Thread Ralf Steppacher
I can confirm that if I send JSON messages that always cover all schema fields the tombstone issue is not reported by Cassandra. So, is there a way to work around this issue other than to always populate every column of the schema with every insert? That would be a pain in the backside, really.

RE: DataModelling to query date range

2016-03-24 Thread Peer, Oded
You can change the table to support Multi-column slice restrictions CREATE TABLE routes ( start text, end text, year int, month int, day int, PRIMARY KEY (start, end, year, month, day) ); Then using Multi-column slice restrictions you can query: SELECT * from routes where start = 'New York' and

Re: Large number of tombstones without delete or update

2016-03-24 Thread Ralf Steppacher
I did some more tests with my particular schema/message structure: A null text field inside a UDT instance does NOT yield tombstones. A null map does NOT yield tombstones. A null text field does yield tombstones. Ralf > On 24.03.2016, at 09:42, Ralf Steppacher wrote: > > I can confirm that if

RE: Large number of tombstones without delete or update

2016-03-24 Thread Peer, Oded
http://www.datastax.com/dev/blog/datastax-java-driver-3-0-0-released#unset-values "For Protocol V3 or below, all variables in a statement must be bound. With Protocol V4, variables can be left "unset", in which case they will be ignored server-side (no tombstones will be generated)." From: Ral

Re: Large number of tombstones without delete or update

2016-03-24 Thread Ralf Steppacher
How does this improvement apply to inserting JSON? The prepared statement has exactly one parameter and it is always bound to the JSON message: INSERT INTO event_by_patient_timestamp JSON ? How would I “unset” a field inside the JSON message written to the event_by_patient_timestamp table? Ra

Re: Large number of tombstones without delete or update

2016-03-24 Thread Jean Tremblay
Ralf, Are you using protocol V4? How do you measure if a tombstone was generated? Thanks Jean On 24 Mar 2016, at 10:35 , Ralf Steppacher mailto:ralf.viva...@gmail.com>> wrote: How does this improvement apply to inserting JSON? The prepared statement has exactly one parameter and it is always

Re: Large number of tombstones without delete or update

2016-03-24 Thread Ralf Steppacher
Jean, yes, I am using the native protocol v4 (auto-negotiated between java driver 3.0.0 and C* 2.2.4, verified by logging cluster.getConfiguration().getProtocolOptions().getProtocolVersion() ). My first approach for testing for tombstones was “brute force”. Add records and soon enough (after a

RE: Large number of tombstones without delete or update

2016-03-24 Thread Peer, Oded
You are right, I missed the JSON part. According to the docs “Columns which are omitted from the JSON value map are treated as a null insert (which results in an existing value being deleted, if one is present).” So “unset

Re: Large number of tombstones without delete or update

2016-03-24 Thread Jean Tremblay
Ralf, Thank YOU very much Ralf. You are the first one who could finally shed some light on something I observed, but I could not put my finger on what exactly is causing my Tombstones. I cannot judge your method for evaluating the amount of tombstone. It seems valid to me. Jean On 24 Mar 2016,

Re: Large number of tombstones without delete or update

2016-03-24 Thread Ralf Steppacher
Done: https://issues.apache.org/jira/browse/CASSANDRA-11424 Thanks! Ralf > On 24.03.2016, at 11:17, Peer, Oded wrote: > > You are right, I missed the JSON part. > According to the docs >

StatusLogger output

2016-03-24 Thread Vasileios Vlachos
Hello, Environment: - Cassandra 2.0.17, 8 nodes, 4 per DC - Ubuntu 12.04, 6-Cores, 16GB of RAM (we use VMWare) Every node seems to be dropping messages (anywhere from 10 to 300) twice a day. I don't know it this has always been the case, but has definitely been going for the past month or so. Whe

Re: StatusLogger output

2016-03-24 Thread Vasileios Vlachos
Just to clarify, I can see line 29 which seems to explain the format (first number ops, second is data), however I don't know they actually mean. Thanks, Vasilis On Thu, Mar 24, 2016 at 11:45 AM, Vasileios Vlachos < vasileiosvlac...@gmail.com> wrote: > Hello, > > Environment: > - Cassandra 2.0.1

Re: DataModelling to query date range

2016-03-24 Thread Vidur Malik
Hi Chris, I had something slightly different in mind. You would treat it as time series data, and have one record for each of the days the route was valid. In your case: start | end| valid New York Washington 2016-01-01 New York Washington 2016-01-02 New York

Is this type of counter table definition valid?

2016-03-24 Thread K. Lawson
I want to create a table with wide partitions (or, put another way, a table which has no value columns (non primary key columns)) that enables the number of rows in any of its partitions to be efficiently procured. Here is a simple definition of such a table CREATE TABLE IF NOT EXISTS test_table

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Jack Krupansky
What CL do you read and write with? Normally, RF=2 is not recommended since it doesn't give you HA within a data center - there is no way to achieve quorum in the data center if a node goes down. I suppose you can achieve a quorum if your request is spread across all three data centers, but norma

Re: Query regarding CassandraJavaRDD while running spark job on cassandra

2016-03-24 Thread Kai Wang
I suggest you post this to spark-cassandra-connector list. On Sat, Mar 12, 2016 at 12:52 AM, Siddharth Verma < verma.siddha...@snapdeal.com> wrote: > In cassandra I have a table with the following schema. > > CREATE TABLE my_keyspace.my_table1 ( > col_1 text, > col_2 text, > col_3 tex

Re: Is this type of counter table definition valid?

2016-03-24 Thread DuyHai Doan
Just tested against C* 3.4 CREATE TABLE IF NOT EXISTS test_table ( part timestamp, clust timestamp, count counter static, PRIMARY KEY(part, clust)); and it just works. "However, I'm not sure how that is possible, given that the updates to partitionRowCountCol would require use

Client drivers

2016-03-24 Thread Rakesh Kumar
Is it possible to install multiple versions of language drivers on the client machines. This will be typically useful during an upgrade process, where by fallback to the old version can be easy. thanks.

RE: StatusLogger output

2016-03-24 Thread SEAN_R_DURITY
I am not sure the status logger output helps determine the problem. However, the dropped mutations and the status logger output is what I see when there is too high of a load on one or more Cassandra nodes. It could be long GC pauses, something reading too much data (a large row or a multi-parti

Re: Client drivers

2016-03-24 Thread Jonathan Haddad
Every language has a different means of working with dependencies. Some are compiled in (java, c), some are pulled in via libraries (python). You'll have to be more specific. On Thu, Mar 24, 2016 at 8:14 AM Rakesh Kumar wrote: > Is it possible to install multiple versions of language drivers on

Re: Client drivers

2016-03-24 Thread Rakesh Kumar
> Every language has a different means of working with dependencies. Some are > compiled in (java, c), some are pulled in via libraries (python). You'll > have to be more specific. I am interested mainly in C++ and Java. Thanks.

Re: StatusLogger output

2016-03-24 Thread Vasileios Vlachos
Thanks for your help Sean, The reason StatusLogger messages appear in the logs is usually, as you said, a GC pause (ParNew or CMS, I have seen both), or dropped messages. In our case dropped messages are always (so far) due to internal timeouts, not due to cross node timeouts (like the sample outp

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Dikang Gu
@Jack, we write to 2 and read from 1. I do not understand why RF=2 matters here, will it have impact on the repair? Can you please explain more? I select RF=2 in each region, because: 1. all 2 writes will be sent to local region, so we do not need to wait for the response across region. 2. if one

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Robert Coli
On Thu, Mar 24, 2016 at 7:17 AM, Jack Krupansky wrote: > Can you advise us on your thinking when you selected RF=2? > I figure he was probably thinking "I want to operate in a bunch of different regions and don't need to use QUORUM for my use cases, and want to save money by not storing 3 copies

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Aleksey Yeschenko
After repair is over, does the value settle? What CLs do you write to your counters with? What CLs are you reading with? --  AY On 24 March 2016 at 06:17:27, Dikang Gu (dikan...@gmail.com) wrote: Hello there, We are experimenting Counters in Cassandra 2.2.5. Our setup is that we have 6 nod

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Dikang Gu
@Aleksey, we are writing to cluster with CL = 2, and reading with CL = 1. And overall we have 6 copies across 3 different regions. Do you have comments about our setup? During the repair, the counter value become inaccurate, we are still playing with the repair, will keep you update with more expe

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Aleksey Yeschenko
Best open a JIRA ticket and I’ll have a look at what could be the reason. --  AY On 24 March 2016 at 23:20:55, Dikang Gu (dikan...@gmail.com) wrote: @Aleksey, we are writing to cluster with CL = 2, and reading with CL = 1. And overall we have 6 copies across 3 different regions. Do you have

Data export with consistency problem

2016-03-24 Thread xutom
Hi all, I have a C* cluster with five nodes and my cassandra version is 2.1.1 and we also enable "Hinted Handoff" . Everything is fine while we use C* cluster to store up to 10 billion rows of data. But now we have a problem. During our test, after we import up to 40 billion rows of data int

datastax java driver Batch vs BatchStatement

2016-03-24 Thread Jimmy Lin
Hi all, What is the difference between datastax driver Batch and BatchStatement? In particular, BatchStatment call out that it needs native protocol of version 2 or above. What is the advantage using native protocol 2.0 for batch execution? Will any of these two api smart enough to split a big b

Re: Counter values become under-counted when running repair.

2016-03-24 Thread Dikang Gu
@Aleksey, sure, here is the jira: https://issues.apache.org/jira/browse/CASSANDRA-11432 Thanks! On Thu, Mar 24, 2016 at 5:32 PM, Aleksey Yeschenko wrote: > Best open a JIRA ticket and I’ll have a look at what could be the reason. > > -- > AY > > On 24 March 2016 at 23:20:55, Dikang Gu (dikan...