read huge data from CSV and write into Cassandra

2014-07-25 Thread Akshay Ballarpure
How to read data from large CSV file which is having 100+ columns and 
millions of rows and inserting into Cassandra every 1 minute.

Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting

=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




How to get rid of stale info in gossip

2014-07-25 Thread Rahul Neelakantan
Is there a way to get rid of stale information that shows up for removed/dead 
nodes in gossip, without a complete cluster bounce?

Rahul Neelakantan



Re: How to get rid of stale info in gossip

2014-07-25 Thread Mark Reddy
After removing a node, it's information can persist in the Gossiper for up
to 3 days, after which time it should be removed.

Are you having issues with a removed node state persisting for longer?


Mark


On Fri, Jul 25, 2014 at 11:33 AM, Rahul Neelakantan  wrote:

> Is there a way to get rid of stale information that shows up for
> removed/dead nodes in gossip, without a complete cluster bounce?
>
> Rahul Neelakantan
>
>


Re: Hot, large row

2014-07-25 Thread Keith Wright
Answers to your questions below but in the end I believe the root issue here is 
that LCS is clearly not compacting away as it should resulting in reads across 
many SSTables which as you noted is “fishy”.   I’m considering filing a JIRA 
for this, sound reasonable?

We are running OOTB JMV tuning (see below) and using the datastax client.  When 
we read from the table in question, we put a limit of 5000 to help reduce the 
read volume but yes the standard scenario is:  “select * from 
global_user_event_skus_v2 where user_id = ? limit 5000”

If you recall, there are 2 issues occurring.  Turning on client side paging 
would just mask these IMO:

 1.  Periodically seeing one node stuck in CMS GC causing high read latency.  
Seems to recover on its own after an hour or so
 2.  Generally bad read/write times for some LCS tables that have multiple 
updates over time

Thanks!


-XX:+CMSClassUnloadingEnabled -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms8018M -Xmx8018M -Xmn2004M 
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseTLAB -XX:+UseCondCardMark -XX:+PrintGCDetails -XX:+PrintGCDateStamps 
-XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime 
-XX:+PrintGCApplicationConcurrentTime -Xloggc:/var/log/cassandra/gc.log 
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M

From: DuyHai Doan mailto:doanduy...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 2:53 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Hot, large row


What are your jvm  settings?

Your read pattern implies that you may fetch lots of data into memory (reading 
all skus for a given user), maybe it stressed too much the jvm.

Did you use native paging of Java Driver to avoid loading all columns at a time?

And the loading all skus for one user, is it a rare scenario or is it your main 
use case for this colum family?

Le 25 juil. 2014 04:11, "Keith Wright" 
mailto:kwri...@nanigans.com>> a écrit :
One last item to add to this thread:  we have consistently experienced this 
behavior where over time performance degrades (previously we were unable to 
bootstrap nodes to due long GC pauses from existing nodes).  I believe its due 
to tombstone build up (as I mentioned previously one of the tables mentioned is 
showing a droppable tombstone ratio of > 30%).   The sku table is used to hold 
SKUs that recent users viewed/purchased.  When we write a new SKU we set the 
TTL to 30 days where the row key is the user id; our read case is to fetch ALL 
skus the user has seen within the TTL.  Since the user sees SKUs consistently 
over time, this can result in a row with many columns much of which are likely 
tombstoned (see CASSANDRA-6654 which I filed for this which shows that C* does 
not handle this case well).

I guess I’m just surprised that others aren’t using C* for similar usage cases 
and thus having the same issue?

 I am hoping to upgrade to 2.0.9 which has improvements to remove tombstones.

From: Keith Wright mailto:kwri...@nanigans.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Thursday, July 24, 2014 at 4:50 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Cc: Don Jackson mailto:djack...@nanigans.com>>
Subject: Re: Hot, large row

When a node is showing the high CMS issue, io is actually low likely due to the 
fact that none is going on during CMS GC.  On a node not showing the issue, 
iostat shows disk usage around 50% (these are SSD) and load hovers around 10 
for a dual octo core machine this is fine.

In addition, nodetool compactionstats does not show that we are falling behind 
in compactions.

So not sure what is going on here.  We are running CentOS 6.5 with java 
1.7.0_51.  It does seem like things are getting worse and I’m considering 
dropping and rebuilding all the tables (as I have the data in Hadoop).  This 
seems to be a repeated problem for us with Cassandra and now that Aerospike has 
an open source version, we are very much considering switching.

Thanks again for the help and any insight you might have!


avg-cpu:  %user   %nice %system %iowait  %steal   %idle

  23.43   12.40   11.406.200.00   46.57


Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz 
avgqu-sz   await  svctm  %util

sda   0.00 0.00  720.500.00 5.51 0.0015.66 
1.051.54   0.57  41.35

sdc   0.00 0.00 5930.50 2100.5055.42 8.2116.23 
2.300.29   0.06  51.25

sdb   0.00 0.00 6951.50

Re: Hot, large row

2014-07-25 Thread Duncan Sands

Hi Keith,

On 25/07/14 14:43, Keith Wright wrote:

Answers to your questions below but in the end I believe the root issue here is
that LCS is clearly not compacting away as it should resulting in reads across
many SSTables which as you noted is “fishy”.   I’m considering filing a JIRA for
this, sound reasonable?

We are running OOTB JMV tuning (see below) and using the datastax client.  When
we read from the table in question, we put a limit of 5000 to help reduce the
read volume but yes the standard scenario is:  “select * from
global_user_event_skus_v2 where user_id = ? limit 5000”


does reducing the limit, eg to 500, help?  I've had similar sounding problems 
when many clients were doing wide row reads in parallel, with the reads 
returning thousands of rows.


Ciao, Duncan.


Re: How to get rid of stale info in gossip

2014-07-25 Thread Rahul Neelakantan
Yes, and this is a really old version of casandra 1.0.8.


Rahul Neelakantan
678-451-4545

> On Jul 25, 2014, at 7:29 AM, Mark Reddy  wrote:
> 
> After removing a node, it's information can persist in the Gossiper for up to 
> 3 days, after which time it should be removed. 
> 
> Are you having issues with a removed node state persisting for longer?
> 
> 
> Mark
> 
> 
>> On Fri, Jul 25, 2014 at 11:33 AM, Rahul Neelakantan  wrote:
>> Is there a way to get rid of stale information that shows up for 
>> removed/dead nodes in gossip, without a complete cluster bounce?
>> 
>> Rahul Neelakantan
> 


Re: Hot, large row

2014-07-25 Thread DuyHai Doan
Hello Keith


   1. Periodically seeing one node stuck in CMS GC causing high read
   latency.  Seems to recover on its own after an hour or so

How many nodes do you have ? And how many distinct user_id roughtly is
there ?

Looking at your jvm settings it seems that you have the GC log enabled. It
worths having a look into it. And also grep for the pattern "GC for" in the
Casssandra system.log file

The symptom you mention looks like there are activity bursts on one
particular node. The rows are not so wide since the largest has only 61k of
cells and C* can deal with rows larger than that. It all depends now on
your data access pattern.

Also Jack Krupansky question is interesting. Even though you limit a
request to 5000, if each cell is a big blob or block of text, it mays add
up a lot into JVM heap ...

Did you try to do a select without limit and use paging feature of Java
Driver. Or lower the limit in the select to 500 as Duncan said and paginate
manually

Hope that helps

Duy Hai



On Fri, Jul 25, 2014 at 3:10 PM, Duncan Sands 
wrote:

> Hi Keith,
>
>
> On 25/07/14 14:43, Keith Wright wrote:
>
>> Answers to your questions below but in the end I believe the root issue
>> here is
>> that LCS is clearly not compacting away as it should resulting in reads
>> across
>> many SSTables which as you noted is “fishy”.   I’m considering filing a
>> JIRA for
>> this, sound reasonable?
>>
>> We are running OOTB JMV tuning (see below) and using the datastax client.
>>  When
>> we read from the table in question, we put a limit of 5000 to help reduce
>> the
>> read volume but yes the standard scenario is:  “select * from
>> global_user_event_skus_v2 where user_id = ? limit 5000”
>>
>
> does reducing the limit, eg to 500, help?  I've had similar sounding
> problems when many clients were doing wide row reads in parallel, with the
> reads returning thousands of rows.
>
> Ciao, Duncan.
>


Re: Hot, large row

2014-07-25 Thread Ken Hancock
Keith,

If I'm understanding your schema it sounds like you have rows/partitions
with TTLs that continuously grow over time.  This sounds a lot like
https://issues.apache.org/jira/browse/CASSANDRA-6654 which was marked as
working as designed, but totally unintuitive.  (Johnathan Ellis went off
and spent the time to reproduce which indicates to me it was equally
unintuitive to someone who is pretty knowlegeable on Cassandra.).

Ken





On Thu, Jul 24, 2014 at 10:10 PM, Keith Wright  wrote:

> One last item to add to this thread:  we have consistently experienced
> this behavior where over time performance degrades (previously we were
> unable to bootstrap nodes to due long GC pauses from existing nodes).  I
> believe its due to tombstone build up (as I mentioned previously one of the
> tables mentioned is showing a droppable tombstone ratio of > 30%).   The
> sku table is used to hold SKUs that recent users viewed/purchased.  When we
> write a new SKU we set the TTL to 30 days where the row key is the user id;
> our read case is to fetch ALL skus the user has seen within the TTL.  Since
> the user sees SKUs consistently over time, this can result in a row with
> many columns much of which are likely tombstoned (see CASSANDRA-6654 which
> I filed for this which shows that C* does not handle this case well).
>
> I guess I’m just surprised that others aren’t using C* for similar usage
> cases and thus having the same issue?
>
>  I am hoping to upgrade to 2.0.9 which has improvements to remove
> tombstones.
>
> From: Keith Wright 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, July 24, 2014 at 4:50 PM
> To: "user@cassandra.apache.org" 
> Cc: Don Jackson 
>
> Subject: Re: Hot, large row
>
> When a node is showing the high CMS issue, io is actually low likely due
> to the fact that none is going on during CMS GC.  On a node not showing the
> issue, iostat shows disk usage around 50% (these are SSD) and load hovers
> around 10 for a dual octo core machine this is fine.
>
> In addition, nodetool compactionstats does not show that we are falling
> behind in compactions.
>
> So not sure what is going on here.  We are running CentOS 6.5 with java
> 1.7.0_51.  It does seem like things are getting worse and I’m considering
> dropping and rebuilding all the tables (as I have the data in Hadoop).
>  This seems to be a repeated problem for us with Cassandra and now that
> Aerospike has an open source version, we are very much considering
> switching.
>
> Thanks again for the help and any insight you might have!
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>
>   23.43   12.40   11.406.200.00   46.57
>
>
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda   0.00 0.00  720.500.00 5.51 0.0015.66
> 1.051.54   0.57  41.35
>
> sdc   0.00 0.00 5930.50 2100.5055.42 8.2116.23
> 2.300.29   0.06  51.25
>
> sdb   0.00 0.00 6951.50 2052.5065.82 8.0216.80
> 4.310.48   0.07  59.60
>
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>
>9.48   14.725.603.670.00   66.52
>
>
> Device: rrqm/s   wrqm/s r/s w/srMB/swMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
>
> sda   0.00 0.00  327.00   19.50 2.55 0.0815.53
> 0.040.13   0.12   4.25
>
> sdc   0.00 0.00 3974.50 1403.5036.37 5.4815.94
> 0.990.18   0.08  45.10
>
> sdb   0.00 0.00 4357.50 1535.0040.83 6.0016.28
> 1.100.19   0.08  47.45
>
> From: DuyHai Doan 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, July 24, 2014 at 4:44 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: Hot, large row
>
> For global_user_event_skus_v2
>
> 1. number of SSTables per read is quite huge. Considering you're using
> LCS, it means that LCS cannot keep up with write rate and is left behind.
> AFAIK LCS is using SizeTieredCompaction at L0 to cope with extreme write
> burst. Your high number of SSTables per read is quite fishy here.
>
> 2. Write latency is widespead up to 73.457 millisecs, meaning that your
> node is getting behind for write in some cases. Most of write are still
> below 1 millisec but we don't care. What we care here is the large trail of
> write latency climbing up to 73 millisecs
>
> 3. Same remarks for read latency which is worst because the distribution
> is even "flatter", worst cases going up to 100 ms.
>
> If I were you, I'll check for disk I/O first and maybe CPU usage
>
>
> On Thu, Jul 24, 2014 at 10:32 PM, Keith Wright 
> wrote:
>
>> Cfhistograms for the tables I believe are most likely the issue are below
>> on the node that most recently presented the issue.  Any ideas?  Note that
>> these tables are LCS and have droppable tombstone ratios of 27% for
>> global_user_event_skus_v2 and 2.7% for 

Re: Caffinitas Mapper - Java object mapper for Apache Cassandra

2014-07-25 Thread Vivek Mishra
How is it different than kundera?
On 20/07/2014 9:03 pm, "Robert Stupp"  wrote:

> Hi all,
>
> I've just released the first beta version of Caffinitas Mapper.
>
> Caffinitas Mapper is an advanced Java object mapper for Apache Cassandra
> NoSQL database. It offers an annotation based declaration model with a wide
> range of built-in features like JPA style inheritance with table-per-class
> and single-table model. Composites can be mapped using either Apache
> Cassandra’s new UserType or as distinct columns in a table. Cassandra
> collections, user type and tuple type are directly supported - collections
> can be loaded lazily. Entity instances can be automatically denormalized in
> other entity instances. CREATE TABLE/TYPE and ALTER TABLE/TYPE CQL DDL
> statements can be generated programmatically. Custom types can be
> integrated using a Converter API. All Cassandra consistency levels, serial
> consistency and batch statements are supported.
>
> All Apache Cassandra versions 1.2, 2.0 and 2.1 as well as all DataStax
> Community and Enterprise editions based on these Cassandra versions are
> supported. Java 6 is required during runtime.
>
> Support for legacy, Thrift style models, is possible with Caffinitas
> Mapper since it supports CompositeType and DynamicCompositeType out of the
> box. A special map-style-entity type has been especially designed to access
> schema-less data models.
>
> Caffinitas Mapper is open source and licensed using the Apache License,
> Version 2.0.
>
>
>
> Website & Documentation: http://caffinitas.org/
> API-Docs: http://caffinitas.org/mapper/apidocs/
> Source Repository: https://bitbucket.org/caffinitas/mapper/
> Issues: https://caffinitas.atlassian.net/
> Mailing List: https://groups.google.com/d/forum/caffinitas-mapper
>
>


Re: Hot, large row

2014-07-25 Thread Keith Wright
Ha, check out who filed that ticket!   Yes I’m aware of it.  My hope is that it 
was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 to 2.0.9.  
I’m really just surprised that others are not doing similar actions as I and 
thus experiencing similar issues.

To answer DuyHai’s questions:

How many nodes do you have ? And how many distinct user_id roughtly is there ?
- 14 nodes with approximately 250 million distinct user_ids

For GC activity, in general we see low GC pressure in both Par New and CMS (we 
see the occasional CMS spike but its usually under 100 ms).  When we see a node 
locked up in CMS GC, its not that anyone GC takes a long time, its just that 
the consistent nature of them causes the read latency to spike from the usual 
3-5 ms up to 35 ms which causes issues for our application.

Also Jack Krupansky question is interesting. Even though you limit a request to 
5000, if each cell is a big blob or block of text, it mays add up a lot into 
JVM heap …
- The columns values are actually timestamps and thus not variable in length 
and we cap the length of other columns used in the primary key so I find if 
VERY unlikely that this is a cause.

I will look into the paging option with that native client but from the docs it 
appears that its enabled by default, right?

I greatly appreciate all the help!

From: Ken Hancock mailto:ken.hanc...@schange.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Cc: Don Jackson mailto:djack...@nanigans.com>>
Subject: Re: Hot, large row

https://issues.apache.org/jira/browse/CASSANDRA-6654


Re: read huge data from CSV and write into Cassandra

2014-07-25 Thread Jack Krupansky
Read the csv file using a Java app and then index the rows using the Cassandra 
Java driver with multiple, parallel input streams.

Oh, and make sure to provision your cluster with enough nodes to handle your 
desired ingestion and query rates. Do a proof of concept with a six node 
cluster with RF=2 to see what ingestion and query rates you can get for a 
fraction of your data and then scale from there. Although a 12-node cluster 
with RF=3 would be more realistic. RF=2 is not for production – doesn’t permit 
any failures, while RF=3 permits quorum operations with a single node failure. 
But RF=2 at least lets you test with a more realistic scenario of coordinator 
nodes and inter-node traffic.

And if your total row count does manage to fit on one machine (or three nodes 
with RF=3), at least make sure you have enough CPU cores and I/O bandwidth to 
handle your desired ingestion and query rate.

-- Jack Krupansky

From: Akshay Ballarpure 
Sent: Friday, July 25, 2014 5:26 AM
To: user@cassandra.apache.org 
Subject: read huge data from CSV and write into Cassandra

How to read data from large CSV file which is having 100+ columns and millions 
of rows and inserting into Cassandra every 1 minute. 

Thanks & Regards
Akshay Ghanshyam Ballarpure
Tata Consultancy Services
Cell:- 9985084075
Mailto: akshay.ballarp...@tcs.com
Website: http://www.tcs.com

Experience certainty.IT Services
   Business Solutions
   Consulting
 
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Hot, large row

2014-07-25 Thread Jack Krupansky
Is it the accumulated tombstones on a row that make it act as if “wide”? Does 
cfhistograms count the tombstones or subtract them when reporting on cell-count 
for rows? (I don’t know.)

-- Jack Krupansky

From: Keith Wright 
Sent: Friday, July 25, 2014 10:24 AM
To: user@cassandra.apache.org 
Cc: Don Jackson 
Subject: Re: Hot, large row

Ha, check out who filed that ticket!   Yes I’m aware of it.  My hope is that it 
was mostly addressed in CASSANDRA-6563 so I may upgrade from 2.0.6 to 2.0.9.  
I’m really just surprised that others are not doing similar actions as I and 
thus experiencing similar issues.

To answer DuyHai’s questions:

How many nodes do you have ? And how many distinct user_id roughtly is there ?
- 14 nodes with approximately 250 million distinct user_ids

For GC activity, in general we see low GC pressure in both Par New and CMS (we 
see the occasional CMS spike but its usually under 100 ms).  When we see a node 
locked up in CMS GC, its not that anyone GC takes a long time, its just that 
the consistent nature of them causes the read latency to spike from the usual 
3-5 ms up to 35 ms which causes issues for our application.

Also Jack Krupansky question is interesting. Even though you limit a request to 
5000, if each cell is a big blob or block of text, it mays add up a lot into 
JVM heap … 
- The columns values are actually timestamps and thus not variable in length 
and we cap the length of other columns used in the primary key so I find if 
VERY unlikely that this is a cause.

I will look into the paging option with that native client but from the docs it 
appears that its enabled by default, right?  

I greatly appreciate all the help!

From: Ken Hancock 
Reply-To: "user@cassandra.apache.org" 
Date: Friday, July 25, 2014 at 10:06 AM
To: "user@cassandra.apache.org" 
Cc: Don Jackson 
Subject: Re: Hot, large row


https://issues.apache.org/jira/browse/CASSANDRA-6654 

Re: Caffinitas Mapper - Java object mapper for Apache Cassandra

2014-07-25 Thread Robert Stupp
I don't know kundera in detail.
Goal for Caffinitas Mapper is to provide a convenient object mapper for Apache 
Cassandra using an annotation based declarative approach that does not have the 
limitations that JPA annotations have.

Am 25.07.2014 um 16:21 schrieb Vivek Mishra :

> How is it different than kundera?
> 
> On 20/07/2014 9:03 pm, "Robert Stupp"  wrote:
> Hi all,
> 
> I've just released the first beta version of Caffinitas Mapper.
> 
> Caffinitas Mapper is an advanced Java object mapper for Apache Cassandra 
> NoSQL database. It offers an annotation based declaration model with a wide 
> range of built-in features like JPA style inheritance with table-per-class 
> and single-table model. Composites can be mapped using either Apache 
> Cassandra’s new UserType or as distinct columns in a table. Cassandra 
> collections, user type and tuple type are directly supported - collections 
> can be loaded lazily. Entity instances can be automatically denormalized in 
> other entity instances. CREATE TABLE/TYPE and ALTER TABLE/TYPE CQL DDL 
> statements can be generated programmatically. Custom types can be integrated 
> using a Converter API. All Cassandra consistency levels, serial consistency 
> and batch statements are supported.
> 
> All Apache Cassandra versions 1.2, 2.0 and 2.1 as well as all DataStax 
> Community and Enterprise editions based on these Cassandra versions are 
> supported. Java 6 is required during runtime.
> 
> Support for legacy, Thrift style models, is possible with Caffinitas Mapper 
> since it supports CompositeType and DynamicCompositeType out of the box. A 
> special map-style-entity type has been especially designed to access 
> schema-less data models.
> 
> Caffinitas Mapper is open source and licensed using the Apache License, 
> Version 2.0.
> 
> 
> 
> Website & Documentation: http://caffinitas.org/
> API-Docs: http://caffinitas.org/mapper/apidocs/
> Source Repository: https://bitbucket.org/caffinitas/mapper/
> Issues: https://caffinitas.atlassian.net/
> Mailing List: https://groups.google.com/d/forum/caffinitas-mapper
> 



signature.asc
Description: Message signed with OpenPGP using GPGMail


Replication factor 2 with immutable data

2014-07-25 Thread Jon Travis
I have a couple questions regarding the availability of my data in a RF=2
scenario.

- The setup -
I am currently storing immutable data in a CF with RF=2 and
read_repair_chance = 0.0.  There is a lot of data, so bumping up to RF=3
would increase my storage costs quite dramatically.  For the most part, I
am only adding data to this CF (and nightly, do some deleting).  Writes and
Reads are both being done with CL = ONE.

- The questions -
When I write a value, it is written to replicas A and B.  If B is down,
then A will still acknowledge the write and the write will succeed.  Great.
Now then, if B comes back up, and before B gets the handoff of the data
from A, a client attempts to read the recently-written data.  If the client
attempts to read the data and it gets routed to replica B, the data will
not exist there, and the read will fail, correct?

But what I really want is for the read to hit both A and B, and whichever
one returns the data then great -- I only need 1 of them to actually
acknowledge having it.

My questions are:
  - Is it possible to achieve consistency in this approach?  Even if I try
at CL=TWO and backoff to CL=ONE in a failure condition, there still seems
to be a race where I could hit the replica without the data.
  - Does a replica 'not having the data' count towards the CL requirements?
 I.e. replica B responds, "Nope, don't have it" -- I don't want the CL to
be satisfied, because the data is either there or it is not.  I have not
done updates to the data.

This feels a bit quorum-ish, where a quorum under RF=3 will ask 3 nodes for
the data and return success when 2 have consistent results.

It feels strange to be able to write data at RF=2, then with only 1 node
being down, not be able to read it ...

Thanks,

-- Jon


Re: Replication factor 2 with immutable data

2014-07-25 Thread Robert Coli
On Fri, Jul 25, 2014 at 10:46 AM, Jon Travis  wrote:

> I have a couple questions regarding the availability of my data in a RF=2
> scenario.
>

You have just explained why consistency does not work with Replication
Factor of fewer than 3 and Consistency Level of less than QUORUM.

Basically, with RF=2, QUORUM is ALL, and you can't be available at ALL
because that's impossible.

=Rob


Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Kevin Burton
Say I have about 50 primary keys I need to fetch.

I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
has record, I can read from all 50 at once.

I assume cassandra does the right thing here ?  I believe it does… at least
from reading the docs but it's still a bit unclear.

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread DuyHai Doan
Nope. Select ... IN() sends one request to a coordinator. This coordinator
dispatch the request to 50 nodes as in your example and waits for 50
responses before sending back the final result. As you can guess this
approach is not optimal since the global request latency is bound to the
slowest latency among 50 nodes.

 On the other hand if you use async feature from the native protocol, you
client will issue 50 requests in parallel and the answers arrive as soon as
they are fetched from different nodes.

 Clearly the only advantage of using IN() clause is ease of query. I would
advise to use IN() only when you have a "few" values, not 50.


On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton  wrote:

> Say I have about 50 primary keys I need to fetch.
>
> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
> has record, I can read from all 50 at once.
>
> I assume cassandra does the right thing here ?  I believe it does… at
> least from reading the docs but it's still a bit unclear.
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
>  … or check out my Google+ profile
> 
> 
>
>


Re: Index creation sometimes fails

2014-07-25 Thread Clint Kelly
Hi Tyler,

FWIW I was not able to reproduce this problem with a smaller example.  I'll
go ahead and file the JIRA anyway.  Thanks for your help!

Best regards,
Clint


On Thu, Jul 17, 2014 at 3:05 PM, Tyler Hobbs  wrote:

>
> On Thu, Jul 17, 2014 at 4:59 PM, Clint Kelly 
> wrote:
>
>>
>> I will post a JIRA, along with directions on how to get this to
>> happen.  The tricky thing, though, is that this doesn't always happen,
>> and I cannot reproduce it on my laptop or in a VM.
>>
>
> Even if you can't reproduce, just include as many details as you can.  C*
> and driver versions, schemas, logs, etc.
>
>
>>
>> BTW you mean the datastax JIRA, correct?
>
>
> Oops! Yes, I meant https://datastax-oss.atlassian.net/browse/JAVA, not
> the batch statement docs :)
>
>
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Graham Sanderson
Of course the driver in question is allowed to be smarter and can do so if use 
use a ? parameter for a list or even individual elements

I'm not sure which if any drivers currently do this but we plan to combine this 
with token aware routing in our scala driver in the future 

Sent from my iPhone

> On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:
> 
> Nope. Select ... IN() sends one request to a coordinator. This coordinator 
> dispatch the request to 50 nodes as in your example and waits for 50 
> responses before sending back the final result. As you can guess this 
> approach is not optimal since the global request latency is bound to the 
> slowest latency among 50 nodes.
> 
>  On the other hand if you use async feature from the native protocol, you 
> client will issue 50 requests in parallel and the answers arrive as soon as 
> they are fetched from different nodes.
> 
>  Clearly the only advantage of using IN() clause is ease of query. I would 
> advise to use IN() only when you have a "few" values, not 50.
> 
> 
>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton  wrote:
>> Say I have about 50 primary keys I need to fetch.
>> 
>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each has 
>> record, I can read from all 50 at once.
>> 
>> I assume cassandra does the right thing here ?  I believe it does… at least 
>> from reading the docs but it's still a bit unclear.
>> 
>> Kevin
>> 
>> -- 
>> Founder/CEO Spinn3r.com
>> Location: San Francisco, CA
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
> 


do all nodes actually send the data to the coordinator when doing a read?

2014-07-25 Thread Brian Tarbox
We're considering a C* setup with very large columns and I have a question
about the details of read.

I understand that a read request gets handled by the coordinator which
sends read requests to  of the nodes holding replicas of the data,
and once  nodes have replied with consistent data it is returned to
the client.

My understanding is that each of the nodes actually sends the full data
being requested to the coordinator (which in the case of very large columns
would involve lots of network traffic).  Is that right?

The alternative (which I don't think is the case but I've been asked to
verify) is that the replicas first send meta-data to the coordinator which
then asks one replica to send the actual data.  Again, I don't think this
is the case but was asked to confirm.

Thanks.

-- 
http://about.me/BrianTarbox


Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Kevin Burton
On Fri, Jul 25, 2014 at 11:14 AM, DuyHai Doan  wrote:

> Nope. Select ... IN() sends one request to a coordinator. This coordinator
> dispatch the request to 50 nodes as in your example and waits for 50
> responses before sending back the final result. As you can guess this
> approach is not optimal since the global request latency is bound to the
> slowest latency among 50 nodes.
>
>
Maybe it's the wording but it sounds like the coordinator is doing parallel
dispatch?

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Kevin Burton
Perhaps the best strategy is to have the datastax java-driver do this and I
just wait or each result individually.  This will give me parallel dispatch.


On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson  wrote:

> Of course the driver in question is allowed to be smarter and can do so if
> use use a ? parameter for a list or even individual elements
>
> I'm not sure which if any drivers currently do this but we plan to combine
> this with token aware routing in our scala driver in the future
>
> Sent from my iPhone
>
> On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:
>
> Nope. Select ... IN() sends one request to a coordinator. This coordinator
> dispatch the request to 50 nodes as in your example and waits for 50
> responses before sending back the final result. As you can guess this
> approach is not optimal since the global request latency is bound to the
> slowest latency among 50 nodes.
>
>  On the other hand if you use async feature from the native protocol, you
> client will issue 50 requests in parallel and the answers arrive as soon as
> they are fetched from different nodes.
>
>  Clearly the only advantage of using IN() clause is ease of query. I would
> advise to use IN() only when you have a "few" values, not 50.
>
>
> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton  wrote:
>
>> Say I have about 50 primary keys I need to fetch.
>>
>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
>> has record, I can read from all 50 at once.
>>
>> I assume cassandra does the right thing here ?  I believe it does… at
>> least from reading the docs but it's still a bit unclear.
>>
>> Kevin
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>>  … or check out my Google+ profile
>> 
>> 
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael
We use IN (keeping the number down). The coordinator does parallel dispatch
AND applies ORDERED BY to the aggregate results, which we would otherwise
have to do ourselves. Anyway, worth it for us.

ml


On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton  wrote:

> Perhaps the best strategy is to have the datastax java-driver do this and
> I just wait or each result individually.  This will give me parallel
> dispatch.
>
>
> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson 
> wrote:
>
>> Of course the driver in question is allowed to be smarter and can do so
>> if use use a ? parameter for a list or even individual elements
>>
>> I'm not sure which if any drivers currently do this but we plan to
>> combine this with token aware routing in our scala driver in the future
>>
>> Sent from my iPhone
>>
>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:
>>
>> Nope. Select ... IN() sends one request to a coordinator. This
>> coordinator dispatch the request to 50 nodes as in your example and waits
>> for 50 responses before sending back the final result. As you can guess
>> this approach is not optimal since the global request latency is bound to
>> the slowest latency among 50 nodes.
>>
>>  On the other hand if you use async feature from the native protocol, you
>> client will issue 50 requests in parallel and the answers arrive as soon as
>> they are fetched from different nodes.
>>
>>  Clearly the only advantage of using IN() clause is ease of query. I
>> would advise to use IN() only when you have a "few" values, not 50.
>>
>>
>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton  wrote:
>>
>>> Say I have about 50 primary keys I need to fetch.
>>>
>>> I'd like to use parallel dispatch.  So that if I have 50 hosts, and each
>>> has record, I can read from all 50 at once.
>>>
>>> I assume cassandra does the right thing here ?  I believe it does… at
>>> least from reading the docs but it's still a bit unclear.
>>>
>>> Kevin
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>>  … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Kevin Burton
Ah.. ok. Nice.  That should work.  Parallel dispatch on the client would
work too.. using async.


On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael 
wrote:

> We use IN (keeping the number down). The coordinator does parallel
> dispatch AND applies ORDERED BY to the aggregate results, which we would
> otherwise have to do ourselves. Anyway, worth it for us.
>
> ml
>
>
> On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton  wrote:
>
>> Perhaps the best strategy is to have the datastax java-driver do this and
>> I just wait or each result individually.  This will give me parallel
>> dispatch.
>>
>>
>> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson 
>> wrote:
>>
>>> Of course the driver in question is allowed to be smarter and can do so
>>> if use use a ? parameter for a list or even individual elements
>>>
>>> I'm not sure which if any drivers currently do this but we plan to
>>> combine this with token aware routing in our scala driver in the future
>>>
>>> Sent from my iPhone
>>>
>>> On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:
>>>
>>> Nope. Select ... IN() sends one request to a coordinator. This
>>> coordinator dispatch the request to 50 nodes as in your example and waits
>>> for 50 responses before sending back the final result. As you can guess
>>> this approach is not optimal since the global request latency is bound to
>>> the slowest latency among 50 nodes.
>>>
>>>  On the other hand if you use async feature from the native protocol,
>>> you client will issue 50 requests in parallel and the answers arrive as
>>> soon as they are fetched from different nodes.
>>>
>>>  Clearly the only advantage of using IN() clause is ease of query. I
>>> would advise to use IN() only when you have a "few" values, not 50.
>>>
>>>
>>> On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton 
>>> wrote:
>>>
 Say I have about 50 primary keys I need to fetch.

 I'd like to use parallel dispatch.  So that if I have 50 hosts, and
 each has record, I can read from all 50 at once.

 I assume cassandra does the right thing here ?  I believe it does… at
 least from reading the docs but it's still a bit unclear.

 Kevin

 --

 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
  … or check out my Google+ profile
 
 


>>>
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




IN clause with composite primary key?

2014-07-25 Thread Kevin Burton
How the heck would you build an IN clause with a primary key which is
composite?

so say columns foo and bar are the primary key.

if you just had foo as your column name, you can do

where foo in ()

… but with two keys I don't see how it's possible.

specifying both actually builds a cartesian product.  which is kind of cool
but not what I want :)

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




Re: do all nodes actually send the data to the coordinator when doing a read?

2014-07-25 Thread Mark Reddy
Hi Brian,

A read request will be handled in the following manner:

Once the coordinator receives a read request it will firstly determine the
replicas responsible for the data. From there those replicas are sorted by
"proximity" to the coordinator. The closest node as determined by proximity
sorting will be sent a command to perform an actual data read i.e. return
the data to the coordinator

If you have a Replication Factor (RF) of 3 and are reading at CL.QUORUM,
one additional node will be sent a digest query. A digest query is like a
read query except that instead of the receiving node actually returning the
data, it only returns a digest (hash) of the would-be data. The reason for
this is to discover whether the two nodes contacted agree on what the
current data is, without sending the data over the network. Obviously for
large data sets this is an effective bandwidth saver.

Back on the coordinator node if the data and the digest match the data is
returned to the client. If the data and digest do not match, a full data
read is performed against the contacted replicas in order to guarantee that
the most recent data is returned.

Asynchronously in the background, the third replica is checked for
consistency with the first two, and if needed, a read repair is initiated
for that node.


Mark



On Fri, Jul 25, 2014 at 9:12 PM, Brian Tarbox  wrote:

> We're considering a C* setup with very large columns and I have a question
> about the details of read.
>
> I understand that a read request gets handled by the coordinator which
> sends read requests to  of the nodes holding replicas of the data,
> and once  nodes have replied with consistent data it is returned to
> the client.
>
> My understanding is that each of the nodes actually sends the full data
> being requested to the coordinator (which in the case of very large columns
> would involve lots of network traffic).  Is that right?
>
> The alternative (which I don't think is the case but I've been asked to
> verify) is that the replicas first send meta-data to the coordinator which
> then asks one replica to send the actual data.  Again, I don't think this
> is the case but was asked to confirm.
>
> Thanks.
>
> --
> http://about.me/BrianTarbox
>


Re: IN clause with composite primary key?

2014-07-25 Thread DuyHai Doan
Below are the rules for IN clause

a. composite partition keys: the IN clause only applies to the last
composite component
b. clustering keys: the IN clause only applies to the last clustering key

Contrived example:

CREATE TABLE test(
   pk1 int,
   pk2 int,
   clust1 int,
   clust2 int,
   clust3 int,
   PRIMARY KEY ((pk1,pk2), clust1, clust2, clust3));

Possible queries

SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3);
SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3) AND col1=1 AND col2=2 AND
col3 IN (3,4,5);

Theoretically there should be possible to do   SELECT * FROM test WHERE pk1
IN(1,2)  AND pk2 =3;  or SELECT * FROM test WHERE pk1 IN(1,2)  AND pk2 IN
(3,4) because the values in the IN() clause are just expanded to all linear
combinations with other composites of the partiton key. But for some reason
it's not allowed.

However the restriction of IN clause for the clustering keys some how makes
sense. Having multiple clustering keys, if you allow using IN clause for
the first or any clustering key that is not the last one, C* would have to
do a very large slice to pick some discrete values matching the IN() clause
...







On Fri, Jul 25, 2014 at 11:17 PM, Kevin Burton  wrote:

> How the heck would you build an IN clause with a primary key which is
> composite?
>
> so say columns foo and bar are the primary key.
>
> if you just had foo as your column name, you can do
>
> where foo in ()
>
> … but with two keys I don't see how it's possible.
>
> specifying both actually builds a cartesian product.  which is kind of
> cool but not what I want :)
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


Re: Does SELECT … IN () use parallel dispatch?

2014-07-25 Thread Laing, Michael
Except then you have to merge results if you want them ordered.


On Fri, Jul 25, 2014 at 2:15 PM, Kevin Burton  wrote:

> Ah.. ok. Nice.  That should work.  Parallel dispatch on the client would
> work too.. using async.
>
>
> On Fri, Jul 25, 2014 at 1:37 PM, Laing, Michael  > wrote:
>
>> We use IN (keeping the number down). The coordinator does parallel
>> dispatch AND applies ORDERED BY to the aggregate results, which we would
>> otherwise have to do ourselves. Anyway, worth it for us.
>>
>> ml
>>
>>
>> On Fri, Jul 25, 2014 at 1:24 PM, Kevin Burton  wrote:
>>
>>> Perhaps the best strategy is to have the datastax java-driver do this
>>> and I just wait or each result individually.  This will give me parallel
>>> dispatch.
>>>
>>>
>>> On Fri, Jul 25, 2014 at 11:40 AM, Graham Sanderson 
>>> wrote:
>>>
 Of course the driver in question is allowed to be smarter and can do so
 if use use a ? parameter for a list or even individual elements

 I'm not sure which if any drivers currently do this but we plan to
 combine this with token aware routing in our scala driver in the future

 Sent from my iPhone

 On Jul 25, 2014, at 1:14 PM, DuyHai Doan  wrote:

 Nope. Select ... IN() sends one request to a coordinator. This
 coordinator dispatch the request to 50 nodes as in your example and waits
 for 50 responses before sending back the final result. As you can guess
 this approach is not optimal since the global request latency is bound to
 the slowest latency among 50 nodes.

  On the other hand if you use async feature from the native protocol,
 you client will issue 50 requests in parallel and the answers arrive as
 soon as they are fetched from different nodes.

  Clearly the only advantage of using IN() clause is ease of query. I
 would advise to use IN() only when you have a "few" values, not 50.


 On Fri, Jul 25, 2014 at 8:08 PM, Kevin Burton 
 wrote:

> Say I have about 50 primary keys I need to fetch.
>
> I'd like to use parallel dispatch.  So that if I have 50 hosts, and
> each has record, I can read from all 50 at once.
>
> I assume cassandra does the right thing here ?  I believe it does… at
> least from reading the docs but it's still a bit unclear.
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
>  … or check out my Google+ profile
> 
> 
>
>

>>>
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> 
>
>


Re: do all nodes actually send the data to the coordinator when doing a read?

2014-07-25 Thread DuyHai Doan
Thanks Mark for the very detailed explanation.

 However what's about timestamp checking ? You're saying that the
coordinator checks for the digest of data (cell value) from both nodes but
if the cell name have different timestamp would it still request a full
data read to the node having the most recent time ?


On Fri, Jul 25, 2014 at 11:25 PM, Mark Reddy  wrote:

> Hi Brian,
>
> A read request will be handled in the following manner:
>
> Once the coordinator receives a read request it will firstly determine the
> replicas responsible for the data. From there those replicas are sorted by
> "proximity" to the coordinator. The closest node as determined by proximity
> sorting will be sent a command to perform an actual data read i.e. return
> the data to the coordinator
>
> If you have a Replication Factor (RF) of 3 and are reading at CL.QUORUM,
> one additional node will be sent a digest query. A digest query is like a
> read query except that instead of the receiving node actually returning the
> data, it only returns a digest (hash) of the would-be data. The reason for
> this is to discover whether the two nodes contacted agree on what the
> current data is, without sending the data over the network. Obviously for
> large data sets this is an effective bandwidth saver.
>
> Back on the coordinator node if the data and the digest match the data is
> returned to the client. If the data and digest do not match, a full data
> read is performed against the contacted replicas in order to guarantee that
> the most recent data is returned.
>
> Asynchronously in the background, the third replica is checked for
> consistency with the first two, and if needed, a read repair is initiated
> for that node.
>
>
> Mark
>
>
>
> On Fri, Jul 25, 2014 at 9:12 PM, Brian Tarbox 
> wrote:
>
>> We're considering a C* setup with very large columns and I have a
>> question about the details of read.
>>
>> I understand that a read request gets handled by the coordinator which
>> sends read requests to  of the nodes holding replicas of the data,
>> and once  nodes have replied with consistent data it is returned to
>> the client.
>>
>> My understanding is that each of the nodes actually sends the full data
>> being requested to the coordinator (which in the case of very large columns
>> would involve lots of network traffic).  Is that right?
>>
>> The alternative (which I don't think is the case but I've been asked to
>> verify) is that the replicas first send meta-data to the coordinator which
>> then asks one replica to send the actual data.  Again, I don't think this
>> is the case but was asked to confirm.
>>
>> Thanks.
>>
>> --
>> http://about.me/BrianTarbox
>>
>
>


Re: do all nodes actually send the data to the coordinator when doing a read?

2014-07-25 Thread Jaydeep Chovatia
Yes. Digest includes following: {name, value, timestamp, flags(deleted,
expired, etc.)}


On Fri, Jul 25, 2014 at 2:33 PM, DuyHai Doan  wrote:

> Thanks Mark for the very detailed explanation.
>
>  However what's about timestamp checking ? You're saying that the
> coordinator checks for the digest of data (cell value) from both nodes but
> if the cell name have different timestamp would it still request a full
> data read to the node having the most recent time ?
>
>
> On Fri, Jul 25, 2014 at 11:25 PM, Mark Reddy 
> wrote:
>
>> Hi Brian,
>>
>> A read request will be handled in the following manner:
>>
>> Once the coordinator receives a read request it will firstly determine
>> the replicas responsible for the data. From there those replicas are sorted
>> by "proximity" to the coordinator. The closest node as determined by
>> proximity sorting will be sent a command to perform an actual data read
>> i.e. return the data to the coordinator
>>
>> If you have a Replication Factor (RF) of 3 and are reading at CL.QUORUM,
>> one additional node will be sent a digest query. A digest query is like a
>> read query except that instead of the receiving node actually returning the
>> data, it only returns a digest (hash) of the would-be data. The reason for
>> this is to discover whether the two nodes contacted agree on what the
>> current data is, without sending the data over the network. Obviously for
>> large data sets this is an effective bandwidth saver.
>>
>> Back on the coordinator node if the data and the digest match the data is
>> returned to the client. If the data and digest do not match, a full data
>> read is performed against the contacted replicas in order to guarantee that
>> the most recent data is returned.
>>
>> Asynchronously in the background, the third replica is checked for
>> consistency with the first two, and if needed, a read repair is initiated
>> for that node.
>>
>>
>> Mark
>>
>>
>>
>> On Fri, Jul 25, 2014 at 9:12 PM, Brian Tarbox 
>> wrote:
>>
>>> We're considering a C* setup with very large columns and I have a
>>> question about the details of read.
>>>
>>> I understand that a read request gets handled by the coordinator which
>>> sends read requests to  of the nodes holding replicas of the data,
>>> and once  nodes have replied with consistent data it is returned to
>>> the client.
>>>
>>> My understanding is that each of the nodes actually sends the full data
>>> being requested to the coordinator (which in the case of very large columns
>>> would involve lots of network traffic).  Is that right?
>>>
>>> The alternative (which I don't think is the case but I've been asked to
>>> verify) is that the replicas first send meta-data to the coordinator which
>>> then asks one replica to send the actual data.  Again, I don't think this
>>> is the case but was asked to confirm.
>>>
>>> Thanks.
>>>
>>> --
>>> http://about.me/BrianTarbox
>>>
>>
>>
>


Re: IN clause with composite primary key?

2014-07-25 Thread Laing, Michael
You may also want to use tuples for the clustering columns:

The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS:
>
> SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN 
> (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
>
>
> from https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt


On Fri, Jul 25, 2014 at 2:29 PM, DuyHai Doan  wrote:

> Below are the rules for IN clause
>
> a. composite partition keys: the IN clause only applies to the last
> composite component
> b. clustering keys: the IN clause only applies to the last clustering key
>
> Contrived example:
>
> CREATE TABLE test(
>pk1 int,
>pk2 int,
>clust1 int,
>clust2 int,
>clust3 int,
>PRIMARY KEY ((pk1,pk2), clust1, clust2, clust3));
>
> Possible queries
>
> SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3);
> SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3) AND col1=1 AND col2=2
> AND col3 IN (3,4,5);
>
> Theoretically there should be possible to do   SELECT * FROM test WHERE
> pk1 IN(1,2)  AND pk2 =3;  or SELECT * FROM test WHERE pk1 IN(1,2)  AND pk2
> IN (3,4) because the values in the IN() clause are just expanded to all
> linear combinations with other composites of the partiton key. But for some
> reason it's not allowed.
>
> However the restriction of IN clause for the clustering keys some how
> makes sense. Having multiple clustering keys, if you allow using IN clause
> for the first or any clustering key that is not the last one, C* would have
> to do a very large slice to pick some discrete values matching the IN()
> clause ...
>
>
>
>
>
>
>
> On Fri, Jul 25, 2014 at 11:17 PM, Kevin Burton  wrote:
>
>> How the heck would you build an IN clause with a primary key which is
>> composite?
>>
>> so say columns foo and bar are the primary key.
>>
>> if you just had foo as your column name, you can do
>>
>> where foo in ()
>>
>> … but with two keys I don't see how it's possible.
>>
>> specifying both actually builds a cartesian product.  which is kind of
>> cool but not what I want :)
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>> 
>>
>>
>


Changing IPs of all nodes in a ring

2014-07-25 Thread Rahul Neelakantan
All,
I need to change the IPs of all nodes in my ring in a flash cut, at the same 
time. Any recommendations on how to do this?

Rahul Neelakantan

Re: Changing IPs of all nodes in a ring

2014-07-25 Thread Robert Coli
On Fri, Jul 25, 2014 at 3:54 PM, Rahul Neelakantan  wrote:

> I need to change the IPs of all nodes in my ring in a flash cut, at the
> same time. Any recommendations on how to do this?
>

What are your uptime requirements as you do this?

Because no, there's no way to change the ip address on all Cassandra nodes
in a cluster simultaneously and have it stay available.

https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/

Is how you do it with a rolling restart.

=Rob


Re: Changing IPs of all nodes in a ring

2014-07-25 Thread Rahul Neelakantan
I am ok with taking upto 2 hours of planned downtime. The problem is all the 
IPs will change at the same time and the previous IPs will no longer be 
available. So it's either all old IPs or all new IPs. 

Rahul Neelakantan
678-451-4545

> On Jul 25, 2014, at 7:23 PM, Robert Coli  wrote:
> 
>> On Fri, Jul 25, 2014 at 3:54 PM, Rahul Neelakantan  wrote:
>> I need to change the IPs of all nodes in my ring in a flash cut, at the same 
>> time. Any recommendations on how to do this?
> 
> What are your uptime requirements as you do this?
> 
> Because no, there's no way to change the ip address on all Cassandra nodes in 
> a cluster simultaneously and have it stay available.
> 
> https://engineering.eventbrite.com/changing-the-ip-address-of-a-cassandra-node-with-auto_bootstrapfalse/
> 
> Is how you do it with a rolling restart.
> 
> =Rob
> 


Re: do all nodes actually send the data to the coordinator when doing a read?

2014-07-25 Thread Mark Reddy
>
> However what's about timestamp checking ? You're saying that the
> coordinator checks for the digest of data (cell value) from both nodes but
> if the cell name have different timestamp would it still request a full
> data read to the node having the most recent time ?


When generating the hash to be returned to the coordinator, the possible
cell values that are used are name, value, timestamp, serialisationFlag and
depending on the cell type possibly other values. From there the hashes are
compared and if there is a mismatch the data is requested from the
replicas. At this stage the RowDataResolver will compute the most recent
version of each column, and send diffs to out-of-date replicas.




On Fri, Jul 25, 2014 at 11:32 PM, Jaydeep Chovatia <
chovatia.jayd...@gmail.com> wrote:

> Yes. Digest includes following: {name, value, timestamp, flags(deleted,
> expired, etc.)}
>
>
> On Fri, Jul 25, 2014 at 2:33 PM, DuyHai Doan  wrote:
>
>> Thanks Mark for the very detailed explanation.
>>
>>  However what's about timestamp checking ? You're saying that the
>> coordinator checks for the digest of data (cell value) from both nodes but
>> if the cell name have different timestamp would it still request a full
>> data read to the node having the most recent time ?
>>
>>
>> On Fri, Jul 25, 2014 at 11:25 PM, Mark Reddy 
>> wrote:
>>
>>> Hi Brian,
>>>
>>> A read request will be handled in the following manner:
>>>
>>> Once the coordinator receives a read request it will firstly determine
>>> the replicas responsible for the data. From there those replicas are sorted
>>> by "proximity" to the coordinator. The closest node as determined by
>>> proximity sorting will be sent a command to perform an actual data read
>>> i.e. return the data to the coordinator
>>>
>>> If you have a Replication Factor (RF) of 3 and are reading at CL.QUORUM,
>>> one additional node will be sent a digest query. A digest query is like a
>>> read query except that instead of the receiving node actually returning the
>>> data, it only returns a digest (hash) of the would-be data. The reason for
>>> this is to discover whether the two nodes contacted agree on what the
>>> current data is, without sending the data over the network. Obviously for
>>> large data sets this is an effective bandwidth saver.
>>>
>>> Back on the coordinator node if the data and the digest match the data
>>> is returned to the client. If the data and digest do not match, a full data
>>> read is performed against the contacted replicas in order to guarantee that
>>> the most recent data is returned.
>>>
>>> Asynchronously in the background, the third replica is checked for
>>> consistency with the first two, and if needed, a read repair is initiated
>>> for that node.
>>>
>>>
>>> Mark
>>>
>>>
>>>
>>> On Fri, Jul 25, 2014 at 9:12 PM, Brian Tarbox 
>>> wrote:
>>>
 We're considering a C* setup with very large columns and I have a
 question about the details of read.

 I understand that a read request gets handled by the coordinator which
 sends read requests to  of the nodes holding replicas of the data,
 and once  nodes have replied with consistent data it is returned to
 the client.

 My understanding is that each of the nodes actually sends the full data
 being requested to the coordinator (which in the case of very large columns
 would involve lots of network traffic).  Is that right?

 The alternative (which I don't think is the case but I've been asked to
 verify) is that the replicas first send meta-data to the coordinator which
 then asks one replica to send the actual data.  Again, I don't think this
 is the case but was asked to confirm.

 Thanks.

 --
 http://about.me/BrianTarbox

>>>
>>>
>>
>


Re: Changing IPs of all nodes in a ring

2014-07-25 Thread Robert Coli
On Fri, Jul 25, 2014 at 4:42 PM, Rahul Neelakantan  wrote:

> I am ok with taking upto 2 hours of planned downtime. The problem is all
> the IPs will change at the same time and the previous IPs will no longer be
> available. So it's either all old IPs or all new IPs.
>

Are the new IPs available before switchover time? If so, you can switch to
them with a rolling restart. If not :

1) down entire cluster
2) change ips in cassandra.yaml, including in seed list
3) use auto_bootstrap:false technique, as in blog post, to bring them all
back

=Rob


Re: Changing IPs of all nodes in a ring

2014-07-25 Thread Rahul Neelakantan
The new IPs are not available before switch time. So I will try the all down 
method you mentioned. 

Do I need to do any move tokens of the new IPs to the old token -1 and the 
removeToken of the old tokens?  I ask this because the old IPs will continue to 
show in gossip info with a status of Normal.



Rahul Neelakantan
678-451-4545

> On Jul 25, 2014, at 9:06 PM, Robert Coli  wrote:
> 
>> On Fri, Jul 25, 2014 at 4:42 PM, Rahul Neelakantan  wrote:
>> I am ok with taking upto 2 hours of planned downtime. The problem is all the 
>> IPs will change at the same time and the previous IPs will no longer be 
>> available. So it's either all old IPs or all new IPs. 
> 
> Are the new IPs available before switchover time? If so, you can switch to 
> them with a rolling restart. If not :
> 
> 1) down entire cluster
> 2) change ips in cassandra.yaml, including in seed list
> 3) use auto_bootstrap:false technique, as in blog post, to bring them all back
>  
> =Rob
> 


Re: IN clause with composite primary key?

2014-07-25 Thread Kevin Burton
ah.. this only works in clustering columns… hm.. won't work in our
situation though :-(


On Fri, Jul 25, 2014 at 3:45 PM, Laing, Michael 
wrote:

> You may also want to use tuples for the clustering columns:
>
> The tuple notation may also be used for IN clauses on CLUSTERING COLUMNS:
>>
>> SELECT * FROM posts WHERE userid='john doe' AND (blog_title, posted_at) IN 
>> (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
>>
>>
>> from https://cassandra.apache.org/doc/cql3/CQL.html#selectStmt
>
>
> On Fri, Jul 25, 2014 at 2:29 PM, DuyHai Doan  wrote:
>
>> Below are the rules for IN clause
>>
>> a. composite partition keys: the IN clause only applies to the last
>> composite component
>> b. clustering keys: the IN clause only applies to the last clustering key
>>
>> Contrived example:
>>
>> CREATE TABLE test(
>>pk1 int,
>>pk2 int,
>>clust1 int,
>>clust2 int,
>>clust3 int,
>>PRIMARY KEY ((pk1,pk2), clust1, clust2, clust3));
>>
>> Possible queries
>>
>> SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3);
>> SELECT * FROM test WHERE pk1=1 AND pk2 IN (1,2,3) AND col1=1 AND col2=2
>> AND col3 IN (3,4,5);
>>
>> Theoretically there should be possible to do   SELECT * FROM test WHERE
>> pk1 IN(1,2)  AND pk2 =3;  or SELECT * FROM test WHERE pk1 IN(1,2)  AND pk2
>> IN (3,4) because the values in the IN() clause are just expanded to all
>> linear combinations with other composites of the partiton key. But for some
>> reason it's not allowed.
>>
>> However the restriction of IN clause for the clustering keys some how
>> makes sense. Having multiple clustering keys, if you allow using IN clause
>> for the first or any clustering key that is not the last one, C* would have
>> to do a very large slice to pick some discrete values matching the IN()
>> clause ...
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 25, 2014 at 11:17 PM, Kevin Burton 
>> wrote:
>>
>>> How the heck would you build an IN clause with a primary key which is
>>> composite?
>>>
>>> so say columns foo and bar are the primary key.
>>>
>>> if you just had foo as your column name, you can do
>>>
>>> where foo in ()
>>>
>>> … but with two keys I don't see how it's possible.
>>>
>>> specifying both actually builds a cartesian product.  which is kind of
>>> cool but not what I want :)
>>>
>>> --
>>>
>>> Founder/CEO Spinn3r.com
>>> Location: *San Francisco, CA*
>>> blog: http://burtonator.wordpress.com
>>> … or check out my Google+ profile
>>> 
>>> 
>>>
>>>
>>
>


-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile




any plans for coprocessors?

2014-07-25 Thread Kevin Burton
Are there any plans to add coprocessors to cassandra?

Embedding logic directly in a cassandra daemon would be nice.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile