Re: Modeling nested collection with C* 2.0

2016-01-28 Thread Ryan Svihla
Ahmed,

Just using text and serializing as Json is the easy way and a common approach.

However, this list is for Cassandra commiter discussion, please be so kind as 
to use the regular user list for data modeling questions or for any future 
responses to this email thread.


Regards,
Ryan Svihla

> On Jan 28, 2016, at 7:28 AM, Ahmed Eljami  wrote:
> 
> ​Hi,
> 
> I need your help for modeling a nested collection with cassanrda2.0 (UDT no,
> no fozen)
> 
> My users table contains emails by type, each type of email contains multiple
> emails.
> 
> Example:
> Type: pro. emails: {a...@mail.com, b...@mail.com ...}
> 
> Type: private. emails: {c...@mail.com, d...@mail.com}
> .
> 
> The user table also contains addresses, address type with fields.
> 
> Example:
> 
> Type: Pro. address {Street= aaa, number = 123, apartment = bbb}
> 
> Type: Private. address {Street = bbb, number = 123, apartment = kkk }
> 
> I am looking for a solution to store all these columns in one table.
> 
> Thank you.


Re: Modeling nested collection with C* 2.0

2016-01-28 Thread Carlos Alonso
Hi Ahmed,

I think modelling them as a map where you can 'label' your emails or
addresses sounds like a good option.

More info here:
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html

Regards

Carlos Alonso | Software Engineer | @calonso 

On 28 January 2016 at 13:36, Ryan Svihla  wrote:

> Ahmed,
>
> Just using text and serializing as Json is the easy way and a common
> approach.
>
> However, this list is for Cassandra commiter discussion, please be so kind
> as to use the regular user list for data modeling questions or for any
> future responses to this email thread.
>
>
> Regards,
> Ryan Svihla
>
> On Jan 28, 2016, at 7:28 AM, Ahmed Eljami  wrote:
>
> ​Hi,
>
> I need your help for modeling a nested collection with cassanrda2.0 (UDT
> no,
> no fozen)
>
> My users table contains emails by type, each type of email contains
> multiple
> emails.
>
> Example:
> Type: pro. emails: {a...@mail.com, b...@mail.com ...}
>
> Type: private. emails: {c...@mail.com, d...@mail.com}
> .
>
> The user table also contains addresses, address type with fields.
>
> Example:
>
> Type: Pro. address {Street= aaa, number = 123, apartment = bbb}
>
> Type: Private. address {Street = bbb, number = 123, apartment = kkk }
>
> I am looking for a solution to store all these columns in one table.
>
> Thank you.
>
>


Re: Modeling nested collection with C* 2.0

2016-01-28 Thread Jack Krupansky
Generally, you should use clustering columns to model nested structures,
unless they really are simply list/map structures.

But, first, as with all data modeling in Cassandra, start by looking at how
you intend to query the data. Do you need to query individual addresses,
email addresses, streets, etc.? If so, separate rows for each address is a
better way to go. On the flip side, if the full address is just a blob that
you will interpret in the app, a frozen UDT or JSON blob is a reasonable
way to go. In any case, tell us about your query needs first.

-- Jack Krupansky

On Thu, Jan 28, 2016 at 10:29 AM, Carlos Alonso  wrote:

> Hi Ahmed,
>
> I think modelling them as a map where you can 'label' your emails or
> addresses sounds like a good option.
>
> More info here:
> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html
>
> Regards
>
> Carlos Alonso | Software Engineer | @calonso 
>
> On 28 January 2016 at 13:36, Ryan Svihla  wrote:
>
>> Ahmed,
>>
>> Just using text and serializing as Json is the easy way and a common
>> approach.
>>
>> However, this list is for Cassandra commiter discussion, please be so
>> kind as to use the regular user list for data modeling questions or for any
>> future responses to this email thread.
>>
>>
>> Regards,
>> Ryan Svihla
>>
>> On Jan 28, 2016, at 7:28 AM, Ahmed Eljami  wrote:
>>
>> ​Hi,
>>
>> I need your help for modeling a nested collection with cassanrda2.0 (UDT
>> no,
>> no fozen)
>>
>> My users table contains emails by type, each type of email contains
>> multiple
>> emails.
>>
>> Example:
>> Type: pro. emails: {a...@mail.com, b...@mail.com ...}
>>
>> Type: private. emails: {c...@mail.com, d...@mail.com}
>> .
>>
>> The user table also contains addresses, address type with fields.
>>
>> Example:
>>
>> Type: Pro. address {Street= aaa, number = 123, apartment = bbb}
>>
>> Type: Private. address {Street = bbb, number = 123, apartment = kkk }
>>
>> I am looking for a solution to store all these columns in one table.
>>
>> Thank you.
>>
>>
>


RE: Modeling nested collection with C* 2.0

2016-01-28 Thread aeljami.ext
I need to query all columns by the userid.

For example: Select * from users where userid  = 123;

frozen UDT don’t exist in Cassandra 2.0 ☹

De : Jack Krupansky [mailto:jack.krupan...@gmail.com]
Envoyé : jeudi 28 janvier 2016 16:38
À : user@cassandra.apache.org
Objet : Re: Modeling nested collection with C* 2.0

Generally, you should use clustering columns to model nested structures, unless 
they really are simply list/map structures.

But, first, as with all data modeling in Cassandra, start by looking at how you 
intend to query the data. Do you need to query individual addresses, email 
addresses, streets, etc.? If so, separate rows for each address is a better way 
to go. On the flip side, if the full address is just a blob that you will 
interpret in the app, a frozen UDT or JSON blob is a reasonable way to go. In 
any case, tell us about your query needs first.

-- Jack Krupansky

On Thu, Jan 28, 2016 at 10:29 AM, Carlos Alonso 
mailto:i...@mrcalonso.com>> wrote:
Hi Ahmed,

I think modelling them as a map where you can 'label' your emails or addresses 
sounds like a good option.

More info here: 
https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html

Regards

Carlos Alonso | Software Engineer | @calonso

On 28 January 2016 at 13:36, Ryan Svihla 
mailto:r...@foundev.pro>> wrote:
Ahmed,

Just using text and serializing as Json is the easy way and a common approach.

However, this list is for Cassandra commiter discussion, please be so kind as 
to use the regular user list for data modeling questions or for any future 
responses to this email thread.


Regards,
Ryan Svihla

On Jan 28, 2016, at 7:28 AM, Ahmed Eljami 
mailto:ahmed.elj...@gmail.com>> wrote:
​Hi,

I need your help for modeling a nested collection with cassanrda2.0 (UDT no,
no fozen)

My users table contains emails by type, each type of email contains multiple
emails.

Example:
Type: pro. emails: {a...@mail.com, 
b...@mail.com ...}

Type: private. emails: {c...@mail.com, 
d...@mail.com}
.

The user table also contains addresses, address type with fields.

Example:

Type: Pro. address {Street= aaa, number = 123, apartment = bbb}

Type: Private. address {Street = bbb, number = 123, apartment = kkk }

I am looking for a solution to store all these columns in one table.

Thank you.



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.



Read operations freeze for a few second while adding a new node

2016-01-28 Thread Lorand Kasler
Hi,

We are struggling with a problem that when adding nodes around 5% read
operations freeze (aka time out after 1 second) for a few seconds (10-20
seconds). It might not seems much, but at the order of 200k requests per
second that's quite big of disruption.  It is well documented and known
that adding nodes *has* impact on the latency or the completion of the
requests but is there a way to lessen that?
It is completely okay for write operations to fail or get blocked while
adding nodes, but having the read path also impacted by this much (going
from 30 millisecond 99 percentile latency to above 1 second) is what
puzzles us.

We have a 36 node cluster, every node owning ~120 GB of data. We are using
Cassandra version 2.0.14 with vnodes and we are in the process of
increasing capacity of the cluster, by roughly doubling the nodes.  They
have SSDs and have peak IO usage of ~30%.

Apart from the latency metrics only FlushWrites are blocked 18% of the time
(based on the tpstats counters), but that can only lead to blocking writes
and not reads?

Thank you


Re: Modeling nested collection with C* 2.0

2016-01-28 Thread Lorand Kasler
Maps and Sets have a hard limit of 65536 elements and you always need to
get the full collection even if you are only interested in few elements.
They are well suited to denormalize small datasets but above that it is
better to use Clustering Columns to model these kind of data.

Best,
Lorand

On Thu, Jan 28, 2016 at 4:50 PM,  wrote:

> I need to query all columns by the userid.
>
>
>
> For example: Select * from users where userid  = 123;
>
>
>
> frozen UDT don’t exist in Cassandra 2.0 L
>
>
>
> *De :* Jack Krupansky [mailto:jack.krupan...@gmail.com]
> *Envoyé :* jeudi 28 janvier 2016 16:38
> *À :* user@cassandra.apache.org
> *Objet :* Re: Modeling nested collection with C* 2.0
>
>
>
> Generally, you should use clustering columns to model nested structures,
> unless they really are simply list/map structures.
>
>
>
> But, first, as with all data modeling in Cassandra, start by looking at
> how you intend to query the data. Do you need to query individual
> addresses, email addresses, streets, etc.? If so, separate rows for each
> address is a better way to go. On the flip side, if the full address is
> just a blob that you will interpret in the app, a frozen UDT or JSON blob
> is a reasonable way to go. In any case, tell us about your query needs
> first.
>
>
> -- Jack Krupansky
>
>
>
> On Thu, Jan 28, 2016 at 10:29 AM, Carlos Alonso 
> wrote:
>
> Hi Ahmed,
>
>
>
> I think modelling them as a map where you can 'label' your emails or
> addresses sounds like a good option.
>
>
>
> More info here:
> https://docs.datastax.com/en/cql/3.1/cql/cql_using/use_map_t.html
>
>
>
> Regards
>
>
> Carlos Alonso | Software Engineer | @calonso 
>
>
>
> On 28 January 2016 at 13:36, Ryan Svihla  wrote:
>
> Ahmed,
>
>
>
> Just using text and serializing as Json is the easy way and a common
> approach.
>
>
>
> However, this list is for Cassandra commiter discussion, please be so kind
> as to use the regular user list for data modeling questions or for any
> future responses to this email thread.
>
>
>
> Regards,
>
> Ryan Svihla
>
>
> On Jan 28, 2016, at 7:28 AM, Ahmed Eljami  wrote:
>
> ​Hi,
>
> I need your help for modeling a nested collection with cassanrda2.0 (UDT
> no,
> no fozen)
>
> My users table contains emails by type, each type of email contains
> multiple
> emails.
>
> Example:
> Type: pro. emails: {a...@mail.com, b...@mail.com ...}
>
> Type: private. emails: {c...@mail.com, d...@mail.com}
> .
>
> The user table also contains addresses, address type with fields.
>
> Example:
>
> Type: Pro. address {Street= aaa, number = 123, apartment = bbb}
>
> Type: Private. address {Street = bbb, number = 123, apartment = kkk }
>
> I am looking for a solution to store all these columns in one table.
>
> Thank you.
>
>
>
>
>
> _
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>
>


Re: Read operations freeze for a few second while adding a new node

2016-01-28 Thread Jonathan Haddad
If you've got a read heavy workload you should check out
http://blakeeggleston.com/cassandra-tuning-the-jvm-for-read-heavy-workloads.html



On Thu, Jan 28, 2016 at 8:11 AM Lorand Kasler 
wrote:

> Hi,
>
> We are struggling with a problem that when adding nodes around 5% read
> operations freeze (aka time out after 1 second) for a few seconds (10-20
> seconds). It might not seems much, but at the order of 200k requests per
> second that's quite big of disruption.  It is well documented and known
> that adding nodes *has* impact on the latency or the completion of the
> requests but is there a way to lessen that?
> It is completely okay for write operations to fail or get blocked while
> adding nodes, but having the read path also impacted by this much (going
> from 30 millisecond 99 percentile latency to above 1 second) is what
> puzzles us.
>
> We have a 36 node cluster, every node owning ~120 GB of data. We are using
> Cassandra version 2.0.14 with vnodes and we are in the process of
> increasing capacity of the cluster, by roughly doubling the nodes.  They
> have SSDs and have peak IO usage of ~30%.
>
> Apart from the latency metrics only FlushWrites are blocked 18% of the
> time (based on the tpstats counters), but that can only lead to blocking
> writes and not reads?
>
> Thank you
>


Are aggregate functions done in parallel?

2016-01-28 Thread Francisco Reyes

Does Cassandra paralelizes aggregate functions?

Have a new project with potentially 200 to 300 million rows per month 
that I need to do aggregates on. Wondering if Cassandra would be a good 
match.


Re: Read operations freeze for a few second while adding a new node

2016-01-28 Thread Jeff Jirsa
Is this during streaming plan setup (is your 10-20 second time of impact 
approximately 30 seconds from the time you start the node that’s joining the 
ring), or does it happen for the entire time you’re joining the node to the 
ring?

If so, there’s a chance it’s GC related – the streaming plan code used to 
instantiate ALL of the compression metadata chunks in order to calculate, which 
creates a fair amount of garbage, which creates some GC activity. 
https://issues.apache.org/jira/browse/CASSANDRA-10680 was created due to some 
edge cases (very small compression chunk size + 3T of data per node = hundreds 
of millions of objects), but it’s possible that you’re seeing a less-extreme 
version of that same behavior.



From:  Lorand Kasler
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, January 28, 2016 at 8:11 AM
To:  "user@cassandra.apache.org"
Subject:  Read operations freeze for a few second while adding a new node

Hi, 

We are struggling with a problem that when adding nodes around 5% read 
operations freeze (aka time out after 1 second) for a few seconds (10-20 
seconds). It might not seems much, but at the order of 200k requests per second 
that's quite big of disruption.  It is well documented and known that adding 
nodes *has* impact on the latency or the completion of the requests but is 
there a way to lessen that? 
It is completely okay for write operations to fail or get blocked while adding 
nodes, but having the read path also impacted by this much (going from 30 
millisecond 99 percentile latency to above 1 second) is what puzzles us.

We have a 36 node cluster, every node owning ~120 GB of data. We are using 
Cassandra version 2.0.14 with vnodes and we are in the process of increasing 
capacity of the cluster, by roughly doubling the nodes.  They have SSDs and 
have peak IO usage of ~30%. 

Apart from the latency metrics only FlushWrites are blocked 18% of the time 
(based on the tpstats counters), but that can only lead to blocking writes and 
not reads? 

Thank you 



smime.p7s
Description: S/MIME cryptographic signature


Re: Any excellent tutorials or automated scripts for cluster setup on EC2?

2016-01-28 Thread Branton Davis
If you use Chef, there's this cookbook:
https://github.com/michaelklishin/cassandra-chef-cookbook

It's not perfect, but you can make a wrapper cookbook pretty easily to
fix/extend it to do anything you need.

On Wed, Jan 27, 2016 at 11:25 PM, Richard L. Burton III 
wrote:

> I'm curious to see if there's automated scripts or tutorials on setting up
> Cassandra on EC2 with security taken care of etc.
>
> Thanks,
> --
> -Richard L. Burton III
> @rburton
>


Re: Read operations freeze for a few second while adding a new node

2016-01-28 Thread Anuj Wadehra
Hi Lorand,
Do you see any different gc pattern during these 20 seconds?
In 2.0.x, memtable create lot of heap pressure. So in a way, reads are not 
isolated from writes.
Frankly speaking, I would have accepted 20 second slowness as scaling is one 
time activity. But may be your business case doesnt make that acceptable. 
Such tough requirements often drive improvements.. 

ThanksAnuj

Sent from Yahoo Mail on Android 
 
  On Thu, 28 Jan, 2016 at 9:41 pm, Lorand Kasler 
wrote:   Hi,
We are struggling with a problem that when adding nodes around 5% read 
operations freeze (aka time out after 1 second) for a few seconds (10-20 
seconds). It might not seems much, but at the order of 200k requests per second 
that's quite big of disruption.  It is well documented and known that adding 
nodes *has* impact on the latency or the completion of the requests but is 
there a way to lessen that? It is completely okay for write operations to fail 
or get blocked while adding nodes, but having the read path also impacted by 
this much (going from 30 millisecond 99 percentile latency to above 1 second) 
is what puzzles us.
We have a 36 node cluster, every node owning ~120 GB of data. We are using 
Cassandra version 2.0.14 with vnodes and we are in the process of increasing 
capacity of the cluster, by roughly doubling the nodes.  They have SSDs and 
have peak IO usage of ~30%. 
Apart from the latency metrics only FlushWrites are blocked 18% of the time 
(based on the tpstats counters), but that can only lead to blocking writes and 
not reads? 
Thank you   


Re: Are aggregate functions done in parallel?

2016-01-28 Thread DuyHai Doan
You can read this: http://www.doanduyhai.com/blog/?p=1876 and this:
http://www.doanduyhai.com/blog/?p=2015

Long story short, UDF and UDA computation is Cassandra is not distributed.
All the values are retrieved first on the coordinator node (to apply the
last write win reconciliation logic) before applying any UDF/UDA

The sweet spot for Cassandra UDA is single partition operations. If you
need to aggregate on multiple partitions, consider using Apache Spark


On Thu, Jan 28, 2016 at 6:06 PM, Francisco Reyes  wrote:

> Does Cassandra paralelizes aggregate functions?
>
> Have a new project with potentially 200 to 300 million rows per month that
> I need to do aggregates on. Wondering if Cassandra would be a good match.
>


Re: Rename Keyspace offline

2016-01-28 Thread Jean Tremblay
Thank you all for your replies.
My main objective was not to change my client.
After your answers it makes a lot of sense to modify my client in a way to make 
it accept different key space name. This way I will no longer need to rename a 
key space I simply need to develop a way to tell my client that there is a new 
key space.

Thanks again for your feedback
Jean

On 27 Jan,2016, at 19:58, Robert Coli 
mailto:rc...@eventbrite.com>> wrote:

On Wed, Jan 27, 2016 at 6:49 AM, Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Since it takes me 2 days to load my data, I was planning to load the new set on 
a new keyspace (KS-Y), and when loaded drop KS-X and rename KS-Y to KS-X.

Why bother with the rename? Just have two keyspaces, foo and foo_, and 
alternate your bulk loads between truncating them?

Would this procedure work to destroy an old keyspace KS-X and rename a new 
keyspace KS-Y to KS-X:

Yes, if you include :

0) Load schema for KS-Y into KS-X

1) nodetool drain each node.
2) stop cassandra on each node.
3) on each node:
3.1) rm -r data/KS-X
3.2) mv data/KS-Y data/KS-X
4) restart each node.

Note also that in step 3.2, the uuid component of file and/or directory names 
will have to be changed.

=Rob



Session timeout

2016-01-28 Thread oleg yusim
Greetings,

Does Cassandra support session timeout? If so, where can I find this
configuration switch? If not, what kind of hook I can use to write my out
code, terminating session in so many seconds of inactivity?

Thanks,

Oleg


Security labels

2016-01-28 Thread oleg yusim
Greetings,

Does Cassandra support security label concept? If so, where can I read on
how it should be applied?

Thanks,

Oleg


Cassandra Connection Pooling

2016-01-28 Thread KAMM, BILL
Hi, I'm looking for some good info on connection pooling, using JBoss.  Is this 
something that needs to be configured within JBoss, or is it handled directly 
by the Cassandra classes themselves?  Thanks.

Bill



Re: Cassandra Connection Pooling

2016-01-28 Thread Jim Ancona
It's typically handled by your client (e.g.
https://docs.datastax.com/en/latest-java-driver/index.html) along with
retries, timeouts and all the other things you would put in your datasource
config for a SQL database in JBoss.


On Thu, Jan 28, 2016 at 5:31 PM, KAMM, BILL  wrote:

> Hi, I’m looking for some good info on connection pooling, using JBoss.  Is
> this something that needs to be configured within JBoss, or is it handled
> directly by the Cassandra classes themselves?  Thanks.
>
>
>
> Bill
>
>
>


Re: Cassandra Connection Pooling

2016-01-28 Thread Nate McCall
On Thu, Jan 28, 2016 at 4:31 PM, KAMM, BILL  wrote:

> Hi, I’m looking for some good info on connection pooling, using JBoss.  Is
> this something that needs to be configured within JBoss, or is it handled
> directly by the Cassandra classes themselves?  Thanks.
>
>
>
>
>


This thread was on the Java-Driver list recently - it may answer some of
your questions:
https://groups.google.com/a/lists.datastax.com/forum/m/#!topic/java-driver-user/-im4eN_yZbA




-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Wide row in Cassandra

2016-01-28 Thread Qi Li
Hi all,

I've found something in Internet, but still want to consult with your
expertise.

I'm designing a table, the object model will be like,
class Data{
 String uuid;//partition key
 String value1;
 String value2;
 ...
 String valueN;
 Map mapValues;
}


For one Data object, I would like it to be saved into one wide row in C*,
that means the mapValues will be extended as dynamic columns. AFAIK, I can
put the mapValues' key into Cluster column, then C* will use the wide row
to save the data. Then I would use 'uuid' as partition key, and mapKey into
cluster key. My question is for the other columns : value1 to valueN, shall
I put them into ClusterKey too? like below,

create table Data (
   text uuid;
   text value1;
   text value2;
   ...
   text valueN;
   text mapKey;
   Double mapValue;
   primary key(key, mapKey, value1, value2, ..., valueN);
);

The reason I put them into cluster keys is I don't want value1 to valueN
are saved duplicated each time when the mapKey is created. For example, the
mapValues can have 100 entries, I don't want value1 to valueN are saved 100
times, I only want them saved 1 time together with the partition key. Is
this correct?

Thanks for your help.

-- 
Ken Li


Re: Wide row in Cassandra

2016-01-28 Thread Jack Krupansky
As usual, the first step should be to example your queries and use them as
the guide to data modeling. So... how do you need to access the data? What
columns do you need to be able to query on vs. merely return? What data
needs to be accessed at the same time? What data does not need to be
accessed at the same time?

-- Jack Krupansky

On Thu, Jan 28, 2016 at 5:51 PM, Qi Li  wrote:

> Hi all,
>
> I've found something in Internet, but still want to consult with your
> expertise.
>
> I'm designing a table, the object model will be like,
> class Data{
>  String uuid;//partition key
>  String value1;
>  String value2;
>  ...
>  String valueN;
>  Map mapValues;
> }
>
>
> For one Data object, I would like it to be saved into one wide row in C*,
> that means the mapValues will be extended as dynamic columns. AFAIK, I can
> put the mapValues' key into Cluster column, then C* will use the wide row
> to save the data. Then I would use 'uuid' as partition key, and mapKey into
> cluster key. My question is for the other columns : value1 to valueN, shall
> I put them into ClusterKey too? like below,
>
> create table Data (
>text uuid;
>text value1;
>text value2;
>...
>text valueN;
>text mapKey;
>Double mapValue;
>primary key(key, mapKey, value1, value2, ..., valueN);
> );
>
> The reason I put them into cluster keys is I don't want value1 to valueN
> are saved duplicated each time when the mapKey is created. For example, the
> mapValues can have 100 entries, I don't want value1 to valueN are saved 100
> times, I only want them saved 1 time together with the partition key. Is
> this correct?
>
> Thanks for your help.
>
> --
> Ken Li
>


Re: Security labels

2016-01-28 Thread Patrick McFadin
Cassandra has support for authentication security, but I'm not familiar
with a security label. Can you describe what you want to do?

Patrick

On Thu, Jan 28, 2016 at 2:26 PM, oleg yusim  wrote:

> Greetings,
>
> Does Cassandra support security label concept? If so, where can I read on
> how it should be applied?
>
> Thanks,
>
> Oleg
>


Detailed info on how inter dc rep works

2016-01-28 Thread John Lonergan
If I have a single client publishing to a cluster with replication to a
second cluster in another dc, then do the changes become visible in the
second dc in the same order that they became visible in the first dc?


Re: Detailed info on how inter dc rep works

2016-01-28 Thread Kai Wang
John,

There was a thread last month about this topic.

https://mail-archives.apache.org/mod_mbox/incubator-cassandra-user/201512.mbox/%3CCABWW=xw9obk+w-4efpymnpo_fy8dbilbgv2fk-9xre7ydy2...@mail.gmail.com%3E



On Thu, Jan 28, 2016 at 7:51 PM, John Lonergan 
wrote:

> If I have a single client publishing to a cluster with replication to a
> second cluster in another dc, then do the changes become visible in the
> second dc in the same order that they became visible in the first dc?
>
>


Re: Security labels

2016-01-28 Thread oleg yusim
Patrick,

Absolutely. Security label is mechanism of access control, utilized by MAC
(mandatory access control) model, and not utilized by DAC (discretionary
access control) model, we all are used to. In database content it is
illustrated for instance here:
http://www.postgresql.org/docs/current/static/sql-security-label.html

Now, as per my goals, I'm making a security assessment for Cassandra DB
with a goal to produce STIG on this product. That is one of the parameters
in database SRG I have to assess against.

Thanks,

Oleg


On Thu, Jan 28, 2016 at 6:32 PM, Patrick McFadin  wrote:

> Cassandra has support for authentication security, but I'm not familiar
> with a security label. Can you describe what you want to do?
>
> Patrick
>
> On Thu, Jan 28, 2016 at 2:26 PM, oleg yusim  wrote:
>
>> Greetings,
>>
>> Does Cassandra support security label concept? If so, where can I read on
>> how it should be applied?
>>
>> Thanks,
>>
>> Oleg
>>
>
>


Logging connect/disconnect

2016-01-28 Thread oleg yusim
Greetings,

What is the right way to configure Cassandra logging, so it would log all
the connects and disconnects?

Thanks,

Oleg


Re: Wide row in Cassandra

2016-01-28 Thread Qi Li
Thanks Jack.

the columns to be used for query will be 'uuid' and 'key' in mapValues. For
value1 to valueN, and Double in mapValues will be merely return.

there are 2 scenarios to query.
1. Query for value, it can be any one from value1 to valueN. The query
criteria will be 'uuid'.
2. Query for the Double in mapValue. The query criteria will be 'uuid' +
'key' in mapValue.

Thanks for your help.

Ken

On Thu, Jan 28, 2016 at 11:22 PM, Jack Krupansky 
wrote:

> As usual, the first step should be to example your queries and use them as
> the guide to data modeling. So... how do you need to access the data? What
> columns do you need to be able to query on vs. merely return? What data
> needs to be accessed at the same time? What data does not need to be
> accessed at the same time?
>
> -- Jack Krupansky
>
> On Thu, Jan 28, 2016 at 5:51 PM, Qi Li  wrote:
>
>> Hi all,
>>
>> I've found something in Internet, but still want to consult with your
>> expertise.
>>
>> I'm designing a table, the object model will be like,
>> class Data{
>>  String uuid;//partition key
>>  String value1;
>>  String value2;
>>  ...
>>  String valueN;
>>  Map mapValues;
>> }
>>
>>
>> For one Data object, I would like it to be saved into one wide row in C*,
>> that means the mapValues will be extended as dynamic columns. AFAIK, I can
>> put the mapValues' key into Cluster column, then C* will use the wide row
>> to save the data. Then I would use 'uuid' as partition key, and mapKey into
>> cluster key. My question is for the other columns : value1 to valueN, shall
>> I put them into ClusterKey too? like below,
>>
>> create table Data (
>>text uuid;
>>text value1;
>>text value2;
>>...
>>text valueN;
>>text mapKey;
>>Double mapValue;
>>primary key(key, mapKey, value1, value2, ..., valueN);
>> );
>>
>> The reason I put them into cluster keys is I don't want value1 to valueN
>> are saved duplicated each time when the mapKey is created. For example, the
>> mapValues can have 100 entries, I don't want value1 to valueN are saved 100
>> times, I only want them saved 1 time together with the partition key. Is
>> this correct?
>>
>> Thanks for your help.
>>
>> --
>> Ken Li
>>
>
>


-- 
Ken Li


Call for Book Chapter

2016-01-28 Thread Ganesh Deka
*Respected Sir/Madam,Book Chapter proposal are invited for the Edited book
titled "NoSQL: Database for Storage and Retrieval of data in Cloud" to be
published by CRC Press Taylor & Francis Group,Florida 33487, USA in the
following topic: 1. Multi-model Databases, NewSQL, Time Series Databases,
Database as a service, Non-traditional database engines. 2. Technology,
Algorithm for Web based data mining/Machine learning and OLTP 3. Recent
Technology, Algorithm for NoSQL 4.  NoSQL and Cloud paradigm
(Characteristics and Classification, Data storage Technology, Algorithms)
5. Comparison and classification of In-Memory, In-Database and Hybrid NoSQL
databases for Bigdata storage 6. Comparative study on mostly used NoSQL
[BASE, ACID & W(rite), R(ead), N(ode) analysis] 7. Hadoop Ecosystem Tools &
Algorithms 8. Various NoSQL Bigdata analytics Tools/Technology/Applications
9. Challenges and Security issues of NoSQL databases 10. Data Migration
Technique/Algorithm/Platform/Tools for SQL to NoSQL Databases and vice
versa 11. NoSQL Middleware, Load balancing in Distributed databases 12.
Distributed Transaction processing 13. NoSQL, Distributed Database
Benchmarking 14. Research trends in NoSQL/Cloud Database, Polyglot
persistence 15. Case study on database having SQL-NoSQL dual features 16.
Case Study/Use Cases/Demonstration/hand-on of Open Source/Proprietary NoSQL
databases, Hadoop Distributed File System (Flume, Sqoop etc.). 17. Hands
On/Demonstration on NoSQL (non-SQL) Quarries for data manipulation, NoSQL
interfacing with application. Important Dates:Last Date for Submission of
Chapter Proposal29th February 2016Acceptance/Rejection
notification to Authors 10th March 2016Full Chapter submission
by Accepted Authors 10th June 2016Final Acceptance/Rejection
notification to Authors   30th June 2016Submission of revised/Final
Chapters to publisher31st July 2016*
Details about the book
https://sites.google.com/site/callforchapter/home


*Editor*
*GC Dekae-mail: ganeshdeka2...@gmail.com
, g.c.d...@ieee.org *
https://scholar.google.co.in/citations?user=Qw5HblgJ&hl=en


Re: Wide row in Cassandra

2016-01-28 Thread DuyHai Doan
This data model should do the job

Create table Data (
   text uuid;
   text value1 static;
   text value2 static;
   ...
   text valueN static;
   text mapKey;
   Double mapValue;
   primary key(key, mapKey);
);

Warning, value1... valueN being static, there will be a 1:1 relationship
between them and the partition key uuid.

1.Query for value, it can be any one from value1 to valueN. The query
criteria will be 'uuid'.

SELECT value1,..., valueN FROM data
WHERE partition = uuid

2. Query for the Double in mapValue. The query criteria will be 'uuid' +
'key' in mapValue

SELECT mapValue FROM data WHERE partition = uuid AND mapKey = double

Le 29 janv. 2016 07:51, "Qi Li"  a écrit :
>
> Thanks Jack.
>
> the columns to be used for query will be 'uuid' and 'key' in mapValues.
For value1 to valueN, and Double in mapValues will be merely return.
>
> there are 2 scenarios to query.
> 1. Query for value, it can be any one from value1 to valueN. The query
criteria will be 'uuid'.
> 2. Query for the Double in mapValue. The query criteria will be 'uuid' +
'key' in mapValue.
>
> Thanks for your help.
>
> Ken
>
> On Thu, Jan 28, 2016 at 11:22 PM, Jack Krupansky 
wrote:
>>
>> As usual, the first step should be to example your queries and use them
as the guide to data modeling. So... how do you need to access the data?
What columns do you need to be able to query on vs. merely return? What
data needs to be accessed at the same time? What data does not need to be
accessed at the same time?
>>
>> -- Jack Krupansky
>>
>> On Thu, Jan 28, 2016 at 5:51 PM, Qi Li  wrote:
>>>
>>> Hi all,
>>>
>>> I've found something in Internet, but still want to consult with your
expertise.
>>>
>>> I'm designing a table, the object model will be like,
>>> class Data{
>>>  String uuid;//partition key
>>>  String value1;
>>>  String value2;
>>>  ...
>>>  String valueN;
>>>  Map mapValues;
>>> }
>>>
>>>
>>> For one Data object, I would like it to be saved into one wide row in
C*, that means the mapValues will be extended as dynamic columns. AFAIK, I
can put the mapValues' key into Cluster column, then C* will use the wide
row to save the data. Then I would use 'uuid' as partition key, and mapKey
into cluster key. My question is for the other columns : value1 to valueN,
shall I put them into ClusterKey too? like below,
>>>
>>> create table Data (
>>>text uuid;
>>>text value1;
>>>text value2;
>>>...
>>>text valueN;
>>>text mapKey;
>>>Double mapValue;
>>>primary key(key, mapKey, value1, value2, ..., valueN);
>>> );
>>>
>>> The reason I put them into cluster keys is I don't want value1 to
valueN are saved duplicated each time when the mapKey is created. For
example, the mapValues can have 100 entries, I don't want value1 to valueN
are saved 100 times, I only want them saved 1 time together with the
partition key. Is this correct?
>>>
>>> Thanks for your help.
>>>
>>> --
>>> Ken Li
>>
>>
>
>
>
> --
> Ken Li


Re: Wide row in Cassandra

2016-01-28 Thread Qi Li
static column is exactly what I want!

Thank you Duyhai!

On Fri, 29 Jan 2016 07:22 DuyHai Doan  wrote:

> This data model should do the job
>
> Create table Data (
>text uuid;
>text value1 static;
>text value2 static;
>...
>text valueN static;
>text mapKey;
>Double mapValue;
>primary key(key, mapKey);
> );
>
> Warning, value1... valueN being static, there will be a 1:1 relationship
> between them and the partition key uuid.
>
> 1.Query for value, it can be any one from value1 to valueN. The query
> criteria will be 'uuid'.
>
> SELECT value1,..., valueN FROM data
> WHERE partition = uuid
>
> 2. Query for the Double in mapValue. The query criteria will be 'uuid' +
> 'key' in mapValue
>
> SELECT mapValue FROM data WHERE partition = uuid AND mapKey = double
>
>
> Le 29 janv. 2016 07:51, "Qi Li"  a écrit :
> >
> > Thanks Jack.
> >
> > the columns to be used for query will be 'uuid' and 'key' in mapValues.
> For value1 to valueN, and Double in mapValues will be merely return.
> >
> > there are 2 scenarios to query.
> > 1. Query for value, it can be any one from value1 to valueN. The query
> criteria will be 'uuid'.
> > 2. Query for the Double in mapValue. The query criteria will be 'uuid' +
> 'key' in mapValue.
> >
> > Thanks for your help.
> >
> > Ken
> >
> > On Thu, Jan 28, 2016 at 11:22 PM, Jack Krupansky <
> jack.krupan...@gmail.com> wrote:
> >>
> >> As usual, the first step should be to example your queries and use them
> as the guide to data modeling. So... how do you need to access the data?
> What columns do you need to be able to query on vs. merely return? What
> data needs to be accessed at the same time? What data does not need to be
> accessed at the same time?
> >>
> >> -- Jack Krupansky
> >>
> >> On Thu, Jan 28, 2016 at 5:51 PM, Qi Li  wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I've found something in Internet, but still want to consult with your
> expertise.
> >>>
> >>> I'm designing a table, the object model will be like,
> >>> class Data{
> >>>  String uuid;//partition key
> >>>  String value1;
> >>>  String value2;
> >>>  ...
> >>>  String valueN;
> >>>  Map mapValues;
> >>> }
> >>>
> >>>
> >>> For one Data object, I would like it to be saved into one wide row in
> C*, that means the mapValues will be extended as dynamic columns. AFAIK, I
> can put the mapValues' key into Cluster column, then C* will use the wide
> row to save the data. Then I would use 'uuid' as partition key, and mapKey
> into cluster key. My question is for the other columns : value1 to valueN,
> shall I put them into ClusterKey too? like below,
> >>>
> >>> create table Data (
> >>>text uuid;
> >>>text value1;
> >>>text value2;
> >>>...
> >>>text valueN;
> >>>text mapKey;
> >>>Double mapValue;
> >>>primary key(key, mapKey, value1, value2, ..., valueN);
> >>> );
> >>>
> >>> The reason I put them into cluster keys is I don't want value1 to
> valueN are saved duplicated each time when the mapKey is created. For
> example, the mapValues can have 100 entries, I don't want value1 to valueN
> are saved 100 times, I only want them saved 1 time together with the
> partition key. Is this correct?
> >>>
> >>> Thanks for your help.
> >>>
> >>> --
> >>> Ken Li
> >>
> >>
> >
> >
> >
> > --
> > Ken Li
>