Data Model - Consistency question

2012-09-19 Thread Roshni Rajagopal
Hi Folks, In the relational world, if I needed to model students, courses relationship, I may have donea students -master tablea course - master tablea bridge table students-course which gives me the ids to students and the courses they are taking. This can answer both 'which students take cour

Solr Use Cases

2012-09-19 Thread Roshni Rajagopal
Hi, Im new to Solr, and I hear that Solr is a great tool for improving search performanceIm unsure whether Solr or DSE Search is a must for all cassandra deployments 1. For performance - I thought cassandra had great read & write performance. When should solr be used ?Taking the following use c

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
Yes, this scenario can occur(even with quorum writes/reads as you are dealing with different rows) as one write may be complete and the other not while someone else is reading from the cluster. Generally though, you can do read repair when you read it in ;). Ie. See if things are inconsistent

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
Thinking a little more on your issue, you can also do that in playroom as OneToMany is represented with a few columns in the owning table/entity unlike JPA and RDBMS. Ie. { List - These course primary keys are saved one per column in the student's row } { List - These

Re: Data Modeling - JSON vs Composite columns

2012-09-19 Thread Brian O'Neill
Roshni, We're going through the same debate right now. I believe native support for JSON (or collections) is on the docket for Cassandra. Here is a discussion we had a few months ago on the topic: We presently store JSON, but we're con

higher layer library makes things faster?

2012-09-19 Thread Hiller, Dean
So there is this interesting case where a higher layer library makes things slower. This is counter-intuitive as every abstraction usually makes things slower with an increase in productivity.It would be cool if more and more libraries supported something to help with this scenario I think.

Re: Solr Use Cases

2012-09-19 Thread Brian O'Neill
Roshni, We're using SOLR to support ad hoc queries and fuzzy searches against unstructured data stored in Cassandra. Cassandra is great for storage and you can create data models and indexes that support your queries, provided you can anticipate those queries. When you can't anticipate the queri

Losing keyspace on cassandra upgrade

2012-09-19 Thread Thomas Stets
I consistently keep losing my keyspace on upgrading from cassandra 1.1.1 to 1.1.5 I have the same cassandra keyspace on all our staging systems: development: a 3-node cluster integration: a 3-node cluster QS: a 2-node cluster (productive will be a 4-node cluster, which is as yet not active) All

Re: Solr Use Cases

2012-09-19 Thread Michael Kjellman
If I were you I would look into ElasticSearch unless you are okay updating the search cache very infrequently. I tried Solandra vs ElasticSearch in our use case and there was no contest. Also, Cassandra is great for writes but not as optimized for reads. Honestly, it all depends on your use cas

Re: Losing keyspace on cassandra upgrade

2012-09-19 Thread Michael Kjellman
Sounds like you are loosing your system keyspace. When you say nothing important changed between yaml files do you mean with or without your changes? Did your data directories change in the migration? Permissions okay? I've done a 1.1.1 to 1.1.5 upgrade on many of my nodes without issue.. On Se

Re: Setting the default replication factor for Solandra cores

2012-09-19 Thread Michael Kjellman
If I recall correctly you should make those changes in the schema through the CLI. I never ended up running Solandra in production though so I'm not sure if anyone else has better options. Why is the CLI not enough? On Sep 19, 2012, at 5:56 AM, "Safdar Kureishy"

Re: Data Modeling - JSON vs Composite columns

2012-09-19 Thread Michael Kjellman
Client code. CQL will only deserialize composites as you mention in A. On Sep 19, 2012, at 5:01 AM, "Roshni Rajagopal">> wrote: Hi, There was a conversation on this some time earlier, and to continue it Suppose I want to associate a user to an item, and I w

Re: Losing keyspace on cassandra upgrade

2012-09-19 Thread Edward Sargisson
We've seen that before too - supposedly it was fixed in 1.1.5. Your experience casts some doubt on that. Our workaround, thus far, is to shut down the entire ring and then bring each node back up starting with known good. Then you do nodetool resetlocalschema on the node that's confused and ma

Re: Losing keyspace on cassandra upgrade

2012-09-19 Thread Michael Kjellman
@Edward Do you have a bug number for that by chance? On Sep 19, 2012, at 8:25 AM, "Edward Sargisson">> wrote: We've seen that before too - supposedly it was fixed in 1.1.5. Your experience casts some doubt on that. Our workaround, thus far, is to shut d

Re: Losing keyspace on cassandra upgrade

2012-09-19 Thread Edward Sargisson On 12-09-19 08:30 AM, Michael Kjellman wrote: @Edward Do you have a bug number for that by chance? On Sep 19, 2012, at 8:25 AM, "Edward Sargisson">> wrote: We've seen that before too - supposedly it w

Re: higher layer library makes things faster?

2012-09-19 Thread jeffpk
Actually its not uncommon at all. Any caching implemented on a higher level will generally improve speed at a cost in memory. Beware common wisdom, its seldom very wise Sent from my Verizon Wireless BlackBerry -Original Message- From: "Hiller, Dean" Date: Wed, 19 Sep 2012 07:35:07 To:

Re: higher layer library makes things faster?

2012-09-19 Thread Hiller, Dean
I guess you could look at that as a form of cachingŠdidn't think of it at the timeŠ.I usually think of it as caching in RAM, but this I guess is caching on disk(though hopefully if the row cache is used for the 3 index tables playOrm uses, it should be blazingly fast). Dean On 9/19/12 10:59 AM, "

Data stax community

2012-09-19 Thread Marcelo Elias Del Valle
Not sure if this question should be asked in this list, if this is the wrong place to ask this, please tell me. Does anyone know if Data Stax community edition alows us to run in production? I plan to use the enterprise edition later, but for now even for production I am thinking in using communit

Re: Data stax community

2012-09-19 Thread Abhijit Chanda
You better ask this question there. Any ways as far as i am concern it should not be problematic thing. Regards, Abhijit On Thu, Sep 20, 2012 at 12:07 AM, Marcelo Elias Del Valle <> wrote: > Not sure if this ques

Re: Data stax community

2012-09-19 Thread Tyler Hobbs
DataStax Community is free for any type of use, including production. On Wed, Sep 19, 2012 at 1:42 PM, Abhijit Chanda wrote: > You better ask this question > there. > Any ways as far as i am concern it should not be problematic thin

Re: Data stax community

2012-09-19 Thread Marcelo Elias Del Valle
Thanks! 2012/9/19 Tyler Hobbs > DataStax Community is free for any type of use, including production. > > > On Wed, Sep 19, 2012 at 1:42 PM, Abhijit Chanda > wrote: > >> You better ask this question >> there. >> Any ways as far as

Correct model

2012-09-19 Thread Marcelo Elias Del Valle
I am new to Cassandra and NoSQL at all. I built my first model and any comments would be of great help. I am describing my thoughts bellow. It's a very simple model. I will need to store several users and, for each user, I will need to store several requests. It request has it's insertion time. As

Re: Correct model

2012-09-19 Thread Hiller, Dean
Thinking out loud and I think a bit towards playOrm's model though you don’t' need to use playroom for this. 1. I would probably have a User with the requests either embedded in or the Foreign keys to the requests…either is fine as long as you get the user get ALL FK's and make one request to g

Re: Correct model

2012-09-19 Thread Marcelo Elias Del Valle
2012/9/19 Hiller, Dean > Thinking out loud and I think a bit towards playOrm's model though you > don’t' need to use playroom for this. > > 1. I would probably have a User with the requests either embedded in or > the Foreign keys to the requests…either is fine as long as you get the user > get A

Re: Correct model

2012-09-19 Thread Hiller, Dean
Uhm, unless I am mistaken, a NEW request implies a new UUID so you can just write it to both the index to the request row and to the user that request was for all in one shot with no need to read, right? (Also, read before write is not necessarily bad…it really depends on your situation but in

Re: Correct model

2012-09-19 Thread Hiller, Dean
Oh, quick correction, I was thinking your user row key was in the request coming in from your first email. In your first email, you get a request and seem to shove it and a user in generating the ids which means that user never generates a request ever again??? If a user sends multiple requests i

Re: downgrade from 1.1.4 to 1.0.X

2012-09-19 Thread aaron morton
No. They use different minor file versions which are not backwards compatible. Cheers - Aaron Morton Freelance Developer @aaronmorton On 18/09/2012, at 11:18 PM, Arend-Jan Wijtzes wrote: > Hi, > > We are running Cassandra 1.1.4 and like to experi

Re: Query advice to prevent node overload

2012-09-19 Thread aaron morton
> Wouldn't that return files from directories '/tmp1', '/tmp2', for example? I believe so. > I thought the goal was to return files and subdirectories recursively inside > '/tmp'. I'm not sure what the purpose of the query was. The query query will return inodes where the file path starts with

Re: HTimedOutException and cluster not working

2012-09-19 Thread aaron morton
> No, all keyspaces that we created do not have secondary indexes. So probably > the settings 'memtable_flush_queue_size' is not relevant? It may be. If you had a lot a CF's and cassandra tried to flush more than memtable_flush_queue_size at once. > one would think that the compaction would

Re: updating CF from a mapper-only Hadoop job

2012-09-19 Thread aaron morton
> That job would consistently fail with a flurry of exceptions What were the exceptions ? Cheers - Aaron Morton Freelance Developer @aaronmorton On 19/09/2012, at 2:16 AM, Brian Jeltema wrote: > I wrote a Hadoop mapper-only job that uses BulkOutput

Re: Correct model

2012-09-19 Thread Marcelo Elias Del Valle
> > In your first email, you get a request and seem to shove it and a user in > generating the ids which means that user never generates a request ever > again??? If a user sends multiple requests in, how are you looking up his > TimeUUID row key from your first email(I would do the same in my > i


2012-09-19 Thread Michael Kjellman
A few questions: what version of 1.1 are you running. What version of Hadoop? What is your job config? What is the buffer size you've chosen? How much data are you dealing with? On Sep 19, 2012, at 7:23 PM, "Manu Zhang" wrote: > I've been bulk loading data into Cassandra and seen the following


2012-09-19 Thread Manu Zhang
cassandra-trunk (so it's 1.2); no Hadoop, bulk load example here; buffer size is 64 MB as in the example; I'm dealing with about 1GB data. job config, you mean? On Thu, Sep 20, 2012 at 10:32 AM, Michael Kjellman wrote: > A few questions


2012-09-19 Thread Michael Kjellman
I assumed you were talking about BulkLoader. I haven't played with trunk yet so I'm afraid I won't be much help here... On Sep 19, 2012, at 7:56 PM, "Manu Zhang">> wrote: cassandra-trunk (so it's 1.2); no Hadoop, bulk load example here


2012-09-19 Thread Manu Zhang
Yeah, BulkLoader. You did help me to elaborate my question. Thanks! On Thu, Sep 20, 2012 at 10:58 AM, Michael Kjellman wrote: > I assumed you were talking about BulkLoader. I haven't played with trunk > yet so I'm afraid I won't be much help here... > > On Sep 19, 2012, at 7:56 PM, "Manu Zhang"

Re: HTimedOutException and cluster not working

2012-09-19 Thread Jason Wee
Hi Aaron, thank you for comment. >It may be. >If you had a lot a CF's and cassandra tried to flush more than memtable_flush_queue_size at once. We created 6 keyspaces where maximum CF in a keyspace is 5 CF. The setting 'memtable_flush_queue_size' is using the default value which is 4. >

Re: persistent compaction issue (1.1.4 and 1.1.5)

2012-09-19 Thread Віталій Тимчишин
I did see problems with schema agreement on 1.1.4, but they did go away after rolling restart (BTW: it would be still good to check describe schema for unreachable). Same rolling restart helped to force compactions after moving to Leveled compaction. If your compactions still don't go, you can try


2012-09-19 Thread Manu Zhang
the problem seems to have gone away with changing Murmur3Partitioner back to RandomPartitioner On Thu, Sep 20, 2012 at 11:14 AM, Manu Zhang wrote: > Yeah, BulkLoader. You did help me to elaborate my question. Thanks! > > > On Thu, Sep 20, 2012 at 10:58 AM, Michael Kjellman < > mkjell...@barracuda

Re: persistent compaction issue (1.1.4 and 1.1.5)

2012-09-19 Thread Michael Kjellman
After changing my ss_table_size as recommended my pending compactions across the cluster have leveled off at 34808 but it isn't progressing after 24 hours at that level. As I've already changed the most offending column families I think the only option I have left is to remove the .json files f

Re: Invalid Counter Shard errors?

2012-09-19 Thread Peter Schuller
> I don't understand what the three in parentheses values are exactly. I guess > the last number is the count and the middle one is the number of increments, > is that true ? What is the first string (identical in all the errors) ? It's (UUID, clock, increment). Very briefly, counter columns in C

Re: Invalid Counter Shard errors?

2012-09-19 Thread Peter Schuller
The significance I think is: If it is indeed the case that the higher value is always *in fact* correct, I think that's inconsistent with the hypothesis that unclean shutdown is the sole cause of these problems - as long as the client is truly submitting non-idempotent counter increments without a