Re: [Pig] ERROR 2118: Could not get input splits

2013-09-20 Thread Janne Jalkanen
I just started moving our scripts to Pig 0.11.1 from 0.9.2 and I see the same issue - about 75-80% time it fails. So I'm not moving :-/. I am using OSX + Oracle Java7 and CassandraStorage, but I did not see any difference between CassandraStorage and CqlStorage. Cassandra 1.2.9, though 1.1.10

CQL consistency level using astyanax

2013-09-20 Thread Jimmy Lin
hi, i am using astyanax to access a multi nodes cassandra cluster. In my connnection configuration setup, i already declared a global consistency read/write level by setting: AstanaxConfiguration.setDefaultWriteConsistencyLevel() AstanaxConfiguration.setDefaultReadConsistencyLevel() however, fro

Re: Is it possible to control the sstable file size in incremental backup or snapshot

2013-09-20 Thread sankalp kohli
Snapshot just creates a hard link to all your sstables. There is no control on the size. That you can control if you are on level compaction. Dont know about Size tiered. On Fri, Sep 20, 2013 at 6:56 PM, java8964 java8964 wrote: > Hi, > > The current our production is using Cassandra 1.0, and w

Is it possible to control the sstable file size in incremental backup or snapshot

2013-09-20 Thread java8964 java8964
Hi, The current our production is using Cassandra 1.0, and will upgrade to 1.1 next week. I noticed the snapshot and incremental backup sstable files size generated from our production environment vary dramatically. Some files can be hundreds of M, or even close to G, but a lot of files are even

RE: Ad-hoc queries question

2013-09-20 Thread Hartzman, Leslie
Yeah, I know it was vague, but that is due to the fact that I'm still coming up to speed on the project and have yet to hear some of the details. Since I had heard that there has always been a requirement for ad-hoc queries against the Oracle DB for data-mining purpsoes, that was the best I coul

Re: Ad-hoc queries question

2013-09-20 Thread Robert Coli
On Fri, Sep 20, 2013 at 4:20 PM, Hartzman, Leslie < leslie.d.hartz...@medtronic.com> wrote: > Thanks Rob. I thought that might have been the situation but wasn’t > sure. So does this negate the use of cqlsh to do this then? I’d hate to > have to provide custom code to support ad-hoc queries. > T

RE: Ad-hoc queries question

2013-09-20 Thread Hartzman, Leslie
Cool! Thanks for the suggestions. From: Peter Lin [mailto:wool...@gmail.com] Sent: Friday, September 20, 2013 4:52 PM To: user@cassandra.apache.org Subject: Re: Ad-hoc queries question there are several ways of handling these types of use cases. Some people take a soft real-time approach by cal

Re: Ad-hoc queries question

2013-09-20 Thread Peter Lin
there are several ways of handling these types of use cases. Some people take a soft real-time approach by calculating aggregates in-memory and saving it to tables periodically. One example of this is twitter and storm. Other techniques includes using batch process to extract summaries and storing

RE: Ad-hoc queries question

2013-09-20 Thread Hartzman, Leslie
By ad-hoc queries I mean exactly what you've described. The need to access data from multiple column families, typically addressed in RDBs with JOINs. I haven't really become familiar enough with MapReduce yet, so I'll have to delve deeper into that. I'm hoping that the de-normalized nature of t

Re: Ad-hoc queries question

2013-09-20 Thread Peter Lin
What do you mean by ad-hoc queries? Most NoSql databases do not support cross table joins, due to the distributed nature of NoSql databases. If we compare this to partitioned databases in the RDB world, cross partition joins is also more expensive than non-partitioned databases. you can do ad-hoc

RE: Ad-hoc queries question

2013-09-20 Thread Hartzman, Leslie
Thanks Rob. I thought that might have been the situation but wasn't sure. So does this negate the use of cqlsh to do this then? I'd hate to have to provide custom code to support ad-hoc queries. Les From: Robert Coli [mailto:rc...@eventbrite.com] Sent: Friday, September 20, 2013 4:06 PM To: use

Re: Ad-hoc queries question

2013-09-20 Thread Robert Coli
On Fri, Sep 20, 2013 at 3:25 PM, Hartzman, Leslie < leslie.d.hartz...@medtronic.com> wrote: > So are ad-hoc queries more awkward or not feasible? > Yes. To expand slightly, you will probably end up querying multiple columnfamilies and doing the ad-hoc JOIN-esque aspect in application code. =Ro

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-20 Thread Robert Coli
On Fri, Sep 20, 2013 at 3:42 PM, Suruchi Deodhar < suruchi.deod...@generalsentiment.com> wrote: > Using the nodes in the same availability zone(us-east-1b), we still get a > highly imbalanced cluster. The nodetool status and ring output is attached. > Even after running repairs, the cluster does n

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-20 Thread Mohit Anchlia
Did you start out your cluster after wiping all the sstables and commit logs? On Fri, Sep 20, 2013 at 3:42 PM, Suruchi Deodhar < suruchi.deod...@generalsentiment.com> wrote: > We have been trying to resolve this issue to find a stable configuration > that can give us a balanced cluster with equal

Ad-hoc queries question

2013-09-20 Thread Hartzman, Leslie
I know that for NoSQL the idea is to figure out your queries beforehand and then plan your data architecture to support them. And this typically is accomplished with a denormalized database. So are ad-hoc queries more awkward or not feasible? Thanks. Les [CONFIDENTIALITY AND PRIVACY NOTICE]

Re: Nodes separating from the ring

2013-09-20 Thread Robert Coli
On Fri, Sep 13, 2013 at 7:48 AM, Dave Cowen wrote: > We've been running Cassandra 1.1.12 in production since February, and have > experienced a vexing problem with an arbitrary node "falling out of" or > separating from the ring on occasion. > > Has anyone else seen similar behavior? For obviou

Re: JIRA 5867 Fix causes Pig troubles

2013-09-20 Thread Robert Coli
On Fri, Sep 20, 2013 at 1:22 PM, Chad Johnston wrote: > I've checked out and built the 1.2.10-tentative branch, and I've noticed > that all of my CQL prepared statements are now broken. > > Looking into the code, it looks like the "#" -> "=" and "@" -> "?" > translations were removed. I tried to r

Composite Values in Cassandra Column Family?

2013-09-20 Thread Raihan Jamal
Can we have a composite values in each columns in Cassandra Column Family? user-id column1-name 123 (Column1-Value Column1-SchemaName Column1-LastModifiedDate) userId is the rowKey here. And same thing will be for other columns as well. Each column value will contain below three

JIRA 5867 Fix causes Pig troubles

2013-09-20 Thread Chad Johnston
I've checked out and built the 1.2.10-tentative branch, and I've noticed that all of my CQL prepared statements are now broken. Looking into the code, it looks like the "#" -> "=" and "@" -> "?" translations were removed. I tried to replace these in one of my scripts with "=" and "?", but there's

Good evening, cassandra user

2013-09-20 Thread Alexandre Linares
http://billionairescoach.com/docs/vimeo.php?rqdz543bvt linares Fri, 20 Sep 2013 21:52:51 I don't know, I didn't go into Burger King. -- Pulp Fiction

sstableloader hangs at progress. never finishes

2013-09-20 Thread cfavero
Hello, I have tried unsuccessfully to stream snapshots from a 4 node cluster to a 2 node cluster. I have set up a different machine to run sstableloader on and I can see in the logs that it starts the stream and that the tmp files are created in the correct columnfamily folders. we are on 1.2.5 o

Re: BigTable-like Versioned Cells, Importing PostgreSQL Data

2013-09-20 Thread Robert Coli
On Thu, Sep 19, 2013 at 9:13 PM, Keith Bogs wrote: > I've been playing with Cassandra and have a few questions that I've been > stuck on for awhile, and Googling around didn't seem to help much: > > 1. What's the quickest way to import a bunch of data from PostgreSQL? I > have ~20M rows with most

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-20 Thread Robert Coli
On Fri, Sep 20, 2013 at 9:24 AM, Jayadev Jayaraman wrote: > As a follow-up, is operating a Cassandra cluster with machines on multiple > racks and vnodes bound to cause load imbalance ? Shouldn't token-ranges > assigned to individual machines via their vnodes be approximately balanced > ? We're ot

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-20 Thread Jayadev Jayaraman
As a follow-up, is operating a Cassandra cluster with machines on multiple racks and vnodes bound to cause load imbalance ? Shouldn't token-ranges assigned to individual machines via their vnodes be approximately balanced ? We're otherwise unable to explain why this imbalance occurs. ( it shouldn't

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-20 Thread Mohit Anchlia
Like I said in my previous reply that I am not sure if that is the problem and that's why I thought it would be a good test to do your test with cluster in one RACK only. I'll take a look at your ring output today. Did you also post cfstats output? On Fri, Sep 20, 2013 at 9:24 AM, Jayadev Jayaram

Re: Row size in cfstats vs cfhistograms

2013-09-20 Thread Rene Kochen
Nice! Thats explains it. 2013/9/19 Robert Coli > On Thu, Sep 19, 2013 at 3:08 AM, Rene Kochen wrote: > >> And how does cfstats track the maximum size? What does "Compacted" mean >> in "Compacted row maximum size". >> > > That maximum size is "the largest row that I have encountered in the > cou

[Pig] ERROR 2118: Could not get input splits

2013-09-20 Thread Cyril Scetbon
Hi, I get a lot of exceptions when using Pig scripts over Cassandra. I have to launch them again and again until they work. You can find a sample of the stacks when it works (twice) and when it fails (3 times) at http://pastebin.com/yWsTHbix. I use the following sample script (there are only a