Re: Cassandra read optimization

2012-04-18 Thread Dan Feldman
Hi Tyler and Aaron, Thanks for your replies. Tyler, fetching scs using your pycassa script on our server takes ~7 s - consistent with the times we've been seeing. Now, we aren't really experts in Cassandra, but it seems that JNA is enabled by default for Cassandra > 1.0 according to Jeremy ( http

Re: Lexer error at char '\u201C'

2012-04-18 Thread Trevor Francis
it pukes: ERROR com.cloudera.flume.conf.SinkFactoryImpl: Could not find class org.apache.cassandra.plugins.flume.sink.LogsandraSyslogSink for plugin loading followed the read me to a "t" flume.plugin.classes org.apache.cassandra.plugins.flume.sink.SimpleCassandraSink,org.apache.cassan

Re: Cassandra read optimization

2012-04-18 Thread Tyler Hobbs
I tested this out with a small pycassa script: https://gist.github.com/2418598 On my not-very-impressive laptop, I can read 5000 of the super columns in 3 seconds (cold) or 1.5 (warm). Reading in batches of 1000 super columns at a time gives much better performance; I definitely recommend going w

Re: Lexer error at char '\u201C'

2012-04-18 Thread Tyler Hobbs
Yup, you beat me to the punch by a minute. On Wed, Apr 18, 2012 at 11:39 PM, Nick Bailey wrote: > https://github.com/thobbs/flume-cassandra-plugin > > I think that is fairly up to date, right Tyler? > > > On Wed, Apr 18, 2012 at 11:18 PM, Trevor Francis < > trevor.fran...@tgrahamcapital.com> wro

Re: Lexer error at char '\u201C'

2012-04-18 Thread Nick Bailey
https://github.com/thobbs/flume-cassandra-plugin I think that is fairly up to date, right Tyler? On Wed, Apr 18, 2012 at 11:18 PM, Trevor Francis < trevor.fran...@tgrahamcapital.com> wrote: > …..slaps himself. > > Oh you guys at Datastax are great. I have deployed a small Cassandra > cluster usi

Re: Lexer error at char '\u201C'

2012-04-18 Thread Trevor Francis
…..slaps himself. Oh you guys at Datastax are great. I have deployed a small Cassandra cluster using your community edition. Actually currently working on making Flume use cassandra as a sink…..unsuccessfully. However, I did just get this Flume error fixed. Are you aware of any cassandra sinks

Re: Lexer error at char '\u201C'

2012-04-18 Thread Tyler Hobbs
This... looks like Flume. Are you sure you've got the right mailing list? On Wed, Apr 18, 2012 at 11:04 PM, Trevor Francis < trevor.fran...@tgrahamcapital.com> wrote: > Trying to add an agent config through the master web server to point to a > collector node, getting: > > FAILEDconfig [10.38.20

Lexer error at char '\u201C'

2012-04-18 Thread Trevor Francis
Trying to add an agent config through the master web server to point to a collector node, getting: FAILED config [10.38.20.197, tailDir("/var/log/acc/", ".*", true, 0), agentDFOSink(“10.38.20.203”,35853)] Attempted to write an invalid sink/source: Lexer error at char '\u201C' at line 1 ch

Re: RMI/JMX errors, weird

2012-04-18 Thread Maxim Potekhin
Server log below. Mind you that all the nodes are still up -- even though reported as "dead" in this log. What's going on here? Thanks! INFO [GossipTasks:1] 2012-04-18 22:18:26,487 Gossiper.java (line 719) InetAddress /130.199.185.193 is now dead. INFO [ScheduledTasks:1] 2012-04-18 22:18:26,

Re: Cassandra read optimization

2012-04-18 Thread Aaron Turner
On Wed, Apr 18, 2012 at 5:00 PM, Dan Feldman wrote: > Hi all, > > I'm trying to optimize moving data from Cassandra to HDFS using either Ruby > or Python client. Right now, I'm playing around on my staging server, an 8 > GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for >

Cassandra read optimization

2012-04-18 Thread Dan Feldman
Hi all, I'm trying to optimize moving data from Cassandra to HDFS using either Ruby or Python client. Right now, I'm playing around on my staging server, an 8 GB single node machine. My data in Cassandra (1.0.8) consist of 2 rows (for now) with ~150k super columns each (I know, I know - super colu

Re: DataStax Opscenter 2.0 question

2012-04-18 Thread Nick Bailey
What version of firefox? Someone has reported a similar issue with firefox 3.6? Can you try with chrome or perhaps a more recent version of firefox (assuming you are also on an older version)? On Wed, Apr 18, 2012 at 4:51 PM, Jay Parashar wrote: > I am having trouble in running the OpsCenter. It

DataStax Opscenter 2.0 question

2012-04-18 Thread Jay Parashar
I am having trouble in running the OpsCenter. It starts without any error but the GUI stays in the index page and just shows "Loading OpsCenter.". Firebug shows an error "this._onClusterSave.bind is not a function". I have the log turned on DEBUG it shows no error (pasted below). This is the only

Re: Column Family per User

2012-04-18 Thread Dave Brosius
It seems to me you are on the right track. Finding the right balance of # rows vs row width is the part that will take the most experimentation. - Original Message -From: "Trevor Francis" >;trevor.fran...@tgrahamcapital.com

Re: Column Family per User

2012-04-18 Thread Trevor Francis
Regarding Rotating, I was thinking about the concept of log rotate, where you write to a file for a specific period of time, then you create a new file and write to it after a specific set of time. So yes, it closes a row and opens another row. Since I will be generating analytics every 15 minu

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Yes in this cassandra model, time wouldn't be a column value, it would be part of the column name. Depending on how you want to access your data (give me all data points for time X) and how many separate datapoints you have for time X, you might consider packing all the data for a time in one

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Yes in this cassandra model, time wouldn't be a column value, it would be part of the column name. Depending on how you want to access your data (give me all data points for time X) and how many separate datapoints you have for time X, you might consider packing all the data for a time in one co

Re: Column Family per User

2012-04-18 Thread Trevor Francis
I am trying to grasp this concept….so let me try a scenario. Lets say I have 5 data points being captured in the log file. Here would be a typical table schema in mysql. Id, Username, Time, Wind, Rain, Sunshine Select * from table; would reveal: 1, george, 2012-04-12T12:22:23.293, 55, 45, 10 2

Re: Column Family per User

2012-04-18 Thread Dave Brosius
Your design should be around how you want to query. If you are only querying by user, then having a user as part of the row key makes sense. To manage row size, you should think of a row as being a bucket of time. Cassandra supports a large (but not without bounds) row size. To manage row size

Re: Column Family per User

2012-04-18 Thread Trevor Francis
Janne, Of course, I am new to the Cassandra world, so it is taking some getting used to understand how everything translates into my MYSQL head. We are building an enterprise application that will ingest log information and provide metrics and trending based upon the data contained in the logs

Re: Column Family per User

2012-04-18 Thread Janne Jalkanen
Each CF takes a fair chunk of memory regardless of how much data it has, so this is probably not a good idea, if you have lots of users. Also using a single CF means that compression is likely to work better (more redundant data). However, Cassandra distributes the load across different nodes b

Column Family per User

2012-04-18 Thread Trevor Francis
Our application has users that can write in upwards of 50 million records per day. However, they all write the same format of records (20 fields…columns). Should I put each user in their own column family, even though the column family schema will be the same per user? Would this help with dime

Re: Resident size growth

2012-04-18 Thread Jonathan Ellis
On Wed, Apr 18, 2012 at 12:44 PM, Rob Coli wrote: > On Tue, Apr 10, 2012 at 8:40 AM, ruslan usifov > wrote: >> mmap doesn't depend on jna > > FWIW, this confusion is as a result of the use of *mlockall*, which is > used to prevent mmapped files from being swapped, which does depend on > JNA. ml

RE: size tiered compaction - improvement

2012-04-18 Thread Bryce Godfrey
Per CF or per Row TTL would be very usefull for me also with our timeseries data. -Original Message- From: Igor [mailto:i...@4friends.od.ua] Sent: Wednesday, April 18, 2012 6:06 AM To: user@cassandra.apache.org Subject: Re: size tiered compaction - improvement For my use case it would b

Re: Resident size growth

2012-04-18 Thread Rob Coli
On Tue, Apr 10, 2012 at 8:40 AM, ruslan usifov wrote: > mmap doesn't depend on jna FWIW, this confusion is as a result of the use of *mlockall*, which is used to prevent mmapped files from being swapped, which does depend on JNA. =Rob -- =Robert Coli AIM>ALK - rc...@palominodb.com YAHOO - rcol

Single Vs. Multiple Keyspaces

2012-04-18 Thread Trevor Francis
We are launching a data-intensive application that will store in upwards of 50 million 150-byte records per day per user. We have identified Cassandra as our database technology and Flume as what we will use to seed the data from log files into the database. Each user is given their own server

Re: blob fields, bynary or hexa?

2012-04-18 Thread phuduc nguyen
How are you passing a blob or binary stream to the CLI? It sounds like you're passing in a representation of a binary stream as ascii/UTF8 which will create the problems you describe. Regards, Duc On 4/18/12 6:08 AM, "mdione@orange.com" wrote: > De : Erik Forkalsud [mailto:eforkals...@cj.

Re: size tiered compaction - improvement

2012-04-18 Thread Jonathan Ellis
It's not that simple, unless you have an append-only workload. (See discussion on https://issues.apache.org/jira/browse/CASSANDRA-3974.) On Wed, Apr 18, 2012 at 4:57 AM, Radim Kolar wrote: > >> Any compaction pass over A will first convert the TTL data into >> tombstones. >> >> Then, any subsequ

Re: Counter column family

2012-04-18 Thread Tamar Fraenkel
My problem was the result of Hector bug, see http://groups.google.com/group/hector-users/browse_thread/thread/8359538ed387564e So please ignore question, Thanks, *Tamar Fraenkel * Senior Software Engineer, TOK Media [image: Inline image 1] ta...@tok-media.com Tel: +972 2 6409736 Mob: +972 54

Re: size tiered compaction - improvement

2012-04-18 Thread Igor
For my use case it would be nice to have per CF TTL (to protect myself from application bug and from storage leak due to missed TTL), but seems you can't avoid tombstones even in this case and if you change CF TTL during runtime. On 04/18/2012 03:06 PM, Viktor Jevdokimov wrote: Our use case r

Re: swap grows

2012-04-18 Thread ruslan usifov
Thanks for link. But for me still present question about free memory. In out cluster we have 200 IOPS in peaks, but still have about 3GB of free memory on each server (cluster have 6 nodes tho there are 3*6=18 GB of unused memry). I think that OS must fill all memory with pagecache (we do backups

RE: blob fields, bynary or hexa?

2012-04-18 Thread mdione.ext
De : Erik Forkalsud [mailto:eforkals...@cj.com] > Which client are you using? With Hector or straight thrift, your > should > be able to store byte[] directly. So far, cassandra-cli only, but we're also testing phpcassa with CQL support[1]. -- [1] https://github.com/thobbs/phpcassa -- Marcos

RE: size tiered compaction - improvement

2012-04-18 Thread Viktor Jevdokimov
Our use case requires Column TTL, not CF TTL, because it is variable, not constant. Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Disclaimer: The infor

Re: blob fields, bynary or hexa?

2012-04-18 Thread Erik Forkalsud
On 04/18/2012 03:02 AM, mdione@orange.com wrote: We're building a database to stock the avatars for our users in three sizes. Thing is, We planned to use the blob field with a ByteType validator, but if we try to inject the binary data as read from the image file, we get a error. Whi

RE: blob fields, bynary or hexa?

2012-04-18 Thread mdione.ext
De : mdione@orange.com [mailto:mdione@orange.com] > We're building a database to stock the avatars for our users in three > sizes. Thing is, > We planned to use the blob field with a ByteType validator Before anyone starts pointing out that files in Cassandra is a bad idea, the images

blob fields, bynary or hexa?

2012-04-18 Thread mdione.ext
We're building a database to stock the avatars for our users in three sizes. Thing is, We planned to use the blob field with a ByteType validator, but if we try to inject the binary data as read from the image file, we get a error. The same happens if we convert the binary data to its base

Re: size tiered compaction - improvement

2012-04-18 Thread Radim Kolar
Any compaction pass over A will first convert the TTL data into tombstones. Then, any subsequent pass that includes A *and all other sstables containing rows with the same key* will drop the tombstones. thats why i proposed to attach TTL to entire CF. Tombstones would not be needed