Re: Network traffic patterns

2011-11-17 Thread Philippe
Hi Todd Yes all equal hardware. Nearly no CPU usage and no memory issues. Repairs are running in tens of minutes so i don't understand why replication would be backed up. Any other ideas? Le 17 nov. 2011 02:33, "Todd Burruss" a écrit : > Are all of your machines equal hardware? Since those mach

split large sstable

2011-11-17 Thread Radim Kolar
Is there some simple way how to split large sstable into several smaller ones? I increased min_compaction_threshold (smaller tables seems to get better file offset caching from OS) and now i need to reshuffle data to smaller sstables, running several cluster wide repairs worked well just large

Re: Second Cassandra users survey

2011-11-17 Thread Boris Yen
I was wondering if it is possible to provide a funtion like "delete from cf where column='value' " I think this shold be useful for people who use secondary index a lot. On Nov 15, 2011 11:05 AM, "Edward Ribeiro" wrote: > > +1 on co-processors. > > > Edward

Quick DataStax OpsCenter question

2011-11-17 Thread Alexandru Dan Sicoe
Hi, I'm using the community version of OpsCenter to monitor my clutster. At the moment I'm interested in storage space. In the performance metrics page, if I choose to see the graph of a the metric "CF: SSTable Size" for a certain CF of interest, two things are plotted on the graph: Total disk u

building a new email-like inbox service with cassandra

2011-11-17 Thread Dotan N.
Hi all, New to cassandra, i'm about to embrak on building a scalable user inbox service on top of cassandra. I've done the preliminary googling and got some more info on bluerunner (IBM's project on the subject), and now looking for more information in this specific topic. If anyone can point me t

Re: building a new email-like inbox service with cassandra

2011-11-17 Thread Edward Capriolo
On Thu, Nov 17, 2011 at 9:17 AM, Dotan N. wrote: > Hi all, > New to cassandra, i'm about to embrak on building a scalable user inbox > service on top of cassandra. > I've done the preliminary googling and got some more info on bluerunner > (IBM's project on the subject), > and now looking for mor

Re: building a new email-like inbox service with cassandra

2011-11-17 Thread Dotan N.
I'm sorry if I misrepresented the domain. it is not inbox search, but implementation of inbox. that is: write messages for users, get messages for a user, get unread messages for a user, mark messages as read for a user. basically this is it. although (and due to) the use cases are simple we're aim

Re: building a new email-like inbox service with cassandra

2011-11-17 Thread Rustam Aliyev
Hi Dotan, We have already built something similar and were planning to open source it. It will be available under http://www.elasticinbox.com/. We haven't followed exactly IBM's paper, we believe our Cassandra model design is more robust. It's written in Java and provides LMTP and REST inter

Re: building a new email-like inbox service with cassandra

2011-11-17 Thread Norman Maurer
I would be very interested in this. I wrote a prototype for JAMES which uses cassandra to store emails and provide them via IMAP and POP3 so it would be nice to see your impl. thanks norman Am Donnerstag, 17. November 2011 schrieb Rustam Aliyev : > Hi Dotan, > > We have already built something si

java lib used in cli to provide auto-completion

2011-11-17 Thread S Ahmed
Hi folks, I'm curious what java lib is used to provide auto-completion in the cli? Or is it all custom code?

Re: Quick DataStax OpsCenter question

2011-11-17 Thread Nick Bailey
Live Disk Space indicates all sstables that are currently valid. Total Disk Space will include sstables that have been compacted but not yet deleted (because a full GC hasn't run yet). Also the 1.3 release of OpsCenter includes an inline help section. You can find more specific information about p

RE: split large sstable

2011-11-17 Thread Dan Hendry
What do you mean by ' better file offset caching'? Presumably you mean 'better page cache hit rate'? Out of curiosity, why do you think this? What data are you seeing which makes you think it's better? I am certainly not even close to a virtual memory or page caching expert but I am pretty sure fil

Help with Pig Script

2011-11-17 Thread Aaron Griffith
I am trying to do the following with a PIG script and am having trouble finding the correct syntax. - I want to use the LOAD function to load a single key/value "row" into a pig object. - The contents of that row is then flattened into a list of keys. - I then want to use that list of keys for a

A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Maxim Potekhin
Hello everyone, I run a query on a secondary index. For some queries, I get 0 rows returned. In other cases, I just get a string that reads "null". What's going on? TIA Maxim

Re: Dropped request...

2011-11-17 Thread K.Tomita
Hello, Jeesoo. I was investigating it exactly. It means,when RPC service does not return a status for a defined period of time, the value is incremented. (cassandra.yaml:rpc_timeout_in_ms) (default:1) Therefore, The Drop is assumed that, there is a possibility that distribution node is compl

Re: A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Jonathan Ellis
If CLI returns null it means there was an error -- run with --debug to check the exception. On Thu, Nov 17, 2011 at 11:20 AM, Maxim Potekhin wrote: > Hello everyone, > > I run a query on a secondary index. For some queries, I get 0 rows returned. > In other cases, > I just get a string that reads

Re: A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Maxim Potekhin
Thanks Jonathan. I get the bellow error. Don't have a clue as to what it means. null java.lang.RuntimeException at org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:310) at org.apache.cassandra.cli.CliMain.processStatement(CliMain.java:217) at org.

What sort of load do the tombstones create on the cluster?

2011-11-17 Thread Maxim Potekhin
In view of my unpleasant discovery last week that deletions in Cassandra lead to a very real and serious performance loss, I'm working on a strategy of moving forward. If the tombstones do cause such problem, where should I be looking for performance bottlenecks? Is it disk, CPU or something el

Austin Hacker Dojo - Big Data Machine Learning

2011-11-17 Thread David Boney
I am interested in starting a hacker dojo in Austin for big data machine learning. We would meet one evening a week to work on coding up Hadoop based machine learning and statistical analysis problems for big data systems. This would be a hacker dojo where the focus is on coding. I can teach and

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
If you are only interested in loading one row, why do you need to use Pig? Is it an extremely wide row? Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you have to mapreduce over the whole column family. That will change probably in 1.1. H

Re: Help with Pig Script

2011-11-17 Thread Aaron Griffith
Jeremy Hanna gmail.com> writes: > > If you are only interested in loading one row, why do you need to use Pig? > Is it an extremely wide row? > > Unless you are using an ordered partitioner, you can't limit the rows you mapreduce over currently - you > have to mapreduce over the whole colum

Datastructure time tracking

2011-11-17 Thread RobinUs2
We're currently developing a system with a time tracking part. We need to store following details: - user - time (in minutes) - description - billable - project - task ID What would be a proper data structure for this in Cassandra? -- View this message in context: http://cassandra-user-incubator

Data Model Design for Login Servie

2011-11-17 Thread Maciej Miklas
Hallo all, I need your help to design structure for simple login service. It contains about 100.000.000 customers and each one can have about 10 different logins - this results 1.000.000.000 different logins. Each customer contains following data: - one to many login names as string, max 20 UTF-8

Re: Help with Pig Script

2011-11-17 Thread Jeremy Hanna
On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote: > Jeremy Hanna gmail.com> writes: > >> >> If you are only interested in loading one row, why do you need to use Pig? >> Is > it an extremely wide row? >> >> Unless you are using an ordered partitioner, you can't limit the rows you > mapre

RE: Data Model Design for Login Servie

2011-11-17 Thread Dan Hendry
Your first approach, skinny rows, will almost certainly be a better solution although it never hurts to experiment for yourself. Even for low end hardware (for sake of argument, EC2 m1.smalls), a few million rows is basically nothing (again though, I encourage you to verify for yourself). For re

Varying number of rows coming from same query on same database

2011-11-17 Thread Maxim Potekhin
Hello, I'm running the same query repeatedly. It's a secondary index query, done from a Pycassa client. I see that when I iterate the "result" object, I get slightly different number of entries when running the test serially. There is no deletions in the database, and no writes, it's static for n

ParNew and caching

2011-11-17 Thread Todd Burruss
I'm using cassandra 1.0. Been doing some testing on using cass's cache. When I turn it on (using the CLI) I see ParNew jump from 3-4ms to 200-300ms. This really screws with response times, which jump from ~25-30ms to 1300+ms. I've increase new gen and that helps, but still this is suprising

Re: Data Model Design for Login Servie

2011-11-17 Thread Maxim Potekhin
1122: { gender: MALE birthdate: 1987.11.09 name: Alfred Tester pwd: e72c504dc16c8fcd2fe8c74bb492affa alias1: alfred.tes...@xyz.de alias2: alf...@aad.de alias3: a...@dd.de

Re: building a new email-like inbox service with cassandra

2011-11-17 Thread Andrey V. Panov
I'm also interesting in your project and will be glad to follow you on twitter if I can. On 18 November 2011 00:37, Rustam Aliyev wrote: > Hi Dotan, > > We have already built something similar and were planning to open source > it. It will be available under http://www.elasticinbox.com/. > > We

Re: A Cassandra CLI question: null vs 0 rows

2011-11-17 Thread Maxim Potekhin
Should I file a ticket? I consistently see this behavior after a mass delete. On 11/17/2011 12:46 PM, Maxim Potekhin wrote: Thanks Jonathan. I get the bellow error. Don't have a clue as to what it means. null java.lang.RuntimeException at org.apache.cassandra.cli.CliClient.executeCL

Re: Dropped request...

2011-11-17 Thread Jeesoo Shin
Thank you for your reply, Tomita. So the node may have failed to response in time, but other nodes might have completed it's request. (in QUORUM) This helps my mind that it's not a definite failure. ;-) However, I do not understand what you meant to check data. Checking data of dropped request...

Re: mmap I/O and shared memory

2011-11-17 Thread Tyler Hobbs
This FAQ entry and the linked document provide a pretty good explanation: http://wiki.apache.org/cassandra/FAQ#mmap By the way, you should almost always turn off swap. On Thu, Nov 17, 2011 at 1:16 AM, Jaesung Lee wrote: > I am running 7 nodes cassandra(v1.0.2) cluster. > I am putting 20K rows

Re: java lib used in cli to provide auto-completion

2011-11-17 Thread Tyler Hobbs
JLine. You can see the usage here: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cli/CliCompleter.java On Thu, Nov 17, 2011 at 10:21 AM, S Ahmed wrote: > Hi folks, > > I'm curious what java lib is used to provide auto-completion in the cli? > Or is it all custom

Re: Datastructure time tracking

2011-11-17 Thread Tyler Hobbs
On Thu, Nov 17, 2011 at 2:36 PM, RobinUs2 wrote: > We're currently developing a system with a time tracking part. We need to > store following details: > - user > - time (in minutes) > - description > - billable > - project > - task ID > > What would be a proper data structure for this in Cassand

Re: Data Model Design for Login Servie

2011-11-17 Thread Maciej Miklas
but secondary index is limited only to repeating values like enums. In my case I would have performance issue. right? On 18.11.2011, at 02:08, Maxim Potekhin wrote: 1122: { gender: MALE birthdate: 1987.11.09 name: Alfred Tester pwd: e72c504dc16c8fcd2fe8c7