BloomFilterFalsePositives equals 1.0

2011-06-21 Thread Preston Chang
Hi,all: I have a problem with bloom filter. When made a test which tried to get some nonexistent keys, it seemed that the bloom filter does not work. The 'BloomFilterFalseRatio' was 1.0 and the 'BloomFilterFalsePositives' was rising and the disk I/O utils reached 100% according to 'iostat'.

Re: Storing Accounting Data

2011-06-21 Thread AJ
On 6/21/2011 3:36 PM, Stephen Connolly wrote: writes are not atomic. the first side can succeed at quorum, and the second side can fail completely... you'll know it failed, but now what... you retry, still failed... erh I'll store it somewhere and retry it later... where do I store it? the

Re: Storing Accounting Data

2011-06-21 Thread AJ
On 6/21/2011 3:14 PM, Anand Somani wrote: Not sure if it is that simple, a quorum can fail with writes happening on some nodes (there is no rollback). Also there is no concept of atomic compare-and-swap. Good points. I suppose what I need is for the client to implement the part of ACID tha

Re: Storing Accounting Data

2011-06-21 Thread AJ
And I was thinking of using JTA for transaction processing. I have no experience with it but on the surface it looks like it should work. On 6/21/2011 3:31 PM, AJ wrote: What's the best accepted way to handle that 100% in the client? Retries? On 6/21/2011 3:14 PM, Anand Somani wrote: Not sur

Re: Storing Accounting Data

2011-06-21 Thread Stephen Connolly
writes are not atomic. the first side can succeed at quorum, and the second side can fail completely... you'll know it failed, but now what... you retry, still failed... erh I'll store it somewhere and retry it later... where do I store it? the consistency level is about tuning whether reads and

Re: Storing Accounting Data

2011-06-21 Thread AJ
What's the best accepted way to handle that 100% in the client? Retries? On 6/21/2011 3:14 PM, Anand Somani wrote: Not sure if it is that simple, a quorum can fail with writes happening on some nodes (there is no rollback). Also there is no concept of atomic compare-and-swap. On Tue, Jun 21,

Re: Storing Accounting Data

2011-06-21 Thread Anand Somani
Not sure if it is that simple, a quorum can fail with writes happening on some nodes (there is no rollback). Also there is no concept of atomic compare-and-swap. On Tue, Jun 21, 2011 at 2:03 PM, AJ wrote: > ** > On 6/21/2011 2:50 PM, Stephen Connolly wrote: > > how important are things like tran

Re: Storing Accounting Data

2011-06-21 Thread AJ
On 6/21/2011 2:50 PM, Stephen Connolly wrote: how important are things like transactional consistency for you? would you have issues if only one side of a transfer was recorded? Right. Both of those questions are about consistency. Isn't the simple solution is to use QUORUM read/writes?

Re: Storing Accounting Data

2011-06-21 Thread Stephen Connolly
how important are things like transactional consistency for you? would you have issues if only one side of a transfer was recorded? cassandra, out of the box, on it's own, would not be ideal if the above two things are important for you. you can add components to a system to help address these t

Re: solandra or pig or....?

2011-06-21 Thread Jake Luciani
Right, Solr will not do anything other than basic aggregations (facets) and range queries. On Tue, Jun 21, 2011 at 3:16 PM, Dan Kuebrich wrote: > Solandra is indeed distributed search, not distributed number-crunching. > As a previous poster said, you could imagine structuring the data in a > s

Re: Compressing data types

2011-06-21 Thread aaron morton
Also https://issues.apache.org/jira/browse/HADOOP-7206 Now part of brisk http://www.datastax.com/dev/blog/brisk-1-0-beta-2-released Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22 Jun 2011, at 04:04, Vijay wrote: > You might w

Re: solandra or pig or....?

2011-06-21 Thread Dan Kuebrich
Solandra is indeed distributed search, not distributed number-crunching. As a previous poster said, you could imagine structuring the data in a series of documents with fields containing playername, teamname, position, location, day, time, inning, at bat, outcome, etc. Then you could query to get

Re: CommitLog replay

2011-06-21 Thread aaron morton
use nodetool cfstats or show keyspaces; in cassandra-cli to see the flush settings, default is (i think) 60 minutes, 0.1 million "ops" or 1/16th of hte heap size when the CF was created. But under 0.8 there is an automagical global memory manager, see https://github.com/apache/cassandra/blob/cas

Re: solandra or pig or....?

2011-06-21 Thread Victor K.
If I may ask Sasha, what exactly are you trying to achieve using SolR (or Solandra, I guess it's about the same) ? Because from what I understood of your problem you need to do statistics on your matches, players etc... Or do you just want to retrieve information that are already been computed ?

Re: solandra or pig or....?

2011-06-21 Thread Jake Luciani
Your application isn't aware of Cassandra only Solr. The idea of Solandra is to use Cassandra as a backend for Solr. Solr has a distributed search mechanism already so by making Solr Cassandra aware it can auto-shard and manage distributed queries for you, with replication and failover etc As for

Re: solandra or pig or....?

2011-06-21 Thread Sasha Dolgy
Without getting overly complicated and long winded ... are there practical references / examples I can review that demonstrate the cassandra/solandra benefitsi had a quick look at https://github.com/tjake/Solandra/wiki/Solandra-Wiki and it wasn't dead obvious to me On Tue, Jun 21, 2011 at

Re: solandra or pig or....?

2011-06-21 Thread Jeremy Hanna
Just wanted to mention that there is also a #solandra irc channel on freenode in case people are interested. On Jun 21, 2011, at 1:26 PM, Mark Kerzner wrote: > Me too! > > I would be interested to know how such queries are done in Solandra. I would > understand it if it creates a complete Luce

Re: solandra or pig or....?

2011-06-21 Thread Mark Kerzner
Me too! I would be interested to know how such queries are done in Solandra. I would understand it if it creates a complete Lucene index of everything that's in Cassandra, and adds the text search. Then your query goes against Lucene. But if some data is found in column families in Cassandra, and

Re: solandra or pig or....?

2011-06-21 Thread Jake Luciani
Solandra can answer the question you used as an example and it's more of a fit for low-latency ad-hoc reporting then PIG. Pig queries will take minutes not seconds. On Tue, Jun 21, 2011 at 12:12 PM, Sasha Dolgy wrote: > Folks, > > Simple question ... Assuming my current use case is the ability

Storing Accounting Data

2011-06-21 Thread AJ
Is C* suitable for storing customer account (financial) data, as well as billing, payroll, etc? This is a new company so migration is not an issue... starting from scratch. Thanks!

Re: OOM during restart

2011-06-21 Thread Jonathan Ellis
If you're OOMing on restart you WILL OOM during normal usage given heavy enough write load. Definitely adjust memtable thresholds down or, as Dominic suggests, upgrade to 0.8. On Tue, Jun 21, 2011 at 12:02 PM, Dominic Williams wrote: > Hi gabe, > What you need to do is the following: > 1. Adjust

Re: solandra or pig or....?

2011-06-21 Thread Victor Kabdebon
I can speak for what I know : Pig I have taken only a quick look and maybe some guys from Twitter can answer better than me on that particular program. Pig is not for "on demand" queries: they are quite slow and as you said you extract relevant information and append it to another CF where you can

Re: OOM during restart

2011-06-21 Thread Dominic Williams
Hi gabe, What you need to do is the following: 1. Adjust cassandra.yaml so when this node is starting up it is not contacted by other nodes e.g. set thrift to 9061 and storage to 7001 2. Copy your commit logs to tmp sub-folder e.g. commitLog/tmp 3. Copy a small number of commit logs back into m

Re: Keys-only query

2011-06-21 Thread Jeremy Hanna
Also - there is an open ticket to create a .NET CQL driver - may be worth watching or if you'd like to help out with it somehow: https://issues.apache.org/jira/browse/CASSANDRA-2634 On Jun 21, 2011, at 9:31 AM, Stephen Pope wrote: > We just recently switched to 0.8 (from 0.7.4), and it looks lik

Re: Keys-only query

2011-06-21 Thread Nate McCall
This is a known issue and is being tracked on the following: https://issues.apache.org/jira/browse/CASSANDRA-2653 On Tue, Jun 21, 2011 at 9:31 AM, Stephen Pope wrote: > We just recently switched to 0.8 (from 0.7.4), and it looks like key-only > queries are broken (number of columns = 0). The same

Re: Problem with PropertyFileSnitch in Amazon EC2

2011-06-21 Thread Joaquin Casares
Could you verify any security settings that may come into play with Elastic IPs? You should make sure the appropriate ports are open. See: http://www.datastax.com/docs/0.8/brisk/install_brisk_ami for a list of ports in the first chart. Joaquin Casares DataStax Software Engineer/Support On Mon,

solandra or pig or....?

2011-06-21 Thread Sasha Dolgy
Folks, Simple question ... Assuming my current use case is the ability to log lots of trivial and seemingly useless sports statistics ... I want a user to be able to query / compare For example: --> Show me all baseball players in cheektowaga and ontario, california who have hit a grandslam

Re: Compressing data types

2011-06-21 Thread Vijay
You might want to watch https://issues.apache.org/jira/browse/CASSANDRA-47 Regards, On Tue, Jun 21, 2011 at 5:14 AM, Timo Nentwig wrote: > Hi! > > Just wondering why this doesn't already exist: wouldn't it make sense to > have > decorating data types that compress (gzip, snappy) other data ty

Re: pig integration & NoClassDefFoundError TypeParser

2011-06-21 Thread Sasha Dolgy
bang on ... no idea why ... a new day a fresh login ... environment variables gone. working now with cassandra 0.8.0 and pig 0.8.1 went through all my steps and all is working ... except line 45 in the bin/pig_cassandra is not proper when there are multiple pig*.jar files. On Mon, Jun 20, 2011 a

Keys-only query

2011-06-21 Thread Stephen Pope
We just recently switched to 0.8 (from 0.7.4), and it looks like key-only queries are broken (number of columns = 0). The same query works if we switch the number of columns to 1. Is there a new mechanism for getting key-only? We can't use CQL yet since we're using .NET for our development. Che

RE: CommitLog replay

2011-06-21 Thread Stephen Pope
I've only got one cf, and haven't changed the default flush expiry period. I'm not sure the node had fully started or not. I had to restart my data insertion (for other reasons), so I can check the system log upon restart when the data is finished inserting. Do you know off-hand how long the

Re: CommitLog replay

2011-06-21 Thread Peter Schuller
> I’ve got a single node deployment of 0.8 set up on my windows box. When I > insert a bunch of data into it, the commitlogs directory doesn’t clear upon > completion (should it?). It is expected that commit logs are retained for a while, and that there is reply going on when restarting a node. Th

CommitLog replay

2011-06-21 Thread Stephen Pope
Hi there. This is my first message to the mailing list, so let me know if I'm doing it wrong. :) I've got a single node deployment of 0.8 set up on my windows box. When I insert a bunch of data into it, the commitlogs directory doesn't clear upon completion (should it?). As a result, when I sto

RE: Cassandra Clients for Java

2011-06-21 Thread Vivek Mishra
Hi Daniel, Just saw your email regarding kundera download. Kundera snapshot jar is available at: http://kundera.googlecode.com/svn/maven2/maven-missing-resources/com/impetus/kundera/1.1.1-SNAPSHOT/ In addition, If you want to download source code then it is at: https://github.com/impetus-openso

Compressing data types

2011-06-21 Thread Timo Nentwig
Hi! Just wondering why this doesn't already exist: wouldn't it make sense to have decorating data types that compress (gzip, snappy) other data types (esp. UTF8Type, AsciiType) transparently? -tcn

Re: Flushing behavior in Cassandra 0.8

2011-06-21 Thread aaron morton
The new memtable_total_space_in_mb option is kicking in https://github.com/apache/cassandra/blob/cassandra-0.8.0/NEWS.txt#L34 http://thelastpickle.com/2011/05/04/How-are-Memtables-measured/ Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle

Re: Create columnFamily

2011-06-21 Thread aaron morton
You've set a comparator for the super column names, but not the sub columns. e.g. [default@dev] set data['31']['address']['city']='noida'; org.apache.cassandra.db.marshal.MarshalException: cannot parse 'city' as hex bytes [default@dev] set data['31']['address'][utf8('city')]='noida'; Value

RE: Cassandra.yaml

2011-06-21 Thread Vivek Mishra
Thanks Aaron. It is really a great pointer to solution. -Vivek From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Monday, June 20, 2011 12:51 AM To: user@cassandra.apache.org Subject: Re: Cassandra.yaml The change to the remove the calls to DatabaseDecriptor were in this commit on the 0

Re: OOM during restart

2011-06-21 Thread aaron morton
AFAIK the node will not announce itself in the ring until the log replay is complete, so it will not get the schema update until after log replay. If possible i'd avoid making the schema change until you have solved this problem. My theory on OOM during log replay is that the high speed inserts

Re: port 8080

2011-06-21 Thread Sasha Dolgy
Personally speaking, I do not run JMX on 8080, and never have. The tools, like cassandra-cli and nodetool expect it to be on the default port, but you can override with -p or -jmxport -sd On Tue, Jun 21, 2011 at 1:33 PM, osishkin osishkin wrote: > I did, and everything seemed to work fine. > Bu

Re: port 8080

2011-06-21 Thread osishkin osishkin
I did, and everything seemed to work fine. But I saw a reference here http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.html That said "make sure you have at least one node listening on 8080 since all the Cassandra tools assume JMX is listening there", and then remembered th

Re: port 8080

2011-06-21 Thread Sasha Dolgy
it's defined in $CASSANDRA_HOME/conf/cassandra-env.sh JMX_PORT= Have it different for each instance ... On Tue, Jun 21, 2011 at 1:24 PM, osishkin osishkin wrote: > I want to have several deamons running on a machine, each belinging to > a multi-node cluster. > Is that a problem in concern to po

Re: Secondary indexes performance

2011-06-21 Thread aaron morton
Can you provide some more information on the query you are running ? How many terms are you selecting with? How long does it take to return 1024 rows ? IMHO thats a reasonably big slice to get. The server will pick the most selective equality predicate, and then filter the results from that

port 8080

2011-06-21 Thread osishkin osishkin
I want to have several deamons running on a machine, each belinging to a multi-node cluster. Is that a problem in concern to port 8080, for jmx monitoring? Is it somewhere hardcoded, so that changing it is the configuration files is not enough? Thank you osi

Flushing behavior in Cassandra 0.8

2011-06-21 Thread Rene Kochen
I try to understand the flushing behavior in Cassandra 0.8 When I create rows, after a few seconds, I see the following line in the log: INFO 11:18:46,470 flushing high-traffic column family ColumnFamilyStore(table='Traxis', columnFamily='Customers') INFO 11:18:46,471 Enqueuing flush of Memtabl

RE: issue with querying SuperColumn

2011-06-21 Thread Vivek Mishra
Thanks Richard. You are right. I missed that in key validation class. -Original Message- From: Richard Low [mailto:r...@acunu.com] Sent: Tuesday, June 21, 2011 12:44 PM To: user@cassandra.apache.org Subject: Re: issue with querying SuperColumn You have key validation class UTF8Type for th

Re: issue with querying SuperColumn

2011-06-21 Thread Richard Low
You have key validation class UTF8Type for the standard CF, but BytesType for the super. This is why the key is "1" for standard, but printed as "31" for super, which is the hex ascii code for 1. In your java code, use "1".getBytes() as your key and it should work. Richard. -- Richard Low Acun

Create columnFamily

2011-06-21 Thread Vivek Mishra
I understand that I might be missing something on my end. But somehow I cannot get this working using Cassandra-cli: [default@key1] create column family supusers with comparator=UTF8Type and default_validation_class=UTF8Type and key_validation_class=UTF8Type and column_type=Super; 59e2e950-9bd