Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mick Semb Wever
On Fri, 2011-09-02 at 08:20 +0200, Patrik Modesto wrote: > As Jonathan > already explained himself: "ignoring unavailable ranges is a > misfeature, imo" Generally it's not what one would want i think. But I can see the case when data is to be treated volatile and ignoring unavailable ranges may b

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Patrik Modesto
Hi, On Thu, Sep 1, 2011 at 12:36, Mck wrote: >> It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) > > I'm interested in this patch and see it's usefulness but no one will act > until you attach it to an issue. (I think a new issue is appropriate > here). I'm glad someone is i

Limiting ColumnSlice range in second composite value

2011-09-01 Thread Anthony Ikeda
My Column name is of Composite(TimeUUIDType, UTF8Type) and I can query across the TimeUUIDs correctly, but now I want to also range across the UTF8 component. Is this possible? UUID start = uuidForDate(new Date(1979, 1, 1)); UUID end = uuidForDate(new Date(Long.MAX_VALUE)); String startState = "

Re: Fun with Heap Dump ...

2011-09-01 Thread Ian Danforth
Awesome, thanks for the quick response! Ian On Thu, Sep 1, 2011 at 5:27 PM, Jonathan Ellis wrote: > On Thu, Sep 1, 2011 at 6:54 PM, Ian Danforth wrote: >> 1. What operation is C* performing during lines like these: >> >>  INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643 >>

Re: Fun with Heap Dump ...

2011-09-01 Thread Jonathan Ellis
On Thu, Sep 1, 2011 at 6:54 PM, Ian Danforth wrote: > 1. What operation is C* performing during lines like these: > >  INFO 22:38:34,710 Opening /cassandra/data/Keyspace1/TwitterTest-g-5643 > > (I think this is an SSTable it's extracting an index for this column > family from) Right. > 2. Has my

Fun with Heap Dump ...

2011-09-01 Thread Ian Danforth
All, I need help interpreting the results of my investigation. I'm encountering this error: "Unable to reduce heap usage since there are no dirty column families". My heap sits near max and occasionally OOMs. (4GB heap) Following Mr. Ellis's instructions here: http://cassandra-user-incubator-apa

Re: Updates lost

2011-09-01 Thread Paul Loy
Well, on windows Vista and below (haven't checked on 7), System.currentTimeMillis only has around 10ms granularity. That is for any 10ms period, you get the same value. I develop on Windows and I'd get sporadic integration test failures due to this. On Thu, Sep 1, 2011 at 8:31 PM, Jeremiah Jordan

RE: Removal of old data files

2011-09-01 Thread hiroyuki.watanabe
Yes, I see files with name like Orders-g-6517-Compacted However, all of those file have a size of 0. Starting from Monday to Thurseday we have 5642 files for -Data.db, -Filter.db and Statistics.db and only 128 -Compacted files. and all of -Compacted file has size of 0. Is this normal, or we

Re: Bulk loader: Got an unknow host from describe_ring

2011-09-01 Thread Jonathan Ellis
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044, fixed for 0.8.5 On Thu, Sep 1, 2011 at 4:27 PM, Christopher Bottaro wrote: > Hello, > I'm trying to import data from one Cassandra cluster to another.  The old > cluster is using ports 7000 and 9160 and the new cluster is using 700

Bulk loader: Got an unknow host from describe_ring

2011-09-01 Thread Christopher Bottaro
Hello, I'm trying to import data from one Cassandra cluster to another. The old cluster is using ports 7000 and 9160 and the new cluster is using 7001 and 9161. I ran "nodetool -h localhost snapshot" on a node on the old cluster. I then downloaded apache-cassandra-0.8.4-bin.tar.gz, edited conf/

Re: Replicate On Write behavior

2011-09-01 Thread Yang
sorry i mean cf * row if you look in the code, db.cf is just basically a set of columns On Sep 1, 2011 1:36 PM, "Ian Danforth" wrote: > I'm not sure I understand the scalability of this approach. A given > column family can be HUGE with millions of rows and columns. In my > cluster I have a sin

Re: Replicate On Write behavior

2011-09-01 Thread Konstantin Naryshkin
Yeah, I believe that Yan has a type in his post. A CF is no read in one go, a row is. As for the scalability of having all the columns being read at once, I do not believe that it was ever meant to be. All the columns in a row are stored together, on the same set of machines. This means that if

Re: Replicate On Write behavior

2011-09-01 Thread Ian Danforth
I'm not sure I understand the scalability of this approach. A given column family can be HUGE with millions of rows and columns. In my cluster I have a single column family that accounts for 90GB of load on each node. Not only that but column family is distributed over the entire ring. Clearly I'm

Re: Replicate On Write behavior

2011-09-01 Thread Yang
when Cassandra reads, the entire CF is always read together, only at the hand-over to client does the pruning happens On Thu, Sep 1, 2011 at 11:52 AM, David Hawthorne wrote: > I'm curious... digging through the source, it looks like replicate on write > triggers a read of the entire row, and not

Re: Updates lost

2011-09-01 Thread Jeremiah Jordan
Are you running on windows? If the default timestamp is just using time.time()*1e6 you will get the same timestamp twice if the code is close together. time.time() on windows is only millisecond resolution. I don't use pycassa, but in the Thrift api wrapper I created for our python code I im

Replicate On Write behavior

2011-09-01 Thread David Hawthorne
I'm curious... digging through the source, it looks like replicate on write triggers a read of the entire row, and not just the columns/supercolumns that are affected by the counter update. Is this the case? It would certainly explain why my inserts/sec decay over time and why the average inse

Re: hw requirements

2011-09-01 Thread Maxim Potekhin
Sorry about unclear naming scheme. I meant that if I want to index on a few columns simultaneously, I create a new column with catenated values of these. On 8/31/2011 3:10 PM, Anthony Ikeda wrote: Sorry to fork this topic, but in "composite indexes" do you mean as strings or as "Composite()". I

Re: cassandra-cli describe / dump command

2011-09-01 Thread Jonathan Ellis
yes, cli "show schema" in 0.8.4+ On Thu, Sep 1, 2011 at 12:52 PM, J T wrote: > Hi, > > I'm probably being blind .. but I can't see any way to dump the schema > definition (and the data in it for that matter)  of a cluster in order to > capture the current schema in a script file for subsequent re

cassandra-cli describe / dump command

2011-09-01 Thread J T
Hi, I'm probably being blind .. but I can't see any way to dump the schema definition (and the data in it for that matter) of a cluster in order to capture the current schema in a script file for subsequent replaying in to a different environment. For example, say I have a DEV env and wanted to

Re: Trying to understand QUORUM and Strategies

2011-09-01 Thread Anthony Ikeda
Thanks Evneniy, We encountered this exception with the following settings: Caused by: InvalidRequestException(why:consistency level LOCAL_QUORUM not compatible with replication strategy (org.apache.cassandra.locator .SimpleStrategy)) at org.apache.cassandra.t

Re: 15 seconds to increment 17k keys?

2011-09-01 Thread Ian Danforth
Does this scale with multiples of the replication factor or directly with number of nodes? Or more succinctly, to double the writes per second into the cluster how many more nodes would I need? (Thanks for the note on pycassa, I've checked and it's not the limiting factor) Ian On Thu, Sep 1, 2011

Re: java.io.IOException: Could not get input splits

2011-09-01 Thread Jian Fang
Thanks. How soon 0.8.5 will be out? Is there any 0.8.5 snapshot version available? On Thu, Sep 1, 2011 at 11:57 AM, Jonathan Ellis wrote: > Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044, > fixed for 0.8.5 > > On Thu, Sep 1, 2011 at 10:54 AM, Jian Fang > wrote: > > Hi, > > > >

Re: java.io.IOException: Could not get input splits

2011-09-01 Thread Jonathan Ellis
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3044, fixed for 0.8.5 On Thu, Sep 1, 2011 at 10:54 AM, Jian Fang wrote: > Hi, > > I upgraded Cassandra from 0.8.2 to 0.8.4 and run a hadoop job to read data > from Cassandra, but > got the following errors: > > 11/09/01 11:42:46 INFO had

Re: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Eric Evans
On Thu, Sep 1, 2011 at 10:13 AM, Eric Czech wrote: > I got it here : https://nodeload.github.com/twall/jna/tarball/master > Is there some other version or distribution of jna that I should be using? >  The version I have is 3.3.0. As Dan mentions in another email, if you can install it from an RP

RE: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Dan Hendry
Have you installed 'jna'? On RHEL (6 at least) it should be possible using the default yum repos. You need the native code and the JAR in Cassandras classpath from what I understand. Dan From: eczec...@gmail.com [mailto:eczec...@gmail.com] On Behalf Of Eric Czech Sent: September-01-11 11:13

Re: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Eric Czech
I got it here : https://nodeload.github.com/twall/jna/tarball/master Is there some other version or distribution of jna that I should be using? The version I have is 3.3.0. On Thu, Sep 1, 2011 at 8:49 AM, Eric Evans wrote: > On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech > wrote: > > I'm runnin

Re: Unable to link C library (for jna.jar) on 0.7.5

2011-09-01 Thread Eric Evans
On Wed, Aug 31, 2011 at 11:38 PM, Eric Czech wrote: > I'm running cassandra 0.7.5 on about 20 RHEL 5 (24 GB RAM) machines and I'm > having issues with snapshots, json sstable conversions, and various nodetool > commands due to memory errors and the lack of the native access C libraries. >  I tried

Fwd: Column index limitations: total number of indexes per row? OOPS :/

2011-09-01 Thread Renato Bacelar da Silveira
HAHA finger trouble on bellow line --- So I will be looking at 6000 x 2.5million indexes. Yep, that's 6,250,000 indexes. * The sum actually is meant to be: 15,000,000 - so thats 15 million indexes - sho!* Apologies :) Original Message Subject:Column index lim

Column index limitations: total number of indexes per row?

2011-09-01 Thread Renato Bacelar da Silveira
Hi All I have indexed a number of columns in a ROW, ie 25 colums, to perform Indexed_slice queries. If I am not mistaken, there is some limit to the number of indexes one may create per row/keyspace? I am trying to get up to 6000 columns indexed, per row, in 2.5 million rows. So I will be

Using Cassandra as backend for an event queue?

2011-09-01 Thread Martin Lansler
Hi, I'm considering using Cassandra as backend for implementing a distributed event queue, probably via an established framework such as ActiveMQ, RabbitMQ, Spring Integration etc. I need a solution which can handle both high throughput, short-lived events such as an outgoing email box, as well a

Re: RF=1 w/ hadoop jobs

2011-09-01 Thread Mck
On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote: > But there is the another problem with Hadoop-Cassandra, if there is no > node available for a range of keys, it fails on RuntimeError. For > example having a keyspace with RF=1 and a node is down all MapReduce > tasks fail. CASSANDRA-2388

Re: 15 seconds to increment 17k keys?

2011-09-01 Thread Richard Low
Assuming you have replicate_on_write enabled (which you almost certainly do for counters), you have to do a read on a write for each increment. This means counter increments, even if all your data set fits in cache, are significantly slower than normal column inserts. I would say ~1k increments p