Re: Load balancing

2010-06-18 Thread Oleg Anastasjev
Mubarak Seyed  apple.com> writes:

> 
> - How does client (application) connect to cassandra cluster? Is it always for
one node (and thrift can get ring info) and send the request to connected node

This depends on client library you use. Any cassandra node can accept client
connections and forward request to node owning requested data.

> - If we send 300k records from each node, it is a over kill for a node which
accepts client connection, does
> node get choked?

Of course in your situation no single node can handle all load. So you have to
connect to several nodes. 
The best way, I believe, is to connect right to the node, owning data you need.
Take a look to org/apache/cassandra/client/RingCache.java for an example how to
read ring state and forward requests to right node.

> - How do we design a cassandra cluster to make sure that insert get
distributed to more than one nodes?
> - If i prefer OrderPreservingPartition as a partitioner, how does single node
handle all the 200k records?

If you prefer OPP, you have 2 ways (manual and automatic): 
1. If you know distribution of keys in your data, you distribute token values
between you nodes in a way, which ensures unform key distribution. Imagine, if
you have single byte keys ranging from 0 to 255 and 64 nodes (i assume data is
distributed uniformly across all keys for simplicity). For this you'll have to
manually configure  in storage-conf of 1st node to 0, 2nd = 4, 3rd = 8,
4th=12 and so on.
2. The automatic way is to start cassandra cluster with small node count, import
data to it and bootstrap rest of nodes, specifying bootstrap=true and empty
value for token in storage conf. This way cassandra will try to balance data by
itself.


200k of records are not big deal for cassandra, IMHO, but of course this depends
on your hardware and size of records.

Anyway, good idea is to test your configuration with real data first.





Cassandra Multiple DataCenter Suitability - why?

2010-06-18 Thread altanis
Hello,

I keep reading everywhere that Cassandra has supported multiple
datacenters from the beginning. I would like to know what does Cassandra
do to achieve that. Is it just that the developers have written some code
that supports that scenario, or is there something inherent in Cassandra's
design that is suitable for a multi DC environment, like minimizing
inter-DC traffic?

I have read about RackAwareStrategy on the wiki, and have also browsed
through some code (DataCenterShardStrategy), but I would like to see what
people have to say about this.

I also read about an implemenetation of Rack Awareness employing
Zookeeper, but I gather that wasn't released by Facebook and it was more
geared towards single-DC rack awareness because Zookeeper is a bit heavy
on the bandwidth.

Anyway, just to sum it up, my question is this: please explain in brief
the reasons why Cassandra is well suited for multi-DC environments.

Alexander Altanis





Java-Client returns empty data sets after Cassandra sits for a while

2010-06-18 Thread Manfred Muench

Hi,

I have noticed the following behaviour (bug?) which I don't completely
understand:
1. start Cassandra (I'm using 0.6.2, but it also appears in 0.6.1)
2. work with it (I'm using Java thrift API)
3. let it sit for a long time (in my case: a day or more) without
issuing any command
4. go back to (2) -- but now Cassandra always returns empty data sets to
queries in Java. The command line interface works, no matter if left
open or started newly.

Here's how I connect to Cassandra (leaving exception handling out for 
better readability):


-
...
import org.apache.cassandra.thrift.Cassandra;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.protocol.TProtocol;
import org.apache.thrift.transport.TSocket;
...

TTransport transport = new TSocket(cassandraHost, cassandraPort);
TProtocol protocol = new TBinaryProtocol(transport);
Cassandra.Client client = new Cassandra.Client(protocol);
transport.open();
...
List keySlices = client.get_range_slices(...);
...
transport.flush();
transport.close();
...
-

This code usually works, but after leaving Cassandra running unused for 
a couple of hours (days), this code connects fine to Cassandra, but the 
client.get_range_slices returns an empty result set.


I am not very sure, but I believe it happens after compacting. Need to 
do more tests on this one.


Does anybody know what I'm doing wrong here? Is there any kind of 
"initialisation step" that I should have taken before running queries?


If you need more (debug) information on this matter, please let me know 
how I can provide you with it. The log files didn't show anything while 
running the query. The last log message was:


 INFO [COMPACTION-POOL:1] 2010-06-18 14:07:45,882 
CompactionManager.java (line 246) Compacting []


I ran the query at around 14:20, no other message after this one.

Thanks for your help in advance!

Cheers,
Manfred

--
Dr. Manfred Muench
Nanjing Imperiosus Technology Co. Ltd.
Wu Xing Nian Hua Da Sha, Room 1004
134 Hanzhong Lu, Nanjing, P.R. China




Re: AVRO client API

2010-06-18 Thread Eric Evans
On Fri, 2010-06-18 at 12:27 +0530, Atul Gosain wrote:
> Is the client API for cassandra available in AVRO.

Significant parts of it, but it is not yet finished.

> If so, any links to examples or some documentation?

There is no samples or documentation yet, sorry.

> and If so, any comparison between Thrift and Avro API's to determine
> the better of them?

The Plan is to develop enough critical mass around the Avro API that
Thrift can be deprecated. We don't want to maintain more than one of
these long-term.

-- 
Eric Evans
eev...@rackspace.com



Re: ec2 tests

2010-06-18 Thread Olivier Mallassi
Hi all,

@Chris, Did you get any bench you could share with us?

I am running the same kind of test on EC2  (m.large instances) :
- one VM for stress.py (can be launched several times)
- another VM for a unique cassandra node

I use the default conf settings (Xmx 1G, concurrentwrite 32...) except for
commitlog and DataFileDirectory : I have a raid0 EBS for commit log and
another raid0 EBS for data.

I can't get through 7500 write/sec (when launching 4 stress.py in the same
time).
Moreover I can see some pending tasks in the
org.cassandra.db.ColumnFamilyStores.Keyspace1.Standard1 MBean

Any ideas on the bottleneck?

Thanks a lot.

oliv/

On Fri, May 28, 2010 at 5:14 PM, gabriele renzi  wrote:

> On Fri, May 28, 2010 at 3:48 PM, Mark Greene  wrote:
> > First thing I would do is stripe your EBS volumes. I've seen blogs that
> say
> > this helps and blogs that say it's fairly marginal.
>
>
> just to point out: another option is to stripe the ephemeral drives
> (if using instances > small)
>



-- 

Olivier Mallassi
OCTO Technology

50, Avenue des Champs-Elysées
75008 Paris

Mobile: (33) 6 28 70 26 61
Tél: (33) 1 58 56 10 00
Fax: (33) 1 58 56 10 01

http://www.octo.com
Octo Talks! http://blog.octo.com


Failover and slow nodes

2010-06-18 Thread James Golick
Our cassandra client fails over if a node times out. Aside from actual
failure, repair and major compactions can make a node so slow that it
affects application performance.

One problem we've run in to is that a node in the midst of repair will still
have requests routed to it internally, even if all clients have failed over.
With a small number of nodes, this has a major impact on the performance of
the overall system.

I'm wondering whether people have any recommendations on tuning this
behaviour. It would be really nice not to route requests to an insanely slow
node.


Re: read operation is slow

2010-06-18 Thread Simon Reavely
Would it perhaps be worth denormalising your data so that you can  
retrieve all rows as a single row using a key encoded with the query  
predicate?


Until we get a stored proc feature (dunno if planned) it's hard to  
avoid round trips without denormalizing/replication of data to fit  
your query paths



Simon Reavely


On Jun 11, 2010, at 9:49 PM, "caribbean410"   
wrote:


Thanks for the suggestion. For the test case, it is 1 key and 1  
column. I once changed 10 to 1, as I remember there is no much  
difference.




I have 200k keys and each key is randomly generated. I will try the  
optimized query next week. But maybe you still have to face the case  
that each time a client just wants to query one key from db.




From: Dop Sun [mailto:su...@dopsun.com]
Sent: Friday, June 11, 2010 6:05 PM
To: user@cassandra.apache.org
Subject: RE: read operation is slow



And also, you are only select 1 key and 10 columns?



criteria.keyList(Lists.newArrayList(userName)).columnRange 
(nameFirst, nameFirst, 10);




Then, if you have 200k keys, you have 200k Thrift calls.  If this is  
the case, you may need to optimize the way you do the query (to  
combine multiple keys into a single query), and to reduce the number  
of calls.




From: Dop Sun [mailto:su...@dopsun.com]
Sent: Saturday, June 12, 2010 8:57 AM
To: user@cassandra.apache.org
Subject: RE: read operation is slow



You mean after you “I remove some unnecessary column family and chan 
ge the size of rowcache and keycache, now the latency changes from 0 
.25ms to 0.09ms. In essence 0.09ms*200k=18s.”, it still takes 400 se 
conds to returning?




From: Caribbean410 [mailto:caribbean...@gmail.com]
Sent: Saturday, June 12, 2010 8:48 AM
To: user@cassandra.apache.org
Subject: Re: read operation is slow



Hi, do you mean this one should not introduce much extra delay? To  
read a record, I need select here, not sure where the extra delay  
comes from.


On Fri, Jun 11, 2010 at 5:29 PM, Dop Sun  wrote:

Jassandra is used here:



Map> map = criteria.select();



The select here basically is a call to Thrift API: get_range_slices





From: Caribbean410 [mailto:caribbean...@gmail.com]
Sent: Saturday, June 12, 2010 8:00 AM


To: user@cassandra.apache.org
Subject: Re: read operation is slow



I remove some unnecessary column family and change the size of  
rowcache and keycache, now the latency changes from 0.25ms to  
0.09ms. In essence 0.09ms*200k=18s. I don't know why it takes more  
than 400s total. Here is the client code and cfstats. There are not  
many operations here, why is the extra time so large?




  long start = System.currentTimeMillis();
  for (int j = 0; j < 1; j++) {
  for (int i = 0; i < numOfRecords; i++) {
  int n = random.nextInt(numOfRecords);
  ICriteria criteria = cf.createCriteria();
  userName = keySet[n];
  criteria.keyList(Lists.newArrayList 
(userName)).columnRange(nameFirst, nameFirst, 10);
  Map> map =  
criteria.select();

  List list = map.get(userName);
//  ByteArray bloc = list.get(0).getValue();
//  byte[] byteArrayloc = bloc.toByteArray();
//  loc = new String(byteArrayloc);
//  readBytes = readBytes + loc.length();
  readBytes = readBytes + blobSize;
  }
  }

long finish=System.currentTimeMillis();

float totalTime=(finish-start)/1000;


Keyspace: Keyspace1
Read Count: 60
Read Latency: 0.090530067 ms.
Write Count: 20
Write Latency: 0.01504989 ms.
Pending Tasks: 0
Column Family: Standard2
SSTable count: 3
Space used (live): 265990358
Space used (total): 265990358
Memtable Columns Count: 2615
Memtable Data Size: 2667300
Memtable Switch Count: 3
Read Count: 60
Read Latency: 0.091 ms.
Write Count: 20
Write Latency: 0.015 ms.
Pending Tasks: 0
Key cache capacity: 1000
Key cache size: 187465
Key cache hit rate: 0.0
Row cache capacity: 1000
Row cache size: 189990
Row cache hit rate: 0.68335
Compacted row minimum size: 0
Compacted row maximum size: 0
Compacted row mean size: 0


Keyspace: system
Read Count: 1
Read Latency: 10.954 ms.
Write Count: 4
Write Latency: 0.28075 ms.
Pending Tasks: 0
Column Family: HintsColumnFamily
SSTable count: 0
Space used (live): 0
Space used (total): 0
Memtable Columns Count: 0
Memtable Data Size: 0
Memtable Switch Count: 0
Read Count: 0
Read Latency: NaN ms.
Write Count: 0
Write Latency: NaN ms.
Pending Tasks: 0
Key cache capacity: 1
  

Re: extending multiget

2010-06-18 Thread Sonny Heer
Assume a map/reduce program which needs to update some values during
ingest, and needs to perform read operations on 100 keys each of which
have say 50 different columns.  This happens many times for a given
reduce task in the cluster.  Shouldn't that be handled by the server
as a single call?


On Thu, Jun 17, 2010 at 5:54 PM, Jonathan Ellis  wrote:
> No.  At that point you basically have no overhead advantage vs just
> doing multiple single-row requests.
>
> On Thu, Jun 17, 2010 at 2:39 PM, Sonny Heer  wrote:
>> Any plans for this sort of call?
>>
>>
>> Instead of:
>>
>>    public Map> multiget_slice(String
>> keyspace, List keys, ColumnParent column_parent,
>> SlicePredicate predicate, ConsistencyLevel consistency_level) throws
>> InvalidRequestException, UnavailableException, TimedOutException,
>> TException;
>>
>> ---
>>
>>    public Map> multiget_slice(String
>> keyspace, Map> keyColNames, ColumnParent
>> column_parent, ConsistencyLevel consistency_level) throws
>> InvalidRequestException, UnavailableException, TimedOutException,
>> TException;
>>
>> ---
>>
>> where the keyColNames explicitly maps which column names to retrieve
>> for a given key, instead of a column slice on all keys.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: what is the best way to truncate a column family

2010-06-18 Thread Benjamin Black
In 0.6 your only option with those constraints is to iterate over the
entire CF and deleting row by row.  This requires you are either using
OPP or have an index that covers all keys in the CF.  0.7 adds the
ability to truncate a CF (deleting all its rows) through the API.

On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang
 wrote:
> programmatically w/o bringing the servers down.
>
> thanks,
> claire
>


Re: Failover and slow nodes

2010-06-18 Thread Benjamin Black
Would be interesting to have a snitch that manipulated responses for
read nodes based on historical response times.

On Fri, Jun 18, 2010 at 8:21 AM, James Golick  wrote:
> Our cassandra client fails over if a node times out. Aside from actual
> failure, repair and major compactions can make a node so slow that it
> affects application performance.
> One problem we've run in to is that a node in the midst of repair will still
> have requests routed to it internally, even if all clients have failed over.
> With a small number of nodes, this has a major impact on the performance of
> the overall system.
> I'm wondering whether people have any recommendations on tuning this
> behaviour. It would be really nice not to route requests to an insanely slow
> node.


Re: Failover and slow nodes

2010-06-18 Thread Stu Hood
See https://issues.apache.org/jira/browse/CASSANDRA-981

-Original Message-
From: "Benjamin Black" 
Sent: Friday, June 18, 2010 12:32pm
To: user@cassandra.apache.org
Subject: Re: Failover and slow nodes

Would be interesting to have a snitch that manipulated responses for
read nodes based on historical response times.

On Fri, Jun 18, 2010 at 8:21 AM, James Golick  wrote:
> Our cassandra client fails over if a node times out. Aside from actual
> failure, repair and major compactions can make a node so slow that it
> affects application performance.
> One problem we've run in to is that a node in the midst of repair will still
> have requests routed to it internally, even if all clients have failed over.
> With a small number of nodes, this has a major impact on the performance of
> the overall system.
> I'm wondering whether people have any recommendations on tuning this
> behaviour. It would be really nice not to route requests to an insanely slow
> node.




Re: Occasional 10s Timeouts on Read

2010-06-18 Thread AJ Slater
To summarize:

If a request for a column comes in *after a period of several hours
with no requests*, then the node servicing the request hangs while
looking for its peer rather than servicing the request like it should.
It then throws either a TimedOutException or a (wrong)
NotFoundExeption.

And it doen't appear to actually send the message it says it does to
its peer. Or at least its peer doesn't report the request being
received.

And then the situation magically clears up after approximately 2 minutes.

However, if the idle period never occurs, then the problem does not
manifest. If I run a cron job with wget against my server every
minute, I do not see the problem.

I'll be looking at some tcpdump logs to see if i can suss out what's
really happening, and perhaps file this as a bug. The several hours
between reproducible events makes this whole thing aggravating for
detection, debugging and I'll assume, fixing, if it is indeed a
cassandra problem.

It was suggested on IRC that it may be my network. But gossip is
continually sending heartbeats and nodetool and the logs show the
nodes as up and available. If my network was flaking out I'd think it
would be dropping heartbeats and I'd see that.

AJ

On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater  wrote:
> These are physical machines.
>
> storage-conf.xml.fs03 is here:
>
> http://pastebin.com/weL41NB1
>
> Diffs from that for the other two storage-confs are inline here:
>
> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
> storage-conf.xml.fs01
> 185c185
>
>>   71603818521973537678586548668074777838
> 229c229
> <   10.33.2.70
> ---
>>   10.33.3.10
> 241c241
> <   10.33.2.70
> ---
>>   10.33.3.10
> 341c341
> <   16
> ---
>>   4
>
>
> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
> storage-conf.xml.fs02
> 185c185
> <   0
> ---
>>   120215585224964746744782921158327379306
> 206d205
> <       10.33.3.20
> 229c228
> <   10.33.2.70
> ---
>>   10.33.3.20
> 241c240
> <   10.33.2.70
> ---
>>   10.33.3.20
> 341c340
> <   16
> ---
>>   4
>
>
> Thank you for your attention,
>
> AJ
>
>
> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black  wrote:
>> Are these physical machines or virtuals?  Did you post your
>> cassandra.in.sh and storage-conf.xml someplace?
>>
>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater  wrote:
>>> Total data size in the entire cluster is about twenty 12k images. With
>>> no other load on the system. I just ask for one column and I get these
>>> timeouts. Performing multiple gets on the columns leads to multiple
>>> timeouts for a period of a few seconds or minutes and then the
>>> situation magically resolves itself and response times are down to
>>> single digit milliseconds for a column get.
>>>
>>> On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater  wrote:
 Cassandra 0.6.2 from the apache debian source.
 Ubunutu Jaunty. Sun Java6 jvm.

 All nodes in separate racks at 365 main.

 On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater  wrote:
> I'm seing 10s timeouts on reads few times a day. Its hard to reproduce
> consistently but seems to happen most often after its been a long time
> between reads. After presenting itself for a couple minutes the
> problem then goes away.
>
> I've got a three node cluster with replication factor 2, reading at
> consistency level ONE. The columns being read are around 12k each. The
> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB.
>
> Here's an application log from early this morning when a developer in
> Belgrade accessed the system:
>
> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
> Requested image_id: 5827067133c3d670071c17d9144f0b49
> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 |
> TimedOutException for Image 5827067133c3d670071c17d9144f0b49
> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
> Get took 10005.388975 ms
> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
> Requested image_id: af8caf3b76ce97d13812ddf795104a5c
> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
> Get took 3.658056 ms
> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
> Transform took 0.978947 ms
>
> That's a Timeout and then a successful get of another column.
>
> Here's the cassandra log for 10.33.2.70:
>
> DEBUG 03:54:17,070 get_slice
> DEBUG 03:54:17,071 weakreadremote reading
> SliceFromReadCommand(table='jolitics.com',
> key='5827067133c3d670071c17d9144f0b49',
> column_parent='QueryPath(columnFamilyName='Images',
> superColumnName='null', columnName='null')', start='', finish='
> ', reversed=false, count=100)
> DEBUG 03:54:17,071 weakreadremote reading
> SliceFromReadCommand(table='jolitics.com',
> key='5827067133c3d670071c17d9144f0b49',
> column_parent='QueryPath(columnFamilyName='Images',
> superCol

Re: Cassandra Multiple DataCenter Suitability - why?

2010-06-18 Thread Patrick Hunt


On 06/18/2010 01:20 AM, alta...@ceid.upatras.gr wrote:

I also read about an implemenetation of Rack Awareness employing
Zookeeper, but I gather that wasn't released by Facebook and it was more
geared towards single-DC rack awareness because Zookeeper is a bit heavy
on the bandwidth.


Bandwidth is not the issue with a cross-colo ZooKeeper ensemble -- 
latency is the issue.


ZK is a quorum based service, a majority of the servers need to agree to 
every change (writes, reads are serviced locally by the server and don't 
face this issue). If the latency between servers is high then write 
operations will take longer. Generally this is "4L", so if you have 10ms 
latency btw colos it will take 40ms for a write to complete, if you have 
100ms latency btw colos it will take 400ms, etc... This is not an issue 
for "in colo" deployments since latency is typically very low. If you 
are using ZK for high level coordination then 100ms latency might not be 
bad, if you are using ZK for fine grained sharding it might be...


Patrick


Re: ec2 tests

2010-06-18 Thread Benjamin Black
On Fri, Jun 18, 2010 at 8:00 AM, Olivier Mallassi  wrote:
> I use the default conf settings (Xmx 1G, concurrentwrite 32...) except for
> commitlog and DataFileDirectory : I have a raid0 EBS for commit log and
> another raid0 EBS for data.
> I can't get through 7500 write/sec (when launching 4 stress.py in the same
> time).
> Moreover I can see some pending tasks in the
> org.cassandra.db.ColumnFamilyStores.Keyspace1.Standard1 MBean
> Any ideas on the bottleneck?

Your instance has 7.5G of RAM, but you are limiting Cassandra to 1G.
Increase -Xmx to 4G for a start.  You are likely to get significantly
better performance with the ephemeral drive, as well.  I suggest
testing with commitlog on the ephemeral drive for comparison.


b


Re: Failover and slow nodes

2010-06-18 Thread Benjamin Black
Perfect, ship it.

On Fri, Jun 18, 2010 at 10:37 AM, Stu Hood  wrote:
> See https://issues.apache.org/jira/browse/CASSANDRA-981
>
> -Original Message-
> From: "Benjamin Black" 
> Sent: Friday, June 18, 2010 12:32pm
> To: user@cassandra.apache.org
> Subject: Re: Failover and slow nodes
>
> Would be interesting to have a snitch that manipulated responses for
> read nodes based on historical response times.
>
> On Fri, Jun 18, 2010 at 8:21 AM, James Golick  wrote:
>> Our cassandra client fails over if a node times out. Aside from actual
>> failure, repair and major compactions can make a node so slow that it
>> affects application performance.
>> One problem we've run in to is that a node in the midst of repair will still
>> have requests routed to it internally, even if all clients have failed over.
>> With a small number of nodes, this has a major impact on the performance of
>> the overall system.
>> I'm wondering whether people have any recommendations on tuning this
>> behaviour. It would be really nice not to route requests to an insanely slow
>> node.
>
>
>


Re: AVRO client API

2010-06-18 Thread Paul Brown

On Jun 18, 2010, at 8:01 AM, Eric Evans wrote:
> On Fri, 2010-06-18 at 12:27 +0530, Atul Gosain wrote:
>> Is the client API for cassandra available in AVRO.
> Significant parts of it, but it is not yet finished.
>> If so, any links to examples or some documentation?
> There is no samples or documentation yet, sorry.
>> and If so, any comparison between Thrift and Avro API's to determine
>> the better of them?
> The Plan is to develop enough critical mass around the Avro API that
> Thrift can be deprecated. We don't want to maintain more than one of
> these long-term.

At the risk of asking about religion (but with no interest in hearing about 
it), why Avro instead of something like plain-old-JSON over HTTP?

-- Paul

Re: what is the best way to truncate a column family

2010-06-18 Thread Benjamin Black
I have been reminded that you can do a range query+pagination with RP
in 0.6 to perform this operation.

On Fri, Jun 18, 2010 at 10:29 AM, Benjamin Black  wrote:
> In 0.6 your only option with those constraints is to iterate over the
> entire CF and deleting row by row.  This requires you are either using
> OPP or have an index that covers all keys in the CF.  0.7 adds the
> ability to truncate a CF (deleting all its rows) through the API.
>
> On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang
>  wrote:
>> programmatically w/o bringing the servers down.
>>
>> thanks,
>> claire
>>
>


Re: Failover and slow nodes

2010-06-18 Thread James Golick
What's the current timeframe on 0.7?

On Fri, Jun 18, 2010 at 1:45 PM, Benjamin Black  wrote:

> Perfect, ship it.
>
> On Fri, Jun 18, 2010 at 10:37 AM, Stu Hood  wrote:
> > See https://issues.apache.org/jira/browse/CASSANDRA-981
> >
> > -Original Message-
> > From: "Benjamin Black" 
> > Sent: Friday, June 18, 2010 12:32pm
> > To: user@cassandra.apache.org
> > Subject: Re: Failover and slow nodes
> >
> > Would be interesting to have a snitch that manipulated responses for
> > read nodes based on historical response times.
> >
> > On Fri, Jun 18, 2010 at 8:21 AM, James Golick 
> wrote:
> >> Our cassandra client fails over if a node times out. Aside from actual
> >> failure, repair and major compactions can make a node so slow that it
> >> affects application performance.
> >> One problem we've run in to is that a node in the midst of repair will
> still
> >> have requests routed to it internally, even if all clients have failed
> over.
> >> With a small number of nodes, this has a major impact on the performance
> of
> >> the overall system.
> >> I'm wondering whether people have any recommendations on tuning this
> >> behaviour. It would be really nice not to route requests to an insanely
> slow
> >> node.
> >
> >
> >
>


Re: java.lang.RuntimeException: java.io.IOException: Value too large for defined data type

2010-06-18 Thread Gary Dusbabek
*Hopefully* fixed.  I was never able to duplicate the problem on my
workstation, but I had a pretty good idea what was causing the
problem.  Julie, if you're in a position to apply and test the fix, it
would help help us make sure we've got this one nailed down.

Gary.

On Thu, Jun 17, 2010 at 00:33, Jonathan Ellis  wrote:
> That is consistent with the
> https://issues.apache.org/jira/browse/CASSANDRA-1169 bug I mentioned,
> that is fixed in the 0.6 svn branch.
>
> On Wed, Jun 16, 2010 at 10:51 PM, Julie  wrote:
>> The loop is in IncomingStreamReader.java, line 62, a 3-line while loop.
>> bytesRead is not changing.  pendingFile.getExpectedBytes() returns
>> 7,161,538,639 but bytesRead is stuck at 2,147,483,647.
>>


Re: what is the best way to truncate a column family

2010-06-18 Thread Phil Stanhope
In 0.6.x the iterating approach works ... but you need to flush and compact 
(after GCGraceSeconds) in order to NOT see the keys in the CF.

Will the behavior of the truncate method in 0.7 require flush/compact as well? 
Or will it be immediate?

-phil

On Jun 18, 2010, at 1:29 PM, Benjamin Black wrote:

> In 0.6 your only option with those constraints is to iterate over the
> entire CF and deleting row by row.  This requires you are either using
> OPP or have an index that covers all keys in the CF.  0.7 adds the
> ability to truncate a CF (deleting all its rows) through the API.
> 
> On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang
>  wrote:
>> programmatically w/o bringing the servers down.
>> 
>> thanks,
>> claire
>> 



Re: what is the best way to truncate a column family

2010-06-18 Thread Ran Tavory
it will be immediate.
But it will fail if not all hosts in the cluster are up, this is the
tradeoff. We regard the truncate operation an admin api so I think it's a
fair tradeoff.

On Fri, Jun 18, 2010 at 11:50 PM, Phil Stanhope  wrote:

> In 0.6.x the iterating approach works ... but you need to flush and compact
> (after GCGraceSeconds) in order to NOT see the keys in the CF.
>
> Will the behavior of the truncate method in 0.7 require flush/compact as
> well? Or will it be immediate?
>
> -phil
>
> On Jun 18, 2010, at 1:29 PM, Benjamin Black wrote:
>
> > In 0.6 your only option with those constraints is to iterate over the
> > entire CF and deleting row by row.  This requires you are either using
> > OPP or have an index that covers all keys in the CF.  0.7 adds the
> > ability to truncate a CF (deleting all its rows) through the API.
> >
> > On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang
> >  wrote:
> >> programmatically w/o bringing the servers down.
> >>
> >> thanks,
> >> claire
> >>
>
>


Re: what is the best way to truncate a column family

2010-06-18 Thread Phil Stanhope
I am happy with this restriction on truncate operation for 0.7. Thanks for the 
quick response.

-phil

On Jun 18, 2010, at 4:57 PM, Ran Tavory wrote:

> it will be immediate. 
> But it will fail if not all hosts in the cluster are up, this is the 
> tradeoff. We regard the truncate operation an admin api so I think it's a 
> fair tradeoff. 
> 
> On Fri, Jun 18, 2010 at 11:50 PM, Phil Stanhope  wrote:
> In 0.6.x the iterating approach works ... but you need to flush and compact 
> (after GCGraceSeconds) in order to NOT see the keys in the CF.
> 
> Will the behavior of the truncate method in 0.7 require flush/compact as 
> well? Or will it be immediate?
> 
> -phil
> 
> On Jun 18, 2010, at 1:29 PM, Benjamin Black wrote:
> 
> > In 0.6 your only option with those constraints is to iterate over the
> > entire CF and deleting row by row.  This requires you are either using
> > OPP or have an index that covers all keys in the CF.  0.7 adds the
> > ability to truncate a CF (deleting all its rows) through the API.
> >
> > On Fri, Jun 18, 2010 at 10:24 AM, Claire Chang
> >  wrote:
> >> programmatically w/o bringing the servers down.
> >>
> >> thanks,
> >> claire
> >>
> 
> 



Re: AVRO client API

2010-06-18 Thread Eric Evans
On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote:
> At the risk of asking about religion (but with no interest in hearing
> about it), why Avro instead of something like plain-old-JSON over
> HTTP?

At the risk of having this thread veer off on a very long tangent...

In a nutshell, we need a way of processing requests and responses over
the network with typed data. You could of course put something together
to do this using JSON and HTTP, but not without reimplementing another
framework like Avro or Thrift (both of which can do JSON encoding, and
both of which have an HTTP transport).

-- 
Eric Evans
eev...@rackspace.com



Cassandra dinner in Austin

2010-06-18 Thread Jeremy Hanna
As mentioned in the #cassandra IRC channel - there's going to be a dinner in 
Austin on July 15th for people interested in Cassandra.

For those interested: http://cassandradinneraustin.eventbrite.com/

(Sorry if this doesn't apply to everyone, but everyone is welcome :)

Possible bug in Cassandra MapReduce

2010-06-18 Thread Corey Hulen
We are using MapReduce to periodical verify and rebuild our secondary
indexes along with counting total records.  We started to noticed double
counting of unique keys on single machine standalone tests. We were finally
able to reproduce the problem using
the apache-cassandra-0.6.2-src/contrib/word_count example and just
re-running it multiple times.  We are hoping someone can verify the bug.

re-run the tests and the word count for /tmp/word_count3/part-r-0 will
be 1000 +~200  and will change if you blow the data away and re-run.  Notice
the setup script loops and only inserts 1000 records so we expect count to
be 1000.  Once the data is generated then re-running the setup script and/or
mapreduce doesn't change the number (still off).  The key is to blow all the
data away and start over which will cause it to change.

Can someone please verify this behavior?

-Corey


Re: Possible bug in Cassandra MapReduce

2010-06-18 Thread Phil Stanhope
"blow all the data away" ... how do you do that? What is the timestamp 
precision that you are using when creating key/col or key/supercol/col items?

I have seen a fail to write a key when the timestamp is identical to the 
previous timestamp of a deleted key/col. While I didn't examine the source 
code, I'm certain that this is do to delete tombstones.

I view this as a application error because I was attempting to do this within 
the GCGraceSeconds time period. If I, however, stopped cassandra, blew away 
data & commitlogs and restarted the write always succeeds (no surprise there).

I turned this behavior into a feature (of sorts). When this happens I increment 
a formally non-zero portion of the timestamp (the last digit of precision which 
was always zero) and use this as a counter to track how many times a key/col 
was updated (max 9 for my purposes).

-phil

On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote:

> 
> We are using MapReduce to periodical verify and rebuild our secondary indexes 
> along with counting total records.  We started to noticed double counting of 
> unique keys on single machine standalone tests. We were finally able to 
> reproduce the problem using the apache-cassandra-0.6.2-src/contrib/word_count 
> example and just re-running it multiple times.  We are hoping someone can 
> verify the bug.
> 
> re-run the tests and the word count for /tmp/word_count3/part-r-0 will be 
> 1000 +~200  and will change if you blow the data away and re-run.  Notice the 
> setup script loops and only inserts 1000 records so we expect count to be 
> 1000.  Once the data is generated then re-running the setup script and/or 
> mapreduce doesn't change the number (still off).  The key is to blow all the 
> data away and start over which will cause it to change.
> 
> Can someone please verify this behavior?
> 
> -Corey



Re: Possible bug in Cassandra MapReduce

2010-06-18 Thread Corey Hulen
I thought the same thing, but using the supplied contrib example I just
delete the /var/lib/data dirs and commit log.

-Corey



On Fri, Jun 18, 2010 at 3:11 PM, Phil Stanhope  wrote:

> "blow all the data away" ... how do you do that? What is the timestamp
> precision that you are using when creating key/col or key/supercol/col
> items?
>
> I have seen a fail to write a key when the timestamp is identical to the
> previous timestamp of a deleted key/col. While I didn't examine the source
> code, I'm certain that this is do to delete tombstones.
>
> I view this as a application error because I was attempting to do this
> within the GCGraceSeconds time period. If I, however, stopped cassandra,
> blew away data & commitlogs and restarted the write always succeeds (no
> surprise there).
>
> I turned this behavior into a feature (of sorts). When this happens I
> increment a formally non-zero portion of the timestamp (the last digit of
> precision which was always zero) and use this as a counter to track how many
> times a key/col was updated (max 9 for my purposes).
>
> -phil
>
> On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote:
>
> >
> > We are using MapReduce to periodical verify and rebuild our secondary
> indexes along with counting total records.  We started to noticed double
> counting of unique keys on single machine standalone tests. We were finally
> able to reproduce the problem using the
> apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it
> multiple times.  We are hoping someone can verify the bug.
> >
> > re-run the tests and the word count for /tmp/word_count3/part-r-0
> will be 1000 +~200  and will change if you blow the data away and re-run.
>  Notice the setup script loops and only inserts 1000 records so we expect
> count to be 1000.  Once the data is generated then re-running the setup
> script and/or mapreduce doesn't change the number (still off).  The key is
> to blow all the data away and start over which will cause it to change.
> >
> > Can someone please verify this behavior?
> >
> > -Corey
>
>


Re: AVRO client API

2010-06-18 Thread Paul Brown

On Jun 18, 2010, at 2:12 PM, Eric Evans wrote:

> On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote:
>> At the risk of asking about religion (but with no interest in hearing
>> about it), why Avro instead of something like plain-old-JSON over
>> HTTP?
> At the risk of having this thread veer off on a very long tangent...
> In a nutshell, we need a way of processing requests and responses over
> the network with typed data. You could of course put something together
> to do this using JSON and HTTP, but not without reimplementing another
> framework like Avro or Thrift (both of which can do JSON encoding, and
> both of which have an HTTP transport).

"Rich, natively-provided types" is a fair answer; I was more interested in 
motivation that making a value judgement.

Cheers.

-- Paul

Re: Possible bug in Cassandra MapReduce

2010-06-18 Thread Corey Hulen
OK...I just verified on a clean EC2 small single instance box using
apache-cassandra-0.6.2-src.
 I'm pertty sure the Cassandra MapReduce functionality is broken.

If your MapReduce jobs are idempotent then you are OK, but if you are doing
things like word count (as in the supplied example) or key count you will
get double counts.

-Corey


On Fri, Jun 18, 2010 at 3:15 PM, Corey Hulen  wrote:

>
> I thought the same thing, but using the supplied contrib example I just
> delete the /var/lib/data dirs and commit log.
>
> -Corey
>
>
>
>
> On Fri, Jun 18, 2010 at 3:11 PM, Phil Stanhope wrote:
>
>> "blow all the data away" ... how do you do that? What is the timestamp
>> precision that you are using when creating key/col or key/supercol/col
>> items?
>>
>> I have seen a fail to write a key when the timestamp is identical to the
>> previous timestamp of a deleted key/col. While I didn't examine the source
>> code, I'm certain that this is do to delete tombstones.
>>
>> I view this as a application error because I was attempting to do this
>> within the GCGraceSeconds time period. If I, however, stopped cassandra,
>> blew away data & commitlogs and restarted the write always succeeds (no
>> surprise there).
>>
>> I turned this behavior into a feature (of sorts). When this happens I
>> increment a formally non-zero portion of the timestamp (the last digit of
>> precision which was always zero) and use this as a counter to track how many
>> times a key/col was updated (max 9 for my purposes).
>>
>> -phil
>>
>> On Jun 18, 2010, at 5:49 PM, Corey Hulen wrote:
>>
>> >
>> > We are using MapReduce to periodical verify and rebuild our secondary
>> indexes along with counting total records.  We started to noticed double
>> counting of unique keys on single machine standalone tests. We were finally
>> able to reproduce the problem using the
>> apache-cassandra-0.6.2-src/contrib/word_count example and just re-running it
>> multiple times.  We are hoping someone can verify the bug.
>> >
>> > re-run the tests and the word count for /tmp/word_count3/part-r-0
>> will be 1000 +~200  and will change if you blow the data away and re-run.
>>  Notice the setup script loops and only inserts 1000 records so we expect
>> count to be 1000.  Once the data is generated then re-running the setup
>> script and/or mapreduce doesn't change the number (still off).  The key is
>> to blow all the data away and start over which will cause it to change.
>> >
>> > Can someone please verify this behavior?
>> >
>> > -Corey
>>
>>
>


Re: AVRO client API

2010-06-18 Thread B. Todd Burruss
i'll jump in ... why AVRO over Thrift.  can you guys point me at a 
comparison?  (i know next to nothing about both of them)


On 06/18/2010 03:41 PM, Paul Brown wrote:

On Jun 18, 2010, at 2:12 PM, Eric Evans wrote:

   

On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote:
 

At the risk of asking about religion (but with no interest in hearing
about it), why Avro instead of something like plain-old-JSON over
HTTP?
   

At the risk of having this thread veer off on a very long tangent...
In a nutshell, we need a way of processing requests and responses over
the network with typed data. You could of course put something together
to do this using JSON and HTTP, but not without reimplementing another
framework like Avro or Thrift (both of which can do JSON encoding, and
both of which have an HTTP transport).
 

"Rich, natively-provided types" is a fair answer; I was more interested in 
motivation that making a value judgement.

Cheers.

-- Paul


Re: AVRO client API

2010-06-18 Thread Tatu Saloranta
On Fri, Jun 18, 2010 at 2:12 PM, Eric Evans  wrote:
> On Fri, 2010-06-18 at 11:00 -0700, Paul Brown wrote:
>> At the risk of asking about religion (but with no interest in hearing
>> about it), why Avro instead of something like plain-old-JSON over
>> HTTP?
>
> At the risk of having this thread veer off on a very long tangent...
>
> In a nutshell, we need a way of processing requests and responses over
> the network with typed data. You could of course put something together
> to do this using JSON and HTTP, but not without reimplementing another
> framework like Avro or Thrift (both of which can do JSON encoding, and
> both of which have an HTTP transport).

Not that I wanted to criticize choices, but do they actually allow use
of JSON as encoding?
Avro does use JSON for specifying schemas, but I wasn't aware of being
able to use it for encoding data.
Likewise with Thrift.

I think there's also important question of whether schemas/formatting
choice for payload should follow that of framing.
Avro/Thrift/PB seem reasonable for framing, use by protocol itself;
but for open payload it might make sense to allow different pluggable
formats.
Mostly because Avro/Thrift/PB are schema-bound formats which is not an
optimal choice for many use cases (but are fine for many others)
It is of course possible to just use byte[]/String as payload, handle
encoding and decoding on client end, and maybe that's how it should
be, for cases where strict schema doesn't work for use cases.

-+ Tatu +-


Re: AVRO client API

2010-06-18 Thread Miguel Verde
On Fri, Jun 18, 2010 at 6:23 PM, Tatu Saloranta wrote:

>  Not that I wanted to criticize choices, but do they actually allow use
> of JSON as encoding?
> Avro does use JSON for specifying schemas, but I wasn't aware of being
> able to use it for encoding data.
> Likewise with Thrift.
>

 Yes, each supports a JSON data encoding.  See
http://avro.apache.org/docs/1.3.3/spec.html#json_encoding for Avro and the
JSONProtocol in Thrift.  One clear advantage of these two is that they
support either stringified JSON or a compact binary encoding, and that they
each support (or intend to support) a more efficient TCP-based protocol
instead of only allowing HTTP.

Re: Avro vs Thrift, Cassandra has historically had difficulty getting Thrift
bugs fixed and Avro is more malleable at this point.  Additionally, Avro has
the potential for a more compact encoding and easier integration with
dynamic languages.


Re: AVRO client API

2010-06-18 Thread Tatu Saloranta
On Fri, Jun 18, 2010 at 4:57 PM, Miguel Verde  wrote:
> On Fri, Jun 18, 2010 at 6:23 PM, Tatu Saloranta 
> wrote:
>>
>> Not that I wanted to criticize choices, but do they actually allow use
>> of JSON as encoding?
>> Avro does use JSON for specifying schemas, but I wasn't aware of being
>> able to use it for encoding data.
>> Likewise with Thrift.
>
> Yes, each supports a JSON data encoding.  See
> http://avro.apache.org/docs/1.3.3/spec.html#json_encoding for Avro and the
> JSONProtocol in Thrift.  One clear advantage of these two is that they

Ok thanks. I learnt something new today. :-)

> support either stringified JSON or a compact binary encoding, and that they
> each support (or intend to support) a more efficient TCP-based protocol
> instead of only allowing HTTP.

Right. Latter is actually useful, then, as that would suggest
possibility of using alternative binary encodings with other pieces
(schema definition, protocol handling)
(encoding that supports their respective data sets)

> Re: Avro vs Thrift, Cassandra has historically had difficulty getting Thrift
> bugs fixed and Avro is more malleable at this point.  Additionally, Avro has
> the potential for a more compact encoding and easier integration with
> dynamic languages.

Yes, that has been my impression as well, so I was not surprised to
see plans for this change.
Although I have been interested in learning more about progress, to
know when would new versions be available.

-+ Tatu +-


Re: ec2 tests

2010-06-18 Thread Olivier Mallassi
I tried the following :
- always one cassandra node on one EC2 m.large instance. two other m.large
instance, I run 4 stress.py (50 thread each, 2 stress.py on each instance)
- RAID0 EBS for data and ephemeral EBS (/dev/sda1 partition) for commit log.
- -Xmx4G

and I did not see any improvements (Cassandra stays around 7000 W/sec).

CPU is running up to 130% (spike) but I have two 2,5Ghz CPU
the avgqu-sz goes up to 20 (sometimes more) (for the device /dev/sda1 that
stores the commitlog)

Do you think concurrentWrites or MemtableThroughputInMB parameters must be
increased (using default value right now)
Any suggestions are welcomed. ;o)

On Fri, Jun 18, 2010 at 7:42 PM, Benjamin Black  wrote:

> On Fri, Jun 18, 2010 at 8:00 AM, Olivier Mallassi 
> wrote:
> > I use the default conf settings (Xmx 1G, concurrentwrite 32...) except
> for
> > commitlog and DataFileDirectory : I have a raid0 EBS for commit log and
> > another raid0 EBS for data.
> > I can't get through 7500 write/sec (when launching 4 stress.py in the
> same
> > time).
> > Moreover I can see some pending tasks in the
> > org.cassandra.db.ColumnFamilyStores.Keyspace1.Standard1 MBean
> > Any ideas on the bottleneck?
>
> Your instance has 7.5G of RAM, but you are limiting Cassandra to 1G.
> Increase -Xmx to 4G for a start.  You are likely to get significantly
> better performance with the ephemeral drive, as well.  I suggest
> testing with commitlog on the ephemeral drive for comparison.
>
>
> b
>



-- 

Olivier Mallassi
OCTO Technology

50, Avenue des Champs-Elysées
75008 Paris

Mobile: (33) 6 28 70 26 61
Tél: (33) 1 58 56 10 00
Fax: (33) 1 58 56 10 01

http://www.octo.com
Octo Talks! http://blog.octo.com


Re: ec2 tests

2010-06-18 Thread Joe Stump

On Jun 18, 2010, at 6:39 PM, Olivier Mallassi wrote:

> and I did not see any improvements (Cassandra stays around 7000 W/sec). 

It's a brave new world where N+1 scaling with 7,000 writes per second per node 
is considered suboptimal performance.

--Joe



Learning-by-doing (also announcing a new Ruby Client Codename: "Greek Architect")

2010-06-18 Thread Thomas Heller
Howdy!

So, last week I finally got around to playing with Cassandra. After a
while I understood the basics. To test this assumption I started
working on my own Client implementation since "Learning-by-doing" is
what I do and existing Ruby Clients (which are awesome) already
abstracted too much for me to really grasp what was going on. Java is
not really my thing (anymore) so I began with the Thrift API and Ruby.

Anyways back to Topic.

This library is now is available at:
http://github.com/thheller/greek_architect

Since I have virtually no experience with Cassandra (but plenty with
SQL) I started with the first use-case which I have programmed a bunch
of times before. User Management. I build websites which are used by
other people, so I need to store them somewhere.

Step #1: Creating Users and persisting them in Cassandra

Example here:
http://github.com/thheller/greek_architect/blob/master/spec/examples/user_create_spec.rb

I hope my rspec-style documentation doesnt confuse too many people
since I already have a gazillion questions for this simple, but also
VERY common use-case. Since a question is best asked with a concrete
example to refer to, here goes my first one:

Would any of you veterans build what I built the way I did? (refering
to the cassandra design, not the ruby client)

I insert Users with UUID keys into one ColumnFamily. I then index them
by creating a row in another ColumnFamily using the Name as Key and
then adding one column holding a reference to the User UUID. I also
insert a reference into another ColumnFamily holding a List of Users
partitioned by Date.

I'm really unsure about the index design, since they dont get updated
when a User row is removed. I could hook into the remove call (like I
did into mutations) and cascade the deletes where needed, but 10+
years of SQL always want to tell me I'm crazy for doing this stuff!

I'd really appreciate some feedback.

Cheers,
Thomas


Re: ec2 tests

2010-06-18 Thread Chris Dean
> @Chris, Did you get any bench you could share with us?

We're still working on it.  It's a lower priority task so it will take a
while to finish.  So far we've run on all the AWS data centers in the US
and used several different setups.  We also did a test on Rackspace with
one setup and some whitebox servers we had in the office.  (The whitebox
servers are still running I believe.)

I don't have the numbers here, but the fastest by far is the
non-virtualized whitebox servers.  No real surprise.  Rackspace was
faster than AWS US-West; US-West faster than the than US-East.  

We always use 3 Cassandra servers and one or two machines to run
stress.py.  I don't think we're seeing the 7500 writes/sec so maybe our
config is wrong.  You'll have to be patient until my colleague writes
this all up.

Cheers,
Chris Dean


Re: Occasional 10s Timeouts on Read

2010-06-18 Thread Jonathan Ellis
set log level to TRACE and see if the OutboundTcpConnection is going
bad.  that would explain the message never arriving.

On Fri, Jun 18, 2010 at 10:39 AM, AJ Slater  wrote:
> To summarize:
>
> If a request for a column comes in *after a period of several hours
> with no requests*, then the node servicing the request hangs while
> looking for its peer rather than servicing the request like it should.
> It then throws either a TimedOutException or a (wrong)
> NotFoundExeption.
>
> And it doen't appear to actually send the message it says it does to
> its peer. Or at least its peer doesn't report the request being
> received.
>
> And then the situation magically clears up after approximately 2 minutes.
>
> However, if the idle period never occurs, then the problem does not
> manifest. If I run a cron job with wget against my server every
> minute, I do not see the problem.
>
> I'll be looking at some tcpdump logs to see if i can suss out what's
> really happening, and perhaps file this as a bug. The several hours
> between reproducible events makes this whole thing aggravating for
> detection, debugging and I'll assume, fixing, if it is indeed a
> cassandra problem.
>
> It was suggested on IRC that it may be my network. But gossip is
> continually sending heartbeats and nodetool and the logs show the
> nodes as up and available. If my network was flaking out I'd think it
> would be dropping heartbeats and I'd see that.
>
> AJ
>
> On Thu, Jun 17, 2010 at 2:26 PM, AJ Slater  wrote:
>> These are physical machines.
>>
>> storage-conf.xml.fs03 is here:
>>
>> http://pastebin.com/weL41NB1
>>
>> Diffs from that for the other two storage-confs are inline here:
>>
>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>> storage-conf.xml.fs01
>> 185c185
>>
>>>   71603818521973537678586548668074777838
>> 229c229
>> <   10.33.2.70
>> ---
>>>   10.33.3.10
>> 241c241
>> <   10.33.2.70
>> ---
>>>   10.33.3.10
>> 341c341
>> <   16
>> ---
>>>   4
>>
>>
>> a...@worm:../Z3/cassandra/conf/dev$ diff storage-conf.xml.lpc03
>> storage-conf.xml.fs02
>> 185c185
>> <   0
>> ---
>>>   120215585224964746744782921158327379306
>> 206d205
>> <       10.33.3.20
>> 229c228
>> <   10.33.2.70
>> ---
>>>   10.33.3.20
>> 241c240
>> <   10.33.2.70
>> ---
>>>   10.33.3.20
>> 341c340
>> <   16
>> ---
>>>   4
>>
>>
>> Thank you for your attention,
>>
>> AJ
>>
>>
>> On Thu, Jun 17, 2010 at 2:09 PM, Benjamin Black  wrote:
>>> Are these physical machines or virtuals?  Did you post your
>>> cassandra.in.sh and storage-conf.xml someplace?
>>>
>>> On Thu, Jun 17, 2010 at 10:31 AM, AJ Slater  wrote:
 Total data size in the entire cluster is about twenty 12k images. With
 no other load on the system. I just ask for one column and I get these
 timeouts. Performing multiple gets on the columns leads to multiple
 timeouts for a period of a few seconds or minutes and then the
 situation magically resolves itself and response times are down to
 single digit milliseconds for a column get.

 On Thu, Jun 17, 2010 at 10:24 AM, AJ Slater  wrote:
> Cassandra 0.6.2 from the apache debian source.
> Ubunutu Jaunty. Sun Java6 jvm.
>
> All nodes in separate racks at 365 main.
>
> On Thu, Jun 17, 2010 at 10:12 AM, AJ Slater  wrote:
>> I'm seing 10s timeouts on reads few times a day. Its hard to reproduce
>> consistently but seems to happen most often after its been a long time
>> between reads. After presenting itself for a couple minutes the
>> problem then goes away.
>>
>> I've got a three node cluster with replication factor 2, reading at
>> consistency level ONE. The columns being read are around 12k each. The
>> nodes are 8GB multicore boxes with the JVM limits between 4GB and 6GB.
>>
>> Here's an application log from early this morning when a developer in
>> Belgrade accessed the system:
>>
>> Jun 17 03:54:17 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>> Requested image_id: 5827067133c3d670071c17d9144f0b49
>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:76 |
>> TimedOutException for Image 5827067133c3d670071c17d9144f0b49
>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>> Get took 10005.388975 ms
>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:pinhole.py:61 |
>> Requested image_id: af8caf3b76ce97d13812ddf795104a5c
>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>> Get took 3.658056 ms
>> Jun 17 03:54:27 lpc03 pinhole[5736]: MainThread:zlog.py:105 | Image
>> Transform took 0.978947 ms
>>
>> That's a Timeout and then a successful get of another column.
>>
>> Here's the cassandra log for 10.33.2.70:
>>
>> DEBUG 03:54:17,070 get_slice
>> DEBUG 03:54:17,071 weakreadremote reading
>> SliceFromReadCommand(table='jolitics.com',
>> key='5827067133c3d670071c17d9144f0b49',
>> column_parent='QueryPath(columnFami

Re: Failover and slow nodes

2010-06-18 Thread Jonathan Ellis
My guess?  8-10 weeks.

On Fri, Jun 18, 2010 at 1:31 PM, James Golick  wrote:
> What's the current timeframe on 0.7?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Possible bug in Cassandra MapReduce

2010-06-18 Thread Jonathan Ellis
Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042

On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen  wrote:
>
> We are using MapReduce to periodical verify and rebuild our secondary
> indexes along with counting total records.  We started to noticed double
> counting of unique keys on single machine standalone tests. We were finally
> able to reproduce the problem using
> the apache-cassandra-0.6.2-src/contrib/word_count example and just
> re-running it multiple times.  We are hoping someone can verify the bug.
> re-run the tests and the word count for /tmp/word_count3/part-r-0 will
> be 1000 +~200  and will change if you blow the data away and re-run.  Notice
> the setup script loops and only inserts 1000 records so we expect count to
> be 1000.  Once the data is generated then re-running the setup script and/or
> mapreduce doesn't change the number (still off).  The key is to blow all the
> data away and start over which will cause it to change.
> Can someone please verify this behavior?
> -Corey



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Java-Client returns empty data sets after Cassandra sits for a while

2010-06-18 Thread Jonathan Ellis
I can't think of any scenario where leaving Cassandra idle would
affect the results returned.  I think something else is going on here.

On Fri, Jun 18, 2010 at 2:05 AM, Manfred Muench
 wrote:
> Hi,
>
> I have noticed the following behaviour (bug?) which I don't completely
> understand:
> 1. start Cassandra (I'm using 0.6.2, but it also appears in 0.6.1)
> 2. work with it (I'm using Java thrift API)
> 3. let it sit for a long time (in my case: a day or more) without
> issuing any command
> 4. go back to (2) -- but now Cassandra always returns empty data sets to
> queries in Java. The command line interface works, no matter if left
> open or started newly.
>
> Here's how I connect to Cassandra (leaving exception handling out for better
> readability):
>
> -
> ...
> import org.apache.cassandra.thrift.Cassandra;
> import org.apache.thrift.protocol.TBinaryProtocol;
> import org.apache.thrift.protocol.TProtocol;
> import org.apache.thrift.transport.TSocket;
> ...
>
> TTransport transport = new TSocket(cassandraHost, cassandraPort);
> TProtocol protocol = new TBinaryProtocol(transport);
> Cassandra.Client client = new Cassandra.Client(protocol);
> transport.open();
> ...
> List keySlices = client.get_range_slices(...);
> ...
> transport.flush();
> transport.close();
> ...
> -
>
> This code usually works, but after leaving Cassandra running unused for a
> couple of hours (days), this code connects fine to Cassandra, but the
> client.get_range_slices returns an empty result set.
>
> I am not very sure, but I believe it happens after compacting. Need to do
> more tests on this one.
>
> Does anybody know what I'm doing wrong here? Is there any kind of
> "initialisation step" that I should have taken before running queries?
>
> If you need more (debug) information on this matter, please let me know how
> I can provide you with it. The log files didn't show anything while running
> the query. The last log message was:
>
>  INFO [COMPACTION-POOL:1] 2010-06-18 14:07:45,882 CompactionManager.java
> (line 246) Compacting []
>
> I ran the query at around 14:20, no other message after this one.
>
> Thanks for your help in advance!
>
> Cheers,
> Manfred
>
> --
> Dr. Manfred Muench
> Nanjing Imperiosus Technology Co. Ltd.
> Wu Xing Nian Hua Da Sha, Room 1004
> 134 Hanzhong Lu, Nanjing, P.R. China
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Possible bug in Cassandra MapReduce

2010-06-18 Thread Corey Hulen
Awesome...thanks.

I just downloaded the patch and applied it and verified it fixes our
problems.

what's the ETA on 0.6.3?  (debating on weather to tolerate it or maintain
our own 0.6.2+patch).

-Corey

On Fri, Jun 18, 2010 at 8:21 PM, Jonathan Ellis  wrote:

> Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042
>
> On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen  wrote:
> >
> > We are using MapReduce to periodical verify and rebuild our secondary
> > indexes along with counting total records.  We started to noticed double
> > counting of unique keys on single machine standalone tests. We were
> finally
> > able to reproduce the problem using
> > the apache-cassandra-0.6.2-src/contrib/word_count example and just
> > re-running it multiple times.  We are hoping someone can verify the bug.
> > re-run the tests and the word count for /tmp/word_count3/part-r-0
> will
> > be 1000 +~200  and will change if you blow the data away and re-run.
>  Notice
> > the setup script loops and only inserts 1000 records so we expect count
> to
> > be 1000.  Once the data is generated then re-running the setup script
> and/or
> > mapreduce doesn't change the number (still off).  The key is to blow all
> the
> > data away and start over which will cause it to change.
> > Can someone please verify this behavior?
> > -Corey
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Possible bug in Cassandra MapReduce

2010-06-18 Thread Jonathan Ellis
Looks like the end of June.

On Fri, Jun 18, 2010 at 8:38 PM, Corey Hulen  wrote:
> Awesome...thanks.
> I just downloaded the patch and applied it and verified it fixes our
> problems.
> what's the ETA on 0.6.3?  (debating on weather to tolerate it or maintain
> our own 0.6.2+patch).
> -Corey
>
> On Fri, Jun 18, 2010 at 8:21 PM, Jonathan Ellis  wrote:
>>
>> Fixed for 0.6.3: https://issues.apache.org/jira/browse/CASSANDRA-1042
>>
>> On Fri, Jun 18, 2010 at 2:49 PM, Corey Hulen  wrote:
>> >
>> > We are using MapReduce to periodical verify and rebuild our secondary
>> > indexes along with counting total records.  We started to noticed double
>> > counting of unique keys on single machine standalone tests. We were
>> > finally
>> > able to reproduce the problem using
>> > the apache-cassandra-0.6.2-src/contrib/word_count example and just
>> > re-running it multiple times.  We are hoping someone can verify the bug.
>> > re-run the tests and the word count for /tmp/word_count3/part-r-0
>> > will
>> > be 1000 +~200  and will change if you blow the data away and re-run.
>> >  Notice
>> > the setup script loops and only inserts 1000 records so we expect count
>> > to
>> > be 1000.  Once the data is generated then re-running the setup script
>> > and/or
>> > mapreduce doesn't change the number (still off).  The key is to blow all
>> > the
>> > data away and start over which will cause it to change.
>> > Can someone please verify this behavior?
>> > -Corey
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Lucandra issues

2010-06-18 Thread Jake Luciani

Hi Maxim,

Lucandra doesn't support numeric queries quite yet. A workaround would  
be to load your numbers and convert them to strings.


I'll eventually add support for this. Please feel free to help out if  
you can :)


Jake



On Jun 17, 2010, at 1:16 PM, Maxim Kramarenko  
 wrote:



Hello!

I am trying to rework our current lucene-based application to  
lucandra. Note the following problem: when I try to use  
NumericRangeQuery like this one:
query.add(NumericRangeQuery.newLongRange("deliveryTimestampMinute",  
6, fromDate, toDate, true, true), BooleanClause.Occur.MUST);


I got the following exception:

java.lang.NullPointerException
org.apache.lucene.search.NumericRangeQuery$NumericRangeTermEnum.next 
(NumericRangeQuery.java:536)
org.apache.lucene.search.MultiTermQuery 
$ConstantScoreAutoRewrite.rewrite(MultiTermQuery.java:248)
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java: 
371)

org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:386)
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:267)
org.apache.lucene.search.Query.weight(Query.java:100)
org.apache.lucene.search.Searcher.createWeight(Searcher.java:147)
org.apache.lucene.search.Searcher.search(Searcher.java:98)
org.apache.lucene.search.Searcher.search(Searcher.java:108)
===

Any workaround for this issue ?

--
Best regards,
Maximmailto:maxi...@trackstudio.com