Re: Issues getting JNA to work correctly under centos 5.5 using cassandra 0.7.0-rc1 and JNA 2.7.3

2010-11-29 Thread jasonmpell
I checked and /etc/security/limits.conf on redhat supports zero (0) to
mean unlimited.  Here is the sample from the man page.  Notice the
soft core entry.

EXAMPLES
   These are some example lines which might be specified in
   /etc/security/limits.conf.

   *   softcore0
   *   hardrss 1
   @studenthardnproc   20
   @facultysoftnproc   20
   @facultyhardnproc   50
   ftp hardnproc   0
   @student-   maxlogins   4



On Mon, Nov 29, 2010 at 6:51 AM, Jason Pell  wrote:
> Ok that's a good point i will check - I am not sure.
>
> Sent from my iPhone
> On Nov 29, 2010, at 5:53, Tyler Hobbs  wrote:
>
> I'm not familiar with ulimit on RedHat systems, but are you sure you
> have ulimit set correctly? Did you set it to '0' or 'unlimited'?  I ask
> because on a Debian system, I get this:
>
> tho...@~ $ ulimit -l
> unlimited
>
> Where you said that you got back '0'.
>
> - Tyler
>
> On Sun, Nov 28, 2010 at 1:15 AM, Jason Pell  wrote:
>>
>> Hi,
>>
>> I have selinux disabled via /etc/sysconfig/selinux already.  But I did
>> as you suggested anyway, even restarted the whole machine again too
>> and still no difference.  Do you know if there is a way to discover
>> exactly what this error means?
>>
>> THanks
>> Jason
>>
>> On Sat, Nov 27, 2010 at 3:59 AM, Nate McCall  wrote:
>> > This might be an issue with selinux. You can try this quickly to
>> > temporarily disable selinux enforcement:
>> > /usr/sbin/setenforce 0  (as root)
>> >
>> > and then start cassandra as your user.
>> >
>> > On Fri, Nov 26, 2010 at 1:00 AM, Jason Pell 
>> > wrote:
>> >> I restarted the box :-) so it's well and truly set
>> >>
>> >> Sent from my iPhone
>> >> On Nov 26, 2010, at 17:57, Brandon Williams  wrote:
>> >>
>> >> On Thu, Nov 25, 2010 at 10:02 PM, Jason Pell 
>> >> wrote:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I have set the memlock limit to unlimited in /etc/security/limits.conf
>> >>>
>> >>> [devel...@localhost apache-cassandra-0.7.0-rc1]$ ulimit -l
>> >>> 0
>> >>>
>> >>> Running as a non root user gets me a Unknown mlockall error 1
>> >>
>> >> Have you tried logging out and back in after changing limits.conf?
>> >> -Brandon
>> >
>
>


Re: Taking down a node in a 3-node cluster, RF=2

2010-11-29 Thread David Boxenhorn
For that matter, RF=1 and QUORUM are incompatible (if you want to be able to
take a node down).

In other words, if you use QUORUM, you need RF>=3.

On Sun, Nov 28, 2010 at 8:04 PM, Jake Luciani  wrote:

> Right.
>
>
> On Sun, Nov 28, 2010 at 1:03 PM, David Boxenhorn wrote:
>
>> OK. To sum up: RF=2 and QUORUM are incompatible (if you want to be able to
>> take a node down).
>>
>> Right?
>>
>> On Sun, Nov 28, 2010 at 7:59 PM, Jake Luciani  wrote:
>>
>>> I was wrong on this scenario and I'll explain where I was incorrect.
>>>
>>> Hints are stored for a downed node but they don't count towards meeting a
>>> consistency level.
>>> Let's take 2 scenarios:
>>>
>>> RF=6, Nodes=10
>>>
>>> If you READ/WRITE with CL.QUORUM you will need 4 alive nodes if one is
>>> down it will still have 4 active replicas to write to, one of these will
>>> store a hint and update the downed node when it comes back.
>>>
>>> RF=2, Nodes=3
>>>
>>> If you READ/WRITE with CL.QUORUM you need 2 live nodes.  If one of these
>>> 2 are down you can't meet the QUORUM level so the write will fail.
>>>
>>> In your scenario your best bet is to update to RF=3, then any two nodes
>>> will accept QUORUM
>>>
>>> Sorry for the confusion,
>>>
>>> -Jake
>>>
>>> On Sun, Nov 28, 2010 at 12:26 PM, David Boxenhorn wrote:
>>>
 Thank you, Jake. It does... except that in another context you told me:

 Hints only happen when a node is unavailable and you are writing with
 CL.ANY
 If you never write with CL.ANY then you can turn off hinted handoff.

 How do I reconcile this?


 On Sun, Nov 28, 2010 at 7:11 PM, Jake Luciani  wrote:

> If you read/write data with quorum then you can safely take a node down
> in this scenario.  Subsequent writes will use hinted handoff to be passed 
> to
> the node when it comes back up.
>
> More info is here: http://wiki.apache.org/cassandra/HintedHandoff
>
> Does that answer your question?
>
> -Jake
>
>
> On Sun, Nov 28, 2010 at 9:42 AM, Ran Tavory  wrote:
>
>> to me it makes sense that if hinted handoff is off then cassandra
>> cannot satisfy 2 out of every 3rd writes writes when one of the nodes is
>> down since this node is the designated node of 2/3 writes.
>> But I don't remember reading this somewhere. Does hinted handoff
>> affect David's situation?
>> (David, did you disable HH in your storage-config?
>> false)
>>
>>
>> On Sun, Nov 28, 2010 at 4:32 PM, David Boxenhorn 
>> wrote:
>>
>>> For the vast majority of my data usage eventual consistency is fine
>>> (i.e. CL=ONE) but I have a small amount of critical data for which I 
>>> read
>>> and write using CL=QUORUM.
>>>
>>> If I have a cluster with 3 nodes and RF=2, and CL=QUORUM does that
>>> mean that a value can be read from or written to any 2 nodes, or does it
>>> have to be the particular 2 nodes that store the data? If it is the
>>> particular 2 nodes that store the data, that means that I can't even 
>>> take
>>> down one node, since it will be the mandatory 2nd node for 1/3 of my 
>>> data...
>>>
>>>
>>
>>
>>
>> --
>> /Ran
>>
>>
>

>>>
>>
>


Booting Cassandra v0.7.0 on Windows: rename failed

2010-11-29 Thread Ramon Rockx
Hi,
 
Recently I downloaded Cassandra v0.7.0 rc1. When I try to run cassandra
it ends with the following logging:
 
 INFO 09:17:30,044 Enqueuing flush of
memtable-locationi...@839514767(643 bytes, 12 operations)
 INFO 09:17:30,045 Writing memtable-locationi...@839514767(643 bytes, 12
operations)
ERROR 09:17:30,233 Fatal exception in thread
Thread[FlushWriter:1,5,main]
java.io.IOError: java.io.IOException: rename failed of
d:\cassandra\data\system\LocationInfo-e-1-Data.db
 at
org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
214)
 at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
Writer.java:184)
 at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
Writer.java:167)
 at
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161)
 at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
 at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
 at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
r.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
va:908)
 at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: rename failed of
d:\cassandra\data\system\LocationInfo-e-1-Data.db
 at
org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.jav
a:359)
 at
org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
210)
 ... 12 more

Operating system is Windows 7. Tried it also on Windows 2003 server.
I only modified a few (necessary) path settings in cassandra.yaml:

commitlog_directory: d:/cassandra/commitlog
data_file_directories:
- d:/cassandra/data
saved_caches_directory: d:/cassandra/saved_caches

Does anybody know what I'm doing wrong?

Regards,
Ramon


word_count example fails in multi-node configuration

2010-11-29 Thread RS
Hi guys,

I am trying to run word_count example from contrib directory (0.7 beta
3 and 0.7.0 rc 1).
It works fine in a single-node configuration, but fails with 2+ nodes.

It fails in the assert statement, which caused problems before
(https://issues.apache.org/jira/browse/CASSANDRA-1700).

Here's a simple ring I have and error messages.
---
Address Status State   LoadOwnsToken

143797990709940316224804537595633718982
127.0.0.2   Up Normal  40.2 KB 51.38%
61078635599166706937511052402724559481
127.0.0.1   Up Normal  36.01 KB48.62%
143797990709940316224804537595633718982
---
[SERVER SIDE]

ERROR 17:39:57,098 Fatal exception in thread Thread[ReadStage:4,5,main]
java.lang.AssertionError:
(143797990709940316224804537595633718982,61078635599166706937511052402724559481]
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1273)
at 
org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:48)
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
---
[CLIENT_SIDE]
java.lang.RuntimeException: org.apache.thrift.TApplicationException:
Internal error processing get_range_slices
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148)
at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: org.apache.thrift.TApplicationException: Internal error
processing get_range_slices
at 
org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255)
... 11 more
---

Looks like tokens used in ColumnFamilySplits
(ColumnFamilyInputFormat.java) are on wrapping ranges (left_token >
right_token).
Any ideas how to fix this?

--
Regards,
Roman


Cassandra 0.7 beta 3 outOfMemory (OOM)

2010-11-29 Thread cassandra

Hi community,

during my tests i had several OOM crashes.
Getting some hints to find the problem would be nice.

First cassandra crashes after about 45 min insert test script.
During the following tests time to OOM was shorter until it started to crash
even in "idle" mode.

Here the facts:
- cassandra 0.7 beta 3
- using lucandra to index about 3 million files ~1kb data
- inserting with one client to one cassandra node with about 200 files/s
- cassandra data files for this keyspace grow up to about 20 GB
- the keyspace only contains the two lucandra specific CFs

Cluster:
- cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM
- java jre 1.6.0_22
- heap space first 1GB, later increased to 1,3 GB

Cassandra.yaml:
default + reduced "binary_memtable_throughput_in_mb" to 128

CFs:
default + reduced
min_compaction_threshold: 4
max_compaction_threshold: 8


I think the problem appears always during compaction,
and perhaps it is a result of large rows (some about 170mb).

Are there more optionions we could use to work with few memory?

Is it a problem of compaction?
And how to avoid?
Slower inserts? More memory?
Even fewer memtable_throuput or in_memory_compaction_limit?
Continuous manual major comapction?

I've read  
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors

- row_size should be fixed since 0.7 and 200mb is still far away from 2gb
- only key cache is used a little bit 3600/2
- after a lot of writes cassandra crashes even in idle mode
- memtablesize was reduced and there are only 2 CFs

Several heapdumps in MAT show 60-99% heapusage of compaction thread.

Here some log extract:

 INFO [CompactionExecutor:1] 2010-11-26 14:18:18,593  
CompactionIterator.java (line 134) Compacting large row  
6650325572717566efbfbf44545241434b53efbfbf31 (172967291 bytes)  
incrementally
 INFO [ScheduledTasks:1] 2010-11-26 14:18:41,421 GCInspector.java  
(line 133) GC for ParNew: 365 ms, 54551328 reclaimed leaving 459496840  
used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:42,437 GCInspector.java  
(line 133) GC for ParNew: 226 ms, 12469104 reclaimed leaving 554506776  
used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:43,453 GCInspector.java  
(line 133) GC for ParNew: 224 ms, 12777840 reclaimed leaving 649207976  
used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:44,468 GCInspector.java  
(line 133) GC for ParNew: 225 ms, 12564144 reclaimed leaving 744122872  
used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:45,468 GCInspector.java  
(line 133) GC for ParNew: 222 ms, 16020328 reclaimed leaving 835581584  
used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:46,468 GCInspector.java  
(line 133) GC for ParNew: 226 ms, 12697912 reclaimed leaving 930362712  
used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:47,468 GCInspector.java  
(line 133) GC for ParNew: 227 ms, 15816872 reclaimed leaving  
1022026288 used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:48,484 GCInspector.java  
(line 133) GC for ParNew: 258 ms, 12746584 reclaimed leaving  
1116758744 used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:49,484 GCInspector.java  
(line 133) GC for ParNew: 257 ms, 12802608 reclaimed leaving  
1211435176 used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 133) GC for ConcurrentMarkSweep: 4188 ms, 271308512 reclaimed  
leaving 1047605704 used; max is 1450442752
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 153) Pool NameActive   Pending
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ResponseStage 0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ReadStage 0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ReadRepair0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) MutationStage 0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) GossipStage   0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) AntientropyStage  0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) MigrationStage0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) StreamStage   0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) MemtablePostFlusher   0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) FlushWriter   0 0
 INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 160) MiscStage  

The key list of multiget_slice's parameter has been changed unexpectedly.

2010-11-29 Thread eggli

Hi everyone, we are working on a Java product based on Cassandra since 0.5, and 
Cassandra made a very huge change in 0.7 beta 2, which changes all byte array 
into ByteBuffers, and we found this problem which confuses us a lot, here's the 
detail about what happened:

The multiget_slice method in Cassandra.Iface indicated that it requires a list 
of keys for multi get slice query, which we believed we have to give every 
individual keys to get the data we need, and according to the Java doc, we will 
get a Map result, which uses a  ByteBuffer as key and ColunmOrSuperColumn as 
value, we made a guess that the ByteBuffer is the key we send for query, in the 
case above, the result Map should looks like if we give a key list  :


Key of A -> Data of A
Key of B -> Data of B
Key of C -> Data of C

In order to get Data of A from the result map, all we need to do is perform a 
resultMap.get(A), but we got problem here: The result map's key is something 
else, it's not the key we gave before, in the case above, it's no longer a list 
of  while the value is exactly the data we need, but it's very 
troublesome we are unable to find the corresponding data from the key.

We made a guess that the key ByteBuffers has been changed in the query process 
due to call by reference, and we found this in the server's source code which 
looks like that the key has been changed unexpectedly in 
org.apache.cassandra.thrift.CassandraServer's getSlice method:

columnFamilies.get(StorageService.getPartitioner().decorateKey(command.key));

Looks like the key has been "decorated" for some purpose, and it's has been 
changed in the process due to the nature of ByteBuffer, and the decorated key 
has been used as the key in the result map.

columnFamiliesMap.put(command.key, thriftifiedColumns);

Are we misinterpreted the Java Doc API or is this is a bug?


Re: The key list of multiget_slice's parameter has been changed unexpectedly.

2010-11-29 Thread Sylvain Lebresne
You should start by trying 0.7 RC1. Some bugs with the use of
ByteBuffers have been corrected since beta2.

If you still have problem, then it's likely a bug, the byteBuffer
should not be changed from under you.
If it still doesn't work with RC1, it would be very helpful if you can
provide a simple script that reproduce the
behavior you describe.

On Mon, Nov 29, 2010 at 12:07 PM, eggli  wrote:
>
> Hi everyone, we are working on a Java product based on Cassandra since 0.5, 
> and Cassandra made a very huge change in 0.7 beta 2, which changes all byte 
> array into ByteBuffers, and we found this problem which confuses us a lot, 
> here's the detail about what happened:
>
> The multiget_slice method in Cassandra.Iface indicated that it requires a 
> list of keys for multi get slice query, which we believed we have to give 
> every individual keys to get the data we need, and according to the Java doc, 
> we will get a Map result, which uses a  ByteBuffer as key and 
> ColunmOrSuperColumn as value, we made a guess that the ByteBuffer is the key 
> we send for query, in the case above, the result Map should looks like if we 
> give a key list  :
>
>
> Key of A -> Data of A
> Key of B -> Data of B
> Key of C -> Data of C
>
> In order to get Data of A from the result map, all we need to do is perform a 
> resultMap.get(A), but we got problem here: The result map's key is something 
> else, it's not the key we gave before, in the case above, it's no longer a 
> list of  while the value is exactly the data we need, but it's very 
> troublesome we are unable to find the corresponding data from the key.
>
> We made a guess that the key ByteBuffers has been changed in the query 
> process due to call by reference, and we found this in the server's source 
> code which looks like that the key has been changed unexpectedly in 
> org.apache.cassandra.thrift.CassandraServer's getSlice method:
>
> columnFamilies.get(StorageService.getPartitioner().decorateKey(command.key));
>
> Looks like the key has been "decorated" for some purpose, and it's has been 
> changed in the process due to the nature of ByteBuffer, and the decorated key 
> has been used as the key in the result map.
>
> columnFamiliesMap.put(command.key, thriftifiedColumns);
>
> Are we misinterpreted the Java Doc API or is this is a bug?
>


Re: Booting Cassandra v0.7.0 on Windows: rename failed

2010-11-29 Thread Gary Dusbabek
Windows is notoriously bad about hanging on to file handles.  Make
sure there are no explorer windows or command line windows open to
d:\cassandra\data\system\, and then hope for the best.

Gary.

On Mon, Nov 29, 2010 at 02:49, Ramon Rockx  wrote:
> Hi,
>
> Recently I downloaded Cassandra v0.7.0 rc1. When I try to run cassandra
> it ends with the following logging:
>
>  INFO 09:17:30,044 Enqueuing flush of
> memtable-locationi...@839514767(643 bytes, 12 operations)
>  INFO 09:17:30,045 Writing memtable-locationi...@839514767(643 bytes, 12
> operations)
> ERROR 09:17:30,233 Fatal exception in thread
> Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 214)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:184)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:167)
>  at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161)
>  at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>  at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>  at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
> r.java:886)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
>  at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.jav
> a:359)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 210)
>  ... 12 more
>
> Operating system is Windows 7. Tried it also on Windows 2003 server.
> I only modified a few (necessary) path settings in cassandra.yaml:
>
> commitlog_directory: d:/cassandra/commitlog
> data_file_directories:
> - d:/cassandra/data
> saved_caches_directory: d:/cassandra/saved_caches
>
> Does anybody know what I'm doing wrong?
>
> Regards,
> Ramon
>


Re: Booting Cassandra v0.7.0 on Windows: rename failed

2010-11-29 Thread Jonathan Ellis
Please report a bug at https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Nov 29, 2010 at 2:49 AM, Ramon Rockx  wrote:
> Hi,
>
> Recently I downloaded Cassandra v0.7.0 rc1. When I try to run cassandra
> it ends with the following logging:
>
>  INFO 09:17:30,044 Enqueuing flush of
> memtable-locationi...@839514767(643 bytes, 12 operations)
>  INFO 09:17:30,045 Writing memtable-locationi...@839514767(643 bytes, 12
> operations)
> ERROR 09:17:30,233 Fatal exception in thread
> Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 214)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:184)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:167)
>  at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161)
>  at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>  at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>  at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
> r.java:886)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
>  at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.jav
> a:359)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 210)
>  ... 12 more
>
> Operating system is Windows 7. Tried it also on Windows 2003 server.
> I only modified a few (necessary) path settings in cassandra.yaml:
>
> commitlog_directory: d:/cassandra/commitlog
> data_file_directories:
> - d:/cassandra/data
> saved_caches_directory: d:/cassandra/saved_caches
>
> Does anybody know what I'm doing wrong?
>
> Regards,
> Ramon
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Solr DataImportHandler (DIH) and Cassandra

2010-11-29 Thread Mark

Is there anyway to use DIH to import from Cassandra? Thanks


RE: Booting Cassandra v0.7.0 on Windows: rename failed

2010-11-29 Thread Viktor Jevdokimov
This isn't a first time Cassandra has I/O issues on Windows.

I think it's not easy to review source code and eliminate such issues, but 
would like developers to keep in mind such issues in the future.

We're also running a Cassandra cluster on Windows, but 0.7 beta1 (with similar 
issue, but for Commit Log) and waiting for 0.7 release to use it fully on 
production.


Viktor

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Monday, November 29, 2010 5:09 PM
To: user
Subject: Re: Booting Cassandra v0.7.0 on Windows: rename failed

Please report a bug at https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Nov 29, 2010 at 2:49 AM, Ramon Rockx  wrote:
> Hi,
>
> Recently I downloaded Cassandra v0.7.0 rc1. When I try to run cassandra
> it ends with the following logging:
>
>  INFO 09:17:30,044 Enqueuing flush of
> memtable-locationi...@839514767(643 bytes, 12 operations)
>  INFO 09:17:30,045 Writing memtable-locationi...@839514767(643 bytes, 12
> operations)
> ERROR 09:17:30,233 Fatal exception in thread
> Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 214)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:184)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:167)
>  at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161)
>  at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>  at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>  at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
> r.java:886)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
>  at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.jav
> a:359)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 210)
>  ... 12 more
>
> Operating system is Windows 7. Tried it also on Windows 2003 server.
> I only modified a few (necessary) path settings in cassandra.yaml:
>
> commitlog_directory: d:/cassandra/commitlog
> data_file_directories:
> - d:/cassandra/data
> saved_caches_directory: d:/cassandra/saved_caches
>
> Does anybody know what I'm doing wrong?
>
> Regards,
> Ramon
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com




RE: Booting Cassandra v0.7.0 on Windows: rename failed

2010-11-29 Thread Aditya Muralidharan
I've run into this as well. Having confirmed that there are no handles on the 
file (it's only ever created and used by Cassandra), and having stepped through 
the code, I've concluded that something in the io (not sure if it's the jvm or 
the os) stack is lazy about releasing the file handle for 'RandomAccessFile's. 
I was able to get past these issues by setting a breakpoint after the call to 
close (on the file-to-be-renamed), waiting 30 seconds, then resuming the 
thread. Basically, Cassandra won't start on windows 7 in its current state.

AD

-Original Message-
From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
Sent: Monday, November 29, 2010 10:13 AM
To: user@cassandra.apache.org
Subject: RE: Booting Cassandra v0.7.0 on Windows: rename failed

This isn't a first time Cassandra has I/O issues on Windows.

I think it's not easy to review source code and eliminate such issues, but 
would like developers to keep in mind such issues in the future.

We're also running a Cassandra cluster on Windows, but 0.7 beta1 (with similar 
issue, but for Commit Log) and waiting for 0.7 release to use it fully on 
production.


Viktor

-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Monday, November 29, 2010 5:09 PM
To: user
Subject: Re: Booting Cassandra v0.7.0 on Windows: rename failed

Please report a bug at https://issues.apache.org/jira/browse/CASSANDRA

On Mon, Nov 29, 2010 at 2:49 AM, Ramon Rockx  wrote:
> Hi,
>
> Recently I downloaded Cassandra v0.7.0 rc1. When I try to run cassandra
> it ends with the following logging:
>
>  INFO 09:17:30,044 Enqueuing flush of
> memtable-locationi...@839514767(643 bytes, 12 operations)
>  INFO 09:17:30,045 Writing memtable-locationi...@839514767(643 bytes, 12
> operations)
> ERROR 09:17:30,233 Fatal exception in thread
> Thread[FlushWriter:1,5,main]
> java.io.IOError: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 214)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:184)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTable
> Writer.java:167)
>  at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:161)
>  at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>  at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>  at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>  at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>  at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecuto
> r.java:886)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.ja
> va:908)
>  at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: rename failed of
> d:\cassandra\data\system\LocationInfo-e-1-Data.db
>  at
> org.apache.cassandra.utils.FBUtilities.renameWithConfirm(FBUtilities.jav
> a:359)
>  at
> org.apache.cassandra.io.sstable.SSTableWriter.rename(SSTableWriter.java:
> 210)
>  ... 12 more
>
> Operating system is Windows 7. Tried it also on Windows 2003 server.
> I only modified a few (necessary) path settings in cassandra.yaml:
>
> commitlog_directory: d:/cassandra/commitlog
> data_file_directories:
> - d:/cassandra/data
> saved_caches_directory: d:/cassandra/saved_caches
>
> Does anybody know what I'm doing wrong?
>
> Regards,
> Ramon
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com




unsubscribe

2010-11-29 Thread Dave Therrien



Re: word_count example fails in multi-node configuration

2010-11-29 Thread Jeremy Hanna
Roman:

I logged a jira ticket about this for further investigation, if you'd like to 
follow that.

https://issues.apache.org/jira/browse/CASSANDRA-1787

On Nov 29, 2010, at 3:14 AM, RS wrote:

> Hi guys,
> 
> I am trying to run word_count example from contrib directory (0.7 beta
> 3 and 0.7.0 rc 1).
> It works fine in a single-node configuration, but fails with 2+ nodes.
> 
> It fails in the assert statement, which caused problems before
> (https://issues.apache.org/jira/browse/CASSANDRA-1700).
> 
> Here's a simple ring I have and error messages.
> ---
> Address Status State   LoadOwnsToken
> 
> 143797990709940316224804537595633718982
> 127.0.0.2   Up Normal  40.2 KB 51.38%
> 61078635599166706937511052402724559481
> 127.0.0.1   Up Normal  36.01 KB48.62%
> 143797990709940316224804537595633718982
> ---
> [SERVER SIDE]
> 
> ERROR 17:39:57,098 Fatal exception in thread Thread[ReadStage:4,5,main]
> java.lang.AssertionError:
> (143797990709940316224804537595633718982,61078635599166706937511052402724559481]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1273)
>   at 
> org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:48)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> ---
> [CLIENT_SIDE]
> java.lang.RuntimeException: org.apache.thrift.TApplicationException:
> Internal error processing get_range_slices
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>   at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: org.apache.thrift.TApplicationException: Internal error
> processing get_range_slices
>   at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>   at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
>   at 
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255)
>   ... 11 more
> ---
> 
> Looks like tokens used in ColumnFamilySplits
> (ColumnFamilyInputFormat.java) are on wrapping ranges (left_token >
> right_token).
> Any ideas how to fix this?
> 
> --
> Regards,
> Roman



Re: Issues getting JNA to work correctly under centos 5.5 using cassandra 0.7.0-rc1 and JNA 2.7.3

2010-11-29 Thread Nate McCall
What does the current line(s) in limits.conf look like?

On Mon, Nov 29, 2010 at 2:01 AM,   wrote:
> I checked and /etc/security/limits.conf on redhat supports zero (0) to
> mean unlimited.  Here is the sample from the man page.  Notice the
> soft core entry.
>
> EXAMPLES
>       These are some example lines which might be specified in
>       /etc/security/limits.conf.
>
>       *               soft    core            0
>       *               hard    rss             1
>       @student        hard    nproc           20
>       @faculty        soft    nproc           20
>       @faculty        hard    nproc           50
>       ftp             hard    nproc           0
>       @student        -       maxlogins       4
>
>
>
> On Mon, Nov 29, 2010 at 6:51 AM, Jason Pell  wrote:
>> Ok that's a good point i will check - I am not sure.
>>
>> Sent from my iPhone
>> On Nov 29, 2010, at 5:53, Tyler Hobbs  wrote:
>>
>> I'm not familiar with ulimit on RedHat systems, but are you sure you
>> have ulimit set correctly? Did you set it to '0' or 'unlimited'?  I ask
>> because on a Debian system, I get this:
>>
>> tho...@~ $ ulimit -l
>> unlimited
>>
>> Where you said that you got back '0'.
>>
>> - Tyler
>>
>> On Sun, Nov 28, 2010 at 1:15 AM, Jason Pell  wrote:
>>>
>>> Hi,
>>>
>>> I have selinux disabled via /etc/sysconfig/selinux already.  But I did
>>> as you suggested anyway, even restarted the whole machine again too
>>> and still no difference.  Do you know if there is a way to discover
>>> exactly what this error means?
>>>
>>> THanks
>>> Jason
>>>
>>> On Sat, Nov 27, 2010 at 3:59 AM, Nate McCall  wrote:
>>> > This might be an issue with selinux. You can try this quickly to
>>> > temporarily disable selinux enforcement:
>>> > /usr/sbin/setenforce 0  (as root)
>>> >
>>> > and then start cassandra as your user.
>>> >
>>> > On Fri, Nov 26, 2010 at 1:00 AM, Jason Pell 
>>> > wrote:
>>> >> I restarted the box :-) so it's well and truly set
>>> >>
>>> >> Sent from my iPhone
>>> >> On Nov 26, 2010, at 17:57, Brandon Williams  wrote:
>>> >>
>>> >> On Thu, Nov 25, 2010 at 10:02 PM, Jason Pell 
>>> >> wrote:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I have set the memlock limit to unlimited in /etc/security/limits.conf
>>> >>>
>>> >>> [devel...@localhost apache-cassandra-0.7.0-rc1]$ ulimit -l
>>> >>> 0
>>> >>>
>>> >>> Running as a non root user gets me a Unknown mlockall error 1
>>> >>
>>> >> Have you tried logging out and back in after changing limits.conf?
>>> >> -Brandon
>>> >
>>
>>
>


Re: word_count example fails in multi-node configuration

2010-11-29 Thread Jeremy Hanna
Roman:

I commented on the ticket - would you mind answering on there?  
https://issues.apache.org/jira/browse/CASSANDRA-1787

Tx,

Jeremy

On Nov 29, 2010, at 3:14 AM, RS wrote:

> Hi guys,
> 
> I am trying to run word_count example from contrib directory (0.7 beta
> 3 and 0.7.0 rc 1).
> It works fine in a single-node configuration, but fails with 2+ nodes.
> 
> It fails in the assert statement, which caused problems before
> (https://issues.apache.org/jira/browse/CASSANDRA-1700).
> 
> Here's a simple ring I have and error messages.
> ---
> Address Status State   LoadOwnsToken
> 
> 143797990709940316224804537595633718982
> 127.0.0.2   Up Normal  40.2 KB 51.38%
> 61078635599166706937511052402724559481
> 127.0.0.1   Up Normal  36.01 KB48.62%
> 143797990709940316224804537595633718982
> ---
> [SERVER SIDE]
> 
> ERROR 17:39:57,098 Fatal exception in thread Thread[ReadStage:4,5,main]
> java.lang.AssertionError:
> (143797990709940316224804537595633718982,61078635599166706937511052402724559481]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1273)
>   at 
> org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:48)
>   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:619)
> ---
> [CLIENT_SIDE]
> java.lang.RuntimeException: org.apache.thrift.TApplicationException:
> Internal error processing get_range_slices
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189)
>   at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>   at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148)
>   at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>   at 
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> Caused by: org.apache.thrift.TApplicationException: Internal error
> processing get_range_slices
>   at 
> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>   at 
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
>   at 
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255)
>   ... 11 more
> ---
> 
> Looks like tokens used in ColumnFamilySplits
> (ColumnFamilyInputFormat.java) are on wrapping ranges (left_token >
> right_token).
> Any ideas how to fix this?
> 
> --
> Regards,
> Roman



Re: word_count example fails in multi-node configuration

2010-11-29 Thread Jeremy Hanna
So final answer - known issue with RC1 - 
https://issues.apache.org/jira/browse/CASSANDRA-1781 - that should be fixed 
before 0.7.0 is completed.

On Nov 29, 2010, at 11:31 AM, Jeremy Hanna wrote:

> Roman:
> 
> I logged a jira ticket about this for further investigation, if you'd like to 
> follow that.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-1787
> 
> On Nov 29, 2010, at 3:14 AM, RS wrote:
> 
>> Hi guys,
>> 
>> I am trying to run word_count example from contrib directory (0.7 beta
>> 3 and 0.7.0 rc 1).
>> It works fine in a single-node configuration, but fails with 2+ nodes.
>> 
>> It fails in the assert statement, which caused problems before
>> (https://issues.apache.org/jira/browse/CASSANDRA-1700).
>> 
>> Here's a simple ring I have and error messages.
>> ---
>> Address Status State   LoadOwnsToken
>> 
>> 143797990709940316224804537595633718982
>> 127.0.0.2   Up Normal  40.2 KB 51.38%
>> 61078635599166706937511052402724559481
>> 127.0.0.1   Up Normal  36.01 KB48.62%
>> 143797990709940316224804537595633718982
>> ---
>> [SERVER SIDE]
>> 
>> ERROR 17:39:57,098 Fatal exception in thread Thread[ReadStage:4,5,main]
>> java.lang.AssertionError:
>> (143797990709940316224804537595633718982,61078635599166706937511052402724559481]
>>  at 
>> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1273)
>>  at 
>> org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:48)
>>  at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>  at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>  at java.lang.Thread.run(Thread.java:619)
>> ---
>> [CLIENT_SIDE]
>> java.lang.RuntimeException: org.apache.thrift.TApplicationException:
>> Internal error processing get_range_slices
>>  at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277)
>>  at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292)
>>  at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189)
>>  at 
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>  at 
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>  at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148)
>>  at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>  at 
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>  at 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: org.apache.thrift.TApplicationException: Internal error
>> processing get_range_slices
>>  at 
>> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>>  at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
>>  at 
>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
>>  at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255)
>>  ... 11 more
>> ---
>> 
>> Looks like tokens used in ColumnFamilySplits
>> (ColumnFamilyInputFormat.java) are on wrapping ranges (left_token >
>> right_token).
>> Any ideas how to fix this?
>> 
>> --
>> Regards,
>> Roman
> 



Re: Issues getting JNA to work correctly under centos 5.5 using cassandra 0.7.0-rc1 and JNA 2.7.3

2010-11-29 Thread Jason Pell
*   -   memlock 0


On Tue, Nov 30, 2010 at 4:40 AM, Nate McCall  wrote:
> What does the current line(s) in limits.conf look like?
>
> On Mon, Nov 29, 2010 at 2:01 AM,   wrote:
>> I checked and /etc/security/limits.conf on redhat supports zero (0) to
>> mean unlimited.  Here is the sample from the man page.  Notice the
>> soft core entry.
>>
>> EXAMPLES
>>       These are some example lines which might be specified in
>>       /etc/security/limits.conf.
>>
>>       *               soft    core            0
>>       *               hard    rss             1
>>       @student        hard    nproc           20
>>       @faculty        soft    nproc           20
>>       @faculty        hard    nproc           50
>>       ftp             hard    nproc           0
>>       @student        -       maxlogins       4
>>
>>
>>
>> On Mon, Nov 29, 2010 at 6:51 AM, Jason Pell  wrote:
>>> Ok that's a good point i will check - I am not sure.
>>>
>>> Sent from my iPhone
>>> On Nov 29, 2010, at 5:53, Tyler Hobbs  wrote:
>>>
>>> I'm not familiar with ulimit on RedHat systems, but are you sure you
>>> have ulimit set correctly? Did you set it to '0' or 'unlimited'?  I ask
>>> because on a Debian system, I get this:
>>>
>>> tho...@~ $ ulimit -l
>>> unlimited
>>>
>>> Where you said that you got back '0'.
>>>
>>> - Tyler
>>>
>>> On Sun, Nov 28, 2010 at 1:15 AM, Jason Pell  wrote:

 Hi,

 I have selinux disabled via /etc/sysconfig/selinux already.  But I did
 as you suggested anyway, even restarted the whole machine again too
 and still no difference.  Do you know if there is a way to discover
 exactly what this error means?

 THanks
 Jason

 On Sat, Nov 27, 2010 at 3:59 AM, Nate McCall  wrote:
 > This might be an issue with selinux. You can try this quickly to
 > temporarily disable selinux enforcement:
 > /usr/sbin/setenforce 0  (as root)
 >
 > and then start cassandra as your user.
 >
 > On Fri, Nov 26, 2010 at 1:00 AM, Jason Pell 
 > wrote:
 >> I restarted the box :-) so it's well and truly set
 >>
 >> Sent from my iPhone
 >> On Nov 26, 2010, at 17:57, Brandon Williams  wrote:
 >>
 >> On Thu, Nov 25, 2010 at 10:02 PM, Jason Pell 
 >> wrote:
 >>>
 >>> Hi,
 >>>
 >>> I have set the memlock limit to unlimited in /etc/security/limits.conf
 >>>
 >>> [devel...@localhost apache-cassandra-0.7.0-rc1]$ ulimit -l
 >>> 0
 >>>
 >>> Running as a non root user gets me a Unknown mlockall error 1
 >>
 >> Have you tried logging out and back in after changing limits.conf?
 >> -Brandon
 >
>>>
>>>
>>
>


Re: Issues getting JNA to work correctly under centos 5.5 using cassandra 0.7.0-rc1 and JNA 2.7.3

2010-11-29 Thread Nate McCall
Ok, I was able to reproduce this with "0" as the value. Changing it to
"unlimited" will make this go away. A closer reading of the
limits.conf man page seems to leave some ambiguity when taken with the
examples:
"All items support the values -1, unlimited or infinity indicating no
limit, except for priority and nice."

I would recommend tightening this to a specific user. The line I ended
up with for the "cassandra" user was:

cassandra-   memlock   unlimited

You probably want to add a line for nofile in there at ~ 16384 as well
while your there as that can be an issue depending on load.



On Mon, Nov 29, 2010 at 1:59 PM, Jason Pell  wrote:
> *               -       memlock         0
>
>
> On Tue, Nov 30, 2010 at 4:40 AM, Nate McCall  wrote:
>> What does the current line(s) in limits.conf look like?
>>
>> On Mon, Nov 29, 2010 at 2:01 AM,   wrote:
>>> I checked and /etc/security/limits.conf on redhat supports zero (0) to
>>> mean unlimited.  Here is the sample from the man page.  Notice the
>>> soft core entry.
>>>
>>> EXAMPLES
>>>       These are some example lines which might be specified in
>>>       /etc/security/limits.conf.
>>>
>>>       *               soft    core            0
>>>       *               hard    rss             1
>>>       @student        hard    nproc           20
>>>       @faculty        soft    nproc           20
>>>       @faculty        hard    nproc           50
>>>       ftp             hard    nproc           0
>>>       @student        -       maxlogins       4
>>>
>>>
>>>
>>> On Mon, Nov 29, 2010 at 6:51 AM, Jason Pell  wrote:
 Ok that's a good point i will check - I am not sure.

 Sent from my iPhone
 On Nov 29, 2010, at 5:53, Tyler Hobbs  wrote:

 I'm not familiar with ulimit on RedHat systems, but are you sure you
 have ulimit set correctly? Did you set it to '0' or 'unlimited'?  I ask
 because on a Debian system, I get this:

 tho...@~ $ ulimit -l
 unlimited

 Where you said that you got back '0'.

 - Tyler

 On Sun, Nov 28, 2010 at 1:15 AM, Jason Pell  wrote:
>
> Hi,
>
> I have selinux disabled via /etc/sysconfig/selinux already.  But I did
> as you suggested anyway, even restarted the whole machine again too
> and still no difference.  Do you know if there is a way to discover
> exactly what this error means?
>
> THanks
> Jason
>
> On Sat, Nov 27, 2010 at 3:59 AM, Nate McCall  wrote:
> > This might be an issue with selinux. You can try this quickly to
> > temporarily disable selinux enforcement:
> > /usr/sbin/setenforce 0  (as root)
> >
> > and then start cassandra as your user.
> >
> > On Fri, Nov 26, 2010 at 1:00 AM, Jason Pell 
> > wrote:
> >> I restarted the box :-) so it's well and truly set
> >>
> >> Sent from my iPhone
> >> On Nov 26, 2010, at 17:57, Brandon Williams  wrote:
> >>
> >> On Thu, Nov 25, 2010 at 10:02 PM, Jason Pell 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I have set the memlock limit to unlimited in /etc/security/limits.conf
> >>>
> >>> [devel...@localhost apache-cassandra-0.7.0-rc1]$ ulimit -l
> >>> 0
> >>>
> >>> Running as a non root user gets me a Unknown mlockall error 1
> >>
> >> Have you tried logging out and back in after changing limits.conf?
> >> -Brandon
> >


>>>
>>
>


Re: Solr DataImportHandler (DIH) and Cassandra

2010-11-29 Thread Aaron Morton
AFAIK there is nothing pre-written to pull the data out for you. You should be able to create your DataSource sub class http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html Using the Hector java library to pull data from Cassandra. I'm guessing you will need to consider how to perform delta imports. Perhaps using the secondary indexes in 0.7* , or maintaining your own queues or indexes to know what has changed. There is also the Lucandra project, not exactly what your after but may be of interest anyway https://github.com/tjake/LucandraHope that helps.AaronOn 30 Nov, 2010,at 05:04 AM, Mark  wrote:Is there anyway to use DIH to import from Cassandra? Thanks


Re: Issues getting JNA to work correctly under centos 5.5 using cassandra 0.7.0-rc1 and JNA 2.7.3

2010-11-29 Thread Jason Pell
Awesome thanks will make the changes

So is the man page inaccurate? Or is jna doing something wrong? 

Sent from my iPhone

On Nov 30, 2010, at 7:28, Nate McCall  wrote:

> Ok, I was able to reproduce this with "0" as the value. Changing it to
> "unlimited" will make this go away. A closer reading of the
> limits.conf man page seems to leave some ambiguity when taken with the
> examples:
> "All items support the values -1, unlimited or infinity indicating no
> limit, except for priority and nice."
> 
> I would recommend tightening this to a specific user. The line I ended
> up with for the "cassandra" user was:
> 
> cassandra-   memlock   unlimited
> 
> You probably want to add a line for nofile in there at ~ 16384 as well
> while your there as that can be an issue depending on load.
> 
> 
> 
> On Mon, Nov 29, 2010 at 1:59 PM, Jason Pell  wrote:
>> *   -   memlock 0
>> 
>> 
>> On Tue, Nov 30, 2010 at 4:40 AM, Nate McCall  wrote:
>>> What does the current line(s) in limits.conf look like?
>>> 
>>> On Mon, Nov 29, 2010 at 2:01 AM,   wrote:
 I checked and /etc/security/limits.conf on redhat supports zero (0) to
 mean unlimited.  Here is the sample from the man page.  Notice the
 soft core entry.
 
 EXAMPLES
   These are some example lines which might be specified in
   /etc/security/limits.conf.
 
   *   softcore0
   *   hardrss 1
   @studenthardnproc   20
   @facultysoftnproc   20
   @facultyhardnproc   50
   ftp hardnproc   0
   @student-   maxlogins   4
 
 
 
 On Mon, Nov 29, 2010 at 6:51 AM, Jason Pell  wrote:
> Ok that's a good point i will check - I am not sure.
> 
> Sent from my iPhone
> On Nov 29, 2010, at 5:53, Tyler Hobbs  wrote:
> 
> I'm not familiar with ulimit on RedHat systems, but are you sure you
> have ulimit set correctly? Did you set it to '0' or 'unlimited'?  I ask
> because on a Debian system, I get this:
> 
> tho...@~ $ ulimit -l
> unlimited
> 
> Where you said that you got back '0'.
> 
> - Tyler
> 
> On Sun, Nov 28, 2010 at 1:15 AM, Jason Pell  wrote:
>> 
>> Hi,
>> 
>> I have selinux disabled via /etc/sysconfig/selinux already.  But I did
>> as you suggested anyway, even restarted the whole machine again too
>> and still no difference.  Do you know if there is a way to discover
>> exactly what this error means?
>> 
>> THanks
>> Jason
>> 
>> On Sat, Nov 27, 2010 at 3:59 AM, Nate McCall  wrote:
>>> This might be an issue with selinux. You can try this quickly to
>>> temporarily disable selinux enforcement:
>>> /usr/sbin/setenforce 0  (as root)
>>> 
>>> and then start cassandra as your user.
>>> 
>>> On Fri, Nov 26, 2010 at 1:00 AM, Jason Pell 
>>> wrote:
 I restarted the box :-) so it's well and truly set
 
 Sent from my iPhone
 On Nov 26, 2010, at 17:57, Brandon Williams  wrote:
 
 On Thu, Nov 25, 2010 at 10:02 PM, Jason Pell 
 wrote:
> 
> Hi,
> 
> I have set the memlock limit to unlimited in /etc/security/limits.conf
> 
> [devel...@localhost apache-cassandra-0.7.0-rc1]$ ulimit -l
> 0
> 
> Running as a non root user gets me a Unknown mlockall error 1
 
 Have you tried logging out and back in after changing limits.conf?
 -Brandon
>>> 
> 
> 
 
>>> 
>> 


Re: Cassandra 0.7 beta 3 outOfMemory (OOM)

2010-11-29 Thread Aaron Morton
Sounds like you need to increase the Heap size and/or reduce the memtable_throughput_in_mb and/or turn off the internal caches. Normally the binary memtable thresholds only apply to bulk load operations and it's the per CF memtable_* settings you want to change. I'm not familiar with lucandra though. See the section on JVM Heap Size here http://wiki.apache.org/cassandra/MemtableThresholdsBottom line is you will need more JVM heap memory.Hope that helps.AaronOn 29 Nov, 2010,at 10:28 PM, cassan...@ajowa.de wrote:Hi community,

during my tests i had several OOM crashes.
Getting some hints to find the problem would be nice.

First cassandra crashes after about 45 min insert test script
During the following tests time to OOM was shorter until it started to crash
even in "idle" mode.

Here the facts:
- cassandra 0.7 beta 3
- using lucandra to index about 3 million files ~1kb data
- inserting with one client to one cassandra node with about 200 files/s
- cassandra data files for this keyspace grow up to about 20 GB
- the keyspace only contains the two lucandra specific CFs

Cluster:
- cassandra single node on windows 32bit, Xeon 2,5 Ghz, 4GB RAM
- java jre 1.6.0_22
- heap space first 1GB, later increased to 1,3 GB

Cassandra.yaml:
default + reduced "binary_memtable_throughput_in_mb" to 128

CFs:
default + reduced
min_compaction_threshold: 4
max_compaction_threshold: 8


I think the problem appears always during compaction,
and perhaps it is a result of large rows (some about 170mb).

Are there more optionions we could use to work with few memory?

Is it a problem of compaction?
And how to avoid?
Slower inserts? More memory?
Even fewer memtable_throuput or in_memory_compaction_limit?
Continuous manual major comapction?

I've read  
http://www.riptano.com/docs/0.6/troubleshooting/index#nodes-are-dying-with-oom-errors
- row_size should be fixed since 0.7 and 200mb is still far away from 2gb
- only key cache is used a little bit 3600/2
- after a lot of writes cassandra crashes even in idle mode
- memtablesize was reduced and there are only 2 CFs

Several heapdumps in MAT show 60-99% heapusage of compaction thread.

Here some log extract:

  INFO [CompactionExecutor:1] 2010-11-26 14:18:18,593  
CompactionIterator.java (line 134) Compacting large row  
6650325572717566efbfbf44545241434b53efbfbf31 (172967291 bytes)  
incrementally
  INFO [ScheduledTasks:1] 2010-11-26 14:18:41,421 GCInspector.java  
(line 133) GC for ParNew: 365 ms, 54551328 reclaimed leaving 459496840  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:42,437 GCInspector.java  
(line 133) GC for ParNew: 226 ms, 12469104 reclaimed leaving 554506776  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:43,453 GCInspector.java  
(line 133) GC for ParNew: 224 ms, 12777840 reclaimed leaving 649207976  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:44,468 GCInspector.java  
(line 133) GC for ParNew: 225 ms, 12564144 reclaimed leaving 744122872  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:45,468 GCInspector.java  
(line 133) GC for ParNew: 222 ms, 16020328 reclaimed leaving 835581584  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:46,468 GCInspector.java  
(line 133) GC for ParNew: 226 ms, 12697912 reclaimed leaving 930362712  
used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:47,468 GCInspector.java  
(line 133) GC for ParNew: 227 ms, 15816872 reclaimed leaving  
1022026288 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:48,484 GCInspector.java  
(line 133) GC for ParNew: 258 ms, 12746584 reclaimed leaving  
1116758744 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:49,484 GCInspector.java  
(line 133) GC for ParNew: 257 ms, 12802608 reclaimed leaving  
1211435176 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 133) GC for ConcurrentMarkSweep: 4188 ms, 271308512 reclaimed  
leaving 1047605704 used; max is 1450442752
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 153) Pool NameActive   Pending
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ResponseStage 0 0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) ReadStage 0 0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspectorjava  
(line 160) ReadRepair0 0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) MutationStage 0 0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) GossipStage   0 0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,546 GCInspector.java  
(line 160) AntientropyStage  0 0
  INFO [ScheduledTasks:1] 2010-11-26 14:18:54,562 GCInspector.java  
(line 1

Introduction to Cassandra

2010-11-29 Thread Aaron Morton
I did a talk last week at the Wellington Rails User Group as a basic introduction to Cassandra. The slides are here http://www.slideshare.net/aaronmorton/well-railedcassandra24112010-5901169 if anyone is interested. CheersAaron

Re: Introduction to Cassandra

2010-11-29 Thread Jonathan Ellis
That is a lot of slides. :)  Nice work!

On Mon, Nov 29, 2010 at 3:11 PM, Aaron Morton  wrote:
> I did a talk last week at the Wellington Rails User Group as a basic
> introduction to Cassandra. The slides are
> here http://www.slideshare.net/aaronmorton/well-railedcassandra24112010-5901169 if
> anyone is interested.
> Cheers
> Aaron
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Issues getting JNA to work correctly under centos 5.5 using cassandra 0.7.0-rc1 and JNA 2.7.3

2010-11-29 Thread jasonmpell
Hi,

Thanks for that your suggestions worked a treat.  I created a new
cassandra user and set the value to unlimited
and I get the desired log:

INFO 08:49:50,204 JNA mlockall successful



On Tue, Nov 30, 2010 at 7:56 AM, Jason Pell  wrote:
> Awesome thanks will make the changes
>
> So is the man page inaccurate? Or is jna doing something wrong?
>
> Sent from my iPhone
>
> On Nov 30, 2010, at 7:28, Nate McCall  wrote:
>
>> Ok, I was able to reproduce this with "0" as the value. Changing it to
>> "unlimited" will make this go away. A closer reading of the
>> limits.conf man page seems to leave some ambiguity when taken with the
>> examples:
>> "All items support the values -1, unlimited or infinity indicating no
>> limit, except for priority and nice."
>>
>> I would recommend tightening this to a specific user. The line I ended
>> up with for the "cassandra" user was:
>>
>> cassandra        -       memlock       unlimited
>>
>> You probably want to add a line for nofile in there at ~ 16384 as well
>> while your there as that can be an issue depending on load.
>>
>>
>>
>> On Mon, Nov 29, 2010 at 1:59 PM, Jason Pell  wrote:
>>> *               -       memlock         0
>>>
>>>
>>> On Tue, Nov 30, 2010 at 4:40 AM, Nate McCall  wrote:
 What does the current line(s) in limits.conf look like?

 On Mon, Nov 29, 2010 at 2:01 AM,   wrote:
> I checked and /etc/security/limits.conf on redhat supports zero (0) to
> mean unlimited.  Here is the sample from the man page.  Notice the
> soft core entry.
>
> EXAMPLES
>       These are some example lines which might be specified in
>       /etc/security/limits.conf.
>
>       *               soft    core            0
>       *               hard    rss             1
>       @student        hard    nproc           20
>       @faculty        soft    nproc           20
>       @faculty        hard    nproc           50
>       ftp             hard    nproc           0
>       @student        -       maxlogins       4
>
>
>
> On Mon, Nov 29, 2010 at 6:51 AM, Jason Pell  wrote:
>> Ok that's a good point i will check - I am not sure.
>>
>> Sent from my iPhone
>> On Nov 29, 2010, at 5:53, Tyler Hobbs  wrote:
>>
>> I'm not familiar with ulimit on RedHat systems, but are you sure you
>> have ulimit set correctly? Did you set it to '0' or 'unlimited'?  I ask
>> because on a Debian system, I get this:
>>
>> tho...@~ $ ulimit -l
>> unlimited
>>
>> Where you said that you got back '0'.
>>
>> - Tyler
>>
>> On Sun, Nov 28, 2010 at 1:15 AM, Jason Pell  wrote:
>>>
>>> Hi,
>>>
>>> I have selinux disabled via /etc/sysconfig/selinux already.  But I did
>>> as you suggested anyway, even restarted the whole machine again too
>>> and still no difference.  Do you know if there is a way to discover
>>> exactly what this error means?
>>>
>>> THanks
>>> Jason
>>>
>>> On Sat, Nov 27, 2010 at 3:59 AM, Nate McCall  wrote:
 This might be an issue with selinux. You can try this quickly to
 temporarily disable selinux enforcement:
 /usr/sbin/setenforce 0  (as root)

 and then start cassandra as your user.

 On Fri, Nov 26, 2010 at 1:00 AM, Jason Pell 
 wrote:
> I restarted the box :-) so it's well and truly set
>
> Sent from my iPhone
> On Nov 26, 2010, at 17:57, Brandon Williams  wrote:
>
> On Thu, Nov 25, 2010 at 10:02 PM, Jason Pell 
> wrote:
>>
>> Hi,
>>
>> I have set the memlock limit to unlimited in 
>> /etc/security/limits.conf
>>
>> [devel...@localhost apache-cassandra-0.7.0-rc1]$ ulimit -l
>> 0
>>
>> Running as a non root user gets me a Unknown mlockall error 1
>
> Have you tried logging out and back in after changing limits.conf?
> -Brandon

>>
>>
>

>>>
>


batch_mutate vs number of write operations on CF

2010-11-29 Thread Narendra Sharma
Hi,

I am using Cassandra 0.7 beta3 and Hector.

I create a mutation map. The mutation involves adding few columns for a
given row. After that I use batch_mutate API to send the changes to
Cassandra.

Question:
If there are multiple column writes on same row in a mutation_map, does
Cassandra show (on JMX write count stats for CF) that as 1 write operation
or as N write operations where N is the number of entries in mutation map
for that row.
Assume all the changes in mutation map are for one row.

Thanks,
Naren


Re: Solr DataImportHandler (DIH) and Cassandra

2010-11-29 Thread Mark
The DataSource subclass route is what I will probably be interested in. 
Are there are working examples of this already out there?


On 11/29/10 12:32 PM, Aaron Morton wrote:

AFAIK there is nothing pre-written to pull the data out for you.

You should be able to create your DataSource sub class 
http://lucene.apache.org/solr/api/org/apache/solr/handler/dataimport/DataSource.html Using 
the Hector java library to pull data from Cassandra.


I'm guessing you will need to consider how to perform delta imports. 
Perhaps using the secondary indexes in 0.7* , or maintaining your own 
queues or indexes to know what has changed.


There is also the Lucandra project, not exactly what your after but 
may be of interest anyway https://github.com/tjake/Lucandra


Hope that helps.
Aaron


On 30 Nov, 2010,at 05:04 AM, Mark  wrote:


Is there anyway to use DIH to import from Cassandra? Thanks


Re: batch_mutate vs number of write operations on CF

2010-11-29 Thread Tyler Hobbs
Using batch_mutate on a single row will count as 1 write operation, even if
you mutate multiple columns. Using batch_mutate on N rows will count as N
write operations.
- Tyler

On Mon, Nov 29, 2010 at 5:58 PM, Narendra Sharma
wrote:

> Hi,
>
> I am using Cassandra 0.7 beta3 and Hector.
>
> I create a mutation map. The mutation involves adding few columns for a
> given row. After that I use batch_mutate API to send the changes to
> Cassandra.
>
> Question:
> If there are multiple column writes on same row in a mutation_map, does
> Cassandra show (on JMX write count stats for CF) that as 1 write operation
> or as N write operations where N is the number of entries in mutation map
> for that row.
> Assume all the changes in mutation map are for one row.
>
> Thanks,
> Naren
>


Re: word_count example fails in multi-node configuration

2010-11-29 Thread RS
It occurs in 0.7 beta 3 and 0.7.0 rc 1.

Thank you, Jeremy. I will follow the ticket.

-Roman


On Tue, Nov 30, 2010 at 2:50 AM, Jeremy Hanna
 wrote:
> Roman:
>
> I commented on the ticket - would you mind answering on there?  
> https://issues.apache.org/jira/browse/CASSANDRA-1787
>
> Tx,
>
> Jeremy
>
> On Nov 29, 2010, at 3:14 AM, RS wrote:
>
>> Hi guys,
>>
>> I am trying to run word_count example from contrib directory (0.7 beta
>> 3 and 0.7.0 rc 1).
>> It works fine in a single-node configuration, but fails with 2+ nodes.
>>
>> It fails in the assert statement, which caused problems before
>> (https://issues.apache.org/jira/browse/CASSANDRA-1700).
>>
>> Here's a simple ring I have and error messages.
>> ---
>> Address         Status State   Load            Owns    Token
>>
>> 143797990709940316224804537595633718982
>> 127.0.0.2       Up     Normal  40.2 KB         51.38%
>> 61078635599166706937511052402724559481
>> 127.0.0.1       Up     Normal  36.01 KB        48.62%
>> 143797990709940316224804537595633718982
>> ---
>> [SERVER SIDE]
>>
>> ERROR 17:39:57,098 Fatal exception in thread Thread[ReadStage:4,5,main]
>> java.lang.AssertionError:
>> (143797990709940316224804537595633718982,61078635599166706937511052402724559481]
>>       at 
>> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1273)
>>       at 
>> org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:48)
>>       at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>>       at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>       at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>       at java.lang.Thread.run(Thread.java:619)
>> ---
>> [CLIENT_SIDE]
>> java.lang.RuntimeException: org.apache.thrift.TApplicationException:
>> Internal error processing get_range_slices
>>       at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277)
>>       at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292)
>>       at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189)
>>       at 
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>       at 
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>       at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148)
>>       at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>       at 
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>       at 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: org.apache.thrift.TApplicationException: Internal error
>> processing get_range_slices
>>       at 
>> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>>       at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
>>       at 
>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
>>       at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255)
>>       ... 11 more
>> ---
>>
>> Looks like tokens used in ColumnFamilySplits
>> (ColumnFamilyInputFormat.java) are on wrapping ranges (left_token >
>> right_token).
>> Any ideas how to fix this?
>>
>> --
>> Regards,
>> Roman
>
>


Re: Re: word_count example fails in multi-node configuration

2010-11-29 Thread Bingbing Liu
try the OrderPreservingPartitioner


2010-11-30 



Bingbing Liu 



发件人: RS 
发送时间: 2010-11-30  09:14:38 
收件人: user 
抄送: 
主题: Re: word_count example fails in multi-node configuration 
 
It occurs in 0.7 beta 3 and 0.7.0 rc 1.
Thank you, Jeremy. I will follow the ticket.
-Roman
On Tue, Nov 30, 2010 at 2:50 AM, Jeremy Hanna
 wrote:
> Roman:
>
> I commented on the ticket - would you mind answering on there?  
> https://issues.apache.org/jira/browse/CASSANDRA-1787
>
> Tx,
>
> Jeremy
>
> On Nov 29, 2010, at 3:14 AM, RS wrote:
>
>> Hi guys,
>>
>> I am trying to run word_count example from contrib directory (0.7 beta
>> 3 and 0.7.0 rc 1).
>> It works fine in a single-node configuration, but fails with 2+ nodes.
>>
>> It fails in the assert statement, which caused problems before
>> (https://issues.apache.org/jira/browse/CASSANDRA-1700).
>>
>> Here's a simple ring I have and error messages.
>> ---
>> Address Status State   LoadOwnsToken
>>
>> 143797990709940316224804537595633718982
>> 127.0.0.2   Up Normal  40.2 KB 51.38%
>> 61078635599166706937511052402724559481
>> 127.0.0.1   Up Normal  36.01 KB48.62%
>> 143797990709940316224804537595633718982
>> ---
>> [SERVER SIDE]
>>
>> ERROR 17:39:57,098 Fatal exception in thread Thread[ReadStage:4,5,main]
>> java.lang.AssertionError:
>> (143797990709940316224804537595633718982,61078635599166706937511052402724559481]
>>   at 
>> org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1273)
>>   at 
>> org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:48)
>>   at 
>> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>>   at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>   at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>   at java.lang.Thread.run(Thread.java:619)
>> ---
>> [CLIENT_SIDE]
>> java.lang.RuntimeException: org.apache.thrift.TApplicationException:
>> Internal error processing get_range_slices
>>   at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:277)
>>   at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:292)
>>   at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:189)
>>   at 
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>>   at 
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>>   at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:148)
>>   at 
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
>>   at 
>> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>   at 
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
>> Caused by: org.apache.thrift.TApplicationException: Internal error
>> processing get_range_slices
>>   at 
>> org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
>>   at 
>> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:724)
>>   at 
>> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
>>   at 
>> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:255)
>>   ... 11 more
>> ---
>>
>> Looks like tokens used in ColumnFamilySplits
>> (ColumnFamilyInputFormat.java) are on wrapping ranges (left_token >
>> right_token).
>> Any ideas how to fix this?
>>
>> --
>> Regards,
>> Roman
>
>


Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Narendra Sharma
Is there any documentation available on what is possible with secondary
indexes? For eg
- Is it possible to define secondary index on columns within a SuperColumn?
- If I define a secondary index at run time, does Cassandra index all the
existing data or only new data is indexed?

Some documentation along with examples will be highly useful.

Thanks,
Naren


Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Jonathan Ellis
On Mon, Nov 29, 2010 at 7:59 PM, Narendra Sharma
 wrote:
> Is there any documentation available on what is possible with secondary
> indexes?

Not yet.

> - Is it possible to define secondary index on columns within a SuperColumn?

No.

> - If I define a secondary index at run time, does Cassandra index all the
> existing data or only new data is indexed?

The former.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Narendra Sharma
Thanks Jonathan.

Couple of more questions:
1. Is there any technical limit on the number of secondary indexes that can
be created?

2. Is it possible to execute join queries spanning multiple secondary
indexes?

Thanks,
Naren

On Mon, Nov 29, 2010 at 6:02 PM, Jonathan Ellis  wrote:

> On Mon, Nov 29, 2010 at 7:59 PM, Narendra Sharma
>  wrote:
> > Is there any documentation available on what is possible with secondary
> > indexes?
>
> Not yet.
>
> > - Is it possible to define secondary index on columns within a
> SuperColumn?
>
> No.
>
> > - If I define a secondary index at run time, does Cassandra index all the
> > existing data or only new data is indexed?
>
> The former.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Jonathan Ellis
On Mon, Nov 29, 2010 at 11:26 PM, Narendra Sharma
 wrote:
> Thanks Jonathan.
>
> Couple of more questions:
> 1. Is there any technical limit on the number of secondary indexes that can
> be created?

Just as with traditional databases, the more indexes there are the
slower writes to that CF will be.

> 2. Is it possible to execute join queries spanning multiple secondary
> indexes?

What do secondary indexes have to do with joins?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Narendra Sharma
On Mon, Nov 29, 2010 at 9:32 PM, Jonathan Ellis  wrote:

> On Mon, Nov 29, 2010 at 11:26 PM, Narendra Sharma
>  wrote:
> > Thanks Jonathan.
> >
> > Couple of more questions:
> > 1. Is there any technical limit on the number of secondary indexes that
> can
> > be created?
>
> Just as with traditional databases, the more indexes there are the
> slower writes to that CF will be.
>
> > 2. Is it possible to execute join queries spanning multiple secondary
> > indexes?
>
> What do secondary indexes have to do with joins?
>

For eg if I want to get all employees that are male and have age = 35 years.
How can secondary indexes be useful in such scenario?

>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Cassandra 0.7 - documentation on Secondary Indexes

2010-11-29 Thread Tyler Hobbs
The 'employees with age = 35' scenario is exactly what they are useful for.

There's a quick section in the pycassa documentation that might be useful:

http://pycassa.github.com/pycassa/tutorial.html#indexes

On Mon, Nov 29, 2010 at 11:41 PM, Narendra Sharma  wrote:

>
>
> On Mon, Nov 29, 2010 at 9:32 PM, Jonathan Ellis  wrote:
>
>> On Mon, Nov 29, 2010 at 11:26 PM, Narendra Sharma
>>  wrote:
>> > Thanks Jonathan.
>> >
>> > Couple of more questions:
>> > 1. Is there any technical limit on the number of secondary indexes that
>> can
>> > be created?
>>
>> Just as with traditional databases, the more indexes there are the
>> slower writes to that CF will be.
>>
>> > 2. Is it possible to execute join queries spanning multiple secondary
>> > indexes?
>>
>> What do secondary indexes have to do with joins?
>>
>
> For eg if I want to get all employees that are male and have age = 35
> years. How can secondary indexes be useful in such scenario?
>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>


Re: Achieving isolation on single row modifications with batch_mutate

2010-11-29 Thread Tyler Hobbs
In this case, it sounds like you should combine columns A and B if you
are writing them both at the same time, reading them both at the same
time, and need them to be consistent.

Obviously, you're probably dealing with more than two columns here, but
there's generally not any value in splitting something into multiple columns
if you're always writing and reading all of them at the same time.

Or are you talking about chunking huge blobs across a row?

- Tyler

On Sat, Nov 27, 2010 at 10:12 AM, E S  wrote:

> I'm trying to figure out the best way to achieve single row modification
> isolation for readers.
>
> As an example, I have 2 rows (1,2) with 2 columns (a,b).  If I modify both
> rows,
> I don't care if the user sees the write operations completed on 1 and not
> on 2
> for a short time period (seconds).  I also don't care if when reading row 1
> the
> user gets the new value, and then on a re-read gets the old value (within a
> few
> seconds).  Because of this, I have been planning on using a consistency
> level of
> one.
>
> However, if I modify both columns A,B on a single row, I need both changes
> on
> the row to be visible/invisible atomically.  It doesn't matter if they both
> become visible and then both invisible as the data propagates across nodes,
> but
> a half-completed state on an initial read will basically be returning
> corrupt
> data given my apps consistency requirements.  My understanding from the FAQ
> that
> this single row multicolumn change provides no read isolation, so I will
> have
> this problem.  Is this correct?  If so:
>
> Question 1:  Is there a way to get this type of isolation without using a
> distributed locking mechanism like cages?
>
> Question 2:  Are there any plans to implement this type of isolation within
> Cassandra?
>
> Question 3:  If I went with a distributed locking mechanism, what
> consistency
> level would I need to use with Cassandra?  Could I still get away with a
> consistency level of one?  It seems that if the initial write is done in a
> non-isolated way, but if cross-node row synchronizations are done all or
> nothing, I could still use one.
>
> Question 4:  Does anyone know of a good c# alternative to cages/zookeeper?
>
> Thanks for any help with this!
>
>
>
>
>


Re: get_count - cassandra 0.7.x predicate limit bug?

2010-11-29 Thread Tyler Hobbs
What error are you getting?

Remember, get_count() is still just about as much work for cassandra as
getting the whole row; the only advantage is it doesn't have to send the
whole row back to the client.

If you're counting 3+ million columns frequently, it's time to take a look
at counters.

- Tyler

On Fri, Nov 26, 2010 at 10:33 AM, Marcin  wrote:

> Hi guys,
>
> I have a key with 3million+ columns but when I am trying to run get_count
> on it its getting me error if setting limit more than 46000+ any ideas?
>
> In previous API there was no predicate at all so it was simply counting
> number of columns now its not so simple any more.
>
> Please let me know if that is a bug or I do something wrong.
>
>
> cheers,
> /Marcin
>


Re: Updating Cascal

2010-11-29 Thread Tyler Hobbs
Are you sure you're using the same key for batch_mutate() and get_slice()?
They appear different in the logs.

- Tyler

On Thu, Nov 25, 2010 at 10:14 AM, Michael Fortin  wrote:

> Hello,
> I forked Cascal  (Scala based client for cassandra) and I'm attempting to
> update it to cassandra 0.7.  I have it partially working, but I'm getting
> stuck on a few areas.
>
> I have most of the unit tests working from the original code, but I'm
> having an issue with batch_mutate(keyToFamilyMutations, consistency) .  Does
> the log output mean anything?  I can't figure out why the columns are not
> getting inserted.  If I change th code from a batch_mutate to an
> insert(family, parent, column, consistency) it works.
>
> ### keyToFamilyMutations: {java.nio.HeapByteBuffer[pos=0 lim=16
> cap=16]={Standard=[Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43
> 6F 6C 75 6D 6E 2D 61 2D 31, value:56 61 6C 75 65 2D 31,
> timestamp:1290662894466035))),
> Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F
> 6C 75 6D 6E 2D 61 2D 33, value:56 61 6C 75 65 2D 33,
> timestamp:1290662894467942))),
> Mutation(column_or_supercolumn:ColumnOrSuperColumn(column:Column(name:43 6F
> 6C 75 6D 6E 2D 61 2D 32, value:56 61 6C 75 65 2D 32,
> timestamp:1290662894467915)))]}}
> DEBUG 2010-11-25 00:28:14,534 [org.apache.cassandra.thrift.CassandraServer
> pool-1-thread-2] batch_mutate
> DEBUG 2010-11-25 00:28:14,583 [org.apache.cassandra.service.StorageProxy
> pool-1-thread-2] insert writing local RowMutation(keyspace='Test',
> key='ccfd5520f85411df858a001c4209', modifications=[Standard])
>
> DEBUG 2010-11-25 00:28:14,599 [org.apache.cassandra.thrift.CassandraServer
> pool-1-thread-2] get_slice
> DEBUG 2010-11-25 00:28:14,605 [org.apache.cassandra.service.StorageProxy
> pool-1-thread-2] weakread reading SliceFromReadCommand(table='Test',
> key='5374616e64617264',
> column_parent='QueryPath(columnFamilyName='Standard',
> superColumnName='null', columnName='null')', start='', finish='',
> reversed=false, count=2147483647) locally
> DEBUG 2010-11-25 00:28:14,608 [org.apache.cassandra.service.StorageProxy
> ReadStage:2] weakreadlocal reading SliceFromReadCommand(table='Test',
> key='5374616e64617264',
> column_parent='QueryPath(columnFamilyName='Standard',
> superColumnName='null', columnName='null')', start='', finish='',
> reversed=false, count=2147483647)
> ### get_slice: []
>
>
> The code looks like:
>  println("keyToFamilyMutations: %s".format(keyToFamilyMutations))
>  client.batch_mutate(keyToFamilyMutations, consistency)
>  …
>  client.client.get_slice(…)
>
> keyspaces:
>- name: Test
>  replica_placement_strategy:
> org.apache.cassandra.locator.SimpleStrategy
>  replication_factor: 1
>  column_families:
>- {name: Standard, compare_with: BytesType}
>
>
>
> Thanks,
> Mike


partial matching of keys

2010-11-29 Thread Arijit Mukherjee
Hi All

I was wondering if it is possible to match keys partially while
searching in Cassandra.

I have a requirement where I'm storing a large number of records, the
key being something like "A|B|T" where A and B are mobile numbers and
T is the time-stamp (the time when A called B). Such format ensure the
uniqueness of the keys. Now if I want to search for all records where
A called B, I would like to do a partial match with "A|B". Is this
possible?

I've another small question - where can I find some complete examples
of creating a cluster and communicating with it (for
insertion/deletion of records) using Hector or Pelops? So far, I've
been doing this via the Thrift interface, but it's becoming illegible
now...

Thanks in advance...

Regards
Arijit

-- 
"And when the night is cloudy,
There is still a light that shines on me,
Shine on until tomorrow, let it be."


Re: Introduction to Cassandra

2010-11-29 Thread Jim Morrison
Really great introduction, thanks Aaron.Bookmarked for the team. 

J. 

Sent from my iPhone

On 29 Nov 2010, at 21:11, Aaron Morton  wrote:

> I did a talk last week at the Wellington Rails User Group as a basic 
> introduction to Cassandra. The slides are here 
> http://www.slideshare.net/aaronmorton/well-railedcassandra24112010-5901169 if 
> anyone is interested. 
> 
> Cheers
> Aaron
> 


Re: partial matching of keys

2010-11-29 Thread Tyler Hobbs
Yes, you can basically do this two ways:

First, you can use an OrderPreservingPartitioner.  This stores your keys in
order, so you can grab the range of keys that begin with 'A|B'.  Because of
the drawbacks of OPP (unbalanced ring, hotspots), you almost certainly don't
want to do this.

Second, you take advantage of column name sorting.  For example, you can
have a row for all of the calls that A has made; each column name can be
something like 'B|T'. This allows you to quickly get all of the times when A
called
B in chronological order.  (You can have a second row or column family and
swap B
and T's position if you're more interested in time slices.)  This is very
much like the
Twitter clone, Twissandra:

https://github.com/ericflo/twissandra
http://twissandra.com/

As for examples, there are Hector examples here:

https://github.com/zznate/hector-examples

- Tyler

On Tue, Nov 30, 2010 at 12:11 AM, Arijit Mukherjee wrote:

> Hi All
>
> I was wondering if it is possible to match keys partially while
> searching in Cassandra.
>
> I have a requirement where I'm storing a large number of records, the
> key being something like "A|B|T" where A and B are mobile numbers and
> T is the time-stamp (the time when A called B). Such format ensure the
> uniqueness of the keys. Now if I want to search for all records where
> A called B, I would like to do a partial match with "A|B". Is this
> possible?
>
> I've another small question - where can I find some complete examples
> of creating a cluster and communicating with it (for
> insertion/deletion of records) using Hector or Pelops? So far, I've
> been doing this via the Thrift interface, but it's becoming illegible
> now...
>
> Thanks in advance...
>
> Regards
> Arijit
>
> --
> "And when the night is cloudy,
> There is still a light that shines on me,
> Shine on until tomorrow, let it be."
>