Re: Tree Search in Cassandra

2010-06-08 Thread David Boxenhorn
I am not worried about getting the occasional wrong result - if I were, I
couldn't use Cassandra. I am only worried about breaking the index as a
whole. If concurrent changes to the tree happen to modify the same record, I
don't mind if one of them "wins" as long as the result is a working tree.

On Tue, Jun 8, 2010 at 1:20 AM, Tatu Saloranta  wrote:

> On Mon, Jun 7, 2010 at 3:09 PM, Ian Soboroff  wrote:
> > I was going to say, if ordered trees are your problem, Cassandra is not
> your
> > solution. Try building something with Berkeley DB.
>
> Also -- while there are no official plans for this, there have been
> discussions on Voldemort list, wrt. possible future work to make some
> use of their pluggable backends.
> The most commonly used configuration is that of using BDBs; and
> supposedly it is not totally out of question to consider adding
> specific backend-dependant functionality in future.
> So it might make sense to ping Voldemort-ians too; it is another
> actively-developed distributed key/value store implementation, with
> slightly different trade-offs.
>
> -+ Tatu +-
>


Re: Using Cassandra via the Erlang Thrift Client API (HOW ??)

2010-06-08 Thread nilskyone

Greetings,


I am also exploring erlang and cassandra via thrift..

but when inserting i've encountered this error.


(t...@ubuntu)11> thrift_client:call(C,'insert',[
"Keyspace1","1",#columnPath{column_family="Standard1", column="email"},
"t...@example.com",1,1 ]).

=ERROR REPORT 8-Jun-2010::15:07:58 ===
** Generic server <0.118.0> terminating 
** Last message in was {call,insert,
 ["Keyspace1","1",
  {columnPath,"Standard1",undefined,"email"},
  "t...@example.com",1,1]}
** When Server state == {state,cassandra_thrift,
 {protocol,thrift_binary_protocol,
  {binary_protocol,
   {transport,thrift_buffered_transport,<0.119.0>},
   true,true}},
 0}
** Reason for termination == 
** {'module could not be loaded',
   [{cassandra_thrift,function_info,[insert,params_type]},
{thrift_client,send_function_call,3},
{thrift_client,'-handle_call/3-fun-0-',3},
{thrift_client,catch_function_exceptions,2},
{thrift_client,handle_call,3},
{gen_server,handle_msg,5},
{proc_lib,init_p_do_apply,3}]}
** exception exit: undef
 in function  cassandra_thrift:function_info/2
called as cassandra_thrift:function_info(insert,params_type)
 in call from thrift_client:send_function_call/3
 in call from thrift_client:'-handle_call/3-fun-0-'/3
 in call from thrift_client:catch_function_exceptions/2
 in call from thrift_client:handle_call/3
 in call from gen_server:handle_msg/5
 in call from proc_lib:init_p_do_apply/3


Is there anyone who encountered the problem above. :)


thanks in advanced :)

- Niel Riddle :)

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-via-the-Erlang-Thrift-Client-API-HOW-tp4672926p5152514.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Tree Search in Cassandra

2010-06-08 Thread Tatu Saloranta
On Tue, Jun 8, 2010 at 12:07 AM, David Boxenhorn  wrote:
> I am not worried about getting the occasional wrong result - if I were, I
> couldn't use Cassandra. I am only worried about breaking the index as a
> whole. If concurrent changes to the tree happen to modify the same record, I
> don't mind if one of them "wins" as long as the result is a working tree.

Right: I would expect it not to be just an occasional missing or extra
update, but rather corruption of the whole thing. The whole point of
b-tree (and alike) structures is to bucket up set of things, splitting
up and merging buckets. Seemingly minor flaws during that processing
can FUBAR the structure itself.

Or maybe I am completely misunderstanding how you were thinking of
implementing this.

-+ Tatu +-


Re: Tree Search in Cassandra

2010-06-08 Thread David Boxenhorn
As I said above, I was wondering if I could come up with a robust algorithm,
e.g. creating the new super columns and then attaching them at the end,
which will not FUBAR my index if it fails.

On Tue, Jun 8, 2010 at 10:53 AM, Tatu Saloranta wrote:

> On Tue, Jun 8, 2010 at 12:07 AM, David Boxenhorn 
> wrote:
> > I am not worried about getting the occasional wrong result - if I were, I
> > couldn't use Cassandra. I am only worried about breaking the index as a
> > whole. If concurrent changes to the tree happen to modify the same
> record, I
> > don't mind if one of them "wins" as long as the result is a working tree.
>
> Right: I would expect it not to be just an occasional missing or extra
> update, but rather corruption of the whole thing. The whole point of
> b-tree (and alike) structures is to bucket up set of things, splitting
> up and merging buckets. Seemingly minor flaws during that processing
> can FUBAR the structure itself.
>
> Or maybe I am completely misunderstanding how you were thinking of
> implementing this.
>
> -+ Tatu +-
>


Re: Cassandra on flash storage

2010-06-08 Thread Héctor Izquierdo

On 08/06/10 03:17, Shuai Yuan wrote:
Would you please tell the performance you measured? Although I don't 
have any experience relating to flash-drive, I'm very interested in 
switching to SSD.



I don't have any benchmark around, but I can tell that those io-drives 
are incredibly fast, not to mention the access time, which is amazing. 
There are some tests in the mysqlperformance, for example:


http://www.mysqlperformanceblog.com/2009/05/01/raid-vs-ssd-vs-fusionio/

Our system is memory constrained (just 16GB per machine), so I though 
that the io-drives would help a lot. The random io is basically free.


Regards


Duplicate a node (replication).

2010-06-08 Thread xavier manach
Hi.

  I have a cluster with only 1 node with a lot of datas (500 Go) .
  I want add a new node with the same datas (with a ReplicationFactor 2)

The method normal is :
stop node.
add a node.
change replication factor to 2.
start nodes
use nodetool repair

  But , I didn't know if this other method is valid, and if it's can be
faster :
stop nodes.
copy all SSTables
change replication factor.
start nodes
and
use nodetool repair

  Have you an idea for the faster valid method ?

Thx.


Re: Cassandra on flash storage

2010-06-08 Thread Jonathan Ellis
cassandra is designed to do less random i/o than b-tree based systems
like tokyo cabinet.  ssds are not as useful for most workloads.

On Mon, Jun 7, 2010 at 8:37 AM, Héctor Izquierdo  wrote:
> Hi everyone.
>
> I wanted to know if anybody has had any experience with cassandra on flash
> storage. At work we have a cluster of 6 machines running Tokyotyrant on
> flash-io drives (320GB) each, but performance is not what we expected, and
> we'are having some issues with replication and availability. It's also hard
> to manage, and adding/removing nodes is pure hell.
>
> We can't afford test hardware with flash storage right now, so could
> somebody share his experience?
>
> Thank you very much
>
> Héctor Izquierdo
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Handling disk-full scenarios

2010-06-08 Thread Ian Soboroff
And three days later, AE stages are still running full-bore.  So I conclude
this is not a very good approach.

I wonder what will happen when I lose a disk (which is essentially the same
as what I did -- rm the data directory).  What happens if I lose a disk
while the AE stages are running?  Since my RF is 3, I assume that I have
data loss when three disks are gone.

Not very happy.  I'm going to blow away what I have, do another reload, then
try dropping a disk again, just to confirm the results... I can't really
believe this is how it should happen.

Ian

On Fri, Jun 4, 2010 at 12:50 PM, Ian Soboroff  wrote:

> Story continued, in hopes this experience is useful to someone...
>
> I shut down the node, removed the huge file, restarted the node, and told
> everybody to repair.  Two days later, AE stages are still running.
>
> Ian
>
>
> On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis  wrote:
>
>> this is why JBOD configuration is contraindicated for cassandra.
>> http://wiki.apache.org/cassandra/CassandraHardware
>>
>> On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff  wrote:
>> > My nodes have 5 disks and are using them separately as data disks.  The
>> > usage on the disks is not uniform, and one is nearly full.  Is there
>> some
>> > way to manually balance the files across the disks?  Pretty much
>> anything
>> > done via nodetool incurs an anticompaction with obviously fails.
>> system/ is
>> > not the problem, it's in my data's keyspace.
>> >
>> > Ian
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>>
>
>


Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Ted Zlatanov
On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook  wrote: 

JS> The point is to get the "last" super-column.
...
JS> Is the Perl Thrift client problematic, or is there something else that
JS> I am missing?

Try Net::Cassandra::Easy; if it does what you want, look at the debug
output or trace the code to see how the predicate is specified so you
can duplicate that in your own code.

In general yes, the Perl Thrift interface is problematic.  It's slow and
semantically inconsistent.

Ted



Cassandra won't start after node crash

2010-06-08 Thread Lucas Di Pentima
Hello,

I've had a server crash, and after rebooting I cannot start the Cassandra 
instance, it's a one-node cluster. I'm running cassandra 0.6.1 on Debian Linux 
and jre 1.6.0_12.

Is my data lost, should I recreate the DB?

The error message is:


 INFO 12:46:30,823 Auto DiskAccessMode determined to be standard
 INFO 12:46:31,084 Sampling index for 
/usr/local/cassandra/data/system/LocationInfo-9-Data.db
 INFO 12:46:31,084 Sampling index for 
/usr/local/cassandra/data/system/LocationInfo-10-Data.db
 INFO 12:46:31,084 Sampling index for 
/usr/local/cassandra/data/system/LocationInfo-11-Data.db
 INFO 12:46:31,135 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignCampaignRuns-469-Data.db
 INFO 12:46:31,135 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignCampaignRuns-470-Data.db
 INFO 12:46:31,135 Sampling index for 
/usr/local/cassandra/data/Empire/Open-85-Data.db
 INFO 12:46:35,772 Sampling index for 
/usr/local/cassandra/data/Empire/Open-106-Data.db
 INFO 12:46:36,864 Sampling index for 
/usr/local/cassandra/data/Empire/Open-283-Data.db
 INFO 12:46:37,228 Sampling index for 
/usr/local/cassandra/data/Empire/Open-372-Data.db
 INFO 12:46:37,436 Sampling index for 
/usr/local/cassandra/data/Empire/Open-526-Data.db
 INFO 12:46:37,644 Sampling index for 
/usr/local/cassandra/data/Empire/Open-535-Data.db
 INFO 12:46:37,644 Sampling index for 
/usr/local/cassandra/data/Empire/Open-536-Data.db
 INFO 12:46:37,644 Sampling index for 
/usr/local/cassandra/data/Empire/Open-537-Data.db
ERROR 12:46:37,644 Corrupt file 
/usr/local/cassandra/data/Empire/Open-537-Data.db; skipped
java.io.UTFDataFormatException: malformed input around byte 0
at java.io.DataInputStream.readUTF(DataInputStream.java:639)
at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
at 
org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248)
at org.apache.cassandra.db.Table.(Table.java:338)
at org.apache.cassandra.db.Table.open(Table.java:199)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177)
 INFO 12:46:37,644 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunClickStream-9-Data.db
 INFO 12:46:37,644 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunClickStream-454-Data.db
 INFO 12:46:37,696 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunOpenStream-9-Data.db
 INFO 12:46:37,696 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunOpenStream-14-Data.db
 INFO 12:46:37,696 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunOpenStream-27-Data.db
 INFO 12:46:37,748 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunOpenStream-456-Data.db
ERROR 12:46:37,748 Corrupt file 
/usr/local/cassandra/data/Empire/CampaignRunOpenStream-456-Data.db; skipped
java.io.UTFDataFormatException: malformed input around byte 48
at java.io.DataInputStream.readUTF(DataInputStream.java:617)
at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
at 
org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125)
at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114)
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178)
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248)
at org.apache.cassandra.db.Table.(Table.java:338)
at org.apache.cassandra.db.Table.open(Table.java:199)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177)
 INFO 12:46:37,748 Sampling index for 
/usr/local/cassandra/data/Empire/Click-21-Data.db
 INFO 12:46:38,788 Sampling index for 
/usr/local/cassandra/data/Empire/Click-26-Data.db
 INFO 12:46:39,048 Sampling index for 
/usr/local/cassandra/data/Empire/Click-259-Data.db
 INFO 12:46:39,412 Sampling index for 
/usr/local/cassandra/data/Empire/Click-476-Data.db
 INFO 12:46:39,464 Sampling index for 
/usr/local/cassandra/data/Empire/Click-477-Data.db
 INFO 12:46:39,464 Sampling index for 
/usr/local/cassandra/data/Empire/Click-478-Data.db
 INFO 12:46:39,464 Sampling index for 
/usr/local/cassandra/data/Empire/CampaignRunUniqueOpen-9-Data.db
 INFO 12:46:39,464 Sampling index for 
/usr/local/cassandr

Re: Tree Search in Cassandra

2010-06-08 Thread Tatu Saloranta
On Tue, Jun 8, 2010 at 1:28 AM, David Boxenhorn  wrote:
> As I said above, I was wondering if I could come up with a robust algorithm,
> e.g. creating the new super columns and then attaching them at the end,
> which will not FUBAR my index if it fails.
>

Is this append-only? That is, never delete or insert in the middle? If
so, it might be easier to build something like this.

-+ Tatu +-


Re: Tree Search in Cassandra

2010-06-08 Thread David Boxenhorn
No, there will be deletes and inserts in the middle. But I can assume that
the index will only grow. There will be few deletes.

On Tue, Jun 8, 2010 at 7:04 PM, Tatu Saloranta  wrote:

> On Tue, Jun 8, 2010 at 1:28 AM, David Boxenhorn  wrote:
> > As I said above, I was wondering if I could come up with a robust
> algorithm,
> > e.g. creating the new super columns and then attaching them at the end,
> > which will not FUBAR my index if it fails.
> >
>
> Is this append-only? That is, never delete or insert in the middle? If
> so, it might be easier to build something like this.
>
> -+ Tatu +-
>


Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Jonathan Shook
I was misreading the result with the original slice range.
I should have been expecting exactly 2 ColumnOrSuperColumns, which is
what I got. I was erroneously expecting only 1.

Thanks!
Jonathan


2010/6/8 Ted Zlatanov :
> On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook  wrote:
>
> JS> The point is to get the "last" super-column.
> ...
> JS> Is the Perl Thrift client problematic, or is there something else that
> JS> I am missing?
>
> Try Net::Cassandra::Easy; if it does what you want, look at the debug
> output or trace the code to see how the predicate is specified so you
> can duplicate that in your own code.
>
> In general yes, the Perl Thrift interface is problematic.  It's slow and
> semantically inconsistent.
>
> Ted
>
>


Re: Duplicate a node (replication).

2010-06-08 Thread Jonathan Ellis
yes, if you're going from 1 to 2 then

1. nodetool drain & stop original node
2. copy everything from *your keyspaces* in data/ directories (but not
system keyspace!) to new node
3. start both nodes with replicationfactor=2 and autobootstrap=false
[the default]

will be faster.

On Tue, Jun 8, 2010 at 7:12 AM, xavier manach  wrote:
> Hi.
>
>   I have a cluster with only 1 node with a lot of datas (500 Go) .
>   I want add a new node with the same datas (with a ReplicationFactor 2)
>
> The method normal is :
> stop node.
> add a node.
> change replication factor to 2.
> start nodes
> use nodetool repair
>
>   But , I didn't know if this other method is valid, and if it's can be
> faster :
> stop nodes.
> copy all SSTables
> change replication factor.
> start nodes
> and
> use nodetool repair
>
>   Have you an idea for the faster valid method ?
>
> Thx.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Compaction bringing a node to its knees

2010-06-08 Thread Jonathan Ellis
I'm curious, did this help at all?

On Sat, May 29, 2010 at 3:03 PM, Jonathan Ellis  wrote:
> You could try setting the compaction thread to a lower priority.  You
> could add a thread priority to NamedThreadPool, and pass that up from
> CompactionExecutor constructor.  According to
> http://www.javamex.com/tutorials/threads/priority_what.shtml you have
> to run as root and add a JVM option to get this to work.
>
> On Sat, May 29, 2010 at 2:55 PM, James Golick  wrote:
>> I just experienced a compaction that brought a node to 100% of its IO
>> capacity and made its responses incredibly slow.
>> It wasn't enough to make the node actually appear as down, though, so it
>> slowed down the operation of the cluster considerably.
>> The CF being compacted contains a lot of relatively wide rows (hundreds of
>> thousands or millions of columns on the big end). Is that the problem?
>> Any suggestions on how to minimize impact here?

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Jonathan Shook
Possible bug...

Using a slice range with the empty sentinel values, and a count of 1
sometimes yields 2 ColumnOrSuperColumns, sometimes 1.
The inconsistency had lead me to believe that the count was not
working, hence the additional confusion.

There was a particular key which returns exactly 2
ColumnOrSuperColumns. This happened repeatedly, even when other data
was inserted before or after. All of the other keys were returning the
expected 1 ColumnOrSuperColumn.

Once I added a 4th super column to the key in question, it started
behaving the same as the others, yielding exactly 1
ColumnOrSuperColumn.

here is the code: for the predicate:

my $predicate = new Cassandra::SlicePredicate();
my $slice_range = new Cassandra::SliceRange();
$slice_range->{start} = '';
$slice_range->{finish} = '';
$slice_range->{reversed} = 1;
$slice_range->{count} = 1;
$predicate->{slice_range} = $slice_range;

The columns are in the right order (reversed), so I'll get what I need
by accessing only the first result in each slice. If I wanted to
iterate the returned list of slices, it would manifest as a bug in my
client.

(Cassandra 6.1/Thrift/Perl)


On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook  wrote:
> I was misreading the result with the original slice range.
> I should have been expecting exactly 2 ColumnOrSuperColumns, which is
> what I got. I was erroneously expecting only 1.
>
> Thanks!
> Jonathan
>
>
> 2010/6/8 Ted Zlatanov :
>> On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook  wrote:
>>
>> JS> The point is to get the "last" super-column.
>> ...
>> JS> Is the Perl Thrift client problematic, or is there something else that
>> JS> I am missing?
>>
>> Try Net::Cassandra::Easy; if it does what you want, look at the debug
>> output or trace the code to see how the predicate is specified so you
>> can duplicate that in your own code.
>>
>> In general yes, the Perl Thrift interface is problematic.  It's slow and
>> semantically inconsistent.
>>
>> Ted
>>
>>
>


Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Jonathan Ellis
that does sound like a bug.  can you give us the data to insert that
allows reproducing this?

On Tue, Jun 8, 2010 at 10:20 AM, Jonathan Shook  wrote:
> Possible bug...
>
> Using a slice range with the empty sentinel values, and a count of 1
> sometimes yields 2 ColumnOrSuperColumns, sometimes 1.
> The inconsistency had lead me to believe that the count was not
> working, hence the additional confusion.
>
> There was a particular key which returns exactly 2
> ColumnOrSuperColumns. This happened repeatedly, even when other data
> was inserted before or after. All of the other keys were returning the
> expected 1 ColumnOrSuperColumn.
>
> Once I added a 4th super column to the key in question, it started
> behaving the same as the others, yielding exactly 1
> ColumnOrSuperColumn.
>
> here is the code: for the predicate:
>
>        my $predicate = new Cassandra::SlicePredicate();
>        my $slice_range = new Cassandra::SliceRange();
>        $slice_range->{start} = '';
>        $slice_range->{finish} = '';
>        $slice_range->{reversed} = 1;
>        $slice_range->{count} = 1;
>        $predicate->{slice_range} = $slice_range;
>
> The columns are in the right order (reversed), so I'll get what I need
> by accessing only the first result in each slice. If I wanted to
> iterate the returned list of slices, it would manifest as a bug in my
> client.
>
> (Cassandra 6.1/Thrift/Perl)
>
>
> On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook  wrote:
>> I was misreading the result with the original slice range.
>> I should have been expecting exactly 2 ColumnOrSuperColumns, which is
>> what I got. I was erroneously expecting only 1.
>>
>> Thanks!
>> Jonathan
>>
>>
>> 2010/6/8 Ted Zlatanov :
>>> On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook  wrote:
>>>
>>> JS> The point is to get the "last" super-column.
>>> ...
>>> JS> Is the Perl Thrift client problematic, or is there something else that
>>> JS> I am missing?
>>>
>>> Try Net::Cassandra::Easy; if it does what you want, look at the debug
>>> output or trace the code to see how the predicate is specified so you
>>> can duplicate that in your own code.
>>>
>>> In general yes, the Perl Thrift interface is problematic.  It's slow and
>>> semantically inconsistent.
>>>
>>> Ted
>>>
>>>
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Perl/Thrift/Cassandra strangeness

2010-06-08 Thread Jonathan Shook
I can't divulge this particular test data, as it was borrowed from a
dataset which is not public.
I will see if I can reproduce the scenario, however, using other data
suitable for a bug report.

On Tue, Jun 8, 2010 at 2:18 PM, Jonathan Ellis  wrote:
> that does sound like a bug.  can you give us the data to insert that
> allows reproducing this?
>
> On Tue, Jun 8, 2010 at 10:20 AM, Jonathan Shook  wrote:
>> Possible bug...
>>
>> Using a slice range with the empty sentinel values, and a count of 1
>> sometimes yields 2 ColumnOrSuperColumns, sometimes 1.
>> The inconsistency had lead me to believe that the count was not
>> working, hence the additional confusion.
>>
>> There was a particular key which returns exactly 2
>> ColumnOrSuperColumns. This happened repeatedly, even when other data
>> was inserted before or after. All of the other keys were returning the
>> expected 1 ColumnOrSuperColumn.
>>
>> Once I added a 4th super column to the key in question, it started
>> behaving the same as the others, yielding exactly 1
>> ColumnOrSuperColumn.
>>
>> here is the code: for the predicate:
>>
>>        my $predicate = new Cassandra::SlicePredicate();
>>        my $slice_range = new Cassandra::SliceRange();
>>        $slice_range->{start} = '';
>>        $slice_range->{finish} = '';
>>        $slice_range->{reversed} = 1;
>>        $slice_range->{count} = 1;
>>        $predicate->{slice_range} = $slice_range;
>>
>> The columns are in the right order (reversed), so I'll get what I need
>> by accessing only the first result in each slice. If I wanted to
>> iterate the returned list of slices, it would manifest as a bug in my
>> client.
>>
>> (Cassandra 6.1/Thrift/Perl)
>>
>>
>> On Tue, Jun 8, 2010 at 11:18 AM, Jonathan Shook  wrote:
>>> I was misreading the result with the original slice range.
>>> I should have been expecting exactly 2 ColumnOrSuperColumns, which is
>>> what I got. I was erroneously expecting only 1.
>>>
>>> Thanks!
>>> Jonathan
>>>
>>>
>>> 2010/6/8 Ted Zlatanov :
 On Mon, 7 Jun 2010 17:20:56 -0500 Jonathan Shook  wrote:

 JS> The point is to get the "last" super-column.
 ...
 JS> Is the Perl Thrift client problematic, or is there something else that
 JS> I am missing?

 Try Net::Cassandra::Easy; if it does what you want, look at the debug
 output or trace the code to see how the predicate is specified so you
 can duplicate that in your own code.

 In general yes, the Perl Thrift interface is problematic.  It's slow and
 semantically inconsistent.

 Ted


>>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Handling disk-full scenarios

2010-06-08 Thread Jonathan Ellis
Sounds like you ran into
https://issues.apache.org/jira/browse/CASSANDRA-1169.  The only
workaround until that is fixed is to re-run repair.

On Tue, Jun 8, 2010 at 7:17 AM, Ian Soboroff  wrote:
> And three days later, AE stages are still running full-bore.  So I conclude
> this is not a very good approach.
>
> I wonder what will happen when I lose a disk (which is essentially the same
> as what I did -- rm the data directory).  What happens if I lose a disk
> while the AE stages are running?  Since my RF is 3, I assume that I have
> data loss when three disks are gone.
>
> Not very happy.  I'm going to blow away what I have, do another reload, then
> try dropping a disk again, just to confirm the results... I can't really
> believe this is how it should happen.
>
> Ian
>
> On Fri, Jun 4, 2010 at 12:50 PM, Ian Soboroff  wrote:
>>
>> Story continued, in hopes this experience is useful to someone...
>>
>> I shut down the node, removed the huge file, restarted the node, and told
>> everybody to repair.  Two days later, AE stages are still running.
>>
>> Ian
>>
>> On Thu, Jun 3, 2010 at 2:21 AM, Jonathan Ellis  wrote:
>>>
>>> this is why JBOD configuration is contraindicated for cassandra.
>>> http://wiki.apache.org/cassandra/CassandraHardware
>>>
>>> On Tue, Jun 1, 2010 at 1:08 PM, Ian Soboroff  wrote:
>>> > My nodes have 5 disks and are using them separately as data disks.  The
>>> > usage on the disks is not uniform, and one is nearly full.  Is there
>>> > some
>>> > way to manually balance the files across the disks?  Pretty much
>>> > anything
>>> > done via nodetool incurs an anticompaction with obviously fails.
>>> > system/ is
>>> > not the problem, it's in my data's keyspace.
>>> >
>>> > Ian
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of Riptano, the source for professional Cassandra support
>>> http://riptano.com
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra 0.6.2 with thrift hector-0.6.0-13

2010-06-08 Thread Jonathan Ellis
Java transports buffer internally.  there is no TBufferedTransport the
way there is in C#.

(moving to user@)

On Tue, Jun 8, 2010 at 10:31 AM, Subrata Roy  wrote:
> We are using Cassandra 0.6.2 with hector/thrift client, and our
> application performance is really slow. We are not sure that it is
> because of hector/thrift connection or not.  Jonathan E. and other
> people has suggested that using "TBuffferedTransport(TSocket) instead of
> a  TSocket directly" performance has drastically improved. If that is
> the case, why TBuffferedTransport (TSocket) is not added as part of the
> hector client by default? Is there any technical reason not to add as
> part of the Cassandra 0.6.2/ hector-0.6.0-13 release?
>
>
>
> Thanks in advance for your valuable input.
>
>
>
> Regards
>
> Subrata
>
>
>
> /// Code snippet from Hector client: CassandraClientFactory.java
>
>
>
> private Cassandra.Client createThriftClient(String  url, int port)
>
>      throws TTransportException , TException {
>
>    log.debug("Creating a new thrift connection to {}:{}", url, port);
>
>    TTransport tr;
>
>    if (useThriftFramedTransport) {
>
>      tr = new TFramedTransport(new TSocket(url, port, timeout));
>
>    } else {
>
>      tr = new TSocket(url, port, timeout);               --- change to
> TBuffferedTransport () for better performance
>
>    }
>
>    TProtocol proto = new TBinaryProtocol(tr);
>
>    Cassandra.Client client = new Cassandra.Client(proto);
>
>    try {
>
>      tr.open();
>
>    } catch (TTransportException e) {
>
>      // Thrift exceptions aren't very good in reporting, so we have to
> catch the exception here and
>
>      // add details to it.
>
>      log.error("Unable to open transport to " + url + ":" + port, e);
>
>      clientMonitor.incCounter(Counter.CONNECT_ERROR);
>
>      throw new TTransportException("Unable to open transport to " + url
> + ":" + port + " , " +
>
>          e.getLocalizedMessage(), e);
>
>    }
>
>    return client;
>
>  }
>
>
>
>
>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra won't start after node crash

2010-06-08 Thread Jonathan Ellis
Sounds like you had some bad hardware take down your index files.
(Cassandra fsyncs them after writing them and before renaming them to
being live, so if it's missing pieces then it's always been hardware
at fault that I have seen.

You could try rebuilding your index files from the data files, but
they may be toast, too.

So: step 1, run bin/sstable2json to make sure your data files are actually okay.

Step 2, rebuild your index files from your data files.

I can never muster up the energy to make an index rebuilder in Java.
So here's one in Python.

(I recommend testing this on a sstable + index pair that are known to
be good, before trusting it to rebuild a damaged index.  In particular
I think it might be broken with a 32bit python instead of 64bit.
Works On My Machine!)

# usage: buildindex  

import sys, struct, stat, os

infname, outfname = sys.argv[1:3]
if '-Data' not in infname:
raise Exception('%s does not look like a Cassandra data filename' % infname)

inf = open(infname, 'r')
outf = open(outfname, 'w')
fsize = os.stat(infname)[stat.ST_SIZE]

while inf.tell() < fsize:
# read current row key and write index entry
dataposition = inf.tell()
keysize, = struct.unpack('>H', inf.read(2))
key = inf.read(keysize)
outf.write(struct.pack('>H', keysize))
outf.write(key)
outf.write(struct.pack('>q', dataposition))

# skip to the next row
datasize, = struct.unpack('>i', inf.read(4))
inf.seek(inf.tell() + datasize)


On Tue, Jun 8, 2010 at 8:56 AM, Lucas Di Pentima
 wrote:
> Hello,
>
> I've had a server crash, and after rebooting I cannot start the Cassandra 
> instance, it's a one-node cluster. I'm running cassandra 0.6.1 on Debian 
> Linux and jre 1.6.0_12.
>
> Is my data lost, should I recreate the DB?
>
> The error message is:
>
> 
>  INFO 12:46:30,823 Auto DiskAccessMode determined to be standard
>  INFO 12:46:31,084 Sampling index for 
> /usr/local/cassandra/data/system/LocationInfo-9-Data.db
>  INFO 12:46:31,084 Sampling index for 
> /usr/local/cassandra/data/system/LocationInfo-10-Data.db
>  INFO 12:46:31,084 Sampling index for 
> /usr/local/cassandra/data/system/LocationInfo-11-Data.db
>  INFO 12:46:31,135 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignCampaignRuns-469-Data.db
>  INFO 12:46:31,135 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignCampaignRuns-470-Data.db
>  INFO 12:46:31,135 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-85-Data.db
>  INFO 12:46:35,772 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-106-Data.db
>  INFO 12:46:36,864 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-283-Data.db
>  INFO 12:46:37,228 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-372-Data.db
>  INFO 12:46:37,436 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-526-Data.db
>  INFO 12:46:37,644 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-535-Data.db
>  INFO 12:46:37,644 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-536-Data.db
>  INFO 12:46:37,644 Sampling index for 
> /usr/local/cassandra/data/Empire/Open-537-Data.db
> ERROR 12:46:37,644 Corrupt file 
> /usr/local/cassandra/data/Empire/Open-537-Data.db; skipped
> java.io.UTFDataFormatException: malformed input around byte 0
>        at java.io.DataInputStream.readUTF(DataInputStream.java:639)
>        at java.io.RandomAccessFile.readUTF(RandomAccessFile.java:887)
>        at 
> org.apache.cassandra.io.SSTableReader.loadIndexFile(SSTableReader.java:261)
>        at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:125)
>        at org.apache.cassandra.io.SSTableReader.open(SSTableReader.java:114)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:178)
>        at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:248)
>        at org.apache.cassandra.db.Table.(Table.java:338)
>        at org.apache.cassandra.db.Table.open(Table.java:199)
>        at 
> org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:91)
>        at 
> org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:177)
>  INFO 12:46:37,644 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignRunClickStream-9-Data.db
>  INFO 12:46:37,644 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignRunClickStream-454-Data.db
>  INFO 12:46:37,696 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignRunOpenStream-9-Data.db
>  INFO 12:46:37,696 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignRunOpenStream-14-Data.db
>  INFO 12:46:37,696 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignRunOpenStream-27-Data.db
>  INFO 12:46:37,748 Sampling index for 
> /usr/local/cassandra/data/Empire/CampaignRunOpenStream-456-Data.db
> ERROR 12:46:37,748 Corrupt file 
> /usr/local/cassandra/data/Empire/Campaig

Re: Getting keys in a range sorted with respect to last access time

2010-06-08 Thread Jonathan Ellis
On Mon, Jun 7, 2010 at 9:04 AM, Utku Can Topçu  wrote:
> Hey All,
>
> First of all I'll start with some questions on the default behavior of
> get_range_slices method defined in the thrift API.
>
> Given a keyrange with start-key "kstart" and end-key "kend", assuming
> kstart * Is it true that I'll get the range [kstart,kend) (kstart inclusive, kend
> exclusive)?

[start, end]

> * What's the default order of the rows in the result list? (assuming I am
> using an OPP)

lexically by unicode code point

> * (How) can we reverse the sorting order?

write your own ReversedOPP.  but maybe you mean "how do we scan in
reversed order," in which case the answer is, "extend
ColumnFamilyStore.getRangeRows" (not for the faint of heart, but not
impossible).

> * What would be the behavior in the case kstart>kend? Will I get an empty
> result list?

pretty sure it will error out.  easy to verify experimentally.

> Secondly, I have use case where I need to access the latest updated rows?
> How can this be possible? Writing a new partitioner?

No.  You'd want to maintain a separate row containing the most recent updates.

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Performance Characteristics of CASSANDRA-16 (Memory Efficient Compactions)

2010-06-08 Thread Jonathan Ellis
of course.  compaction is always O(N) with the size of the data

On Mon, Jun 7, 2010 at 9:51 AM, Jeremy Davis
 wrote:
> Reads, ok.. What about Compactions? Is the cost of compacting going to be
> ever increasing with the number of columns?
>
>
>
> On Sat, Jun 5, 2010 at 7:30 AM, Jonathan Ellis  wrote:
>>
>> #16 is very simple: it allows you to make very large rows.  That is all.
>>
>> Other things being equal, doing reads from really big rows will be
>> slower (since the row index will take longer to read) and this patch
>> does not change this.
>>
>> On Fri, Jun 4, 2010 at 5:47 PM, Jeremy Davis
>>  wrote:
>> >
>> > https://issues.apache.org/jira/browse/CASSANDRA-16
>> >
>> > Can someone (Jonathan?)  help me understand the performance
>> > characteristics
>> > of this patch?
>> > Specifically: If I have an open ended CF, and I keep inserting with ever
>> > increasing column names (for example current Time), will things
>> > generally
>> > work out ok performance wise? Or will I pay some ever increasing penalty
>> > with the number of entries?
>> >
>> > My assumption is that you have bucketed things up for me by column name
>> > order, and as long as I don't delete/modify/create a column in one of
>> > the
>> > old buckets, then things will work out ok. Or is this not at all what is
>> > going on?
>> >
>> > Thanks,
>> > -JD
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Cassandra on flash storage

2010-06-08 Thread Eric Yu
But I think use SSD can boost read performance, this is the main problem now
for us to use Cassandra.

On Tue, Jun 8, 2010 at 10:16 PM, Jonathan Ellis  wrote:

> cassandra is designed to do less random i/o than b-tree based systems
> like tokyo cabinet.  ssds are not as useful for most workloads.
>
> On Mon, Jun 7, 2010 at 8:37 AM, Héctor Izquierdo 
> wrote:
> > Hi everyone.
> >
> > I wanted to know if anybody has had any experience with cassandra on
> flash
> > storage. At work we have a cluster of 6 machines running Tokyotyrant on
> > flash-io drives (320GB) each, but performance is not what we expected,
> and
> > we'are having some issues with replication and availability. It's also
> hard
> > to manage, and adding/removing nodes is pure hell.
> >
> > We can't afford test hardware with flash storage right now, so could
> > somebody share his experience?
> >
> > Thank you very much
> >
> > Héctor Izquierdo
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: Re: Range search on keys not working?

2010-06-08 Thread sina
what's the mean of opp? And How can i make the "start" and "finish" useful and 
make sense?


2010-06-09 



9527



发件人: Ben Browning 
发送时间: 2010-06-02  21:08:57 
收件人: user 
抄送: 
主题: Re: Range search on keys not working? 
 
They exist because when using OPP they are useful and make sense.
On Wed, Jun 2, 2010 at 8:59 AM, David Boxenhorn  wrote:
> So why do the "start" and "finish" range parameters exist?
>
> On Wed, Jun 2, 2010 at 3:53 PM, Ben Browning  wrote:
>>
>> Martin,
>>
>> On Wed, Jun 2, 2010 at 8:34 AM, Dr. Martin Grabmüller
>>  wrote:
>> > I think you can specify an end key, but it should be a key which does
>> > exist
>> > in your column family.
>>
>>
>> Logically, it doesn't make sense to ever specify an end key with
>> random partitioner. If you specified a start key of "aaa" and and end
>> key of "aac" you might get back as results "aaa", "zfc", "hik", etc.
>> And, even if you have a key of "aab" it might not show up. Key ranges
>> only make sense with order-preserving partitioner. The only time to
>> ever use a key range with random partitioner is when you want to
>> iterate over all keys in the CF.
>>
>> Ben
>>
>>
>> > But maybe I'm off the track here and someone else here knows more about
>> > this
>> > key range stuff.
>> >
>> > Martin
>> >
>> > 
>> > From: David Boxenhorn [mailto:da...@lookin2.com]
>> > Sent: Wednesday, June 02, 2010 2:30 PM
>> > To: user@cassandra.apache.org
>> > Subject: Re: Range search on keys not working?
>> >
>> > In other words, I should check the values as I iterate, and stop
>> > iterating
>> > when I get out of range?
>> >
>> > I'll try that!
>> >
>> > On Wed, Jun 2, 2010 at 3:15 PM, Dr. Martin Grabmüller
>> >  wrote:
>> >>
>> >> When not using OOP, you should not use something like 'CATEGORY/' as
>> >> the
>> >> end key.
>> >> Use the empty string as the end key and limit the number of returned
>> >> keys,
>> >> as you did with
>> >> the 'max' value.
>> >>
>> >> If I understand correctly, the end key is used to generate an end token
>> >> by
>> >> hashing it, and
>> >> there is not the same correspondence between 'CATEGORY' and 'CATEGORY/'
>> >> as
>> >> for
>> >> hash('CATEGORY') and hash('CATEGORY/').
>> >>
>> >> At least, this was the explanation I gave myself when I had the same
>> >> problem.
>> >>
>> >> The solution is to iterate through the keys by always using the last
>> >> key
>> >> returned as the
>> >> start key for the next call to get_range_slices, and the to drop the
>> >> first
>> >> element from
>> >> the result.
>> >>
>> >> HTH,
>> >>   Martin
>> >>
>> >> 
>> >> From: David Boxenhorn [mailto:da...@lookin2.com]
>> >> Sent: Wednesday, June 02, 2010 2:01 PM
>> >> To: user@cassandra.apache.org
>> >> Subject: Re: Range search on keys not working?
>> >>
>> >> The previous thread where we discussed this is called, "key is sorted?"
>> >>
>> >>
>> >> On Wed, Jun 2, 2010 at 2:56 PM, David Boxenhorn 
>> >> wrote:
>> >>>
>> >>> I'm not using OPP. But I was assured on earlier threads (I asked
>> >>> several
>> >>> times to be sure) that it would work as stated below: the results
>> >>> would not
>> >>> be ordered, but they would be correct.
>> >>>
>> >>> On Wed, Jun 2, 2010 at 2:51 PM, Torsten Curdt 
>> >>> wrote:
>> 
>>  Sounds like you are not using an order preserving partitioner?
>> 
>>  On Wed, Jun 2, 2010 at 13:48, David Boxenhorn 
>>  wrote:
>>  > Range search on keys is not working for me. I was assured in
>>  > earlier
>>  > threads
>>  > that range search would work, but the results would not be ordered.
>>  >
>>  > I'm trying to get all the rows that start with "CATEGORY."
>>  >
>>  > I'm doing:
>>  >
>>  > String start = "CATEGORY.";
>>  > .
>>  > .
>>  > .
>>  > keyspace.getSuperRangeSlice(columnParent, slicePredicate, start,
>>  > "CATEGORY/", max)
>>  > .
>>  > .
>>  > .
>>  >
>>  > in a loop, setting start to the last key each time - but I'm
>>  > getting
>>  > rows
>>  > that don't start with "CATEGORY."!!
>>  >
>>  > How do I get all rows that start with "CATEGORY."?
>> >>>
>> >>
>> >
>> >
>
>
__ Information from ESET NOD32 Antivirus, version of virus signature 
database 5164 (20100601) __
The message was checked by ESET NOD32 Antivirus.
http://www.eset.com


Data loss and corruption

2010-06-08 Thread Hector Urroz
Hi all,

We're starting to prototype Cassandra for use in a production system and
became concerned about data corruption after reading the excellent article:

http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/

where Evan Weaver writes:

"Cassandra is an alpha product and could, theoretically, lose your data. In
particular, if you change the schema specified in the storage-conf.xml file,
you must follow these instructions carefully, or corruption will occur (this
is going to be fixed). Also, the on-disk storage format is subject to
change, making upgrading a bit difficult."

Is database corruption a well-known or common problem with Cassandra? What
sources of information would you recommend to help devise a strategy to
minimize corruption risk, and to detect and recover when corruption does
occur?

Thanks,

Hector Urroz


Seeds and AutoBoostrap

2010-06-08 Thread Per Olesen
Hi,

Just a quick question on seed nodes and auto bootstrap.

Am I correct in that a seed node won't be able to AutoBootstrap? And if so, 
will a seed node newly added to an existing cluster then not take long time 
before it actually starts getting any work to it? I mean, if it doesn't start 
with moving some data to itself, it will have to wait until new data comes in 
and is determined to live on that new node.
Correct?

/Per