Re: Truncate introspection

2011-06-28 Thread David Boxenhorn
Does drop work in a similar way?

When I drop a CF and add it back with a different schema, it seems to work.

But I notice that in between the drop and adding it back, when the CLI
tells me the CF doesn't exist, the old data is still there.

I've been assuming that this works, but just wanted to make sure...

On Tue, Jun 28, 2011 at 12:56 AM, Jonathan Ellis  wrote:
> Each node (independently) has logic that guarantees that any writes
> processed before the truncate, will be wiped out.
>
> This does not mean that each node will wipe out the same data, or even
> that each node will process the truncate (which would result in a
> timedoutexception).
>
> It also does not mean you can't have writes immediately after the
> truncate that would race w/ a "truncate, check for zero sstables"
> procedure.
>
> On Mon, Jun 27, 2011 at 3:35 PM, Ethan Rowe  wrote:
>> If those went to zero, it would certainly tell me something happened.  :)  I
>> guess watching that would be a way of seeing something was going on.
>> Is the truncate itself propagating a ring-wide marker or anything so the CF
>> is logically "empty" before being physically removed?  That's the impression
>> I got from the docs but it wasn't totally clear to me.
>>
>> On Mon, Jun 27, 2011 at 3:33 PM, Jonathan Ellis  wrote:
>>>
>>> There's a JMX method to get the number of sstables in a CF, is that
>>> what you're looking for?
>>>
>>> On Mon, Jun 27, 2011 at 1:04 PM, Ethan Rowe  wrote:
>>> > Is there any straightforward means of seeing what's going on after
>>> > issuing a
>>> > truncate (on 0.7.5)?  I'm not seeing evidence that anything actually
>>> > happened.  I've disabled read repair on the column family in question
>>> > and
>>> > don't have anything actively reading/writing at present, apart from my
>>> > one-off tests to see if rows have disappeared.
>>> > Thanks in advance.
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re : Re : get_range_slices result

2011-06-28 Thread karim abbouh
can i have an example for using    TimeUUIDType   as comparator in a client  
java code.




De : karim abbouh 
À : "user@cassandra.apache.org" 
Envoyé le : Lundi 27 Juin 2011 17h59
Objet : Re : Re : get_range_slices result


i used TimeUUIDType as type in storage-conf.xml file

 

and i used it as comparator in my java code,
but in the execution i get exception : 

Erreur --java.io.UnsupportedEncodingException: TimeUUIDType


how can i write it?

BR




De : David Boxenhorn 
À : user@cassandra.apache.org
Cc : karim abbouh 
Envoyé le : Vendredi 24 Juin 2011 11h25
Objet : Re: Re : get_range_slices result

You can get the best of both worlds by repeating the key in a column,
and creating a secondary index on that column.

On Fri, Jun 24, 2011 at 1:16 PM, Sylvain Lebresne  wrote:
> On Fri, Jun 24, 2011 at 10:21 AM, karim abbouh  wrote:
>> i want get_range_slices() function returns records sorted(orded)  by the
>> key(rowId) used during the insertion.
>> is
 it possible?
>
> You will have to use the OrderPreservingPartitioner. This is no
> without inconvenience however.
> See for instance
> http://wiki.apache.org/cassandra/StorageConfiguration#line-100 or
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
> that give more details on the pros and cons (the short version being
> that the main advantage of
> OrderPreservingPartitioner is what you're asking for, but it's main
> drawback is that load-balancing
> the cluster will likely be very very hard).
>
> In general the advice is to stick with RandomPartitioner and design a
> data model that avoids
 needing
> range slices (or at least needing that the result is sorted). This is
> very often not too hard and more
> efficient, and much more simpler than to deal with the load balancing
> problems of OrderPreservingPartitioner.
>
> --
> Sylvain
>
>>
>> 
>> De : aaron morton 
>> À : user@cassandra.apache.org
>> Envoyé le : Jeudi 23 Juin 2011 20h30
>> Objet : Re: get_range_slices result
>>
>> Not sure what your question is.
>> Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp
>> Cheers
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 23 Jun 2011, at 21:59, karim abbouh wrote:
>>
>> how can get_range_slices() function returns sorting key ?
>> BR
>>
>>
>>
>>
>

Re: Counter Column

2011-06-28 Thread Donal Zang

On 27/06/2011 19:19, Sylvain Lebresne wrote:

Let me make that simpler.

Don't ever use replicate_on_write=false (even if you "think" that it is
what you want, there is a good chance it's not).
Obviously, the default is replicate_on_write=true.
I may be wrong. But with 0.8.0, I think the default is 
replicate_on_write=false, you have to declare it explicitly.


--
Donal Zang
Computing Center, IHEP
19B YuquanLu, Shijingshan District,Beijing, 100049
zan...@ihep.ac.cn
86 010 8823 6018




Re: remove all the columns of a key in a column family

2011-06-28 Thread aaron morton
That error is thrown if you send a Deletion with a predicate that has neither 
columns or a SliceRange. 

Send a Deletion that does not have a predicate. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 18:11, Donna Li wrote:

> To delete all the columns for row send a Mutation where the Deletion has 
> neither a super_column or predicate 
> I test, but throw the exception “A SlicePredicate must be given a list of 
> Columns, a SliceRange, or both”
>  
> Best Regards
> Donna li
>  
> 发件人: aaron morton [mailto:aa...@thelastpickle.com] 
> 发送时间: 2011年6月28日 12:30
> 收件人: user@cassandra.apache.org
> 主题: Re: remove all the columns of a key in a column family
>  
> AFAIK that is still not supported. 
>  
> To delete all the columns for row send a Mutation where the Deletion has 
> neither a super_column or predicate 
>  
> Cheers
>  
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 28 Jun 2011, at 15:50, Donna Li wrote:
> 
> 
>  
> Cassandra version is 0.7.2, when I use batch_mutate, the following exception 
> throw “TException:Deletion does not yet support SliceRange predicates”, which 
> version support delete the whole row of a key?
>  
>  
> Best Regards
> Donna li
>  
> 发件人: Donna Li 
> 发送时间: 2011年6月28日 10:59
> 收件人: user@cassandra.apache.org
> 主题: remove all the columns of a key in a column family
>  
> All:
> Can I remove all the columns of a key in a column family under the condition 
> that not know what columns the column family has?
>  
>  
> Best Regards
> Donna li
>  



Re: Truncate introspection

2011-06-28 Thread aaron morton
Drop CF takes a snapshot of the CF first, and then marks SSTables on disk as 
compacted so they will be safely deleted later. Finally it removes the CF from 
the meta data. 

If you see the SSTables on disk, you should see 0 length .compacted files for 
every one of them. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 20:00, David Boxenhorn wrote:

> Does drop work in a similar way?
> 
> When I drop a CF and add it back with a different schema, it seems to work.
> 
> But I notice that in between the drop and adding it back, when the CLI
> tells me the CF doesn't exist, the old data is still there.
> 
> I've been assuming that this works, but just wanted to make sure...
> 
> On Tue, Jun 28, 2011 at 12:56 AM, Jonathan Ellis  wrote:
>> Each node (independently) has logic that guarantees that any writes
>> processed before the truncate, will be wiped out.
>> 
>> This does not mean that each node will wipe out the same data, or even
>> that each node will process the truncate (which would result in a
>> timedoutexception).
>> 
>> It also does not mean you can't have writes immediately after the
>> truncate that would race w/ a "truncate, check for zero sstables"
>> procedure.
>> 
>> On Mon, Jun 27, 2011 at 3:35 PM, Ethan Rowe  wrote:
>>> If those went to zero, it would certainly tell me something happened.  :)  I
>>> guess watching that would be a way of seeing something was going on.
>>> Is the truncate itself propagating a ring-wide marker or anything so the CF
>>> is logically "empty" before being physically removed?  That's the impression
>>> I got from the docs but it wasn't totally clear to me.
>>> 
>>> On Mon, Jun 27, 2011 at 3:33 PM, Jonathan Ellis  wrote:
 
 There's a JMX method to get the number of sstables in a CF, is that
 what you're looking for?
 
 On Mon, Jun 27, 2011 at 1:04 PM, Ethan Rowe  wrote:
> Is there any straightforward means of seeing what's going on after
> issuing a
> truncate (on 0.7.5)?  I'm not seeing evidence that anything actually
> happened.  I've disabled read repair on the column family in question
> and
> don't have anything actively reading/writing at present, apart from my
> one-off tests to see if rows have disappeared.
> Thanks in advance.
 
 
 
 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>> 



Re: Re : Re : get_range_slices result

2011-06-28 Thread aaron morton
First thing is you really should upgrade from 0.6, the current release is 0.8. 

Info on time uuid's
http://wiki.apache.org/cassandra/FAQ#working_with_timeuuid_in_java

If you are using a higher level client like Hector or Pelops it will take care 
of encoding for you. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 28 Jun 2011, at 22:20, karim abbouh wrote:

> can i have an example for usingTimeUUIDType   as comparator  in a client  
> java code.
> 
> De : karim abbouh 
> À : "user@cassandra.apache.org" 
> Envoyé le : Lundi 27 Juin 2011 17h59
> Objet : Re : Re : get_range_slices result
> 
> i used TimeUUIDType as type in storage-conf.xml file
>  
> 
> and i used it as comparator in my java code,
> but in the execution i get exception : 
> Erreur --java.io.UnsupportedEncodingException: TimeUUIDType
> 
> 
> how can i write it?
> 
> BR
> 
> De : David Boxenhorn 
> À : user@cassandra.apache.org
> Cc : karim abbouh 
> Envoyé le : Vendredi 24 Juin 2011 11h25
> Objet : Re: Re : get_range_slices result
> 
> You can get the best of both worlds by repeating the key in a column,
> and creating a secondary index on that column.
> 
> On Fri, Jun 24, 2011 at 1:16 PM, Sylvain Lebresne  
> wrote:
> > On Fri, Jun 24, 2011 at 10:21 AM, karim abbouh  wrote:
> >> i want get_range_slices() function returns records sorted(orded)  by the
> >> key(rowId) used during the insertion.
> >> is it possible?
> >
> > You will have to use the OrderPreservingPartitioner. This is no
> > without inconvenience however.
> > See for instance
> > http://wiki.apache.org/cassandra/StorageConfiguration#line-100 or
> > http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
> > that give more details on the pros and cons (the short version being
> > that the main advantage of
> > OrderPreservingPartitioner is what you're asking for, but it's main
> > drawback is that load-balancing
> > the cluster will likely be very very hard).
> >
> > In general the advice is to stick with RandomPartitioner and design a
> > data model that avoids needing
> > range slices (or at least needing that the result is sorted). This is
> > very often not too hard and more
> > efficient, and much more simpler than to deal with the load balancing
> > problems of OrderPreservingPartitioner.
> >
> > --
> > Sylvain
> >
> >>
> >> 
> >> De : aaron morton 
> >> À : user@cassandra.apache.org
> >> Envoyé le : Jeudi 23 Juin 2011 20h30
> >> Objet : Re: get_range_slices result
> >>
> >> Not sure what your question is.
> >> Does this help ? http://wiki.apache.org/cassandra/FAQ#range_rp
> >> Cheers
> >> -
> >> Aaron Morton
> >> Freelance Cassandra Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> On 23 Jun 2011, at 21:59, karim abbouh wrote:
> >>
> >> how can get_range_slices() function returns sorting key ?
> >> BR
> >>
> >>
> >>
> >>
> >
> 
> 
> 
> 



Re: Counter Column

2011-06-28 Thread Sylvain Lebresne
On Tue, Jun 28, 2011 at 12:53 PM, Donal Zang  wrote:
> On 27/06/2011 19:19, Sylvain Lebresne wrote:
>>
>> Let me make that simpler.
>>
>> Don't ever use replicate_on_write=false (even if you "think" that it is
>> what you want, there is a good chance it's not).
>> Obviously, the default is replicate_on_write=true.
>
> I may be wrong. But with 0.8.0, I think the default is
> replicate_on_write=false, you have to declare it explicitly.

No, after having checked, you are right, the default is false and it
is a bug (literally a bug, the default is true in the code, but it
doesn't get applied correctly and it end up defaulting to false
uniquely because it is the default value of a boolean in java --
https://issues.apache.org/jira/browse/CASSANDRA-2835). That bug will
be fixed in 0.8.2.

I sincerely apologize about that, you have to explicitly set
replicate_on_write to true for now. The rest stays true.

--
Sylvain

>
> --
> Donal Zang
> Computing Center, IHEP
> 19B YuquanLu, Shijingshan District,Beijing, 100049
> zan...@ihep.ac.cn
> 86 010 8823 6018
>
>
>


Re: Clock skew

2011-06-28 Thread Eric Evans
On Tue, 2011-06-28 at 11:54 +1200, aaron morton wrote:
> Without exception the timestamp is set by the client, not the server.
> The one exception to the without exception rule is CounterColumnType
> operations.

And CQL...

-- 
Eric Evans
eev...@rackspace.com



[RELEASE] Apache Cassandra 0.8.1 released

2011-06-28 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 0.8.1.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version fixes a number of bugs[1,3] and upgrade is highly encouraged. It
also ships with a few improvements that did not made it in time for 0.8.0.
Please see the release notes[2] for details, and more generally always pay
attention to the release notes before upgrading,

If you were to encounter any problem, please let us know[4].

Have fun!


[1]: http://goo.gl/qbvPB (CHANGES.txt)
[2]: http://goo.gl/7uQXl (NEWS.txt)
[3]: http://goo.gl/IbOQW (JIRA Release Notes)
[4]: https://issues.apache.org/jira/browse/CASSANDRA


custom reconciling columns?

2011-06-28 Thread Yang
for example, if I have an application that needs to read off a user browsing
history, and I model the user ID as the key,
and the history data within the row. with current approach, I could model
each visit as  a column,
the possible issue is that *possibly* (I'm still doing a lot of profiling on
this to verify) that a lot of time is spent on serialization into the
message and out of the
message, plus I do not need the full features provided by the column : for
example I do not need a timestamp on each visit, etc,
so it might be faster to put the entire history in a blob, and each visit
only takes up a few bytes in the blob, and
my code manipulates the blob.

problem is, I still need to avoid the read-before-write, so I send only the
latest visit, and let cassandra do the reconcile, which appends the
visit to the blob, so this needs custom reconcile behavior.

is there a way to incorporate such custom reconcile under current code
framework? (I see custom sorting, but no custom reconcile)

thanks
yang


Re: Error trying to move a node - 0.7

2011-06-28 Thread Ben Frank
Hey Aaron,
   I think you're right, I'm using version 0.7 and indeed the node I'm
trying to move is the only node in that data center - I'll steal some
hardware to add to the ring to confirm.

-Ben

On Sun, Jun 19, 2011 at 4:06 PM, aaron morton wrote:

> I *think* someone had a similar problem once before, moving a node that was
> the only node in a DC.
>
> Whats version are you using ?
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17 Jun 2011, at 07:42, Ben Frank wrote:
>
> > Hi All,
> >   I'm getting the following error when trying to move a nodes token:
> >
> > nodetool -h 145.6.92.82 -p 18080 move
> 56713727820156410577229101238628035242
> > cassandra.in.sh executing for environment DEV1
> > Exception in thread "main" java.lang.AssertionError
> >at
> >
> org.apache.cassandra.locator.TokenMetadata.firstTokenIndex(TokenMetadata.java:393)
> >at
> >
> org.apache.cassandra.locator.TokenMetadata.ringIterator(TokenMetadata.java:418)
> >at
> >
> org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:94)
> >at
> >
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:807)
> >at
> >
> org.apache.cassandra.service.StorageService.calculatePendingRanges(StorageService.java:773)
> >at
> >
> org.apache.cassandra.service.StorageService.startLeaving(StorageService.java:1468)
> >at
> >
> org.apache.cassandra.service.StorageService.move(StorageService.java:1605)
> >at
> >
> org.apache.cassandra.service.StorageService.move(StorageService.java:1580)
> > .
> > .
> > .
> >
> > my ring looks like this:
> >
> > Address Status State   LoadOwnsToken
> >
> > 113427455640312821154458202477256070484
> > 145.6.99.80  Up Normal  1.63 GB 36.05%
> > 4629135223504085509237477504287125589
> > 145.6.92.82  Up Normal  2.86 GB 1.09%
> > 6479163079760931522618457053473150444
> > 145.6.99.81  Up Normal  2.01 GB 62.86%
> > 113427455640312821154458202477256070484
> >
> >
> > '80' and '81' are configured to be in the East coast data center and '82'
> is
> > in the West
> >
> > Anyone shed any light as to what might be going on here?
> >
> > -Ben
>
>


Re: Clock skew

2011-06-28 Thread Dominic Williams
Hi, yes you are correct, and this is a potential problem.

IMPORTANT: If you need to serialize writes from your application servers,
for example using distributed locking, then before releasing locks you must
sleep for a period equal to the maximum variance between the clocks on your
application server nodes.

I had a problem with the clocks on my nodes which led to all kinds of
problems. There is a slightly out of date post, which may not mentioned the
above point, on my experiences here
http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/

Hope this helps
Dominic

On 27 June 2011 23:03, A J  wrote:

> During writes, the timestamp field in the column is the system-time of
> that node (correct me if that is not the case and the system-time of
> the co-ordinator is what gets applied to all the replicas).
> During reads, the latest write wins.
>
> What if there is a clock skew ? It could lead to a stale write
> over-riding the actual latest write, just because the clock of that
> node is ahead of the other node. Right ?
>


Query indexed column with key filter‏

2011-06-28 Thread Daning

I found this code

// Start and finish keys, *and* column relations (KEY>  foo AND KEY<  
bar and name1 = value1).
if (select.isKeyRange()&&  (select.getKeyFinish() != null)&&  
(select.getColumnRelations().size()>  0))
throw new InvalidRequestException("You cannot combine key range and 
by-column clauses in a SELECT");

in

http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/cql/QueryProcessor.java


This operation is exactly what I want - query by column then filter by 
key. I want to know why this query is not supported, and what's the good 
work around for it? At this moment my workaound is to create a column 
which is exactly same as key.


Thanks,

Daning


Re: Clock skew

2011-06-28 Thread A J
Thanks.

On Tue, Jun 28, 2011 at 1:31 PM, Dominic Williams
 wrote:
> Hi, yes you are correct, and this is a potential problem.
> IMPORTANT: If you need to serialize writes from your application servers,
> for example using distributed locking, then before releasing locks you must
> sleep for a period equal to the maximum variance between the clocks on your
> application server nodes.
> I had a problem with the clocks on my nodes which led to all kinds of
> problems. There is a slightly out of date post, which may not mentioned the
> above point, on my experiences
> here http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/
> Hope this helps
> Dominic
> On 27 June 2011 23:03, A J  wrote:
>>
>> During writes, the timestamp field in the column is the system-time of
>> that node (correct me if that is not the case and the system-time of
>> the co-ordinator is what gets applied to all the replicas).
>> During reads, the latest write wins.
>>
>> What if there is a clock skew ? It could lead to a stale write
>> over-riding the actual latest write, just because the clock of that
>> node is ahead of the other node. Right ?
>
>


Problem with PHPCassa accessing Indexes

2011-06-28 Thread Jean-Nicolas Boulay Desjardins
Hi,

I am having problem accessing data via an index with PHPCassa. I have
var_dump() the results:

array(6) { ["birthdate"]=> int(3546927995491989807) ["email"]=>
string(20) "jnbdzjn...@gmail.com" ["firstname"]=> string(12)
"Jean-Nicolas" ["lastname"]=> string(17) "Boulay Desjardins"
["password"]=> string(8) "password" ["username"]=> string(5) "jnbdz" }

object(cassandra_IndexExpression)#76 (3) { ["column_name"]=> string(5)
"email" ["op"]=> int(0) ["value"]=> string(20) "jnbdzjn...@gmail.com"
}

object(cassandra_IndexClause)#77 (3) { ["expressions"]=> array(1) {
[0]=> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=>
string(5) "email" ["op"]=> int(0) ["value"]=> string(20)
"jnbdzjn...@gmail.com" } } ["start_key"]=> string(0) "" ["count"]=>
int(100) }

Here is the code:

$columnFamily = CASSANDRA::selectColumnFamily('Users');

    $this->selectUser = $columnFamily->get('jnbdz');

    var_dump($this->selectUser);

    echo '';
    echo '';

    $index_exp =
CassandraUtil::create_index_expression('email',
'jnbdzjn...@gmail.com');
var_dump($index_exp);
    $index_clause =
CassandraUtil::create_index_clause(array($index_exp));
echo '';
echo '';
var_dump($index_clause);
    $rows = $column_family->get_indexed_slices($index_clause);
echo '';
echo '';
var_dump($rows);
    var_dump($row);

Thanks in advance for any help


Re: Problem with PHPCassa accessing Indexes

2011-06-28 Thread Tyler Hobbs
The result of get_indexed_slices() is an Iterator object, not an
array.  It doesn't look like you're treating it accordingly.

See the bottom of this section for an example:
http://thobbs.github.com/phpcassa/tutorial.html#indexes

On Tue, Jun 28, 2011 at 2:06 PM, Jean-Nicolas Boulay Desjardins
 wrote:
> Hi,
>
> I am having problem accessing data via an index with PHPCassa. I have
> var_dump() the results:
>
> array(6) { ["birthdate"]=> int(3546927995491989807) ["email"]=>
> string(20) "jnbdzjn...@gmail.com" ["firstname"]=> string(12)
> "Jean-Nicolas" ["lastname"]=> string(17) "Boulay Desjardins"
> ["password"]=> string(8) "password" ["username"]=> string(5) "jnbdz" }
>
> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=> string(5)
> "email" ["op"]=> int(0) ["value"]=> string(20) "jnbdzjn...@gmail.com"
> }
>
> object(cassandra_IndexClause)#77 (3) { ["expressions"]=> array(1) {
> [0]=> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=>
> string(5) "email" ["op"]=> int(0) ["value"]=> string(20)
> "jnbdzjn...@gmail.com" } } ["start_key"]=> string(0) "" ["count"]=>
> int(100) }
>
> Here is the code:
>
> $columnFamily = CASSANDRA::selectColumnFamily('Users');
>
>     $this->selectUser = $columnFamily->get('jnbdz');
>
>     var_dump($this->selectUser);
>
>     echo '';
>     echo '';
>
>     $index_exp =
> CassandraUtil::create_index_expression('email',
> 'jnbdzjn...@gmail.com');
> var_dump($index_exp);
>     $index_clause =
> CassandraUtil::create_index_clause(array($index_exp));
> echo '';
> echo '';
> var_dump($index_clause);
>     $rows = $column_family->get_indexed_slices($index_clause);
> echo '';
> echo '';
> var_dump($rows);
>     var_dump($row);
>
> Thanks in advance for any help
>



-- 
Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library


Re: Problem with PHPCassa accessing Indexes

2011-06-28 Thread Jean-Nicolas Boulay Desjardins
Actually I am not getting any results from: get_indexed_slices()

It seems my code dies at: $rows =
$column_family->get_indexed_slices($index_clause);

Because everything after that is echo is not shown on the page.

Plus I don't get any errors.

Any ideas?

On Tue, Jun 28, 2011 at 3:23 PM, Tyler Hobbs  wrote:
> The result of get_indexed_slices() is an Iterator object, not an
> array.  It doesn't look like you're treating it accordingly.
>
> See the bottom of this section for an example:
> http://thobbs.github.com/phpcassa/tutorial.html#indexes
>
> On Tue, Jun 28, 2011 at 2:06 PM, Jean-Nicolas Boulay Desjardins
>  wrote:
>> Hi,
>>
>> I am having problem accessing data via an index with PHPCassa. I have
>> var_dump() the results:
>>
>> array(6) { ["birthdate"]=> int(3546927995491989807) ["email"]=>
>> string(20) "jnbdzjn...@gmail.com" ["firstname"]=> string(12)
>> "Jean-Nicolas" ["lastname"]=> string(17) "Boulay Desjardins"
>> ["password"]=> string(8) "password" ["username"]=> string(5) "jnbdz" }
>>
>> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=> string(5)
>> "email" ["op"]=> int(0) ["value"]=> string(20) "jnbdzjn...@gmail.com"
>> }
>>
>> object(cassandra_IndexClause)#77 (3) { ["expressions"]=> array(1) {
>> [0]=> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=>
>> string(5) "email" ["op"]=> int(0) ["value"]=> string(20)
>> "jnbdzjn...@gmail.com" } } ["start_key"]=> string(0) "" ["count"]=>
>> int(100) }
>>
>> Here is the code:
>>
>> $columnFamily = CASSANDRA::selectColumnFamily('Users');
>>
>>     $this->selectUser = $columnFamily->get('jnbdz');
>>
>>     var_dump($this->selectUser);
>>
>>     echo '';
>>     echo '';
>>
>>     $index_exp =
>> CassandraUtil::create_index_expression('email',
>> 'jnbdzjn...@gmail.com');
>> var_dump($index_exp);
>>     $index_clause =
>> CassandraUtil::create_index_clause(array($index_exp));
>> echo '';
>> echo '';
>> var_dump($index_clause);
>>     $rows = $column_family->get_indexed_slices($index_clause);
>> echo '';
>> echo '';
>> var_dump($rows);
>>     var_dump($row);
>>
>> Thanks in advance for any help
>>
>
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>



-- 
Name / Nom: Boulay Desjardins, Jean-Nicolas
Website / Site Web: www.jeannicolas.com


Server-side CQL parameters substitution

2011-06-28 Thread Michal Augustýn
Hi all,

in most SQL implementations, it's possible to declare parameters in
SQL command text (i.e. "SELECT * FROM T WHERE Id=@myId"). Then the
client application sends this SQL command and parameters values
separately - the server is responsible for the parameters
substitution.

In CQL API (~the "execute_cql_query" method), we must compose the
command (~substitute the parameters) in client application, the same
code must be re-implemented in all drivers (Java, Python, Node.js,
.NET, ...) respectively. And that's IMHO tedious and error prone.

So do you/we plane to improve CQL API in this way?

Thanks!

Augi

P.S.: Yes, I'm working on .NET driver and I'm too lazy to implement
client-side parameters substitution ;-)


Re: Clock skew

2011-06-28 Thread AJ
Yikes!  I just read your blog Dominic.  Now I'm worried since my app was 
going to be mostly cloud-based.  But, you didn't mention anything about 
sleeping for 'max clock variance' after making the ntp-related config 
changes (maybe you haven't had the time to blog it).


I'm curious, do you think the sleep is required even in a 
non-virtualized environment?  Is it only needed when implementing some 
kind of lock?  Does the type of lock make a difference?


Thanks!
aj (the other one)

On 6/28/2011 11:31 AM, Dominic Williams wrote:

Hi, yes you are correct, and this is a potential problem.

IMPORTANT: If you need to serialize writes from your application 
servers, for example using distributed locking, then before releasing 
locks you must sleep for a period equal to the maximum variance 
between the clocks on your application server nodes.


I had a problem with the clocks on my nodes which led to all kinds of 
problems. There is a slightly out of date post, which may not 
mentioned the above point, on my experiences here 
http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/


Hope this helps
Dominic

On 27 June 2011 23:03, A J > wrote:


During writes, the timestamp field in the column is the system-time of
that node (correct me if that is not the case and the system-time of
the co-ordinator is what gets applied to all the replicas).
During reads, the latest write wins.

What if there is a clock skew ? It could lead to a stale write
over-riding the actual latest write, just because the clock of that
node is ahead of the other node. Right ?






Re: Problem with PHPCassa accessing Indexes

2011-06-28 Thread Tyler Hobbs
What does the ouput of 'describe keyspace ' show for the
keyspace the CF is in?

On Tue, Jun 28, 2011 at 2:35 PM, Jean-Nicolas Boulay Desjardins
 wrote:
> Actually I am not getting any results from: get_indexed_slices()
>
> It seems my code dies at: $rows =
> $column_family->get_indexed_slices($index_clause);
>
> Because everything after that is echo is not shown on the page.
>
> Plus I don't get any errors.
>
> Any ideas?
>
> On Tue, Jun 28, 2011 at 3:23 PM, Tyler Hobbs  wrote:
>> The result of get_indexed_slices() is an Iterator object, not an
>> array.  It doesn't look like you're treating it accordingly.
>>
>> See the bottom of this section for an example:
>> http://thobbs.github.com/phpcassa/tutorial.html#indexes
>>
>> On Tue, Jun 28, 2011 at 2:06 PM, Jean-Nicolas Boulay Desjardins
>>  wrote:
>>> Hi,
>>>
>>> I am having problem accessing data via an index with PHPCassa. I have
>>> var_dump() the results:
>>>
>>> array(6) { ["birthdate"]=> int(3546927995491989807) ["email"]=>
>>> string(20) "jnbdzjn...@gmail.com" ["firstname"]=> string(12)
>>> "Jean-Nicolas" ["lastname"]=> string(17) "Boulay Desjardins"
>>> ["password"]=> string(8) "password" ["username"]=> string(5) "jnbdz" }
>>>
>>> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=> string(5)
>>> "email" ["op"]=> int(0) ["value"]=> string(20) "jnbdzjn...@gmail.com"
>>> }
>>>
>>> object(cassandra_IndexClause)#77 (3) { ["expressions"]=> array(1) {
>>> [0]=> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=>
>>> string(5) "email" ["op"]=> int(0) ["value"]=> string(20)
>>> "jnbdzjn...@gmail.com" } } ["start_key"]=> string(0) "" ["count"]=>
>>> int(100) }
>>>
>>> Here is the code:
>>>
>>> $columnFamily = CASSANDRA::selectColumnFamily('Users');
>>>
>>>     $this->selectUser = $columnFamily->get('jnbdz');
>>>
>>>     var_dump($this->selectUser);
>>>
>>>     echo '';
>>>     echo '';
>>>
>>>     $index_exp =
>>> CassandraUtil::create_index_expression('email',
>>> 'jnbdzjn...@gmail.com');
>>> var_dump($index_exp);
>>>     $index_clause =
>>> CassandraUtil::create_index_clause(array($index_exp));
>>> echo '';
>>> echo '';
>>> var_dump($index_clause);
>>>     $rows = $column_family->get_indexed_slices($index_clause);
>>> echo '';
>>> echo '';
>>> var_dump($rows);
>>>     var_dump($row);
>>>
>>> Thanks in advance for any help
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax
>> Maintainer of the pycassa Cassandra Python client library
>>
>
>
>
> --
> Name / Nom: Boulay Desjardins, Jean-Nicolas
> Website / Site Web: www.jeannicolas.com
>



-- 
Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library


Re: custom reconciling columns?

2011-06-28 Thread aaron morton
There is no facility to do custom reconciliation for a column. An append style 
operation would run into many of the same problems as the Counter type, e.g. 
not every node may get an append and there is a chance for lost appends unless 
you go to all the trouble Counter's do. 

I would go with using a row for the user and columns for each item. Then you 
can have fast no look writes. 

What problems are you seeing with the reads ?

Cheers


-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 04:20, Yang wrote:

> for example, if I have an application that needs to read off a user browsing 
> history, and I model the user ID as the key,
> and the history data within the row. with current approach, I could model 
> each visit as  a column, 
> the possible issue is that *possibly* (I'm still doing a lot of profiling on 
> this to verify) that a lot of time is spent on serialization into the message 
> and out of the
> message, plus I do not need the full features provided by the column : for 
> example I do not need a timestamp on each visit, etc,
> so it might be faster to put the entire history in a blob, and each visit 
> only takes up a few bytes in the blob, and 
> my code manipulates the blob.
> 
> problem is, I still need to avoid the read-before-write, so I send only the 
> latest visit, and let cassandra do the reconcile, which appends the
> visit to the blob, so this needs custom reconcile behavior.  
> 
> is there a way to incorporate such custom reconcile under current code 
> framework? (I see custom sorting, but no custom reconcile)
> 
> thanks
> yang



Re: Problem with PHPCassa accessing Indexes

2011-06-28 Thread Jean-Nicolas Boulay Desjardins
Sorry, my mistake. The variable name was wrong. Weird, I did not get any errors.

Thanks anyways.

But I do Have a another question. When looking in cassandra-cli I did
"get Users[jnbdz];" and I got:

A long is exactly 8 bytes: 10

And I don't get the data.

Am I missing something?

Thanks in advance.

On Tue, Jun 28, 2011 at 6:00 PM, Tyler Hobbs  wrote:
> What does the ouput of 'describe keyspace ' show for the
> keyspace the CF is in?
>
> On Tue, Jun 28, 2011 at 2:35 PM, Jean-Nicolas Boulay Desjardins
>  wrote:
>> Actually I am not getting any results from: get_indexed_slices()
>>
>> It seems my code dies at: $rows =
>> $column_family->get_indexed_slices($index_clause);
>>
>> Because everything after that is echo is not shown on the page.
>>
>> Plus I don't get any errors.
>>
>> Any ideas?
>>
>> On Tue, Jun 28, 2011 at 3:23 PM, Tyler Hobbs  wrote:
>>> The result of get_indexed_slices() is an Iterator object, not an
>>> array.  It doesn't look like you're treating it accordingly.
>>>
>>> See the bottom of this section for an example:
>>> http://thobbs.github.com/phpcassa/tutorial.html#indexes
>>>
>>> On Tue, Jun 28, 2011 at 2:06 PM, Jean-Nicolas Boulay Desjardins
>>>  wrote:
 Hi,

 I am having problem accessing data via an index with PHPCassa. I have
 var_dump() the results:

 array(6) { ["birthdate"]=> int(3546927995491989807) ["email"]=>
 string(20) "jnbdzjn...@gmail.com" ["firstname"]=> string(12)
 "Jean-Nicolas" ["lastname"]=> string(17) "Boulay Desjardins"
 ["password"]=> string(8) "password" ["username"]=> string(5) "jnbdz" }

 object(cassandra_IndexExpression)#76 (3) { ["column_name"]=> string(5)
 "email" ["op"]=> int(0) ["value"]=> string(20) "jnbdzjn...@gmail.com"
 }

 object(cassandra_IndexClause)#77 (3) { ["expressions"]=> array(1) {
 [0]=> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=>
 string(5) "email" ["op"]=> int(0) ["value"]=> string(20)
 "jnbdzjn...@gmail.com" } } ["start_key"]=> string(0) "" ["count"]=>
 int(100) }

 Here is the code:

 $columnFamily = CASSANDRA::selectColumnFamily('Users');

     $this->selectUser = $columnFamily->get('jnbdz');

     var_dump($this->selectUser);

     echo '';
     echo '';

     $index_exp =
 CassandraUtil::create_index_expression('email',
 'jnbdzjn...@gmail.com');
 var_dump($index_exp);
     $index_clause =
 CassandraUtil::create_index_clause(array($index_exp));
 echo '';
 echo '';
 var_dump($index_clause);
     $rows = $column_family->get_indexed_slices($index_clause);
 echo '';
 echo '';
 var_dump($rows);
     var_dump($row);

 Thanks in advance for any help

>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> Software Engineer, DataStax
>>> Maintainer of the pycassa Cassandra Python client library
>>>
>>
>>
>>
>> --
>> Name / Nom: Boulay Desjardins, Jean-Nicolas
>> Website / Site Web: www.jeannicolas.com
>>
>
>
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library
>



-- 
Name / Nom: Boulay Desjardins, Jean-Nicolas
Website / Site Web: www.jeannicolas.com


Re: custom reconciling columns?

2011-06-28 Thread Yang
I can see that as my user history grows, the reads time proportionally ( or
faster than linear) grows.
if my business requirements ask me to keep a month's history for each user,
it could become too slow.- I was suspecting that it's actually the
serializing and deserializing that's taking time (I can definitely it's cpu
bound)



On Tue, Jun 28, 2011 at 3:04 PM, aaron morton wrote:

> There is no facility to do custom reconciliation for a column. An append
> style operation would run into many of the same problems as the Counter
> type, e.g. not every node may get an append and there is a chance for lost
> appends unless you go to all the trouble Counter's do.
>
> I would go with using a row for the user and columns for each item. Then
> you can have fast no look writes.
>
> What problems are you seeing with the reads ?
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29 Jun 2011, at 04:20, Yang wrote:
>
> > for example, if I have an application that needs to read off a user
> browsing history, and I model the user ID as the key,
> > and the history data within the row. with current approach, I could model
> each visit as  a column,
> > the possible issue is that *possibly* (I'm still doing a lot of profiling
> on this to verify) that a lot of time is spent on serialization into the
> message and out of the
> > message, plus I do not need the full features provided by the column :
> for example I do not need a timestamp on each visit, etc,
> > so it might be faster to put the entire history in a blob, and each visit
> only takes up a few bytes in the blob, and
> > my code manipulates the blob.
> >
> > problem is, I still need to avoid the read-before-write, so I send only
> the latest visit, and let cassandra do the reconcile, which appends the
> > visit to the blob, so this needs custom reconcile behavior.
> >
> > is there a way to incorporate such custom reconcile under current code
> framework? (I see custom sorting, but no custom reconcile)
> >
> > thanks
> > yang
>
>


Re: Query indexed column with key filter‏

2011-06-28 Thread aaron morton
Currently these are two different types of query, using a key range is 
equivalent to the get_range_slices() API function and column clauses is a 
get_indexed_slices() call. So you would be asking for a potentially painful 
join between.

Creating a column with the same value as the key sounds reasonable. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 05:31, Daning wrote:

> I found this code
> 
>// Start and finish keys, *and* column relations (KEY>  foo AND KEY<  
> bar and name1 = value1).
>if (select.isKeyRange()&&  (select.getKeyFinish() != null)&&  
> (select.getColumnRelations().size()>  0))
>throw new InvalidRequestException("You cannot combine key range 
> and by-column clauses in a SELECT");
> 
> in
> 
> http://svn.apache.org/repos/asf/cassandra/trunk/src/java/org/apache/cassandra/cql/QueryProcessor.java
> 
> 
> This operation is exactly what I want - query by column then filter by key. I 
> want to know why this query is not supported, and what's the good work around 
> for it? At this moment my workaound is to create a column which is exactly 
> same as key.
> 
> Thanks,
> 
> Daning



Re: Server-side CQL parameters substitution

2011-06-28 Thread aaron morton
see https://issues.apache.org/jira/browse/CASSANDRA-2475

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 08:45, Michal Augustýn wrote:

> Hi all,
> 
> in most SQL implementations, it's possible to declare parameters in
> SQL command text (i.e. "SELECT * FROM T WHERE Id=@myId"). Then the
> client application sends this SQL command and parameters values
> separately - the server is responsible for the parameters
> substitution.
> 
> In CQL API (~the "execute_cql_query" method), we must compose the
> command (~substitute the parameters) in client application, the same
> code must be re-implemented in all drivers (Java, Python, Node.js,
> .NET, ...) respectively. And that's IMHO tedious and error prone.
> 
> So do you/we plane to improve CQL API in this way?
> 
> Thanks!
> 
> Augi
> 
> P.S.: Yes, I'm working on .NET driver and I'm too lazy to implement
> client-side parameters substitution ;-)



Re: custom reconciling columns?

2011-06-28 Thread aaron morton
Can you provide some more info:

- how big are the rows, e.g. number of columns and column size  ? 
- how much data are you asking for ? 
- what sort of read query are you using ? 
- what sort of numbers are you seeing ?
- are you deleting columns or using TTL ? 

I would consider issues with the data churn, data model and query before 
looking at serialisation. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 29 Jun 2011, at 10:37, Yang wrote:

> I can see that as my user history grows, the reads time proportionally ( or 
> faster than linear) grows.
> if my business requirements ask me to keep a month's history for each user, 
> it could become too slow.- I was suspecting that it's actually the 
> serializing and deserializing that's taking time (I can definitely it's cpu 
> bound)
> 
> 
> 
> On Tue, Jun 28, 2011 at 3:04 PM, aaron morton  wrote:
> There is no facility to do custom reconciliation for a column. An append 
> style operation would run into many of the same problems as the Counter type, 
> e.g. not every node may get an append and there is a chance for lost appends 
> unless you go to all the trouble Counter's do.
> 
> I would go with using a row for the user and columns for each item. Then you 
> can have fast no look writes.
> 
> What problems are you seeing with the reads ?
> 
> Cheers
> 
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29 Jun 2011, at 04:20, Yang wrote:
> 
> > for example, if I have an application that needs to read off a user 
> > browsing history, and I model the user ID as the key,
> > and the history data within the row. with current approach, I could model 
> > each visit as  a column,
> > the possible issue is that *possibly* (I'm still doing a lot of profiling 
> > on this to verify) that a lot of time is spent on serialization into the 
> > message and out of the
> > message, plus I do not need the full features provided by the column : for 
> > example I do not need a timestamp on each visit, etc,
> > so it might be faster to put the entire history in a blob, and each visit 
> > only takes up a few bytes in the blob, and
> > my code manipulates the blob.
> >
> > problem is, I still need to avoid the read-before-write, so I send only the 
> > latest visit, and let cassandra do the reconcile, which appends the
> > visit to the blob, so this needs custom reconcile behavior.
> >
> > is there a way to incorporate such custom reconcile under current code 
> > framework? (I see custom sorting, but no custom reconcile)
> >
> > thanks
> > yang
> 
> 



Re: custom reconciling columns?

2011-06-28 Thread Nate McCall
I agree with Aaron's suggestion on data model and query here. Since
there is a time component, you can split the row on a fixed duration
for a given user, so the row key would become userId_[timestamp
rounded to day].

This provides you an easy way to roll up the information for the date
ranges you need since the key suffix can be created without a read.
This also benefits from spreading the read load over the cluster
instead of just the replicas since you have 30 rows in this case
instead of one.

On Tue, Jun 28, 2011 at 5:55 PM, aaron morton  wrote:
> Can you provide some more info:
> - how big are the rows, e.g. number of columns and column size  ?
> - how much data are you asking for ?
> - what sort of read query are you using ?
> - what sort of numbers are you seeing ?
> - are you deleting columns or using TTL ?
> I would consider issues with the data churn, data model and query before
> looking at serialisation.
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 29 Jun 2011, at 10:37, Yang wrote:
>
> I can see that as my user history grows, the reads time proportionally ( or
> faster than linear) grows.
> if my business requirements ask me to keep a month's history for each user,
> it could become too slow.- I was suspecting that it's actually the
> serializing and deserializing that's taking time (I can definitely it's cpu
> bound)
>
>
> On Tue, Jun 28, 2011 at 3:04 PM, aaron morton 
> wrote:
>>
>> There is no facility to do custom reconciliation for a column. An append
>> style operation would run into many of the same problems as the Counter
>> type, e.g. not every node may get an append and there is a chance for lost
>> appends unless you go to all the trouble Counter's do.
>>
>> I would go with using a row for the user and columns for each item. Then
>> you can have fast no look writes.
>>
>> What problems are you seeing with the reads ?
>>
>> Cheers
>>
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 29 Jun 2011, at 04:20, Yang wrote:
>>
>> > for example, if I have an application that needs to read off a user
>> > browsing history, and I model the user ID as the key,
>> > and the history data within the row. with current approach, I could
>> > model each visit as  a column,
>> > the possible issue is that *possibly* (I'm still doing a lot of
>> > profiling on this to verify) that a lot of time is spent on serialization
>> > into the message and out of the
>> > message, plus I do not need the full features provided by the column :
>> > for example I do not need a timestamp on each visit, etc,
>> > so it might be faster to put the entire history in a blob, and each
>> > visit only takes up a few bytes in the blob, and
>> > my code manipulates the blob.
>> >
>> > problem is, I still need to avoid the read-before-write, so I send only
>> > the latest visit, and let cassandra do the reconcile, which appends the
>> > visit to the blob, so this needs custom reconcile behavior.
>> >
>> > is there a way to incorporate such custom reconcile under current code
>> > framework? (I see custom sorting, but no custom reconcile)
>> >
>> > thanks
>> > yang
>>
>
>
>


8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Oleg Tsvinev
Hi,

First of all, thank you for releasing v8.0.1 and congrats! the list of fixes
and improvements is impressive.
Is there any ETA for Debian package? Is there a (standard) way to build it
from sources?

Thank you,
  Oleg


Re: 8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Dan Kuebrich
0.8.1 should be up--I've already installed it.  Here's directions:
http://wiki.apache.org/cassandra/DebianPackaging

On Tue, Jun 28, 2011 at 8:24 PM, Oleg Tsvinev wrote:

> Hi,
>
> First of all, thank you for releasing v8.0.1 and congrats! the list of
> fixes and improvements is impressive.
> Is there any ETA for Debian package? Is there a (standard) way to build it
> from sources?
>
> Thank you,
>   Oleg
>


Re: 8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Oleg Tsvinev
Thank you Dan! But I only see 0.8.0 there :(

On Tue, Jun 28, 2011 at 5:35 PM, Dan Kuebrich wrote:

> 0.8.1 should be up--I've already installed it.  Here's directions:
> http://wiki.apache.org/cassandra/DebianPackaging
>
>
> On Tue, Jun 28, 2011 at 8:24 PM, Oleg Tsvinev wrote:
>
>> Hi,
>>
>> First of all, thank you for releasing v8.0.1 and congrats! the list of
>> fixes and improvements is impressive.
>> Is there any ETA for Debian package? Is there a (standard) way to build it
>> from sources?
>>
>> Thank you,
>>   Oleg
>>
>
>


Re: 8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Dan Kuebrich
Try running   apt-get update   (as opposed to upgrade) to pull down the
latest listings from the repo.

On Tue, Jun 28, 2011 at 8:40 PM, Oleg Tsvinev wrote:

> Thank you Dan! But I only see 0.8.0 there :(
>
>
> On Tue, Jun 28, 2011 at 5:35 PM, Dan Kuebrich wrote:
>
>> 0.8.1 should be up--I've already installed it.  Here's directions:
>> http://wiki.apache.org/cassandra/DebianPackaging
>>
>>
>> On Tue, Jun 28, 2011 at 8:24 PM, Oleg Tsvinev wrote:
>>
>>> Hi,
>>>
>>> First of all, thank you for releasing v8.0.1 and congrats! the list of
>>> fixes and improvements is impressive.
>>> Is there any ETA for Debian package? Is there a (standard) way to build
>>> it from sources?
>>>
>>> Thank you,
>>>   Oleg
>>>
>>
>>
>


Re: 8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Oleg Tsvinev
Nope, only see 0.8.0. I updated sources in Synaptic Package manager, which
does the same as apt-get update.

On Tue, Jun 28, 2011 at 5:44 PM, Dan Kuebrich wrote:

> Try running   apt-get update   (as opposed to upgrade) to pull down the
> latest listings from the repo.
>
>
> On Tue, Jun 28, 2011 at 8:40 PM, Oleg Tsvinev wrote:
>
>> Thank you Dan! But I only see 0.8.0 there :(
>>
>>
>> On Tue, Jun 28, 2011 at 5:35 PM, Dan Kuebrich wrote:
>>
>>> 0.8.1 should be up--I've already installed it.  Here's directions:
>>> http://wiki.apache.org/cassandra/DebianPackaging
>>>
>>>
>>> On Tue, Jun 28, 2011 at 8:24 PM, Oleg Tsvinev wrote:
>>>
 Hi,

 First of all, thank you for releasing v8.0.1 and congrats! the list of
 fixes and improvements is impressive.
 Is there any ETA for Debian package? Is there a (standard) way to build
 it from sources?

 Thank you,
   Oleg

>>>
>>>
>>
>


Re: RAID or no RAID

2011-06-28 Thread mcasandra

aaron morton wrote:
> 
>> Not sure what the intended purpose is, but we've mostly used it as an
>> emergency disk-capacity-increase option
> 
> Thats what I've used it for.  
> 
> Cheers
> 

How does compaction work in terms of utilizing multiple data dirs? Also, is
there a reference on wiki somewhere that says not to use multiple data dirs?


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/RAID-or-no-RAID-tp6522904p6527219.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Problem with PHPCassa accessing Indexes

2011-06-28 Thread Tyler Hobbs
The output of 'describe keyspace' would be useful for figuring this
out, at well.

On Tue, Jun 28, 2011 at 5:10 PM, Jean-Nicolas Boulay Desjardins
 wrote:
> Sorry, my mistake. The variable name was wrong. Weird, I did not get any 
> errors.
>
> Thanks anyways.
>
> But I do Have a another question. When looking in cassandra-cli I did
> "get Users[jnbdz];" and I got:
>
> A long is exactly 8 bytes: 10
>
> And I don't get the data.
>
> Am I missing something?
>
> Thanks in advance.
>
> On Tue, Jun 28, 2011 at 6:00 PM, Tyler Hobbs  wrote:
>> What does the ouput of 'describe keyspace ' show for the
>> keyspace the CF is in?
>>
>> On Tue, Jun 28, 2011 at 2:35 PM, Jean-Nicolas Boulay Desjardins
>>  wrote:
>>> Actually I am not getting any results from: get_indexed_slices()
>>>
>>> It seems my code dies at: $rows =
>>> $column_family->get_indexed_slices($index_clause);
>>>
>>> Because everything after that is echo is not shown on the page.
>>>
>>> Plus I don't get any errors.
>>>
>>> Any ideas?
>>>
>>> On Tue, Jun 28, 2011 at 3:23 PM, Tyler Hobbs  wrote:
 The result of get_indexed_slices() is an Iterator object, not an
 array.  It doesn't look like you're treating it accordingly.

 See the bottom of this section for an example:
 http://thobbs.github.com/phpcassa/tutorial.html#indexes

 On Tue, Jun 28, 2011 at 2:06 PM, Jean-Nicolas Boulay Desjardins
  wrote:
> Hi,
>
> I am having problem accessing data via an index with PHPCassa. I have
> var_dump() the results:
>
> array(6) { ["birthdate"]=> int(3546927995491989807) ["email"]=>
> string(20) "jnbdzjn...@gmail.com" ["firstname"]=> string(12)
> "Jean-Nicolas" ["lastname"]=> string(17) "Boulay Desjardins"
> ["password"]=> string(8) "password" ["username"]=> string(5) "jnbdz" }
>
> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=> string(5)
> "email" ["op"]=> int(0) ["value"]=> string(20) "jnbdzjn...@gmail.com"
> }
>
> object(cassandra_IndexClause)#77 (3) { ["expressions"]=> array(1) {
> [0]=> object(cassandra_IndexExpression)#76 (3) { ["column_name"]=>
> string(5) "email" ["op"]=> int(0) ["value"]=> string(20)
> "jnbdzjn...@gmail.com" } } ["start_key"]=> string(0) "" ["count"]=>
> int(100) }
>
> Here is the code:
>
> $columnFamily = CASSANDRA::selectColumnFamily('Users');
>
>     $this->selectUser = $columnFamily->get('jnbdz');
>
>     var_dump($this->selectUser);
>
>     echo '';
>     echo '';
>
>     $index_exp =
> CassandraUtil::create_index_expression('email',
> 'jnbdzjn...@gmail.com');
> var_dump($index_exp);
>     $index_clause =
> CassandraUtil::create_index_clause(array($index_exp));
> echo '';
> echo '';
> var_dump($index_clause);
>     $rows = $column_family->get_indexed_slices($index_clause);
> echo '';
> echo '';
> var_dump($rows);
>     var_dump($row);
>
> Thanks in advance for any help
>



 --
 Tyler Hobbs
 Software Engineer, DataStax
 Maintainer of the pycassa Cassandra Python client library

>>>
>>>
>>>
>>> --
>>> Name / Nom: Boulay Desjardins, Jean-Nicolas
>>> Website / Site Web: www.jeannicolas.com
>>>
>>
>>
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax
>> Maintainer of the pycassa Cassandra Python client library
>>
>
>
>
> --
> Name / Nom: Boulay Desjardins, Jean-Nicolas
> Website / Site Web: www.jeannicolas.com
>



-- 
Tyler Hobbs
Software Engineer, DataStax
Maintainer of the pycassa Cassandra Python client library


Re: 8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Kyle Ambroff
0.8.1 has been added to the 08x distribution, but not unstable.

See:
http://www.apache.org/dist/cassandra/debian/dists/unstable/main/binary-amd64/Packages

vs

http://www.apache.org/dist/cassandra/debian/dists/08x/main/binary-amd64/Packages

Change your apt sources to use 08x instead of unstable.

On Tue, Jun 28, 2011 at 5:55 PM, Oleg Tsvinev  wrote:
> Nope, only see 0.8.0. I updated sources in Synaptic Package manager, which
> does the same as apt-get update.
>
> On Tue, Jun 28, 2011 at 5:44 PM, Dan Kuebrich 
> wrote:
>>
>> Try running   apt-get update   (as opposed to upgrade) to pull down the
>> latest listings from the repo.
>>
>> On Tue, Jun 28, 2011 at 8:40 PM, Oleg Tsvinev 
>> wrote:
>>>
>>> Thank you Dan! But I only see 0.8.0 there :(
>>>
>>> On Tue, Jun 28, 2011 at 5:35 PM, Dan Kuebrich 
>>> wrote:

 0.8.1 should be up--I've already installed it.  Here's
 directions: http://wiki.apache.org/cassandra/DebianPackaging

 On Tue, Jun 28, 2011 at 8:24 PM, Oleg Tsvinev 
 wrote:
>
> Hi,
> First of all, thank you for releasing v8.0.1 and congrats! the list of
> fixes and improvements is impressive.
> Is there any ETA for Debian package? Is there a (standard) way to build
> it from sources?
> Thank you,
>   Oleg
>>>
>>
>
>


Re: 8.0.1 Released - Debian Package ETA?

2011-06-28 Thread Oleg Tsvinev
Yes, I figured that out. Thank you for your help!

On Tue, Jun 28, 2011 at 8:31 PM, Kyle Ambroff  wrote:

> 0.8.1 has been added to the 08x distribution, but not unstable.
>
> See:
>
> http://www.apache.org/dist/cassandra/debian/dists/unstable/main/binary-amd64/Packages
>
> vs
>
>
> http://www.apache.org/dist/cassandra/debian/dists/08x/main/binary-amd64/Packages
>
> Change your apt sources to use 08x instead of unstable.
>
> On Tue, Jun 28, 2011 at 5:55 PM, Oleg Tsvinev 
> wrote:
> > Nope, only see 0.8.0. I updated sources in Synaptic Package manager,
> which
> > does the same as apt-get update.
> >
> > On Tue, Jun 28, 2011 at 5:44 PM, Dan Kuebrich 
> > wrote:
> >>
> >> Try running   apt-get update   (as opposed to upgrade) to pull down the
> >> latest listings from the repo.
> >>
> >> On Tue, Jun 28, 2011 at 8:40 PM, Oleg Tsvinev 
> >> wrote:
> >>>
> >>> Thank you Dan! But I only see 0.8.0 there :(
> >>>
> >>> On Tue, Jun 28, 2011 at 5:35 PM, Dan Kuebrich 
> >>> wrote:
> 
>  0.8.1 should be up--I've already installed it.  Here's
>  directions: http://wiki.apache.org/cassandra/DebianPackaging
> 
>  On Tue, Jun 28, 2011 at 8:24 PM, Oleg Tsvinev  >
>  wrote:
> >
> > Hi,
> > First of all, thank you for releasing v8.0.1 and congrats! the list
> of
> > fixes and improvements is impressive.
> > Is there any ETA for Debian package? Is there a (standard) way to
> build
> > it from sources?
> > Thank you,
> >   Oleg
> >>>
> >>
> >
> >
>


Re: custom reconciling columns?

2011-06-28 Thread Yang
btw I use only one box now just because I'm running it on dev junit test,
not that it's going to be that way in production

On Tue, Jun 28, 2011 at 10:06 PM, Yang  wrote:

> ok, here is the profiling result. I think this is consistent (having been
> trying to recover how to effectively use yourkit ...)  see attached picture
>
> since I actually do not use the thrift interface, but just directly use the
> thrift.CassandraServer and run my code in the same JVM as cassandra,
> and was running the whole thing on a single box, there is no message
> serialization/deserialization cost. but more columns did add on to more
> time.
>
> the time was spent in the ConcurrentSkipListMap operations that implement
> the memtable.
>
>
> regarding breaking up the row, I'm not sure it would reduce my run time,
> since our requirement is to read the entire rolling window history (we
> already have
> the TTL enabled , so the history is limited to a certain length, but it is
> quite long: over 1000 , in some  cases, can be 5000 or more ) .  I think
> accessing roughly 1000 items is not an uncommon requirement for many
> applications. in our case, each column has about 30 bytes of data, besides
> the meta data such as ttl, timestamp.
> at history length of 3000, the read takes about 12ms (remember this is
> completely in-memory, no disk access)
>
> I just took a look at the expiring column logic, it looks that the
> expiration does not come into play until when the
> CassandraServer.internal_get()===>thriftifyColumns() gets called. so the
> above memtable access time is still spent. yes, then breaking up the row is
> going to be helpful, but only to the degree of preventing accessing
> expired columns (btw  if this is actually built into cassandra code it
> would be nicer, so instead of spending multiple key lookups, I locate to the
> row once, and then within the row, there are different "generation" buckets,
> so those old generation buckets that are beyond expiration are not read );
> currently just accessing the 3000 live columns is already quite slow.
>
> I'm trying to see whether there are some easy magic bullets for a drop-in
> replacement for concurrentSkipListMap...
>
> Yang
>
>
>
>
> On Tue, Jun 28, 2011 at 4:18 PM, Nate McCall  wrote:
>
>> I agree with Aaron's suggestion on data model and query here. Since
>> there is a time component, you can split the row on a fixed duration
>> for a given user, so the row key would become userId_[timestamp
>> rounded to day].
>>
>> This provides you an easy way to roll up the information for the date
>> ranges you need since the key suffix can be created without a read.
>> This also benefits from spreading the read load over the cluster
>> instead of just the replicas since you have 30 rows in this case
>> instead of one.
>>
>> On Tue, Jun 28, 2011 at 5:55 PM, aaron morton 
>> wrote:
>> > Can you provide some more info:
>> > - how big are the rows, e.g. number of columns and column size  ?
>> > - how much data are you asking for ?
>> > - what sort of read query are you using ?
>> > - what sort of numbers are you seeing ?
>> > - are you deleting columns or using TTL ?
>> > I would consider issues with the data churn, data model and query before
>> > looking at serialisation.
>> > Cheers
>> > -
>> > Aaron Morton
>> > Freelance Cassandra Developer
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> > On 29 Jun 2011, at 10:37, Yang wrote:
>> >
>> > I can see that as my user history grows, the reads time proportionally (
>> or
>> > faster than linear) grows.
>> > if my business requirements ask me to keep a month's history for each
>> user,
>> > it could become too slow.- I was suspecting that it's actually the
>> > serializing and deserializing that's taking time (I can definitely it's
>> cpu
>> > bound)
>> >
>> >
>> > On Tue, Jun 28, 2011 at 3:04 PM, aaron morton 
>> > wrote:
>> >>
>> >> There is no facility to do custom reconciliation for a column. An
>> append
>> >> style operation would run into many of the same problems as the Counter
>> >> type, e.g. not every node may get an append and there is a chance for
>> lost
>> >> appends unless you go to all the trouble Counter's do.
>> >>
>> >> I would go with using a row for the user and columns for each item.
>> Then
>> >> you can have fast no look writes.
>> >>
>> >> What problems are you seeing with the reads ?
>> >>
>> >> Cheers
>> >>
>> >>
>> >> -
>> >> Aaron Morton
>> >> Freelance Cassandra Developer
>> >> @aaronmorton
>> >> http://www.thelastpickle.com
>> >>
>> >> On 29 Jun 2011, at 04:20, Yang wrote:
>> >>
>> >> > for example, if I have an application that needs to read off a user
>> >> > browsing history, and I model the user ID as the key,
>> >> > and the history data within the row. with current approach, I could
>> >> > model each visit as  a column,
>> >> > the possible issue is that *possibly* (I'm still doing a lot of
>> >> > profiling on this to verify) that a lot of time i

Re: Sharing Cassandra with Solandra

2011-06-28 Thread AJ

On 6/27/2011 3:39 PM, David Strauss wrote:

On Mon, 2011-06-27 at 15:06 -0600, AJ wrote:

Would anyone care to talk about their experiences with using Solandra
along side another application that uses Cassandra (also on the same
node)?  I'm curious about any resource contention issues or
compatibility between C* versions and Sol.  Also, I read the developer
somewhere say that you have to run Solandra on every C* node in the
ring.  I'm not sure if I interpreted that correctly.  Also, what's the
index size to data size ratio to expect (ballpark)?  How does it
perform?  Any caveats?

We're currently keeping the clusters separate at Pantheon Systems
because our core API (which runs on standard Cassandra) is often ready
for the next Cassandra version at a different time than Solandra.
Solandra recently gained dual 0.7/0.8 support, but we're still opting to
use the version on Cassandra that Solandra is primarily being built and
tested on (which is currently 0.8).


Thanks.  But, I'm finally cluing in that Solandra is also developed by 
DataStax, so I feel safer about future compatibility.