Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-30 Thread Philippe
I currently have
scf[c1][sc1]=value
scf[c1][sc2]=value
...
scf[c2][sc1]=value
scf[c2][sc2]=value
scf[c2][sc3]=value
scf[c2][sc4]=value

99% of the time, I do multiget super slices: for multiple keys, I query for
columns explicitly c1,c2,c10,c12
1% of the time, I do a multigetrange superslice where for multiple keys, I
query for a range of super columns
As Tyler said, it can be done by specifying supercolumns in the slice
predicate, it will implicitly return all its columns. I use Hector and it
works great.

Now interestingly enough, column names sc1, sc2, sc3 are in fact home-made
composite columns.

I could and would switch to full composite columns because I am fishing for
every drop of performance I can. However, I would need "Letting
multiget_slice accept multiple SlicePredicates per key could also
accomplish this."
Can anyone on the dev team comment on doing this ? Is it a no-no ?

Thanks

2011/12/29 Edward Capriolo 

> Hum...
>
> Do you have this?
> scf [b][1][a]=value
> scf [b][1][x]=value
> scf [b][7][b]=value
>
> and you want to slice:
> scf [b][1][*]
>
> Which would result in
>
> scf [b][1][a]=value
> scf [b][1][x]=value
>
> ?
>
> The composite version of this would be:
> cf [b][1:a]=value
> cf [b][1:x]=value
> cf [b][7:b]=value
>
> I am not sure exactly what you are doing because A SlicePredicate
> takes either a list of columns or a SliceRange. A ColumnPath takes a
> Single SuperColumn.
>
> I do not see how this is done with Columns or SuperColumns. Maybe you
> can provide a code snippet and/or some sample data?
>
> On 12/29/11, Aditya  wrote:
> > @Edward: Perhaps you missed to notice that I need to always retrieve 'all
> > columns' under the supercolumn at any time.. and as per my query
> > requirements if I use composite columns instead of supercolumns then it
> is
> > impossible to do wildcard queries like the ones asked in this thread's
> > headline but which is much easier to do through the use of supercolumns.
> >
> > On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo
> > wrote:
> >
> >> The use case in question was: Only accessing some columns.
> >>
> >> Even if that is not the case:
> >>
> >> SuperColumns: 1 extra level of nesting
> >> Composite Colunns: Arbitrary levels of nesting
> >>
> >> SuperColumns: More overhead (space on disk) then using your own
> delimiter
> >> '_'
> >> SuperColumns: Likely going to be replaced in future c* version behind
> >> the scenes by composite columns anyway
> >> SuperColumns: Usually an afterthought for API developers, (support for
> >> them comes "later")
> >> SuperColumns: Almost always utilized incorrectly by users, users speak
> >> of '10%' performance gains after they switch away from them.
> >>
> >> There are some (a small % of cases) where SuperColumns are a better
> >> choice, but this is rare. With composites and concatenating columns
> >> they have no great purpose any more, (bad analogy coming!) like a
> >> mechanical type writer.
> >>
> >> On 12/29/11, Philippe  wrote:
> >> > Would you stand by that statement in case all colums inside the super
> >> > column need to be read?  Why?
> >> >
> >> > Thanks
> >> > Le 28 déc. 2011 19:26, "Edward Capriolo"  a
> >> écrit :
> >> >
> >> >> Super columns have the same fundamental problem and perform worse in
> >> >> general. So switching from composites to super columns is NEVER a
> good
> >> >> idea.
> >> >>
> >> >>
> >> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
> >> >>
> >> >>> Since I have around 20 items to query, I guess making 20 queries to
> >> >>> retrieve activities by all followies on all of those 20 columns
> would
> >> too
> >> >>> inefficient, so to take the advantage of more efficient queries, are
> >> >>> supercolumns recommended for this case ? Anyways, in case I use
> >> >>> supercolumns, I need to retrieve the entire supercolumn at any point
> >> >>> of
> >> >>> time & I am writing subcolumn(s) to the supercolumn at different
> times
> >> >>> not
> >> >>> at once.
> >> >>>
> >> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
> >> >>> wrote:
> >> >>>
> >>  You need to execute one get slice operation for each item id or if
> >>  the
> >>  row is not large , you can try one large get slice on the entire
> row
> >> and
> >>  deal with the results client side.
> >> 
> >>  If you try method 1 When doing slices on composites you can set the
> >>  start inclusive or exclusive values to get only the column you want
> >> and
> >>  not
> >>  some extra columns up to slice range size.
> >> 
> >> 
> >>  On Tuesday, December 27, 2011, Aditya  wrote:
> >>  > I need to store data of all activities by user's followies in
> >>  > single
> >>  row. I am trying to do that making use of composite column names
> in a
> >>  single user specific row named 'rowX'.
> >>  > On any activity by a user's followie on an item, a column is
> stored
> >> in
> >>  'rowX'. The column has a composite type column name made up of
> >> 

Dealing with "Corrupt (negative) value length encountered"

2011-12-30 Thread Philippe
Hello,
Running a combination of 0.8.6 and 0.8.8 with RF=3, I am getting the
following while repairing one node (all other nodes completed successfully).
Can I just stop the instance, erase the SSTable and restart cleanup ?
Thanks

ERROR [Thread-402484] 2011-12-29 14:51:03,687 AbstractCassandraDaemon.java
(line 139) Fatal exception in thread Thread[Thread-402484,5,main]
java.lang.RuntimeException: java.util.concurrent.ExecutionException:
java.io.IOError: java.io.IOException: Corrupt (negative) value length
encountered
at
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:154)
at
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
at
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:189)
at
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:117)
Caused by: java.util.concurrent.ExecutionException: java.io.IOError:
java.io.IOException: Corrupt (negative) value length encountered
at
java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:138)
... 3 more


CLI exception :: A long is exactly 8 bytes: 1

2011-12-30 Thread Sasha Dolgy
Hi Everyone,

Been a while .. without any problems.  Thanks for grinding out a good
product!  On 1.0.6, I applied an update to a column family to add a
secondary index, and now via the CLI, when I perform a "get user where
something=1" I receive the following result:

org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8 bytes: 1

This behaviour doesn't seem to be affecting phpcassa or hector
retrieving the results of that query ... is this a silly something
i've done, or something a bit more buggy with the CLI?

Thanks in advance,
-sd

--
Sasha Dolgy
sasha.do...@gmail.com


Re: CLI exception :: A long is exactly 8 bytes: 1

2011-12-30 Thread Moshiur Rahman
I think you need to mention data type in your command. You have to run the
following command first:
assume <*CFName*> keys as <*TypeName*, i.e., utf8>

Otherwise, you need to mention type with each command, e.g.,
utf8('keyname').
http://wiki.apache.org/cassandra/CassandraCli

Moshiur


On Fri, Dec 30, 2011 at 10:50 AM, Sasha Dolgy  wrote:

> Hi Everyone,
>
> Been a while .. without any problems.  Thanks for grinding out a good
> product!  On 1.0.6, I applied an update to a column family to add a
> secondary index, and now via the CLI, when I perform a "get user where
> something=1" I receive the following result:
>
> org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8
> bytes: 1
>
> This behaviour doesn't seem to be affecting phpcassa or hector
> retrieving the results of that query ... is this a silly something
> i've done, or something a bit more buggy with the CLI?
>
> Thanks in advance,
> -sd
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>


Re: CLI exception :: A long is exactly 8 bytes: 1

2011-12-30 Thread Sasha Dolgy
as per the wiki link you sent, i change my query to:

get user where something = '1';

Still throws the error ... This was fine *before* I ran the update CF
command ..

To Query Data
get User where age = '12';

On Fri, Dec 30, 2011 at 6:05 PM, Moshiur Rahman  wrote:
> I think you need to mention data type in your command. You have to run the
> following command first:
> assume  keys as 
>
> Otherwise, you need to mention type with each command, e.g.,
> utf8('keyname').
> http://wiki.apache.org/cassandra/CassandraCli
>
> Moshiur
>
>
>
> On Fri, Dec 30, 2011 at 10:50 AM, Sasha Dolgy  wrote:
>>
>> Hi Everyone,
>>
>> Been a while .. without any problems.  Thanks for grinding out a good
>> product!  On 1.0.6, I applied an update to a column family to add a
>> secondary index, and now via the CLI, when I perform a "get user where
>> something=1" I receive the following result:
>>
>> org.apache.cassandra.db.marshal.MarshalException: A long is exactly 8
>> bytes: 1
>>
>> This behaviour doesn't seem to be affecting phpcassa or hector
>> retrieving the results of that query ... is this a silly something
>> i've done, or something a bit more buggy with the CLI?
>>
>> Thanks in advance,
>> -sd


rename column family

2011-12-30 Thread Jim Newsham


How can I rename a column family (if version matters, I'm interested in 
both 0.8.x and 1.0.x).


Thanks,
Jim



Cassandra performance question

2011-12-30 Thread Dom Wong
Hi, could anyone tell me whether this is possible with Cassandra using an
appropriately sized EC2 cluster.

100,000 clients writing 50k each to their own specific row at 5 second
intervals?


Re: Cassandra performance question

2011-12-30 Thread Jeremy Hanna
This might be helpful: 
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

On Dec 30, 2011, at 1:59 PM, Dom Wong wrote:

> Hi, could anyone tell me whether this is possible with Cassandra using an 
> appropriately sized EC2 cluster.
> 
> 100,000 clients writing 50k each to their own specific row at 5 second 
> intervals?



Re: Cassandra performance question

2011-12-30 Thread Chris Marino
We did some benchmarking as well.

http://blog.vcider.com/2011/09/virtual-networks-can-run-cassandra-up-to-60-faster/


Although we were primarily interested in the networking issues

CM

On Fri, Dec 30, 2011 at 12:08 PM, Jeremy Hanna
wrote:

> This might be helpful:
> http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
>
> On Dec 30, 2011, at 1:59 PM, Dom Wong wrote:
>
> > Hi, could anyone tell me whether this is possible with Cassandra using
> an appropriately sized EC2 cluster.
> >
> > 100,000 clients writing 50k each to their own specific row at 5 second
> intervals?
>
>