Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Philippe
Would you stand by that statement in case all colums inside the super
column need to be read?  Why?

Thanks
Le 28 déc. 2011 19:26, "Edward Capriolo"  a écrit :

> Super columns have the same fundamental problem and perform worse in
> general. So switching from composites to super columns is NEVER a good idea.
>
>
> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
>
>> Since I have around 20 items to query, I guess making 20 queries to
>> retrieve activities by all followies on all of those 20 columns would too
>> inefficient, so to take the advantage of more efficient queries, are
>> supercolumns recommended for this case ? Anyways, in case I use
>> supercolumns, I need to retrieve the entire supercolumn at any point of
>> time & I am writing subcolumn(s) to the supercolumn at different times not
>> at once.
>>
>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo 
>> wrote:
>>
>>> You need to execute one get slice operation for each item id or if the
>>> row is not large , you can try one large get slice on the entire row and
>>> deal with the results client side.
>>>
>>> If you try method 1 When doing slices on composites you can set the
>>> start inclusive or exclusive values to get only the column you want and not
>>> some extra columns up to slice range size.
>>>
>>>
>>> On Tuesday, December 27, 2011, Aditya  wrote:
>>> > I need to store data of all activities by user's followies in single
>>> row. I am trying to do that making use of composite column names in a
>>> single user specific row named 'rowX'.
>>> > On any activity by a user's followie on an item, a column is stored in
>>> 'rowX'. The column has a composite type column name made up of
>>> itemId+userId (which makes it unique col. name) in rowX. (& column value
>>> contains the activity data related to that item by that followie)
>>> >
>>> > Now I want to retrieve activity by all users on a list of items. So I
>>> need to retrieve all composite columns with composite's first component
>>> matching the itemId. Is it possible to do such a query to Cassandra ? I am
>>> using Hector.
>>>
>>
>>
>


Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Aditya
@Edward: Perhaps you missed to notice that I need to always retrieve 'all
columns' under the supercolumn at any time.. and as per my query
requirements if I use composite columns instead of supercolumns then it is
impossible to do wildcard queries like the ones asked in this thread's
headline but which is much easier to do through the use of supercolumns.

On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo wrote:

> The use case in question was: Only accessing some columns.
>
> Even if that is not the case:
>
> SuperColumns: 1 extra level of nesting
> Composite Colunns: Arbitrary levels of nesting
>
> SuperColumns: More overhead (space on disk) then using your own delimiter
> '_'
> SuperColumns: Likely going to be replaced in future c* version behind
> the scenes by composite columns anyway
> SuperColumns: Usually an afterthought for API developers, (support for
> them comes "later")
> SuperColumns: Almost always utilized incorrectly by users, users speak
> of '10%' performance gains after they switch away from them.
>
> There are some (a small % of cases) where SuperColumns are a better
> choice, but this is rare. With composites and concatenating columns
> they have no great purpose any more, (bad analogy coming!) like a
> mechanical type writer.
>
> On 12/29/11, Philippe  wrote:
> > Would you stand by that statement in case all colums inside the super
> > column need to be read?  Why?
> >
> > Thanks
> > Le 28 déc. 2011 19:26, "Edward Capriolo"  a
> écrit :
> >
> >> Super columns have the same fundamental problem and perform worse in
> >> general. So switching from composites to super columns is NEVER a good
> >> idea.
> >>
> >>
> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
> >>
> >>> Since I have around 20 items to query, I guess making 20 queries to
> >>> retrieve activities by all followies on all of those 20 columns would
> too
> >>> inefficient, so to take the advantage of more efficient queries, are
> >>> supercolumns recommended for this case ? Anyways, in case I use
> >>> supercolumns, I need to retrieve the entire supercolumn at any point of
> >>> time & I am writing subcolumn(s) to the supercolumn at different times
> >>> not
> >>> at once.
> >>>
> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
> >>> wrote:
> >>>
>  You need to execute one get slice operation for each item id or if the
>  row is not large , you can try one large get slice on the entire row
> and
>  deal with the results client side.
> 
>  If you try method 1 When doing slices on composites you can set the
>  start inclusive or exclusive values to get only the column you want
> and
>  not
>  some extra columns up to slice range size.
> 
> 
>  On Tuesday, December 27, 2011, Aditya  wrote:
>  > I need to store data of all activities by user's followies in single
>  row. I am trying to do that making use of composite column names in a
>  single user specific row named 'rowX'.
>  > On any activity by a user's followie on an item, a column is stored
> in
>  'rowX'. The column has a composite type column name made up of
>  itemId+userId (which makes it unique col. name) in rowX. (& column
> value
>  contains the activity data related to that item by that followie)
>  >
>  > Now I want to retrieve activity by all users on a list of items. So
> I
>  need to retrieve all composite columns with composite's first
> component
>  matching the itemId. Is it possible to do such a query to Cassandra ?
> I
>  am
>  using Hector.
> 
> >>>
> >>>
> >>
> >
>


Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Aditya
Also point worth noticing is that there might be at max 8-10  subcolumns
per supercolumn.
I need to write a subcolumn at a time( but always read entire supercolumn
at any time).

On Fri, Dec 30, 2011 at 12:20 AM, Aditya  wrote:

> @Edward: Perhaps you missed to notice that I need to always retrieve 'all
> columns' under the supercolumn at any time.. and as per my query
> requirements if I use composite columns instead of supercolumns then it is
> impossible to do wildcard queries like the ones asked in this thread's
> headline but which is much easier to do through the use of supercolumns.
>
>
> On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo 
> wrote:
>
>> The use case in question was: Only accessing some columns.
>>
>> Even if that is not the case:
>>
>> SuperColumns: 1 extra level of nesting
>> Composite Colunns: Arbitrary levels of nesting
>>
>> SuperColumns: More overhead (space on disk) then using your own delimiter
>> '_'
>> SuperColumns: Likely going to be replaced in future c* version behind
>> the scenes by composite columns anyway
>> SuperColumns: Usually an afterthought for API developers, (support for
>> them comes "later")
>> SuperColumns: Almost always utilized incorrectly by users, users speak
>> of '10%' performance gains after they switch away from them.
>>
>> There are some (a small % of cases) where SuperColumns are a better
>> choice, but this is rare. With composites and concatenating columns
>> they have no great purpose any more, (bad analogy coming!) like a
>> mechanical type writer.
>>
>> On 12/29/11, Philippe  wrote:
>> > Would you stand by that statement in case all colums inside the super
>> > column need to be read?  Why?
>> >
>> > Thanks
>> > Le 28 déc. 2011 19:26, "Edward Capriolo"  a
>> écrit :
>> >
>> >> Super columns have the same fundamental problem and perform worse in
>> >> general. So switching from composites to super columns is NEVER a good
>> >> idea.
>> >>
>> >>
>> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
>> >>
>> >>> Since I have around 20 items to query, I guess making 20 queries to
>> >>> retrieve activities by all followies on all of those 20 columns would
>> too
>> >>> inefficient, so to take the advantage of more efficient queries, are
>> >>> supercolumns recommended for this case ? Anyways, in case I use
>> >>> supercolumns, I need to retrieve the entire supercolumn at any point
>> of
>> >>> time & I am writing subcolumn(s) to the supercolumn at different times
>> >>> not
>> >>> at once.
>> >>>
>> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
>> >>> wrote:
>> >>>
>>  You need to execute one get slice operation for each item id or if
>> the
>>  row is not large , you can try one large get slice on the entire row
>> and
>>  deal with the results client side.
>> 
>>  If you try method 1 When doing slices on composites you can set the
>>  start inclusive or exclusive values to get only the column you want
>> and
>>  not
>>  some extra columns up to slice range size.
>> 
>> 
>>  On Tuesday, December 27, 2011, Aditya  wrote:
>>  > I need to store data of all activities by user's followies in
>> single
>>  row. I am trying to do that making use of composite column names in a
>>  single user specific row named 'rowX'.
>>  > On any activity by a user's followie on an item, a column is
>> stored in
>>  'rowX'. The column has a composite type column name made up of
>>  itemId+userId (which makes it unique col. name) in rowX. (& column
>> value
>>  contains the activity data related to that item by that followie)
>>  >
>>  > Now I want to retrieve activity by all users on a list of items.
>> So I
>>  need to retrieve all composite columns with composite's first
>> component
>>  matching the itemId. Is it possible to do such a query to Cassandra
>> ? I
>>  am
>>  using Hector.
>> 
>> >>>
>> >>>
>> >>
>> >
>>
>
>


Re: column family names

2011-12-29 Thread Edward Capriolo
Use the source :)

[edward@ec cas-trunk]$ grep regex ./*
./build.xml:  
./build.xml:
./build.xml:
./CHANGES.txt:   matches a '^\w+' regex. (CASSANDRA-1377)
./NEWS.txt:   to the '^\w+' regex convention.
./NEWS.txt: - Keyspace and column family names that do not confirm to
a '^\w+' regex

 * disallow invalid keyspace and column family names. This includes name that
   matches a '^\w+' regex. (CASSANDRA-1377)

https://issues.apache.org/jira/browse/CASSANDRA-1377



On 12/29/11, Scott Lewis  wrote:
> I've noticed when creating column families that the name of the column
> family apparently has some restrictions...e.g. the presence of a '.'
> character in the column family name seems to throw an exception.  Is
> there anywhere articulated the restrictions on column family names (and
> keyspace names...if there are any such restrictions).  If so, where?
>
> Thanksinadvance,
>
> Scott
>
>
>


Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Edward Capriolo
The use case in question was: Only accessing some columns.

Even if that is not the case:

SuperColumns: 1 extra level of nesting
Composite Colunns: Arbitrary levels of nesting

SuperColumns: More overhead (space on disk) then using your own delimiter '_'
SuperColumns: Likely going to be replaced in future c* version behind
the scenes by composite columns anyway
SuperColumns: Usually an afterthought for API developers, (support for
them comes "later")
SuperColumns: Almost always utilized incorrectly by users, users speak
of '10%' performance gains after they switch away from them.

There are some (a small % of cases) where SuperColumns are a better
choice, but this is rare. With composites and concatenating columns
they have no great purpose any more, (bad analogy coming!) like a
mechanical type writer.

On 12/29/11, Philippe  wrote:
> Would you stand by that statement in case all colums inside the super
> column need to be read?  Why?
>
> Thanks
> Le 28 déc. 2011 19:26, "Edward Capriolo"  a écrit :
>
>> Super columns have the same fundamental problem and perform worse in
>> general. So switching from composites to super columns is NEVER a good
>> idea.
>>
>>
>> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
>>
>>> Since I have around 20 items to query, I guess making 20 queries to
>>> retrieve activities by all followies on all of those 20 columns would too
>>> inefficient, so to take the advantage of more efficient queries, are
>>> supercolumns recommended for this case ? Anyways, in case I use
>>> supercolumns, I need to retrieve the entire supercolumn at any point of
>>> time & I am writing subcolumn(s) to the supercolumn at different times
>>> not
>>> at once.
>>>
>>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
>>> wrote:
>>>
 You need to execute one get slice operation for each item id or if the
 row is not large , you can try one large get slice on the entire row and
 deal with the results client side.

 If you try method 1 When doing slices on composites you can set the
 start inclusive or exclusive values to get only the column you want and
 not
 some extra columns up to slice range size.


 On Tuesday, December 27, 2011, Aditya  wrote:
 > I need to store data of all activities by user's followies in single
 row. I am trying to do that making use of composite column names in a
 single user specific row named 'rowX'.
 > On any activity by a user's followie on an item, a column is stored in
 'rowX'. The column has a composite type column name made up of
 itemId+userId (which makes it unique col. name) in rowX. (& column value
 contains the activity data related to that item by that followie)
 >
 > Now I want to retrieve activity by all users on a list of items. So I
 need to retrieve all composite columns with composite's first component
 matching the itemId. Is it possible to do such a query to Cassandra ? I
 am
 using Hector.

>>>
>>>
>>
>


Async Thift batch mutate operation

2011-12-29 Thread Mayank Mishra
Hi,

I am working on Cassandra 1.0.2 and thrift 0.6.1. I was trying to use Async
thrift libary to perform batch mutate operation on Cassandra. I tried
Cassandra server with both Sync and Async RPC server mode.

While the insert operation works like charm, batch mutate fails to write
even a single row on Cassandra. I do get a call back for operation
completion too but nothing comes up in Cassandra SStables, commit logs.

The only promising I can see is the increase in the memory heap usage of
Cassandra.
I didn't found anything helpful on StorageService jmx options, manual
flushing the node, commit logs, tried changing memtablethreadhold too, but
not help.

Am I missing something? I am quite sure people must be using it.

With regards,
Mayank


Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Edward Capriolo
Hum...

Do you have this?
scf [b][1][a]=value
scf [b][1][x]=value
scf [b][7][b]=value

and you want to slice:
scf [b][1][*]

Which would result in

scf [b][1][a]=value
scf [b][1][x]=value

?

The composite version of this would be:
cf [b][1:a]=value
cf [b][1:x]=value
cf [b][7:b]=value

I am not sure exactly what you are doing because A SlicePredicate
takes either a list of columns or a SliceRange. A ColumnPath takes a
Single SuperColumn.

I do not see how this is done with Columns or SuperColumns. Maybe you
can provide a code snippet and/or some sample data?

On 12/29/11, Aditya  wrote:
> @Edward: Perhaps you missed to notice that I need to always retrieve 'all
> columns' under the supercolumn at any time.. and as per my query
> requirements if I use composite columns instead of supercolumns then it is
> impossible to do wildcard queries like the ones asked in this thread's
> headline but which is much easier to do through the use of supercolumns.
>
> On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo
> wrote:
>
>> The use case in question was: Only accessing some columns.
>>
>> Even if that is not the case:
>>
>> SuperColumns: 1 extra level of nesting
>> Composite Colunns: Arbitrary levels of nesting
>>
>> SuperColumns: More overhead (space on disk) then using your own delimiter
>> '_'
>> SuperColumns: Likely going to be replaced in future c* version behind
>> the scenes by composite columns anyway
>> SuperColumns: Usually an afterthought for API developers, (support for
>> them comes "later")
>> SuperColumns: Almost always utilized incorrectly by users, users speak
>> of '10%' performance gains after they switch away from them.
>>
>> There are some (a small % of cases) where SuperColumns are a better
>> choice, but this is rare. With composites and concatenating columns
>> they have no great purpose any more, (bad analogy coming!) like a
>> mechanical type writer.
>>
>> On 12/29/11, Philippe  wrote:
>> > Would you stand by that statement in case all colums inside the super
>> > column need to be read?  Why?
>> >
>> > Thanks
>> > Le 28 déc. 2011 19:26, "Edward Capriolo"  a
>> écrit :
>> >
>> >> Super columns have the same fundamental problem and perform worse in
>> >> general. So switching from composites to super columns is NEVER a good
>> >> idea.
>> >>
>> >>
>> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
>> >>
>> >>> Since I have around 20 items to query, I guess making 20 queries to
>> >>> retrieve activities by all followies on all of those 20 columns would
>> too
>> >>> inefficient, so to take the advantage of more efficient queries, are
>> >>> supercolumns recommended for this case ? Anyways, in case I use
>> >>> supercolumns, I need to retrieve the entire supercolumn at any point
>> >>> of
>> >>> time & I am writing subcolumn(s) to the supercolumn at different times
>> >>> not
>> >>> at once.
>> >>>
>> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
>> >>> wrote:
>> >>>
>>  You need to execute one get slice operation for each item id or if
>>  the
>>  row is not large , you can try one large get slice on the entire row
>> and
>>  deal with the results client side.
>> 
>>  If you try method 1 When doing slices on composites you can set the
>>  start inclusive or exclusive values to get only the column you want
>> and
>>  not
>>  some extra columns up to slice range size.
>> 
>> 
>>  On Tuesday, December 27, 2011, Aditya  wrote:
>>  > I need to store data of all activities by user's followies in
>>  > single
>>  row. I am trying to do that making use of composite column names in a
>>  single user specific row named 'rowX'.
>>  > On any activity by a user's followie on an item, a column is stored
>> in
>>  'rowX'. The column has a composite type column name made up of
>>  itemId+userId (which makes it unique col. name) in rowX. (& column
>> value
>>  contains the activity data related to that item by that followie)
>>  >
>>  > Now I want to retrieve activity by all users on a list of items. So
>> I
>>  need to retrieve all composite columns with composite's first
>> component
>>  matching the itemId. Is it possible to do such a query to Cassandra ?
>> I
>>  am
>>  using Hector.
>> 
>> >>>
>> >>>
>> >>
>> >
>>
>


Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Aditya
On Fri, Dec 30, 2011 at 1:42 AM, Edward Capriolo wrote:

> Hum...
>
> Do you have this?
> scf [b][1][a]=value
> scf [b][1][x]=value
> scf [b][7][b]=value
>
> and you want to slice:
> scf [b][1][*]
>
> Which would result in
>
> scf [b][1][a]=value
> scf [b][1][x]=value
>
> ?
>

Exactly I have this!
And as for the queries, I want to retrieve columns (satisfying from a list
of wildcard names) , something like below :

scf [b][1][*]
scf [b][7][*]

Now this type of queries are not possible with composite columns but it is
very easily achievable through use of supercolumns, i can simply query for
a list of  supercolumns(with entire subcolumns) by name. Right?

So this is easier in terms of designing a query but since I don't
understand much about the internals and all, I am not sure if this is best
option for me, though by looking at my retrieval needs I feel somewhat
biased towards using supercolumns.

>
> The composite version of this would be:
> cf [b][1:a]=value
> cf [b][1:x]=value
> cf [b][7:b]=value
>
> I am not sure exactly what you are doing because A SlicePredicate
> takes either a list of columns or a SliceRange. A ColumnPath takes a
> Single SuperColumn.
>
> I do not see how this is done with Columns or SuperColumns. Maybe you
> can provide a code snippet and/or some sample data?
>
> On 12/29/11, Aditya  wrote:
> > @Edward: Perhaps you missed to notice that I need to always retrieve 'all
> > columns' under the supercolumn at any time.. and as per my query
> > requirements if I use composite columns instead of supercolumns then it
> is
> > impossible to do wildcard queries like the ones asked in this thread's
> > headline but which is much easier to do through the use of supercolumns.
> >
> > On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo
> > wrote:
> >
> >> The use case in question was: Only accessing some columns.
> >>
> >> Even if that is not the case:
> >>
> >> SuperColumns: 1 extra level of nesting
> >> Composite Colunns: Arbitrary levels of nesting
> >>
> >> SuperColumns: More overhead (space on disk) then using your own
> delimiter
> >> '_'
> >> SuperColumns: Likely going to be replaced in future c* version behind
> >> the scenes by composite columns anyway
> >> SuperColumns: Usually an afterthought for API developers, (support for
> >> them comes "later")
> >> SuperColumns: Almost always utilized incorrectly by users, users speak
> >> of '10%' performance gains after they switch away from them.
> >>
> >> There are some (a small % of cases) where SuperColumns are a better
> >> choice, but this is rare. With composites and concatenating columns
> >> they have no great purpose any more, (bad analogy coming!) like a
> >> mechanical type writer.
> >>
> >> On 12/29/11, Philippe  wrote:
> >> > Would you stand by that statement in case all colums inside the super
> >> > column need to be read?  Why?
> >> >
> >> > Thanks
> >> > Le 28 déc. 2011 19:26, "Edward Capriolo"  a
> >> écrit :
> >> >
> >> >> Super columns have the same fundamental problem and perform worse in
> >> >> general. So switching from composites to super columns is NEVER a
> good
> >> >> idea.
> >> >>
> >> >>
> >> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
> >> >>
> >> >>> Since I have around 20 items to query, I guess making 20 queries to
> >> >>> retrieve activities by all followies on all of those 20 columns
> would
> >> too
> >> >>> inefficient, so to take the advantage of more efficient queries, are
> >> >>> supercolumns recommended for this case ? Anyways, in case I use
> >> >>> supercolumns, I need to retrieve the entire supercolumn at any point
> >> >>> of
> >> >>> time & I am writing subcolumn(s) to the supercolumn at different
> times
> >> >>> not
> >> >>> at once.
> >> >>>
> >> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
> >> >>> wrote:
> >> >>>
> >>  You need to execute one get slice operation for each item id or if
> >>  the
> >>  row is not large , you can try one large get slice on the entire
> row
> >> and
> >>  deal with the results client side.
> >> 
> >>  If you try method 1 When doing slices on composites you can set the
> >>  start inclusive or exclusive values to get only the column you want
> >> and
> >>  not
> >>  some extra columns up to slice range size.
> >> 
> >> 
> >>  On Tuesday, December 27, 2011, Aditya  wrote:
> >>  > I need to store data of all activities by user's followies in
> >>  > single
> >>  row. I am trying to do that making use of composite column names
> in a
> >>  single user specific row named 'rowX'.
> >>  > On any activity by a user's followie on an item, a column is
> stored
> >> in
> >>  'rowX'. The column has a composite type column name made up of
> >>  itemId+userId (which makes it unique col. name) in rowX. (& column
> >> value
> >>  contains the activity data related to that item by that followie)
> >>  >
> >>  > Now I want to retrieve activity by all users on a list of items.

Re: column family names

2011-12-29 Thread Scott Lewis

Hi Edward,

Thanks...although it looks from CASSANDRA-1377 comments that perhaps the 
reserved characters in CF and keyspace names haven't been decided yet?  
If it hasn't been decided yet I would suggest making it as relaxed as 
possible (allowing '.' in addition to '-' and perhaps others...if 
possible)...and then having some minimal docs on the restrictions...in 
addition to the source...just so that users can easily comply with the 
restrictions.  I'll agree to help with the docs if things are decided.


Thanks,

Scott


On 12/29/2011 9:41 AM, Edward Capriolo wrote:

Use the source :)

[edward@ec cas-trunk]$ grep regex ./*
./build.xml:
./build.xml:
./build.xml:
./CHANGES.txt:   matches a '^\w+' regex. (CASSANDRA-1377)
./NEWS.txt:   to the '^\w+' regex convention.
./NEWS.txt: - Keyspace and column family names that do not confirm to
a '^\w+' regex

  * disallow invalid keyspace and column family names. This includes name that
matches a '^\w+' regex. (CASSANDRA-1377)

https://issues.apache.org/jira/browse/CASSANDRA-1377



On 12/29/11, Scott Lewis  wrote:

I've noticed when creating column families that the name of the column
family apparently has some restrictions...e.g. the presence of a '.'
character in the column family name seems to throw an exception.  Is
there anywhere articulated the restrictions on column family names (and
keyspace names...if there are any such restrictions).  If so, where?

Thanksinadvance,

Scott







Re: column family names

2011-12-29 Thread Edward Capriolo
On 12/29/11, Scott Lewis  wrote:
> Hi Edward,
>
> Thanks...although it looks from CASSANDRA-1377 comments that perhaps the
> reserved characters in CF and keyspace names haven't been decided yet?
> If it hasn't been decided yet I would suggest making it as relaxed as
> possible (allowing '.' in addition to '-' and perhaps others...if
> possible)...and then having some minimal docs on the restrictions...in
> addition to the source...just so that users can easily comply with the
> restrictions.  I'll agree to help with the docs if things are decided.
>
> Thanks,
>
> Scott
>
>
> On 12/29/2011 9:41 AM, Edward Capriolo wrote:
>> Use the source :)
>>
>> [edward@ec cas-trunk]$ grep regex ./*
>> ./build.xml:
>> ./build.xml:
>> ./build.xml:> />
>> ./CHANGES.txt:   matches a '^\w+' regex. (CASSANDRA-1377)
>> ./NEWS.txt:   to the '^\w+' regex convention.
>> ./NEWS.txt:  - Keyspace and column family names that do not confirm to
>> a '^\w+' regex
>>
>>   * disallow invalid keyspace and column family names. This includes name
>> that
>> matches a '^\w+' regex. (CASSANDRA-1377)
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-1377
>>
>>
>>
>> On 12/29/11, Scott Lewis  wrote:
>>> I've noticed when creating column families that the name of the column
>>> family apparently has some restrictions...e.g. the presence of a '.'
>>> character in the column family name seems to throw an exception.  Is
>>> there anywhere articulated the restrictions on column family names (and
>>> keyspace names...if there are any such restrictions).  If so, where?
>>>
>>> Thanksinadvance,
>>>
>>> Scott
>>>
>>>
>>>
>
>


Re: column family names

2011-12-29 Thread Edward Capriolo
I never use '.' or '-' in anything. It tends to get object mapping,
code generation libraries, and interpreters upset. I just use a-z and
lower case and know that no one can take that away from me
(hopefully).

On 12/29/11, Edward Capriolo  wrote:
> On 12/29/11, Scott Lewis  wrote:
>> Hi Edward,
>>
>> Thanks...although it looks from CASSANDRA-1377 comments that perhaps the
>> reserved characters in CF and keyspace names haven't been decided yet?
>> If it hasn't been decided yet I would suggest making it as relaxed as
>> possible (allowing '.' in addition to '-' and perhaps others...if
>> possible)...and then having some minimal docs on the restrictions...in
>> addition to the source...just so that users can easily comply with the
>> restrictions.  I'll agree to help with the docs if things are decided.
>>
>> Thanks,
>>
>> Scott
>>
>>
>> On 12/29/2011 9:41 AM, Edward Capriolo wrote:
>>> Use the source :)
>>>
>>> [edward@ec cas-trunk]$ grep regex ./*
>>> ./build.xml:
>>> ./build.xml:
>>> ./build.xml:>> handledirsep="yes"
>>> />
>>> ./CHANGES.txt:   matches a '^\w+' regex. (CASSANDRA-1377)
>>> ./NEWS.txt:   to the '^\w+' regex convention.
>>> ./NEWS.txt: - Keyspace and column family names that do not confirm to
>>> a '^\w+' regex
>>>
>>>   * disallow invalid keyspace and column family names. This includes
>>> name
>>> that
>>> matches a '^\w+' regex. (CASSANDRA-1377)
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-1377
>>>
>>>
>>>
>>> On 12/29/11, Scott Lewis  wrote:
 I've noticed when creating column families that the name of the column
 family apparently has some restrictions...e.g. the presence of a '.'
 character in the column family name seems to throw an exception.  Is
 there anywhere articulated the restrictions on column family names (and
 keyspace names...if there are any such restrictions).  If so, where?

 Thanksinadvance,

 Scott



>>
>>
>


Re: column family names

2011-12-29 Thread Scott Lewis

Hi Edward,

On 12/29/2011 12:51 PM, Edward Capriolo wrote:

I never use '.' or '-' in anything. It tends to get object mapping,
code generation libraries, and interpreters upset. I just use a-z and
lower case and know that no one can take that away from me
(hopefully).


I don't necessarily disagree with these personal conventions, but on 
things like unique names it's my observation that others have other 
approaches...so in general I think it's desirable to be as weak on these 
naming restrictions as possible at the lower layers.


Scott




Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Edward Capriolo
vi ./src/java/org/apache/cassandra/db/marshal/CompositeType.java

'end-of-component' byte should always be 0 for actual column name.
 * However, it can set to 1 for query bounds. This allows to query for the
 * equivalent of 'give me the full super-column'. That is, if during a slice
 * query uses:
 *   start = <3><"foo".getBytes()><0>
 *   end   = <3><"foo".getBytes()><1>

So with composites columns you can do:

scf [b][1][*]

by setting the start and end component.

But you can not do
scf [b][1][*]
scf [b][7][*]
in a single operation with composites.

You seen to say you can query for a list of supercolumns, I am not
sure how this works because the ColumnParent seems to only accept a
single SuperColumn, but if you can do it I am not calling you a liar.

Maybe this is a good case for 'server side scanners'. Ow man I know
jbellis read this and put my face up on a dart board.




On 12/29/11, Aditya  wrote:
> On Fri, Dec 30, 2011 at 1:42 AM, Edward Capriolo
> wrote:
>
>> Hum...
>>
>> Do you have this?
>> scf [b][1][a]=value
>> scf [b][1][x]=value
>> scf [b][7][b]=value
>>
>> and you want to slice:
>> scf [b][1][*]
>>
>> Which would result in
>>
>> scf [b][1][a]=value
>> scf [b][1][x]=value
>>
>> ?
>>
>
> Exactly I have this!
> And as for the queries, I want to retrieve columns (satisfying from a list
> of wildcard names) , something like below :
>
> scf [b][1][*]
> scf [b][7][*]
>
> Now this type of queries are not possible with composite columns but it is
> very easily achievable through use of supercolumns, i can simply query for
> a list of  supercolumns(with entire subcolumns) by name. Right?
>
> So this is easier in terms of designing a query but since I don't
> understand much about the internals and all, I am not sure if this is best
> option for me, though by looking at my retrieval needs I feel somewhat
> biased towards using supercolumns.
>
>>
>> The composite version of this would be:
>> cf [b][1:a]=value
>> cf [b][1:x]=value
>> cf [b][7:b]=value
>>
>> I am not sure exactly what you are doing because A SlicePredicate
>> takes either a list of columns or a SliceRange. A ColumnPath takes a
>> Single SuperColumn.
>>
>> I do not see how this is done with Columns or SuperColumns. Maybe you
>> can provide a code snippet and/or some sample data?
>>
>> On 12/29/11, Aditya  wrote:
>> > @Edward: Perhaps you missed to notice that I need to always retrieve
>> > 'all
>> > columns' under the supercolumn at any time.. and as per my query
>> > requirements if I use composite columns instead of supercolumns then it
>> is
>> > impossible to do wildcard queries like the ones asked in this thread's
>> > headline but which is much easier to do through the use of supercolumns.
>> >
>> > On Thu, Dec 29, 2011 at 11:06 PM, Edward Capriolo
>> > wrote:
>> >
>> >> The use case in question was: Only accessing some columns.
>> >>
>> >> Even if that is not the case:
>> >>
>> >> SuperColumns: 1 extra level of nesting
>> >> Composite Colunns: Arbitrary levels of nesting
>> >>
>> >> SuperColumns: More overhead (space on disk) then using your own
>> delimiter
>> >> '_'
>> >> SuperColumns: Likely going to be replaced in future c* version behind
>> >> the scenes by composite columns anyway
>> >> SuperColumns: Usually an afterthought for API developers, (support for
>> >> them comes "later")
>> >> SuperColumns: Almost always utilized incorrectly by users, users speak
>> >> of '10%' performance gains after they switch away from them.
>> >>
>> >> There are some (a small % of cases) where SuperColumns are a better
>> >> choice, but this is rare. With composites and concatenating columns
>> >> they have no great purpose any more, (bad analogy coming!) like a
>> >> mechanical type writer.
>> >>
>> >> On 12/29/11, Philippe  wrote:
>> >> > Would you stand by that statement in case all colums inside the super
>> >> > column need to be read?  Why?
>> >> >
>> >> > Thanks
>> >> > Le 28 déc. 2011 19:26, "Edward Capriolo"  a
>> >> écrit :
>> >> >
>> >> >> Super columns have the same fundamental problem and perform worse in
>> >> >> general. So switching from composites to super columns is NEVER a
>> good
>> >> >> idea.
>> >> >>
>> >> >>
>> >> >> On Wed, Dec 28, 2011 at 1:19 PM, Aditya  wrote:
>> >> >>
>> >> >>> Since I have around 20 items to query, I guess making 20 queries to
>> >> >>> retrieve activities by all followies on all of those 20 columns
>> would
>> >> too
>> >> >>> inefficient, so to take the advantage of more efficient queries,
>> >> >>> are
>> >> >>> supercolumns recommended for this case ? Anyways, in case I use
>> >> >>> supercolumns, I need to retrieve the entire supercolumn at any
>> >> >>> point
>> >> >>> of
>> >> >>> time & I am writing subcolumn(s) to the supercolumn at different
>> times
>> >> >>> not
>> >> >>> at once.
>> >> >>>
>> >> >>> On Wed, Dec 28, 2011 at 8:07 PM, Edward Capriolo
>> >> >>> wrote:
>> >> >>>
>> >>  You need to execute one get slice operation for each item id or if
>> >>  the
>> >>  row is

Re: Retrieve all composite columns from a row, whose composite name's first component matches from a list of Integers

2011-12-29 Thread Tyler Hobbs
On Thu, Dec 29, 2011 at 3:13 PM, Edward Capriolo wrote:

>
> You seen to say you can query for a list of supercolumns, I am not
> sure how this works because the ColumnParent seems to only accept a
> single SuperColumn, but if you can do it I am not calling you a liar.
>

If you don't specify a super column ColumnParent, then
SlicePredicate.columns are assumed to be super column names.


>
> Maybe this is a good case for 'server side scanners'. Ow man I know
> jbellis read this and put my face up on a dart board.


Letting multiget_slice accept multiple SlicePredicates per key could also
accomplish this.

-- 
Tyler Hobbs
DataStax 


Re: Consistency Level

2011-12-29 Thread Kamal Bahadur
Thanks for the response Peter! I checked everything and it look good to me.

I am stuck with this for almost 2 days now. Has anyone had this issue?

Thanks,
Kamal

On Wed, Dec 28, 2011 at 2:05 PM, Kamal Bahadur wrote:

> Hi All,
>
> My Cassandra cluster has 4 nodes with a RF of 2. I am trying to verify if
> my data gets replicated to 2 nodes with the write consistency level of ONE.
> All the tests that I have done so far tells me that the data is not getting
> replicated for some reason.
>
> I executed the getendpoints command to find out the node where my record
> lives and tried to keep those two nodes running and brought down other
> nodes down. When I tried to read the record using hector, I am getting this
> exception "May not be enough replicas present to handle consistency level"
>
> I tried to read data using cassandra-cli but I am getting "null".
>
> I ran a manual repair command, but still getting the same exception. I
> noticed that as soon as the number of active nodes becomes less than 3 I
> get this exception.
>
> With consistency level ONE, I would assume that with just one node up and
> running (of course the one that has the data) I should get my data back.
> But this is not happening.
>
> Will the read repair happen automatically even if I read and write using
> the consistency level ONE?
>
> Any help will be much appreciated.
>
> Thanks,
> Kamal
>
> Environment details:
>
> Cluster: *4 nodes*
> RF: *2*
> Hector: *1.0-1*
> Cassandra: *0.8.6*
> Read CL: *ONE*
> Write CL: *ONE*
>
> Output of describe keyspace:
>
> Keyspace: MyKF:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
> Options: [replication_factor:2]
>   Column Families:
> ColumnFamily: MyCF
>   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>   Default column value validator:
> org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   Row cache size / save period in seconds: 0.0/0
>   Key cache size / save period in seconds: 20.0/14400
>   Memtable thresholds: 0.58124999/1440/124 (millions of
> ops/minutes/MB)
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 1.0
>   Replicate on write: true
>   Built indexes: [MyCF.MyCF_appId_idx, MyCF.MyCF_bcId_idx]
>   Column Metadata:
> Column Name: .appId
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Index Name: MyCF_appId_idx
>   Index Type: KEYS
> Column Name: .bcId
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Index Name: MyCF_bcId_idx
>   Index Type: KEYS
>
> My hector code:
>
> public class MyHectorTemplate extends HectorTemplateImpl { public void
> init() { CassandraHostConfigurator cassandraHostConfigurator = new
> CassandraHostConfigurator( servers); cassandraHostConfigurator .
> setLoadBalancingPolicy(new LeastActiveBalancingPolicy()); ThriftCluster
> cluster = new ThriftCluster( configuration.getString(
> "cassandra_cluster_name"), cassandraHostConfigurator); setCluster(cluster
> ); setKeyspaceName(configuration.getString("cassandra_keyspace_name"));
> ConfigurableConsistencyLevel configurableConsistencyLevelPolicy = new
> ConfigurableConsistencyLevel(); configurableConsistencyLevelPolicy .
> setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);
> setConfigurableConsistencyLevelPolicy(configurableConsistencyLevelPolicy);
> super.init(); logger.info("--->" +
> getConfigurableConsistencyLevelPolicy().get( OperationType.READ).toString
> ()); } } Here is the log on one of the nodes: DEBUG [pool-2-thread-2] 2011
> -12-28 09:33:47,631 ClientState.java (line 87) logged in: # groups=[]> DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,651 CassandraServer
> .java (line 670) scan DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,665
> StorageProxy.java (line 889) restricted ranges for query [-1,-1] are [[-1,
> 0], (0,42535295865117307932921825928971026432], (
> 42535295865117307932921825928971026432,
> 85070591730234615865843651857942052864], (
> 85070591730234615865843651857942052864,
> 127605887595351923798765477786913079296], (
> 127605887595351923798765477786913079296,-1]] DEBUG [pool-2-thread-2] 2011-
> 12-28 09:33:47,666 StorageProxy.java (line 976) scan ranges are [-1,0],(0,
> 42535295865117307932921825928971026432],(
> 42535295865117307932921825928971026432,
> 85070591730234615865843651857942052864],(
> 85070591730234615865843651857942052864,
> 127605887595351923798765477786913079296],(
> 127605887595351923798765477786913079296,-1] DEBUG [pool-2-thread-2] 2011-
> 12-28 09:33:47,679 ReadCallback.java (line 76) Blockfor/repair is 1/false;
> setting up requests to DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,679
> ReadCallback.java (line 203) Live nodes do not satisfy ConsistencyLevel (1
> required)
>
>
>


Re: Consistency Level

2011-12-29 Thread Peter Schuller
> Thanks for the response Peter! I checked everything and it look good to me.
>
> I am stuck with this for almost 2 days now. Has anyone had this issue?

While it is certainly possible that you're running into a bug, it
seems unlikely to me since it is the kind of bug that would affect
almost anyone if it is failing with Unavailable due to unrelated (not
in replica sets) nodes being down.

Can you please post back with (1) the ring layout ('nodetool ring'),
and (2) the exact row key that you're testing with?

You might also want to run with DEBUG level (modify
log4j-server.properties at the top) and the strategy (assuming you are
using NetworkTopologyStrategy) will log selected endpoints, and
confirm that it's indeed picking endpoints that you think it should
based on getendpoints.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Newbie question about writer/reader consistency

2011-12-29 Thread Jeremiah Jordan
So you can do this with Cassandra, but you need more logic in your code.  
Basically, you get the last safe number, M, then get N..M, if there are any 
gaps, you try again reading those numbers.  As long as you are not over writing 
data, and you only update the last safe number after a successful write to 
Cassandra, you can do this.  We currently do something very similar to this for 
some of our data.

-Jeremiah


On Dec 26, 2011, at 12:38 PM, Vladimir Mosgalin wrote:

> Hello everybody.
> 
> I am developer of financial-related application, and I'm currently evaluating
> various nosql databases for our current goal: storing various views which show
> state of the system in different aspects after each transaction.
> 
> The write load seems to be bigger than typical SQL database would handle
> without problems - under test load of tens of transactions per second, each
> transaction generates changes in dozen of views, which generates hundreds
> messages per second total. Each message ("change") for each view must be
> stored, as well as resulting view (generated as kind-of update of old view); 
> it
> means multiple inserts & updates per message which go as single transaction. I
> started to look into nosql databases. I'm a bit puzzled by guarantees of
> atomicity and isolation that Cassandra provides, so my question will be about
> how to (if possible at all) attain required level of consistency in Cassandra.
> I've read various documents and introductions into Cassandra's data model but
> still can't understands basics about data consistency.  This discussion
> http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-n
> makes me feel disappointed about consistency in Cassandra, but I wonder is
> there is a way to work around it.
> 
> The requirements are like this. There is one writer, which modifies two
> "tables" (I'm sorry for using "SQL" terms, I just don't want to create
> more confusion for mapping them into Cassandra terms at this stage). For
> the first table, it's a simple insert; index is unique SCN which is
> guaranteed to be larger than previous one.
> 
> Let's say it inserts
> SCN DATA
> 1   AAA
> 2   BBB
> 3   CCC
> 
> The goal for the client (reader) is to get all the data from scn N to scn M
> without gaps. It is fine if it can't see the very latest SCN yet, that is, 
> gets
> "1:AAA" and "2:BBB" on request "SCN: 1..END"; what is NOT fine is to get
> something "1:AAA" and "3:CCC". In other words, does Cassandra provide
> consistency between writer and reader regarding the order of changes? Or under
> some conditions (say, very fast writes - but always from single writer - and
> many concurrent reads or something) it might be possible to get that kind of 
> gap?
> 
> The second question is similar, but on bigger scale. The second table must be
> modified in more complicated way; both insert and update of old data are
> required. Sometimes it's few insert and few updates, which must be done
> atomically - under no conditions reader should be able to see the mid-state of
> these inserts/updates. Fortunately, all these new changes will have a new key
> (new SCNs), so if it would be just possible to use a column in separate table
> which stores "last safe SCN" it would work - but I have no faith that 
> Cassandra
> offers such level of consistency. In example, let's say it works like this
> 
> current last safe SCN: 1000
> 
> update (must be viewed as an atomic "transaction"):
> SCN   DATA
> 1001  AAA
> 1002  BBB
> 800   1001
> 1003  DDD
> 
> new last safe SCN: 1003
> 
> Here, readers need a mean to filter out lines with SCN>1000 until the writer 
> is
> done writing "1003:DDD" line. They also need to filter out "800:1001" line
> because it references SCN which is after current "last safe" one.
> 
> "last safe SCN" is stored somewhere, and for this pattern to work I once again
> need "execution order" consistency - no reader should ever see "last safe:
> 1003" line before all the previous lines were commited; and any reader who saw
> "last safe: 1003" line must be able to see all the lines from that update just
> like they are right now.
> 
> Is this possible to do in Cassandra?
>