RE: unique key generation

2011-02-09 Thread Brendan Poole

Are you sure about those odds? Winning the UK national lottery has a chance of 
13 983 816 to 1 so for just 2 days the odds are
 
13 983 816^2 = 1.9554711 x 10^14
 


 Brendan Poole
 Systems Developer
 NewLaw Solicitors
 Helmont House 
 Churchill Way
 Cardiff
 brendan.po...@new-law.co.uk
 029 2078 4283
 www.new-law.co.uk





From: Kallin Nagelberg [mailto:kallin.nagelb...@gmail.com] 
Sent: 08 February 2011 03:38
To: user@cassandra.apache.org
Subject: Re: unique key generation



Pretty sure it also uses mac address, so chances are very slim. I'll check out 
time uuid too, thanks.

On 7 Feb 2011 17:11, "Victor Kabdebon"  
wrote:

Hello Kallin.
If you use timeUUID the chance to generate two time the same uuid is 
the following :
considering that both client generate the uuid at the same millisecond, 
the chance of generating the same uuid is :



1/1.84467441 × 1019

Which is equal to the probability for winning a national lottery for 
1e11 days in a row ( for 270 million years).
Well if you do have a collision you should play the lottery :).

Best regards,
Victor Kabdebon
http://www.voxnucleus.fr


2011/2/7 Kallin Nagelberg  


>
> Hey,
>
> I am developing a session management system using Cassandra and need
> to generate uni...

Please consider the environment before printing this e-mail

Important - The information contained in this email (and any attached files) is 
confidential and may be legally privileged and protected by law.  

The intended recipient is authorised to access it.  If you are not the intended 
recipient, please notify the sender immediately and delete or destroy all 
copies. You must not disclose the 
contents of this email to anyone. Unauthorised use, dissemination, 
distribution, publication or copying of this communication is prohibited. 

NewLaw Solicitors does not accept any liability for any inaccuracies or 
omissions in the contents of this email that may have arisen as a result of 
transmission.  This message and any 
attachments are believed to be free of any virus or defect that might affect 
any computer system into which it is received and opened.  However,it is the 
responsibility of the recipient to 
ensure that it is virus free; therefore, no responsibility is accepted for any 
loss or damage in any way arising from its use. 

NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company 
registered in England and Wales with registered number 07200038.  
NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose 
website is http://www.sra.org.uk 

The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, 
Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: 
i...@new-law.co.uk. www.new-law.co.uk.  

We use the word ‘partner’ to refer to a shareowner or director of the company, 
or an employee or consultant of the company who is a lawyer with equivalent 
standing and qualifications. A list 
of the directors is displayed at the above address, together with a list of 
those persons who are designated as partners. <>

Re: How do secondary indices work

2011-02-09 Thread Timo Nentwig

On Feb 8, 2011, at 21:23, Aaron Morton wrote:

>>> 1) Is data stored in some external data structure, or is it stored in an
>>> actual Cassandra table, as columns within column families?

Yes. Own files next to the CF files and own node IndexColumnFamilies in JMX.

And they are built asynchronously.


Re: regarding space taken by different column families in Cassandra

2011-02-09 Thread abhinav prakash rai
After 1 hour ,from the application was done, the size of data folder become
14 GB and the result of cfstats is matching with this number (and Space used
(live) become equal to Space used (total) ).

CF1-Space used (live)  :7196278850
   Space used (total): 7196278850
CF2-   Space used (live)  :2458866899

   Space used (total): 2458866899

CF3-   Space used (live)  :2871096369

   Space used (total) :2967445550

CF4-   Space used (live)  :1536044466
   Space used (total) :1536044466

After the application was done what kind operation was going on in Cassandra
and how much space it would require ?

regards,
abhinav






On Wed, Feb 9, 2011 at 12:46 PM, abhinav prakash rai
wrote:

> I am using 4 column family in my application , the result of cfstats for
> space taken by different CF are as below-
>
> CF1-Space used (live)  :7196159547
>Space used (total): 14214373706
> CF2-   Space used (live)  :2456495851
>
>Space used (total): 9065746112
>
> CF3-   Space used (live)  :2864007861
>
>Space used (total) :6114084611
>
> CF4-   Space used (live)  :1531088094
>Space used (total) :3433016989
>
> where as I can see the total size of data directory is 17GB which is not
> equal to ALL Space used (total) by above 4 column families. If I assume Space
> used (total) is in byte the sum is coming to about 32 GB which is not the
> space taken by data_file_directories.
>
> Some one can help to know how much space is used by each CF's ?
>
> I am using replication_factor= 1.
>
> Regards,
> abhinav
>
>


-- 
Regards,
Abhinav P. Rai


Re: How do secondary indices work

2011-02-09 Thread altanis
Thank you for the reply, although I didn't quite understand you. All I got
was that Index data is stored in some kind of external data structure.

Alexander

>
> On Feb 8, 2011, at 21:23, Aaron Morton wrote:
>
 1) Is data stored in some external data structure, or is it stored in
 an
 actual Cassandra table, as columns within column families?
>
> Yes. Own files next to the CF files and own node IndexColumnFamilies in
> JMX.
>
> And they are built asynchronously.
>
>


Re: How do secondary indices work

2011-02-09 Thread altanis
Thank you for the links, I did read a bit in the comments of the ticket,
but I couldn't get much out of it.

I am mainly interested in how the index is stored and partitioned, not how
it is used. I think the people in the dev list will probably be better
qualified to answer that. My questions always seem to get moved to the
user list, and usually with good cause, but I think this time it should be
in the dev list :) Please move it back, if you can.

Alexander

> AFAIK this was the ticket the original work was done under 
> https://issues.apache.org/jira/browse/CASSANDRA-1415
>
> also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
> and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
>
> (sorry on reflection the email prob did not need to be moved from dev, my
> bad)
> Aaron
>
> On 09 Feb, 2011,at 09:16 AM, Aaron Morton  wrote:
>
> Moving to the user group.
>
>
>
> On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
>
> Hello,
>
> I'd like some information about how secondary indices work under the hood.
>
> 1) Is data stored in some external data structure, or is it stored in an
> actual Cassandra table, as columns within column families?
> 2) Is data stored sorted or not? How is it partitioned?
> 3) How can I access index data?
>
> Thanks in a advance,
>
> Alexander Altanis
>


Implemeting a LRU in Cassandra

2011-02-09 Thread Utku Can Topçu
Hi All,

I'm sure people here have tried to solve similar questions.
Say I'm tracking pages, I want to access the least recently used 1000 unique
pages (i.e. columnnames). How can I achieve this?

Using a row with say, ttl=60 seconds would solve the problem of accessing
the least recently used unique pages in the last minute.

Thanks for any comments and helps.

Regards,
Utku


Re: How do secondary indices work

2011-02-09 Thread Stu Hood
Alexander:

The secondary indexes in 0.7.0 (type KEYS) are stored internally in a column
family, and are kept synchronized with the base data via locking on a local
node, meaning they are always consistent on the local node. Eventual
consistency still applies between nodes, but a returned result will always
match your query.

This index column family stores a mapping from index values to a sorted list
of matching row keys. When you query for rows between x and y matching a
value z (via the get_indexed_slices call), Cassandra performs a lookup to
the index column family for the slice of columns in row z between x and y.
If any matches are found in the index, they are row keys that match the
index clause, and we query the base data to return you those rows.

Iterating through all of the rows matching an index clause on your cluster
is guaranteed to touch N/RF of the nodes in your cluster, because each node
only knows about data that is indexed locally.

Some portions of the indexing implementation are not fully baked yet: for
instance, although the API allows you to specify multiple columns, only one
index will actually be used per query, and the rest of the clauses will be
brute forced.

A second secondary index implementation has been on the back burner for a
while: it provides an identical API, but does not use a column family to
store the index, and should be more efficient for append only data. See
https://issues.apache.org/jira/browse/CASSANDRA-1472

Thanks,
Stu

On Wed, Feb 9, 2011 at 2:35 AM,  wrote:

> Thank you for the links, I did read a bit in the comments of the ticket,
> but I couldn't get much out of it.
>
> I am mainly interested in how the index is stored and partitioned, not how
> it is used. I think the people in the dev list will probably be better
> qualified to answer that. My questions always seem to get moved to the
> user list, and usually with good cause, but I think this time it should be
> in the dev list :) Please move it back, if you can.
>
> Alexander
>
> > AFAIK this was the ticket the original work was done under
> > https://issues.apache.org/jira/browse/CASSANDRA-1415
> >
> > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
> >
> > (sorry on reflection the email prob did not need to be moved from dev, my
> > bad)
> > Aaron
> >
> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton 
> wrote:
> >
> > Moving to the user group.
> >
> >
> >
> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
> >
> > Hello,
> >
> > I'd like some information about how secondary indices work under the
> hood.
> >
> > 1) Is data stored in some external data structure, or is it stored in an
> > actual Cassandra table, as columns within column families?
> > 2) Is data stored sorted or not? How is it partitioned?
> > 3) How can I access index data?
> >
> > Thanks in a advance,
> >
> > Alexander Altanis
> >
>


Anyone want to help out with http://wiki.apache.org/cassandra/MavenPlugin

2011-02-09 Thread Stephen Connolly
Until the release vote passes at mojo, you will need to do the
following to follow the example:

svn co https://svn.codehaus.org/mojo/trunk/sandbox/cassandra-maven-plugin
cd cassandra-maven-plugin
mvn install
cd ..

Otherwise the example should be fine.

It's a wiki page, so I'm hoping that people can make the example a bit
better... specifically some hector people might be able to put in
actual example code for accessing cassandra from the index.jsp.

-Stephen


Re: How do secondary indices work

2011-02-09 Thread altanis
Thank you very much, this is the information I was looking for. I started
adding secondary index functionality to Cassandra myself, and it turns out
I am doing almost exactly the same thing. I will try to change my code to
use your implementation as well to compare results.

Alexander

> Alexander:
>
> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a
> column
> family, and are kept synchronized with the base data via locking on a
> local
> node, meaning they are always consistent on the local node. Eventual
> consistency still applies between nodes, but a returned result will always
> match your query.
>
> This index column family stores a mapping from index values to a sorted
> list
> of matching row keys. When you query for rows between x and y matching a
> value z (via the get_indexed_slices call), Cassandra performs a lookup to
> the index column family for the slice of columns in row z between x and y.
> If any matches are found in the index, they are row keys that match the
> index clause, and we query the base data to return you those rows.
>
> Iterating through all of the rows matching an index clause on your cluster
> is guaranteed to touch N/RF of the nodes in your cluster, because each
> node
> only knows about data that is indexed locally.
>
> Some portions of the indexing implementation are not fully baked yet: for
> instance, although the API allows you to specify multiple columns, only
> one
> index will actually be used per query, and the rest of the clauses will be
> brute forced.
>
> A second secondary index implementation has been on the back burner for a
> while: it provides an identical API, but does not use a column family to
> store the index, and should be more efficient for append only data. See
> https://issues.apache.org/jira/browse/CASSANDRA-1472
>
> Thanks,
> Stu
>
> On Wed, Feb 9, 2011 at 2:35 AM,  wrote:
>
>> Thank you for the links, I did read a bit in the comments of the ticket,
>> but I couldn't get much out of it.
>>
>> I am mainly interested in how the index is stored and partitioned, not
>> how
>> it is used. I think the people in the dev list will probably be better
>> qualified to answer that. My questions always seem to get moved to the
>> user list, and usually with good cause, but I think this time it should
>> be
>> in the dev list :) Please move it back, if you can.
>>
>> Alexander
>>
>> > AFAIK this was the ticket the original work was done under
>> > https://issues.apache.org/jira/browse/CASSANDRA-1415
>> >
>> > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
>> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
>> >
>> > (sorry on reflection the email prob did not need to be moved from dev,
>> my
>> > bad)
>> > Aaron
>> >
>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton 
>> wrote:
>> >
>> > Moving to the user group.
>> >
>> >
>> >
>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
>> >
>> > Hello,
>> >
>> > I'd like some information about how secondary indices work under the
>> hood.
>> >
>> > 1) Is data stored in some external data structure, or is it stored in
>> an
>> > actual Cassandra table, as columns within column families?
>> > 2) Is data stored sorted or not? How is it partitioned?
>> > 3) How can I access index data?
>> >
>> > Thanks in a advance,
>> >
>> > Alexander Altanis
>> >
>>
>



Re: unique key generation

2011-02-09 Thread Victor Kabdebon
Yes i have done a mistake I know ! But I hoped nobody would notice :).

It is the odds of winning 3 days in a row (standard probability fail). Still
it is totally unlikely

Sorry about this mistake,

Best regards,
Victor K.


Re: ApplicationState Schema has drifted from DatabaseDescriptor

2011-02-09 Thread Gary Dusbabek
Aaron,

It looks like you're experiencing a side-effect of CASSANDRA-2083.
There was at least one place (when node B received updated schema from
node A) where gossip was not being updated with the correct schema
even though DatabaseDescriptor had the right version.  I'm pretty sure
this is what you're seeing.

Gary.


On Wed, Feb 9, 2011 at 00:08, Aaron Morton  wrote:
> I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest
> stable 0.7 build "2011-02-08_20-41-25" (upgraded  node was jb-cass1 below).
> This is a long email, you can jump to the end and help me out by checking
> something on your  07 cluster.
> This is the value from o.a.c.gms.FailureDetector.AllEndpointStates on
> jb-cass05 9114.67)
> /192.168.114.63   X3:2011-02-08_20-41-25
> SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455   LOAD:2.84182972E8
> STATUS:NORMAL,0
> /192.168.114.64   SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455
> LOAD:2.84354156E8   STATUS:NORMAL,34028236692093846346337460743176821145
> /192.168.114.66   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
> LOAD:2.59171601E8   STATUS:NORMAL,102084710076281539039012382229530463435
> /192.168.114.65   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
> LOAD:2.70907168E8   STATUS:NORMAL,68056473384187692692674921486353642290
> jb08.wetafx.co.nz/192.168.114.67
> SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455   LOAD:1.155260665E9
> STATUS:NORMAL,136112946768375385385349842972707284580
> Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and 67
> it starts with 075.
> This is the output from pycassa calling describe_versions when connected to
> both the 63 (jb-cass1) and 67 (jb-cass5) nodes
> In [34]: sys.describe_schema_versions()
> Out[34]:
> {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',
>                                           '192.168.114.64',
>                                           '192.168.114.65',
>                                           '192.168.114.66',
>                                           '192.168.114.67']}
> It's reporting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is
> getting the value from DatabaseDescriptor. FailureDetector MBean is getting
> them from Gossiper.endpointStateMap . Requests are working though, so the
> CFid's must be matching up.
> Commit https://github.com/apache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4 added
> code to the 0.7 branch in the HintedHandOffManager to check the schema
> versions of nodes it has hints for. This is now failing on the new node as
> follows...
> ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java
> (line
> org.apache.cassandra.service.AbstractCassandraDaemon$1.uncaughtException(AbstractCassandraDaemon.java:114))
> Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Could not reach
> schema agreement with /192.168.114.64 in 6ms
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.RuntimeException: Could not reach schema agreement with
> /192.168114.64 in 6ms
>         at
> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256)
>         at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
>         at
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>         at
> org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> (the nodes can all see each other, checked with notetool during the 60
> seconds)
> If I restart one of the nodes with the 075 schema (without upgrading it) it
> reads the schema from the system tables and goes back to the 2f55 schema.
> e.g. the 64 node was also on the 075 schema, I restarted and it moved to the
> 2f55 and logged appropriately. While writing this email I checked again with
> the 65 node, and the schema if was reporting to other nodes changed after a
> restart from 075 to 2f55
> INFO [main] 2011-02-09 17:17:11,457 DatabaseDescriptor.java (line
> org.apache.cassandra.config.DatabaseDescriptor) Loading schema version
> 2f555eb0-3332-11e0-9e8d-c4f8bbf76455
> I've been reading the code for migrations and gossip don't have a theory as
> to what is going on.
>
> REQUEST FOR HELP:
> If you have a 0.7 cluster can you please check if this has happened so I can
> know this is a real problem or just an Aaron problem. You can check by...
> - getting the values from the o.a.c.gms.FailureDetector.AllEndPointStates
> - running describe_schema_versions via the API, her

Re: How do secondary indices work

2011-02-09 Thread altanis
One more question: does each node keep an index of their own values, or is
the index global?

Alexander

> Thank you very much, this is the information I was looking for. I started
> adding secondary index functionality to Cassandra myself, and it turns out
> I am doing almost exactly the same thing. I will try to change my code to
> use your implementation as well to compare results.
>
> Alexander
>
>> Alexander:
>>
>> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a
>> column
>> family, and are kept synchronized with the base data via locking on a
>> local
>> node, meaning they are always consistent on the local node. Eventual
>> consistency still applies between nodes, but a returned result will
>> always
>> match your query.
>>
>> This index column family stores a mapping from index values to a sorted
>> list
>> of matching row keys. When you query for rows between x and y matching a
>> value z (via the get_indexed_slices call), Cassandra performs a lookup
>> to
>> the index column family for the slice of columns in row z between x and
>> y.
>> If any matches are found in the index, they are row keys that match the
>> index clause, and we query the base data to return you those rows.
>>
>> Iterating through all of the rows matching an index clause on your
>> cluster
>> is guaranteed to touch N/RF of the nodes in your cluster, because each
>> node
>> only knows about data that is indexed locally.
>>
>> Some portions of the indexing implementation are not fully baked yet:
>> for
>> instance, although the API allows you to specify multiple columns, only
>> one
>> index will actually be used per query, and the rest of the clauses will
>> be
>> brute forced.
>>
>> A second secondary index implementation has been on the back burner for
>> a
>> while: it provides an identical API, but does not use a column family to
>> store the index, and should be more efficient for append only data. See
>> https://issues.apache.org/jira/browse/CASSANDRA-1472
>>
>> Thanks,
>> Stu
>>
>> On Wed, Feb 9, 2011 at 2:35 AM,  wrote:
>>
>>> Thank you for the links, I did read a bit in the comments of the
>>> ticket,
>>> but I couldn't get much out of it.
>>>
>>> I am mainly interested in how the index is stored and partitioned, not
>>> how
>>> it is used. I think the people in the dev list will probably be better
>>> qualified to answer that. My questions always seem to get moved to the
>>> user list, and usually with good cause, but I think this time it should
>>> be
>>> in the dev list :) Please move it back, if you can.
>>>
>>> Alexander
>>>
>>> > AFAIK this was the ticket the original work was done under
>>> > https://issues.apache.org/jira/browse/CASSANDRA-1415
>>> >
>>> > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
>>> > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
>>> >
>>> > (sorry on reflection the email prob did not need to be moved from
>>> dev,
>>> my
>>> > bad)
>>> > Aaron
>>> >
>>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton 
>>> wrote:
>>> >
>>> > Moving to the user group.
>>> >
>>> >
>>> >
>>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
>>> >
>>> > Hello,
>>> >
>>> > I'd like some information about how secondary indices work under the
>>> hood.
>>> >
>>> > 1) Is data stored in some external data structure, or is it stored in
>>> an
>>> > actual Cassandra table, as columns within column families?
>>> > 2) Is data stored sorted or not? How is it partitioned?
>>> > 3) How can I access index data?
>>> >
>>> > Thanks in a advance,
>>> >
>>> > Alexander Altanis
>>> >
>>>
>>
>
>



[no subject]

2011-02-09 Thread Onur AKTAS

unsubscribe   

unsubscribe

2011-02-09 Thread Onur AKTAS

unsubscribe   

Out of control memory consumption

2011-02-09 Thread Huy Le
Hi,

There is already an email thread on memory issue on this email list, but I
creating a new thread as we are experiencing a different memory consumption
issue.

We are 12-server cluster.  We use random partitioner with manually generated
server tokens.  Memory usage on one server keeps growing out of control.  We
ran flush and cleared key and row caches but and ran GC but heap memory
usage won't go down.  The only way to heap memory usage to go down is the
restart cassandra.  We have to do this one a day.  All other servers have
heap memory usage less than 500MB.  This issue happened on both Cassandra
0.6.6 and 0.6.11.

Our JVM info:

java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)

And JVM memory allocation:  -Xms3G -Xmx3G

Non-heap memory usage is 138MB.

Any recommendation where should look to see why memory usage keep growing?

Thanks!

Huy


Re: How do secondary indices work

2011-02-09 Thread Jonathan Ellis
"Iterating through all of the rows matching an index clause on your
cluster is guaranteed to touch N/RF of the nodes in your cluster,
because each node only knows about data that is indexed locally."

On Wed, Feb 9, 2011 at 9:13 AM,   wrote:
> One more question: does each node keep an index of their own values, or is
> the index global?
>
> Alexander
>
>> Thank you very much, this is the information I was looking for. I started
>> adding secondary index functionality to Cassandra myself, and it turns out
>> I am doing almost exactly the same thing. I will try to change my code to
>> use your implementation as well to compare results.
>>
>> Alexander
>>
>>> Alexander:
>>>
>>> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a
>>> column
>>> family, and are kept synchronized with the base data via locking on a
>>> local
>>> node, meaning they are always consistent on the local node. Eventual
>>> consistency still applies between nodes, but a returned result will
>>> always
>>> match your query.
>>>
>>> This index column family stores a mapping from index values to a sorted
>>> list
>>> of matching row keys. When you query for rows between x and y matching a
>>> value z (via the get_indexed_slices call), Cassandra performs a lookup
>>> to
>>> the index column family for the slice of columns in row z between x and
>>> y.
>>> If any matches are found in the index, they are row keys that match the
>>> index clause, and we query the base data to return you those rows.
>>>
>>> Iterating through all of the rows matching an index clause on your
>>> cluster
>>> is guaranteed to touch N/RF of the nodes in your cluster, because each
>>> node
>>> only knows about data that is indexed locally.
>>>
>>> Some portions of the indexing implementation are not fully baked yet:
>>> for
>>> instance, although the API allows you to specify multiple columns, only
>>> one
>>> index will actually be used per query, and the rest of the clauses will
>>> be
>>> brute forced.
>>>
>>> A second secondary index implementation has been on the back burner for
>>> a
>>> while: it provides an identical API, but does not use a column family to
>>> store the index, and should be more efficient for append only data. See
>>> https://issues.apache.org/jira/browse/CASSANDRA-1472
>>>
>>> Thanks,
>>> Stu
>>>
>>> On Wed, Feb 9, 2011 at 2:35 AM,  wrote:
>>>
 Thank you for the links, I did read a bit in the comments of the
 ticket,
 but I couldn't get much out of it.

 I am mainly interested in how the index is stored and partitioned, not
 how
 it is used. I think the people in the dev list will probably be better
 qualified to answer that. My questions always seem to get moved to the
 user list, and usually with good cause, but I think this time it should
 be
 in the dev list :) Please move it back, if you can.

 Alexander

 > AFAIK this was the ticket the original work was done under
 > https://issues.apache.org/jira/browse/CASSANDRA-1415
 >
 > also  http://www.datastax.com/docs/0.7/data_model/secondary_indexes
 > and  http://pycassa.githubcom/pycassa/tutorial.html#indexes may help
 >
 > (sorry on reflection the email prob did not need to be moved from
 dev,
 my
 > bad)
 > Aaron
 >
 > On 09 Feb, 2011,at 09:16 AM, Aaron Morton 
 wrote:
 >
 > Moving to the user group.
 >
 >
 >
 > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote:
 >
 > Hello,
 >
 > I'd like some information about how secondary indices work under the
 hood.
 >
 > 1) Is data stored in some external data structure, or is it stored in
 an
 > actual Cassandra table, as columns within column families?
 > 2) Is data stored sorted or not? How is it partitioned?
 > 3) How can I access index data?
 >
 > Thanks in a advance,
 >
 > Alexander Altanis
 >

>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Out of control memory consumption

2011-02-09 Thread Chris Burroughs
On 02/09/2011 11:15 AM, Huy Le wrote:
> There is already an email thread on memory issue on this email list, but I
> creating a new thread as we are experiencing a different memory consumption
> issue.
> 
> We are 12-server cluster.  We use random partitioner with manually generated
> server tokens.  Memory usage on one server keeps growing out of control.  We
> ran flush and cleared key and row caches but and ran GC but heap memory
> usage won't go down.  The only way to heap memory usage to go down is the
> restart cassandra.  We have to do this one a day.  All other servers have
> heap memory usage less than 500MB.  This issue happened on both Cassandra
> 0.6.6 and 0.6.11.
> 

If the heap usages continues to grow an OOM will eventually be thrown.
Are you experiencing OOMs on these boxes?  If you are not OOMing, then
what problem are you experiencing (excessive CPU use garbage collection
for one example)?



> Our JVM info:
> 
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
> 
> And JVM memory allocation:  -Xms3G -Xmx3G
> 
> Non-heap memory usage is 138MB.
> 
> Any recommendation where should look to see why memory usage keep growing?
> 
> Thanks!
> 
> Huy

Are you using standard, mmap_index_only, or mmap io?  Are you using JNA?



Re: Anyone want to help out with http://wiki.apache.org/cassandra/MavenPlugin

2011-02-09 Thread Stephen Connolly
oh you might have to check out and install mojo-sandbox-parent (a sibling
svn url) sandbox projects are not allowed to deploy releases... the vote on
dev@mojo will promote from sandbox and release in one vote 32 h to go

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 9 Feb 2011 16:35, "Nate McCall"  wrote:
> Stephen,
> I get an error regarding a non-resolvable parent pom. Is there any
> local additional local configuration or parameters that should be
> passed with the install phase?
>
> I'd be happy to look at this over the next several days as it would
> make the Hector integration testing setup and tear down much easier.
>
> -Nate
>
> On Wed, Feb 9, 2011 at 5:41 AM, Stephen Connolly
>  wrote:
>> Until the release vote passes at mojo, you will need to do the
>> following to follow the example:
>>
>> svn co https://svn.codehaus.org/mojo/trunk/sandbox/cassandra-maven-plugin
>> cd cassandra-maven-plugin
>> mvn install
>> cd ..
>>
>> Otherwise the example should be fine.
>>
>> It's a wiki page, so I'm hoping that people can make the example a bit
>> better... specifically some hector people might be able to put in
>> actual example code for accessing cassandra from the index.jsp.
>>
>> -Stephen
>>


Re: Out of control memory consumption

2011-02-09 Thread Peter Schuller
> We are 12-server cluster.  We use random partitioner with manually generated
> server tokens.  Memory usage on one server keeps growing out of control.  We
> ran flush and cleared key and row caches but and ran GC but heap memory
> usage won't go down.  The only way to heap memory usage to go down is the
> restart cassandra.  We have to do this one a day.  All other servers have
> heap memory usage less than 500MB.  This issue happened on both Cassandra
> 0.6.6 and 0.6.11.

To be clear: You are not talking about the size of the Java process in
top, but the actual amount of heap used as reported by the JVM via
jmx/jconsole/etc?

Is the memory amount of memory that you consider high, the heap size
just after a concurrent mark/sweep?

Are you actually seeing OOM:s or are you restarting the node
pre-emptively in response to seeing heap usage go up?


> And JVM memory allocation:      -Xms3G -Xmx3G

Just FYI: So it is entirely expected that the JVM will be 3G (a bit
higher) in size (even with standard I/O) and further that the amount
of live data in the heap be approaching 3G. The concurrent mark/sweep
GC won't trigger until the initial occupancy reaches the limit (if
modern Cassandra with default settings).

If you've got a 3 gig heap size and the other nodes stay at 500 mb,
the question is why *don't* they increase in heap usage. Unless your
500 mb is the report of the actual live data set as evidenced by
post-CMS heap usage.

-- 
/ Peter Schuller


Re: Out of control memory consumption

2011-02-09 Thread Peter Schuller
(If you're looking at e.g. jconsole graphs a screenshot of the graph
would not hurt.)


-- 
/ Peter Schuller


Specifying row caching on per query basis ?

2011-02-09 Thread Ertio Lew
Is there any way to specify on per query basis(like we specify the
Consistency level), what rows be cached while you're reading them,
from a row_cache enabled CF. I believe, this could lead to much more
efficient use of the cache space!!( if you use same data for different
features/ parts in your application which have different caching
needs).


Re: Do supercolumns have a purpose?

2011-02-09 Thread Mike Malone
On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn  wrote:

> Shaun, I agree with you, but marking them as deprecated is not good enough
> for me. I can't easily stop using supercolumns. I need an upgrade path.
>

David,

Cassandra is open source and community developed. The right thing to do is
what's best for the community, which sometimes conflicts with what's best
for individual users. Such strife should be minimized, it will never be
eliminated. Luckily, because this is an open source, liberal licensed
project, if you feel strongly about something you should feel free to add
whatever features you want yourself. I'm sure other people in your situation
will thank you for it.

At a minimum I think it would behoove you to re-read some of the comments
here re: why super columns aren't really needed and take another look at
your data model and code. I would actually be quite surprised to find a use
of super columns that could not be trivially converted to normal columns. In
fact, it should be possible to do at the framework/client library layer -
you probably wouldn't even need to change any application code.

Mike

On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts  wrote:
>
>>
>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>> you should deprecate SuperColumns. They are already distracting you, and as
>> the years go by the cost of supporting them as you add more and more
>> functionality is only likely to get worse. It would be better to concentrate
>> on making the "core" column families better (and I'm sure we can all think
>> of lots of things we'd like).
>>
>> Just dropping SuperColumns would be bad for your reputation -- and for
>> users like David who are currently using them. But if you mark them clearly
>> as deprecated and explain why and what to do instead (perhaps putting a bit
>> of effort into migration tools... or even a "virtual" layer supporting
>> arbitrary hierarchical data), then you can drop them in a few years (when
>> you get to 1.0, say), without people feeling betrayed.
>>
>> -- Shaun
>>
>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>
>> "My main point was to say that it's think it is better to create tickets
>> for what you want, rather than for something else completely different that
>> would, as a by-product, give you what you want."
>>
>> Then let me say what I want: I want supercolumn families to have any
>> feature that regular column families have.
>>
>> My data model is full of supercolumns. I used them, even though I knew it
>> didn't *have to*, "because they were there", which implied to me that I was
>> supposed to use them for some good reason. Now I suspect that they will
>> gradually become less and less functional, as features are added to regular
>> column families and not supported for supercolumn families.
>>
>>
>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
>> wrote:
>>
>>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone  wrote:
>>>
 On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
 wrote:

> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote:
>
>> The advantage would be to enable secondary indexes on supercolumn
>> families.
>>
>
> Then I suggest opening a ticket for adding secondary indexes to
> supercolumn families and voting on it. This will be 1 or 2 order of
> magnitude less work than getting rid of super column internally, and
> probably a much better solution anyway.
>

 I realize that this is largely subjective, and on such matters code
 speaks louder than words, but I don't think I agree with you on the issue 
 of
 which alternative is less work, or even which is a better solution.

>>>
>>> You are right, I put probably too much emphase in that sentence. My main
>>> point was to say that it's think it is better to create tickets for what you
>>> want, rather than for something else completely different that would, as a
>>> by-product, give you what you want.
>>> Then I suspect that *if* the only goal is to get secondary indexes on
>>> super columns, then there is a good chance this would be less work than
>>> getting rid of super columns. But to be fair, secondary indexes on super
>>> columns may not make too much sense without #598, which itself would require
>>> quite some work, so clearly I spoke a bit quickly.
>>>
>>>
 If the goal is to have a hierarchical model, limiting the depth to two
 seems arbitrary. Why not go all the way and allow an arbitrarily deep
 hierarchy?

 If a more sophisticated hierarchical model is deemed unnecessary, or
 impractical, allowing a depth of two seems inconsistent and
 unnecessary. It's pretty trivial to overlay a hierarchical model on top of
 the map-of-sorted-maps model that Cassandra implements. Ed Anuff has
 implemented a custom comparator that does the job [1]. Google's Megastore
 has a similar architecture and goes even further [2].

 It seems to me t

Re: Out of control memory consumption

2011-02-09 Thread Huy Le
>
> If the heap usages continues to grow an OOM will eventually be thrown.
> Are you experiencing OOMs on these boxes?  If you are not OOMing, then
> what problem are you experiencing (excessive CPU use garbage collection
> for one example)?
>
>
>
No OOM.  The JVM just too busy doing GC when the used heap size is big
making this node unresponsive to its peers on the cluster.


>
> > Our JVM info:
> >
> > java version "1.6.0_21"
> > Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> > Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
> >
> > And JVM memory allocation:  -Xms3G -Xmx3G
> >
> > Non-heap memory usage is 138MB.
> >
> > Any recommendation where should look to see why memory usage keep
> growing?
> >
> > Thanks!
> >
> > Huy
>
> Are you using standard, mmap_index_only, or mmap io?  Are you using JNA?
>
>

We use standard disk access mode with JNA.

Huy

-- 
Huy Le
Spring Partners, Inc.
http://springpadit.com


Re: Do supercolumns have a purpose?

2011-02-09 Thread Norman Maurer
I still think super-columns are useful you just need to be aware of
the limitations...

Bye,
Norman


2011/2/9 Mike Malone :
> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn  wrote:
>>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>
> David,
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
> Mike
>>
>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts  wrote:
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
>>> wrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone  wrote:
>
> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
> wrote:
>>
>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn 
>> wrote:
>>>
>>> The advantage would be to enable secondary indexes on supercolumn
>>> families.
>>
>> Then I suggest opening a ticket for adding secondary indexes to
>> supercolumn families and voting on it. This will be 1 or 2 order of
>> magnitude less work than getting rid of super column internally, and
>> probably a much better solution anyway.
>
> I realize that this is largely subjective, and on such matters code
> speaks louder than words, but I don't think I agree with you on the issue 
> of
> which alternative is less work, or even which is a better solution.

 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what 
 you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on
 super columns, then there is a good chance this would be less work than
 getting rid of super columns. But to be fair, secondary indexes on super
 columns may not make too much sense without #598, which itself would 
 require
 quite some work, so clearly I spoke a bit quickly.

>
> If the goal is to have a hierarchical model, limiting the depth to two
> seems arbitrary. Why not go all the way and allow an arbitrarily deep
> hierarchy?
> If a more sophisticated hierarchical model is deemed unnecessary, or
> impractical, allowing a depth of two seems inconsistent and
> unnecessary. It's pretty trivial to overlay a hierarchical model on top of
> the m

Re: Out of control memory consumption

2011-02-09 Thread Huy Le
>
> To be clear: You are not talking about the size of the Java process in
> top, but the actual amount of heap used as reported by the JVM via
> jmx/jconsole/etc?
>
>  This is memory usage shows in JMX that we are talking about.



> Is the memory amount of memory that you consider high, the heap size
> just after a concurrent mark/sweep?
>
>
Memory usage grows overtime.



> Are you actually seeing OOM:s or are you restarting the node
> pre-emptively in response to seeing heap usage go up?
>
>
>
No OOM.  We pre-emptively restart it before it become unresponsive due to
GC.




> > And JVM memory allocation:  -Xms3G -Xmx3G
>
> Just FYI: So it is entirely expected that the JVM will be 3G (a bit
> higher) in size (even with standard I/O) and further that the amount
> of live data in the heap be approaching 3G. The concurrent mark/sweep
> GC won't trigger until the initial occupancy reaches the limit (if
> modern Cassandra with default settings).
>
>
Our CMS settings are:

-XX:CMSInitiatingOccupancyFraction=35 \
-XX:+UseCMSInitiatingOccupancyOnly \



> If you've got a 3 gig heap size and the other nodes stay at 500 mb,
> the question is why *don't* they increase in heap usage. Unless your
> 500 mb is the report of the actual live data set as evidenced by
> post-CMS heap usage.
>
>
What's considered to be "live data"?  If we clear caches, run flush on the
key space, shouldn't that free up memory?

Thanks!

Huy


> --
> / Peter Schuller
>



-- 
Huy Le
Spring Partners, Inc.
http://springpadit.com


Re: Using Cassandra-cli

2011-02-09 Thread Jonathan Ellis
"help update column family"?

On Wed, Feb 9, 2011 at 1:15 PM, Eranda Sooriyabandara <0704...@gmail.com> wrote:
> Hi Vishan, Aron and all,
>
> Thanks for the help. I tried it and successfully worked for me.
> But I could not find a place where mention about the attributes of some
> commands.
>
> e.g.
> update column family  [with = [and = ...]];
> create keyspace  [with = [and = ...]];
> (we can use comparator=UTF8Type and default_validation_class=UTF8Type as
> changed attributes)
>
> Is there any documentaries which mentioned about those applicable attributes
> in each case?
>
> thanks
> Eranda
>
> P.S. I put a blog post on Cassandra-cli in
> http://emsooriyabandara.blogspot.com/ please correct me if I am got it wrong
> in any place
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Specifying row caching on per query basis ?

2011-02-09 Thread Jonathan Ellis
Currently there is not.

On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew  wrote:
> Is there any way to specify on per query basis(like we specify the
> Consistency level), what rows be cached while you're reading them,
> from a row_cache enabled CF. I believe, this could lead to much more
> efficient use of the cache space!!( if you use same data for different
> features/ parts in your application which have different caching
> needs).
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Out of control memory consumption

2011-02-09 Thread Robert Coli
On Wed, Feb 9, 2011 at 11:04 AM, Huy Le  wrote:
> Memory usage grows overtime.

It is relatively typical for caches to exert memory pressure over time
as they fill. What are your cache settings, for how many
columnfamilies, and with what sized memtables? What version of
Cassandra?

=Rob


Re: Specifying row caching on per query basis ?

2011-02-09 Thread Ertio Lew
Is this under consideration for future releases ? or being thought about!?



On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis  wrote:
> Currently there is not.
>
> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew  wrote:
>> Is there any way to specify on per query basis(like we specify the
>> Consistency level), what rows be cached while you're reading them,
>> from a row_cache enabled CF. I believe, this could lead to much more
>> efficient use of the cache space!!( if you use same data for different
>> features/ parts in your application which have different caching
>> needs).
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: Specifying row caching on per query basis ?

2011-02-09 Thread Jonathan Ellis
Not really, no.  If you can't trust LRU to cache the hottest rows
perhaps you should split the data into different ColumnFamilies.

On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew  wrote:
> Is this under consideration for future releases ? or being thought about!?
>
>
>
> On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis  wrote:
>> Currently there is not.
>>
>> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew  wrote:
>>> Is there any way to specify on per query basis(like we specify the
>>> Consistency level), what rows be cached while you're reading them,
>>> from a row_cache enabled CF. I believe, this could lead to much more
>>> efficient use of the cache space!!( if you use same data for different
>>> features/ parts in your application which have different caching
>>> needs).
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Specifying row caching on per query basis ?

2011-02-09 Thread Edward Capriolo
On Wed, Feb 9, 2011 at 2:43 PM, Ertio Lew  wrote:
> Is this under consideration for future releases ? or being thought about!?
>
>
>
> On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis  wrote:
>> Currently there is not.
>>
>> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew  wrote:
>>> Is there any way to specify on per query basis(like we specify the
>>> Consistency level), what rows be cached while you're reading them,
>>> from a row_cache enabled CF. I believe, this could lead to much more
>>> efficient use of the cache space!!( if you use same data for different
>>> features/ parts in your application which have different caching
>>> needs).
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>>
>

I have mentioned a suggested implemented inside this issue.

https://issues.apache.org/jira/browse/CASSANDRA-2035


Exceptions on 0.7.0

2011-02-09 Thread shimi
I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X
On 3 out of the 4 nodes I get exceptions in the log.
I am using RP.
Changes that I did:
1. changed the replication factor from 3 to 4
2. configured the nodes to use Dynamic Snitch
3. RR of 0.33

I run repair on 2 nodes  before I noticed the errors. One of them is having
the first error and the other the second.
I restart the nodes but I still get the exceptions.

The following Exception I get from 2 nodes:
 WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java (line
84) Cannot provide an optimal Bloom
Filter for 1986622313 elements (1/4 buckets per element).
ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
AbstractCassandraDaemon.java (line 91) Fatal exception in
thread Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34)
at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:76)
at
org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50)
at
org.apache.cassandra.io.LazilyCompactedRow.(LazilyCompactedRow.java:88)
at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:136)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)
at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42)
at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)
at
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76)
at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:101)
... 29 more


On another node I get:

ERROR [pool-1-thread-2] 2011-02-09 19:48:32,137 Cassandra.java (line 2876)
Internal error processing get_range_
slices
java.lang.RuntimeException: error reading 1 of 1970563183
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:82)
at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39)
at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108)
at
org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364)
at
org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterat

Re: Specifying row caching on per query basis ?

2011-02-09 Thread buddhasystem

Jonathan, what if the data is really homogeneous, but over a long period of
time. I decided that the users who hit the database for recent past should
have a better ride. Splitting into a separate CF also has costs, right?

In fact, if I were to go this way, do you think I can crank down the key
caches? If yes, down to what level, zero?

Thanks!



Jonathan Ellis-3 wrote:
> 
> Not really, no.  If you can't trust LRU to cache the hottest rows
> perhaps you should split the data into different ColumnFamilies.
> 
> On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew  wrote:
>> Is this under consideration for future releases ? or being thought
>> about!?
>>
>>
>>
>> On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis 
>> wrote:
>>> Currently there is not.
>>>
>>> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew  wrote:
 Is there any way to specify on per query basis(like we specify the
 Consistency level), what rows be cached while you're reading them,
 from a row_cache enabled CF. I believe, this could lead to much more
 efficient use of the cache space!!( if you use same data for different
 features/ parts in your application which have different caching
 needs).

>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder of DataStax, the source for professional Cassandra support
>>> http://www.datastax.com
>>>
>>
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
> 
> 

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Specifying-row-caching-on-per-query-basis-tp6008838p6009462.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: read latency in cassandra

2011-02-09 Thread Robert Coli
On Fri, Feb 4, 2011 at 11:13 AM, Dan Kuebrich  wrote:
> Is 2 seconds the normal "I went to disk" latency for cassandra?

Cassandra exposes metrics on a per-CF basis which indicate latency.
This includes both cache hits and misses, as well as requests for rows
which do not exist. It does NOT include an assortment of other latency
causing things, like thrift.

If you see two seconds of latency from the perspective of your
application, you should compare it to the latency numbers Cassandra
reports. If you are getting timed-out exceptions, that does seem
relatively likely to be the cold cache "I went to disk" case, and the
Cassandra latency numbers should reflect that.

=Rob


Default Listen Port

2011-02-09 Thread Jeremy.Truelove
What's the easiest way to change the port nodes listen for comm on from other 
nodes? It appears that the default is 8080 which collides with my tomcat server 
on one of our dev boxes. I tried doing something in cassandra.yaml like

listen_address: 192.1.fake.2:

but that doesn't work it throws an exception. Also can you not put the actual 
name of servers in the config or does it always have to be the actual ip 
address currently? Thanks.

jt



___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the 
foregoing.  Barclays Capital is the investment banking division of Barclays 
Bank PLC, a company registered in England (number 1026167) with its registered 
office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be 
sent from other members of the Barclays Group.
___


Re: Default Listen Port

2011-02-09 Thread Chris Burroughs
On 02/09/2011 04:00 PM, jeremy.truel...@barclayscapital.com wrote:
> What's the easiest way to change the port nodes listen for comm on
> from other nodes? It appears that the default is 8080 which collides
> with my tomcat server on one of our dev boxes. I tried doing
> something in cassandra.yaml like
> 
> listen_address: 192.1.fake.2:
> 
> but that doesn't work it throws an exception. Also can you not put
> the actual name of servers in the config or does it always have to be
> the actual ip address currently? Thanks.
> 


8080 is used by jmx [1].  You can change that in cassandra-env.sh.

hostnames are allowed.


[1] http://wiki.apache.org/cassandra/FAQ#ports


RE: Default Listen Port

2011-02-09 Thread Jeremy.Truelove
Thanks for the heads up that worked.

-Original Message-
From: Chris Burroughs [mailto:chris.burrou...@gmail.com] 
Sent: Wednesday, February 09, 2011 4:04 PM
To: user@cassandra.apache.org
Cc: Truelove, Jeremy: IT (NYK)
Subject: Re: Default Listen Port

On 02/09/2011 04:00 PM, jeremy.truel...@barclayscapital.com wrote:
> What's the easiest way to change the port nodes listen for comm on
> from other nodes? It appears that the default is 8080 which collides
> with my tomcat server on one of our dev boxes. I tried doing
> something in cassandra.yaml like
> 
> listen_address: 192.1.fake.2:
> 
> but that doesn't work it throws an exception. Also can you not put
> the actual name of servers in the config or does it always have to be
> the actual ip address currently? Thanks.
> 


8080 is used by jmx [1].  You can change that in cassandra-env.sh.

hostnames are allowed.


[1] http://wiki.apache.org/cassandra/FAQ#ports

___

This e-mail may contain information that is confidential, privileged or 
otherwise protected from disclosure. If you are not an intended recipient of 
this e-mail, do not duplicate or redistribute it by any means. Please delete it 
and any attachments and notify the sender that you have received it in error. 
Unless specifically indicated, this e-mail is not an offer to buy or sell or a 
solicitation to buy or sell any securities, investment products or other 
financial product or service, an official confirmation of any transaction, or 
an official statement of Barclays. Any views or opinions presented are solely 
those of the author and do not necessarily represent those of Barclays. This 
e-mail is subject to terms available at the following link: 
www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the 
foregoing.  Barclays Capital is the investment banking division of Barclays 
Bank PLC, a company registered in England (number 1026167) with its registered 
office at 1 Churchill Place, London, E14 5HP.  This email may relate to or be 
sent from other members of the Barclays Group.
___


Re: Default Listen Port

2011-02-09 Thread Edward Capriolo
On Wed, Feb 9, 2011 at 4:00 PM,   wrote:
> What’s the easiest way to change the port nodes listen for comm on from
> other nodes? It appears that the default is 8080 which collides with my
> tomcat server on one of our dev boxes. I tried doing something in
> cassandra.yaml like
>
>
>
> listen_address: 192.1.fake.2:
>
>
>
> but that doesn’t work it throws an exception. Also can you not put the
> actual name of servers in the config or does it always have to be the actual
> ip address currently? Thanks.
>
>
>
> jt
>
>
>
>
>
> ___
>
>
>
> This e-mail may contain information that is confidential, privileged or
> otherwise protected from disclosure. If you are not an intended recipient of
> this e-mail, do not duplicate or redistribute it by any means. Please delete
> it and any attachments and notify the sender that you have received it in
> error. Unless specifically indicated, this e-mail is not an offer to buy or
> sell or a solicitation to buy or sell any securities, investment products or
> other financial product or service, an official confirmation of any
> transaction, or an official statement of Barclays. Any views or opinions
> presented are solely those of the author and do not necessarily represent
> those of Barclays. This e-mail is subject to terms available at the
> following link: www.barcap.com/emaildisclaimer. By messaging with Barclays
> you consent to the foregoing.  Barclays Capital is the investment banking
> division of Barclays Bank PLC, a company registered in England (number
> 1026167) with its registered office at 1 Churchill Place, London, E14 5HP.
> This email may relate to or be sent from other members of the Barclays
> Group.
>
> ___

You are having a collision on 8080 which is the default JMX port.

In conf/cassandra-env.sh
look for JMX_PORT="8080"

9160 is the thrift port used by clients
7000 is the storage port (used between nodes)

If you change the jmx port you have specify it when using nodetool,
'nodetool -h localhost -p  ring'


Re: Do supercolumns have a purpose?

2011-02-09 Thread Bill de hÓra
On Thu, 2011-02-03 at 15:35 -0800, Mike Malone wrote:

>  In my dealings with the Cassandra code, super columns end up making a
> mess all over the place when algorithms need to be special cased and
> branch based on the column/supercolumn distinction.
> 
> 
> I won't even mention what it does to the thrift interface.

My observation is similar, in that they (SCFs) make the "type system" in
Cassandra disjoint. This makes me doubt that moving to Avro would
simplify anything for Cassandra users. It also means knock-on effects
such as no  common supertype in APIs for languages like Java (so the
surface area of clients like Hector blow up badly when you compare it
the HBase client).   I can't wait to see how CQL fares with SCFs; a sane
query language will be closed under its operations and I doubt it can be
done atm.

That said, I keep finding uses for them, which is irksome; but maybe I'm
being lazy when it comes to modelling and now that secondary indexes are
in, I should pretend SCFs don't exist. 

Bill 



What will happen if I try to compact with insufficient headroom?

2011-02-09 Thread buddhasystem

One of my nodes is 76% full. I know that one of CFs represents 90% of the
data, others are really minor. Can I still compact under these conditions?
Will it crash and lose the data? Will it try to create one very large file
out of fragments, for that dominating CF?

TIA

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-will-happen-if-I-try-to-compact-with-insufficient-headroom-tp6009619p6009619.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: ApplicationState Schema has drifted from DatabaseDescriptor

2011-02-09 Thread Aaron Morton
Thanks Gary. I'll keep an eye on things and see if it happens again. From reading the code I'm wondering if there is a small chance of a race condition in HintedHandoffManager.waitForSchemaAgreement() .Could the following happen? I'm a little unsure on exactly how the endpoint state is removed from the map in Gossiper.1) node 1 starts2) Gossiper calls StorageService.onAlive() when the endpoints are detected as alive.3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandoff TP4) This happens several times, and node 1 gets busy delivering hints but there is only 1 thread in the thread pool.5) Node n is removed from the cluster and the endpoint state is deleted in the Gossiper on node 1 6) Node 1 gets around to processing the hints for node n and Gossiper.getEndpointStateForEndpoint() returns null for node nThanksAaronOn 10 Feb, 2011,at 03:03 AM, Gary Dusbabek  wrote:Aaron,

It looks like you're experiencing a side-effect of CASSANDRA-2083.
There was at least one place (when node B received updated schema from
node A) where gossip was not being updated with the correct schema
even though DatabaseDescriptor had the right version.  I'm pretty sure
this is what you're seeing.

Gary.


On Wed, Feb 9, 2011 at 00:08, Aaron Morton  wrote:
> I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest
> stable 0.7 build "2011-02-08_20-41-25" (upgraded  node was jb-cass1 below).
> This is a long email, you can jump to the end and help me out by checking
> something on your  07 cluster.
> This is the value from o.a.c.gms.FailureDetector.AllEndpointStates on
> jb-cass05 9114.67)
> /192.168.114.63   X3:2011-02-08_20-41-25
> SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455   LOAD:2.84182972E8
> STATUS:NORMAL,0
> /192.168.114.64   SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455
> LOAD:2.84354156E8   STATUS:NORMAL,34028236692093846346337460743176821145
> /192.168.114.66   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
> LOAD:2.59171601E8   STATUS:NORMAL,102084710076281539039012382229530463435
> /192.168.114.65   SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455
> LOAD:2.70907168E8   STATUS:NORMAL,68056473384187692692674921486353642290
> jb08.wetafx.co.nz/192.168.114.67
> SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455   LOAD:1.155260665E9
> STATUS:NORMAL,136112946768375385385349842972707284580
> Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and 67
> it starts with 075.
> This is the output from pycassa calling describe_versions when connected to
> both the 63 (jb-cass1) and 67 (jb-cass5) nodes
> In [34]: sys.describe_schema_versions()
> Out[34]:
> {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63',
>                                           '192.168.114.64',
>                                           '192.168.114.65',
>                                           '192.168.114.66',
>                                           '192.168.114.67']}
> It's reporting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is
> getting the value from DatabaseDescriptor. FailureDetector MBean is getting
> them from Gossiper.endpointStateMap . Requests are working though, so the
> CFid's must be matching up.
> Commit https://github.com/apache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4 added
> code to the 0.7 branch in the HintedHandOffManager to check the schema
> versions of nodes it has hints for. This is now failing on the new node as
> follows...
> ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java
> (line
> org.apache.cassandra.service.AbstractCassandraDaemon$1.uncaughtException(AbstractCassandraDaemon.java:114))
> Fatal exception in thread Thread[HintedHandoff:1,1,main]
> java.lang.RuntimeException: java.lang.RuntimeException: Could not reach
> schema agreement with /192.168.114.64 in 6ms
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.RuntimeException: Could not reach schema agreement with
> /192.168114.64 in 6ms
>         at
> org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256)
>         at
> org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267)
>         at
> org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88)
>         at
> org.apachecassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391)
>         at
> org.apachecassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         ... 3 more
> (the nodes can all see each other, checked with notetool during the 60
> seconds)
> If I restart one of the nodes with the 075 schema (without upgrading it) it
> reads 

Re: ApplicationState Schema has drifted from DatabaseDescriptor

2011-02-09 Thread Brandon Williams
On Wed, Feb 9, 2011 at 4:31 PM, Aaron Morton wrote:

> Thanks Gary. I'll keep an eye on things and see if it happens again.
>
> From reading the code I'm wondering if there is a small chance of a race
> condition in HintedHandoffManager.waitForSchemaAgreement() .
>
> Could the following happen? I'm a little unsure on exactly how the endpoint
> state is removed from the map in Gossiper.
>
> 1) node 1 starts
> 2) Gossiper calls StorageService.onAlive() when the endpoints are detected
> as alive.
> 3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandoff
> TP
> 4) This happens several times, and node 1 gets busy delivering hints but
> there is only 1 thread in the thread pool.
> 5) Node n is removed from the cluster and the endpoint state is deleted in
> the Gossiper on node 1
> 6) Node 1 gets around to processing the hints for node n and
> Gossiper.getEndpointStateForEndpoint() returns null for node n
>

Yes, this is currently possible, but you have to decommission the node
before the schema check/sleep portion of HH is over, which is unlikely in
practice.  It will be especially unlikely after
https://issues.apache.org/jira/browse/CASSANDRA-2115.

-Brandon


RE: Exceptions on 0.7.0

2011-02-09 Thread Dan Hendry
Out of curiosity, do you really have on the order of 1,986,622,313 elements
(I believe elements=keys) in the cf?

 

Dan

 

From: shimi [mailto:shim...@gmail.com] 
Sent: February-09-11 15:06
To: user@cassandra.apache.org
Subject: Exceptions on 0.7.0

 

I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X

On 3 out of the 4 nodes I get exceptions in the log.

I am using RP.

Changes that I did:

1. changed the replication factor from 3 to 4

2. configured the nodes to use Dynamic Snitch

3. RR of 0.33

 

I run repair on 2 nodes  before I noticed the errors. One of them is having
the first error and the other the second.

I restart the nodes but I still get the exceptions.

 

The following Exception I get from 2 nodes:

 WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java (line
84) Cannot provide an optimal Bloom

Filter for 1986622313 elements (1/4 buckets per element).

ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190
AbstractCassandraDaemon.java (line 91) Fatal exception in 

thread Thread[CompactionExecutor:1,1,main]

java.io.IOError: java.io.EOFException

at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentity
Iterator.java:105)

at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentity
Iterator.java:34)

at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter
ator.java:284)

at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt
erator.java:326)

at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte
rator.java:230)

at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:68)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

at
com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

at
org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:7
6)

at
org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50)

at
org.apache.cassandra.io.LazilyCompactedRow.(LazilyCompactedRow.java:88
)

at
org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterato
r.java:136)

at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav
a:107)

at
org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav
a:42)

at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:73)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(Filter
Iterator.java:183)

at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterat
or.java:94)

at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.jav
a:323)

at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)

at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)

at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

at java.util.concurrent.FutureTask.run(FutureTask.java:138)

at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

at java.lang.Thread.run(Thread.java:619)

Caused by: java.io.EOFException

at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)

at
org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280)

at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:7
6)

at
org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:3
5)

at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentity
Iterator.java:101)

... 29 more

 

 

On another node I get:

 

ERROR [pool-1-thread-2] 2011-02-09 19:48:32,137 Cassandra.java (line 2876)
Internal error processing get_range_

slices

java.lang.RuntimeException: error reading 1 of 1970563183

at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleS
liceReader.java:82)

at
org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleS
liceReader.java:39)

at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

at
com.google.common.collect.AbstractIt

Re: unsubscribe

2011-02-09 Thread Chance Li
unsubscribe


Re: unsubscribe

2011-02-09 Thread Aaron Morton
instructions are herehttp://wiki.apache.org/cassandra/FAQ#unsubscribeOn 10 Feb, 2011,at 02:38 PM, Chance Li  wrote:unsubscribe


Re: Using Cassandra-cli

2011-02-09 Thread Eranda Sooriyabandara
Hi all,
Thanks Jonathan and Eric, you both describes what I want. Now I am looking
forward to play with them.

thanks
Eranda


Re: time to live rows

2011-02-09 Thread Wangpei (Peter)
AFAIK 2nd index only works for operator EQ.

-邮件原件-
发件人: Kallin Nagelberg [mailto:kallin.nagelb...@gmail.com] 
发送时间: 2011年2月9日 3:36
收件人: user@cassandra.apache.org
主题: Re: time to live rows

I'm thinking if this row expiry notion doesn't pan out then I might
create a 'lastAccessed' column with a secondary index (i think that's
right) on it. Then I can periodically run a query to find all
lastAccessed columns less than a certain value and manually delete
them. Sound reasonable?

-Kal


Re: Row Key Types

2011-02-09 Thread Wangpei (Peter)
Did you set compare_with attribute of your ColumnFamily to TimeUUIDType?

-邮件原件-
发件人: Bill Speirs [mailto:bill.spe...@gmail.com] 
发送时间: 2011年2月2日 0:47
收件人: Cassandra Usergroup
主题: Row Key Types

What is the type of a Row Key? Can you define how they are compared?

I ask because I'm using TimeUUIDs as my row keys, but when I make a
call to get a range of row keys (get_range in phpcassa) I have to
specify the UTF8 range of '' to '----'
instead of the TimeUUID range of
'----' to
'----'.

This works, but feels wrong/inefficient... thoughts?

Thanks...

Bill-


RE: Do supercolumns have a purpose?

2011-02-09 Thread Viktor Jevdokimov
SCFs are very useful and I hope lives forever. We need them!


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

Konstitucijos pr. 23,
LT-08105 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the interested recipient, you are reminded that 
the information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
or destroy this message and any copies.-Original Message-
From: norman.mau...@googlemail.com [mailto:norman.mau...@googlemail.com] On 
Behalf Of Norman Maurer
Sent: Wednesday, February 09, 2011 20:59
To: user@cassandra.apache.org
Subject: Re: Do supercolumns have a purpose?

I still think super-columns are useful you just need to be aware of
the limitations...

Bye,
Norman


2011/2/9 Mike Malone :
> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn  wrote:
>>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>
> David,
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
> Mike
>>
>> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts  wrote:
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
>>> wrote:

 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone  wrote:
>
> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne 
> wrote:
>>
>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn 
>> wrote:
>>>
>>> The advantage would be to enable secondary indexes on supercolumn
>>> families.
>>
>> Then I suggest opening a ticket for adding secondary indexes to
>> supercolumn families and voting on it. This will be 1 or 2 order of
>> magnitude less work than getting rid of super column internally, and
>> probably a much better solution anyway.
>
> I realize that this is largely subjective, and on such matters code
> speaks louder than words, but I don't think I agree with you on the issue 
> of
> which alternative is less work, or even which is a better solution.

 You are right, I put probably too much emphase in that sentence

Re: Do supercolumns have a purpose?

2011-02-09 Thread David Boxenhorn
Mike, my problem is that I have an database and codebase that already uses
supercolumns. If I had to do it over, it wouldn't use them, for the reasons
you point out. In fact, I have a feeling that over time supercolumns will
become deprecated de facto, if not de jure. That's why I would like to see
them represented internally as regular columns, with an upgrade path for
backward compatibility.

I would love to do it myself! (I haven't looked at the code base, but I
don't understand why it should be so hard.) But my employer has other
ideas...


On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone  wrote:

> On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn  wrote:
>
>> Shaun, I agree with you, but marking them as deprecated is not good enough
>> for me. I can't easily stop using supercolumns. I need an upgrade path.
>>
>
> David,
>
> Cassandra is open source and community developed. The right thing to do is
> what's best for the community, which sometimes conflicts with what's best
> for individual users. Such strife should be minimized, it will never be
> eliminated. Luckily, because this is an open source, liberal licensed
> project, if you feel strongly about something you should feel free to add
> whatever features you want yourself. I'm sure other people in your situation
> will thank you for it.
>
> At a minimum I think it would behoove you to re-read some of the comments
> here re: why super columns aren't really needed and take another look at
> your data model and code. I would actually be quite surprised to find a use
> of super columns that could not be trivially converted to normal columns. In
> fact, it should be possible to do at the framework/client library layer -
> you probably wouldn't even need to change any application code.
>
> Mike
>
> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts  wrote:
>>
>>>
>>> I'm a newbie here, but, with apologies for my presumptuousness, I think
>>> you should deprecate SuperColumns. They are already distracting you, and as
>>> the years go by the cost of supporting them as you add more and more
>>> functionality is only likely to get worse. It would be better to concentrate
>>> on making the "core" column families better (and I'm sure we can all think
>>> of lots of things we'd like).
>>>
>>> Just dropping SuperColumns would be bad for your reputation -- and for
>>> users like David who are currently using them. But if you mark them clearly
>>> as deprecated and explain why and what to do instead (perhaps putting a bit
>>> of effort into migration tools... or even a "virtual" layer supporting
>>> arbitrary hierarchical data), then you can drop them in a few years (when
>>> you get to 1.0, say), without people feeling betrayed.
>>>
>>> -- Shaun
>>>
>>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote:
>>>
>>> "My main point was to say that it's think it is better to create tickets
>>> for what you want, rather than for something else completely different that
>>> would, as a by-product, give you what you want."
>>>
>>> Then let me say what I want: I want supercolumn families to have any
>>> feature that regular column families have.
>>>
>>> My data model is full of supercolumns. I used them, even though I knew it
>>> didn't *have to*, "because they were there", which implied to me that I was
>>> supposed to use them for some good reason. Now I suspect that they will
>>> gradually become less and less functional, as features are added to regular
>>> column families and not supported for supercolumn families.
>>>
>>>
>>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne 
>>> wrote:
>>>
 On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote:

> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne  > wrote:
>
>> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote:
>>
>>> The advantage would be to enable secondary indexes on supercolumn
>>> families.
>>>
>>
>> Then I suggest opening a ticket for adding secondary indexes to
>> supercolumn families and voting on it. This will be 1 or 2 order of
>> magnitude less work than getting rid of super column internally, and
>> probably a much better solution anyway.
>>
>
> I realize that this is largely subjective, and on such matters code
> speaks louder than words, but I don't think I agree with you on the issue 
> of
> which alternative is less work, or even which is a better solution.
>

 You are right, I put probably too much emphase in that sentence. My main
 point was to say that it's think it is better to create tickets for what 
 you
 want, rather than for something else completely different that would, as a
 by-product, give you what you want.
 Then I suspect that *if* the only goal is to get secondary indexes on
 super columns, then there is a good chance this would be less work than
 getting rid of super columns. But to be fair, secondary indexes on super
 columns may not make too much sense without #598,