RE: unique key generation
Are you sure about those odds? Winning the UK national lottery has a chance of 13 983 816 to 1 so for just 2 days the odds are 13 983 816^2 = 1.9554711 x 10^14 Brendan Poole Systems Developer NewLaw Solicitors Helmont House Churchill Way Cardiff brendan.po...@new-law.co.uk 029 2078 4283 www.new-law.co.uk From: Kallin Nagelberg [mailto:kallin.nagelb...@gmail.com] Sent: 08 February 2011 03:38 To: user@cassandra.apache.org Subject: Re: unique key generation Pretty sure it also uses mac address, so chances are very slim. I'll check out time uuid too, thanks. On 7 Feb 2011 17:11, "Victor Kabdebon" wrote: Hello Kallin. If you use timeUUID the chance to generate two time the same uuid is the following : considering that both client generate the uuid at the same millisecond, the chance of generating the same uuid is : 1/1.84467441 × 1019 Which is equal to the probability for winning a national lottery for 1e11 days in a row ( for 270 million years). Well if you do have a collision you should play the lottery :). Best regards, Victor Kabdebon http://www.voxnucleus.fr 2011/2/7 Kallin Nagelberg > > Hey, > > I am developing a session management system using Cassandra and need > to generate uni... Please consider the environment before printing this e-mail Important - The information contained in this email (and any attached files) is confidential and may be legally privileged and protected by law. The intended recipient is authorised to access it. If you are not the intended recipient, please notify the sender immediately and delete or destroy all copies. You must not disclose the contents of this email to anyone. Unauthorised use, dissemination, distribution, publication or copying of this communication is prohibited. NewLaw Solicitors does not accept any liability for any inaccuracies or omissions in the contents of this email that may have arisen as a result of transmission. This message and any attachments are believed to be free of any virus or defect that might affect any computer system into which it is received and opened. However,it is the responsibility of the recipient to ensure that it is virus free; therefore, no responsibility is accepted for any loss or damage in any way arising from its use. NewLaw Solicitors is the trading name of NewLaw Legal Ltd, a limited company registered in England and Wales with registered number 07200038. NewLaw Legal Ltd is regulated by the Solicitors Regulation Authority whose website is http://www.sra.org.uk The registered office of NewLaw Legal Ltd is at Helmont House, Churchill Way, Cardiff, CF10 2HE. Tel: 0845 756 6870, Fax: 0845 756 6871, Email: i...@new-law.co.uk. www.new-law.co.uk. We use the word ‘partner’ to refer to a shareowner or director of the company, or an employee or consultant of the company who is a lawyer with equivalent standing and qualifications. A list of the directors is displayed at the above address, together with a list of those persons who are designated as partners. <>
Re: How do secondary indices work
On Feb 8, 2011, at 21:23, Aaron Morton wrote: >>> 1) Is data stored in some external data structure, or is it stored in an >>> actual Cassandra table, as columns within column families? Yes. Own files next to the CF files and own node IndexColumnFamilies in JMX. And they are built asynchronously.
Re: regarding space taken by different column families in Cassandra
After 1 hour ,from the application was done, the size of data folder become 14 GB and the result of cfstats is matching with this number (and Space used (live) become equal to Space used (total) ). CF1-Space used (live) :7196278850 Space used (total): 7196278850 CF2- Space used (live) :2458866899 Space used (total): 2458866899 CF3- Space used (live) :2871096369 Space used (total) :2967445550 CF4- Space used (live) :1536044466 Space used (total) :1536044466 After the application was done what kind operation was going on in Cassandra and how much space it would require ? regards, abhinav On Wed, Feb 9, 2011 at 12:46 PM, abhinav prakash rai wrote: > I am using 4 column family in my application , the result of cfstats for > space taken by different CF are as below- > > CF1-Space used (live) :7196159547 >Space used (total): 14214373706 > CF2- Space used (live) :2456495851 > >Space used (total): 9065746112 > > CF3- Space used (live) :2864007861 > >Space used (total) :6114084611 > > CF4- Space used (live) :1531088094 >Space used (total) :3433016989 > > where as I can see the total size of data directory is 17GB which is not > equal to ALL Space used (total) by above 4 column families. If I assume Space > used (total) is in byte the sum is coming to about 32 GB which is not the > space taken by data_file_directories. > > Some one can help to know how much space is used by each CF's ? > > I am using replication_factor= 1. > > Regards, > abhinav > > -- Regards, Abhinav P. Rai
Re: How do secondary indices work
Thank you for the reply, although I didn't quite understand you. All I got was that Index data is stored in some kind of external data structure. Alexander > > On Feb 8, 2011, at 21:23, Aaron Morton wrote: > 1) Is data stored in some external data structure, or is it stored in an actual Cassandra table, as columns within column families? > > Yes. Own files next to the CF files and own node IndexColumnFamilies in > JMX. > > And they are built asynchronously. > >
Re: How do secondary indices work
Thank you for the links, I did read a bit in the comments of the ticket, but I couldn't get much out of it. I am mainly interested in how the index is stored and partitioned, not how it is used. I think the people in the dev list will probably be better qualified to answer that. My questions always seem to get moved to the user list, and usually with good cause, but I think this time it should be in the dev list :) Please move it back, if you can. Alexander > AFAIK this was the ticket the original work was done under > https://issues.apache.org/jira/browse/CASSANDRA-1415 > > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help > > (sorry on reflection the email prob did not need to be moved from dev, my > bad) > Aaron > > On 09 Feb, 2011,at 09:16 AM, Aaron Morton wrote: > > Moving to the user group. > > > > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: > > Hello, > > I'd like some information about how secondary indices work under the hood. > > 1) Is data stored in some external data structure, or is it stored in an > actual Cassandra table, as columns within column families? > 2) Is data stored sorted or not? How is it partitioned? > 3) How can I access index data? > > Thanks in a advance, > > Alexander Altanis >
Implemeting a LRU in Cassandra
Hi All, I'm sure people here have tried to solve similar questions. Say I'm tracking pages, I want to access the least recently used 1000 unique pages (i.e. columnnames). How can I achieve this? Using a row with say, ttl=60 seconds would solve the problem of accessing the least recently used unique pages in the last minute. Thanks for any comments and helps. Regards, Utku
Re: How do secondary indices work
Alexander: The secondary indexes in 0.7.0 (type KEYS) are stored internally in a column family, and are kept synchronized with the base data via locking on a local node, meaning they are always consistent on the local node. Eventual consistency still applies between nodes, but a returned result will always match your query. This index column family stores a mapping from index values to a sorted list of matching row keys. When you query for rows between x and y matching a value z (via the get_indexed_slices call), Cassandra performs a lookup to the index column family for the slice of columns in row z between x and y. If any matches are found in the index, they are row keys that match the index clause, and we query the base data to return you those rows. Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally. Some portions of the indexing implementation are not fully baked yet: for instance, although the API allows you to specify multiple columns, only one index will actually be used per query, and the rest of the clauses will be brute forced. A second secondary index implementation has been on the back burner for a while: it provides an identical API, but does not use a column family to store the index, and should be more efficient for append only data. See https://issues.apache.org/jira/browse/CASSANDRA-1472 Thanks, Stu On Wed, Feb 9, 2011 at 2:35 AM, wrote: > Thank you for the links, I did read a bit in the comments of the ticket, > but I couldn't get much out of it. > > I am mainly interested in how the index is stored and partitioned, not how > it is used. I think the people in the dev list will probably be better > qualified to answer that. My questions always seem to get moved to the > user list, and usually with good cause, but I think this time it should be > in the dev list :) Please move it back, if you can. > > Alexander > > > AFAIK this was the ticket the original work was done under > > https://issues.apache.org/jira/browse/CASSANDRA-1415 > > > > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes > > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help > > > > (sorry on reflection the email prob did not need to be moved from dev, my > > bad) > > Aaron > > > > On 09 Feb, 2011,at 09:16 AM, Aaron Morton > wrote: > > > > Moving to the user group. > > > > > > > > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: > > > > Hello, > > > > I'd like some information about how secondary indices work under the > hood. > > > > 1) Is data stored in some external data structure, or is it stored in an > > actual Cassandra table, as columns within column families? > > 2) Is data stored sorted or not? How is it partitioned? > > 3) How can I access index data? > > > > Thanks in a advance, > > > > Alexander Altanis > > >
Anyone want to help out with http://wiki.apache.org/cassandra/MavenPlugin
Until the release vote passes at mojo, you will need to do the following to follow the example: svn co https://svn.codehaus.org/mojo/trunk/sandbox/cassandra-maven-plugin cd cassandra-maven-plugin mvn install cd .. Otherwise the example should be fine. It's a wiki page, so I'm hoping that people can make the example a bit better... specifically some hector people might be able to put in actual example code for accessing cassandra from the index.jsp. -Stephen
Re: How do secondary indices work
Thank you very much, this is the information I was looking for. I started adding secondary index functionality to Cassandra myself, and it turns out I am doing almost exactly the same thing. I will try to change my code to use your implementation as well to compare results. Alexander > Alexander: > > The secondary indexes in 0.7.0 (type KEYS) are stored internally in a > column > family, and are kept synchronized with the base data via locking on a > local > node, meaning they are always consistent on the local node. Eventual > consistency still applies between nodes, but a returned result will always > match your query. > > This index column family stores a mapping from index values to a sorted > list > of matching row keys. When you query for rows between x and y matching a > value z (via the get_indexed_slices call), Cassandra performs a lookup to > the index column family for the slice of columns in row z between x and y. > If any matches are found in the index, they are row keys that match the > index clause, and we query the base data to return you those rows. > > Iterating through all of the rows matching an index clause on your cluster > is guaranteed to touch N/RF of the nodes in your cluster, because each > node > only knows about data that is indexed locally. > > Some portions of the indexing implementation are not fully baked yet: for > instance, although the API allows you to specify multiple columns, only > one > index will actually be used per query, and the rest of the clauses will be > brute forced. > > A second secondary index implementation has been on the back burner for a > while: it provides an identical API, but does not use a column family to > store the index, and should be more efficient for append only data. See > https://issues.apache.org/jira/browse/CASSANDRA-1472 > > Thanks, > Stu > > On Wed, Feb 9, 2011 at 2:35 AM, wrote: > >> Thank you for the links, I did read a bit in the comments of the ticket, >> but I couldn't get much out of it. >> >> I am mainly interested in how the index is stored and partitioned, not >> how >> it is used. I think the people in the dev list will probably be better >> qualified to answer that. My questions always seem to get moved to the >> user list, and usually with good cause, but I think this time it should >> be >> in the dev list :) Please move it back, if you can. >> >> Alexander >> >> > AFAIK this was the ticket the original work was done under >> > https://issues.apache.org/jira/browse/CASSANDRA-1415 >> > >> > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes >> > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help >> > >> > (sorry on reflection the email prob did not need to be moved from dev, >> my >> > bad) >> > Aaron >> > >> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton >> wrote: >> > >> > Moving to the user group. >> > >> > >> > >> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: >> > >> > Hello, >> > >> > I'd like some information about how secondary indices work under the >> hood. >> > >> > 1) Is data stored in some external data structure, or is it stored in >> an >> > actual Cassandra table, as columns within column families? >> > 2) Is data stored sorted or not? How is it partitioned? >> > 3) How can I access index data? >> > >> > Thanks in a advance, >> > >> > Alexander Altanis >> > >> >
Re: unique key generation
Yes i have done a mistake I know ! But I hoped nobody would notice :). It is the odds of winning 3 days in a row (standard probability fail). Still it is totally unlikely Sorry about this mistake, Best regards, Victor K.
Re: ApplicationState Schema has drifted from DatabaseDescriptor
Aaron, It looks like you're experiencing a side-effect of CASSANDRA-2083. There was at least one place (when node B received updated schema from node A) where gossip was not being updated with the correct schema even though DatabaseDescriptor had the right version. I'm pretty sure this is what you're seeing. Gary. On Wed, Feb 9, 2011 at 00:08, Aaron Morton wrote: > I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest > stable 0.7 build "2011-02-08_20-41-25" (upgraded node was jb-cass1 below). > This is a long email, you can jump to the end and help me out by checking > something on your 07 cluster. > This is the value from o.a.c.gms.FailureDetector.AllEndpointStates on > jb-cass05 9114.67) > /192.168.114.63 X3:2011-02-08_20-41-25 > SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455 LOAD:2.84182972E8 > STATUS:NORMAL,0 > /192.168.114.64 SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455 > LOAD:2.84354156E8 STATUS:NORMAL,34028236692093846346337460743176821145 > /192.168.114.66 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 > LOAD:2.59171601E8 STATUS:NORMAL,102084710076281539039012382229530463435 > /192.168.114.65 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 > LOAD:2.70907168E8 STATUS:NORMAL,68056473384187692692674921486353642290 > jb08.wetafx.co.nz/192.168.114.67 > SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 LOAD:1.155260665E9 > STATUS:NORMAL,136112946768375385385349842972707284580 > Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and 67 > it starts with 075. > This is the output from pycassa calling describe_versions when connected to > both the 63 (jb-cass1) and 67 (jb-cass5) nodes > In [34]: sys.describe_schema_versions() > Out[34]: > {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63', > '192.168.114.64', > '192.168.114.65', > '192.168.114.66', > '192.168.114.67']} > It's reporting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is > getting the value from DatabaseDescriptor. FailureDetector MBean is getting > them from Gossiper.endpointStateMap . Requests are working though, so the > CFid's must be matching up. > Commit https://github.com/apache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4 added > code to the 0.7 branch in the HintedHandOffManager to check the schema > versions of nodes it has hints for. This is now failing on the new node as > follows... > ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java > (line > org.apache.cassandra.service.AbstractCassandraDaemon$1.uncaughtException(AbstractCassandraDaemon.java:114)) > Fatal exception in thread Thread[HintedHandoff:1,1,main] > java.lang.RuntimeException: java.lang.RuntimeException: Could not reach > schema agreement with /192.168.114.64 in 6ms > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.RuntimeException: Could not reach schema agreement with > /192.168114.64 in 6ms > at > org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267) > at > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88) > at > org.apache.cassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391) > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > (the nodes can all see each other, checked with notetool during the 60 > seconds) > If I restart one of the nodes with the 075 schema (without upgrading it) it > reads the schema from the system tables and goes back to the 2f55 schema. > e.g. the 64 node was also on the 075 schema, I restarted and it moved to the > 2f55 and logged appropriately. While writing this email I checked again with > the 65 node, and the schema if was reporting to other nodes changed after a > restart from 075 to 2f55 > INFO [main] 2011-02-09 17:17:11,457 DatabaseDescriptor.java (line > org.apache.cassandra.config.DatabaseDescriptor) Loading schema version > 2f555eb0-3332-11e0-9e8d-c4f8bbf76455 > I've been reading the code for migrations and gossip don't have a theory as > to what is going on. > > REQUEST FOR HELP: > If you have a 0.7 cluster can you please check if this has happened so I can > know this is a real problem or just an Aaron problem. You can check by... > - getting the values from the o.a.c.gms.FailureDetector.AllEndPointStates > - running describe_schema_versions via the API, her
Re: How do secondary indices work
One more question: does each node keep an index of their own values, or is the index global? Alexander > Thank you very much, this is the information I was looking for. I started > adding secondary index functionality to Cassandra myself, and it turns out > I am doing almost exactly the same thing. I will try to change my code to > use your implementation as well to compare results. > > Alexander > >> Alexander: >> >> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a >> column >> family, and are kept synchronized with the base data via locking on a >> local >> node, meaning they are always consistent on the local node. Eventual >> consistency still applies between nodes, but a returned result will >> always >> match your query. >> >> This index column family stores a mapping from index values to a sorted >> list >> of matching row keys. When you query for rows between x and y matching a >> value z (via the get_indexed_slices call), Cassandra performs a lookup >> to >> the index column family for the slice of columns in row z between x and >> y. >> If any matches are found in the index, they are row keys that match the >> index clause, and we query the base data to return you those rows. >> >> Iterating through all of the rows matching an index clause on your >> cluster >> is guaranteed to touch N/RF of the nodes in your cluster, because each >> node >> only knows about data that is indexed locally. >> >> Some portions of the indexing implementation are not fully baked yet: >> for >> instance, although the API allows you to specify multiple columns, only >> one >> index will actually be used per query, and the rest of the clauses will >> be >> brute forced. >> >> A second secondary index implementation has been on the back burner for >> a >> while: it provides an identical API, but does not use a column family to >> store the index, and should be more efficient for append only data. See >> https://issues.apache.org/jira/browse/CASSANDRA-1472 >> >> Thanks, >> Stu >> >> On Wed, Feb 9, 2011 at 2:35 AM, wrote: >> >>> Thank you for the links, I did read a bit in the comments of the >>> ticket, >>> but I couldn't get much out of it. >>> >>> I am mainly interested in how the index is stored and partitioned, not >>> how >>> it is used. I think the people in the dev list will probably be better >>> qualified to answer that. My questions always seem to get moved to the >>> user list, and usually with good cause, but I think this time it should >>> be >>> in the dev list :) Please move it back, if you can. >>> >>> Alexander >>> >>> > AFAIK this was the ticket the original work was done under >>> > https://issues.apache.org/jira/browse/CASSANDRA-1415 >>> > >>> > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes >>> > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help >>> > >>> > (sorry on reflection the email prob did not need to be moved from >>> dev, >>> my >>> > bad) >>> > Aaron >>> > >>> > On 09 Feb, 2011,at 09:16 AM, Aaron Morton >>> wrote: >>> > >>> > Moving to the user group. >>> > >>> > >>> > >>> > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: >>> > >>> > Hello, >>> > >>> > I'd like some information about how secondary indices work under the >>> hood. >>> > >>> > 1) Is data stored in some external data structure, or is it stored in >>> an >>> > actual Cassandra table, as columns within column families? >>> > 2) Is data stored sorted or not? How is it partitioned? >>> > 3) How can I access index data? >>> > >>> > Thanks in a advance, >>> > >>> > Alexander Altanis >>> > >>> >> > >
[no subject]
unsubscribe
unsubscribe
unsubscribe
Out of control memory consumption
Hi, There is already an email thread on memory issue on this email list, but I creating a new thread as we are experiencing a different memory consumption issue. We are 12-server cluster. We use random partitioner with manually generated server tokens. Memory usage on one server keeps growing out of control. We ran flush and cleared key and row caches but and ran GC but heap memory usage won't go down. The only way to heap memory usage to go down is the restart cassandra. We have to do this one a day. All other servers have heap memory usage less than 500MB. This issue happened on both Cassandra 0.6.6 and 0.6.11. Our JVM info: java version "1.6.0_21" Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) And JVM memory allocation: -Xms3G -Xmx3G Non-heap memory usage is 138MB. Any recommendation where should look to see why memory usage keep growing? Thanks! Huy
Re: How do secondary indices work
"Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally." On Wed, Feb 9, 2011 at 9:13 AM, wrote: > One more question: does each node keep an index of their own values, or is > the index global? > > Alexander > >> Thank you very much, this is the information I was looking for. I started >> adding secondary index functionality to Cassandra myself, and it turns out >> I am doing almost exactly the same thing. I will try to change my code to >> use your implementation as well to compare results. >> >> Alexander >> >>> Alexander: >>> >>> The secondary indexes in 0.7.0 (type KEYS) are stored internally in a >>> column >>> family, and are kept synchronized with the base data via locking on a >>> local >>> node, meaning they are always consistent on the local node. Eventual >>> consistency still applies between nodes, but a returned result will >>> always >>> match your query. >>> >>> This index column family stores a mapping from index values to a sorted >>> list >>> of matching row keys. When you query for rows between x and y matching a >>> value z (via the get_indexed_slices call), Cassandra performs a lookup >>> to >>> the index column family for the slice of columns in row z between x and >>> y. >>> If any matches are found in the index, they are row keys that match the >>> index clause, and we query the base data to return you those rows. >>> >>> Iterating through all of the rows matching an index clause on your >>> cluster >>> is guaranteed to touch N/RF of the nodes in your cluster, because each >>> node >>> only knows about data that is indexed locally. >>> >>> Some portions of the indexing implementation are not fully baked yet: >>> for >>> instance, although the API allows you to specify multiple columns, only >>> one >>> index will actually be used per query, and the rest of the clauses will >>> be >>> brute forced. >>> >>> A second secondary index implementation has been on the back burner for >>> a >>> while: it provides an identical API, but does not use a column family to >>> store the index, and should be more efficient for append only data. See >>> https://issues.apache.org/jira/browse/CASSANDRA-1472 >>> >>> Thanks, >>> Stu >>> >>> On Wed, Feb 9, 2011 at 2:35 AM, wrote: >>> Thank you for the links, I did read a bit in the comments of the ticket, but I couldn't get much out of it. I am mainly interested in how the index is stored and partitioned, not how it is used. I think the people in the dev list will probably be better qualified to answer that. My questions always seem to get moved to the user list, and usually with good cause, but I think this time it should be in the dev list :) Please move it back, if you can. Alexander > AFAIK this was the ticket the original work was done under > https://issues.apache.org/jira/browse/CASSANDRA-1415 > > also http://www.datastax.com/docs/0.7/data_model/secondary_indexes > and http://pycassa.githubcom/pycassa/tutorial.html#indexes may help > > (sorry on reflection the email prob did not need to be moved from dev, my > bad) > Aaron > > On 09 Feb, 2011,at 09:16 AM, Aaron Morton wrote: > > Moving to the user group. > > > > On 08 Feb, 2011,at 11:39 PM, alta...@ceid.upatras.gr wrote: > > Hello, > > I'd like some information about how secondary indices work under the hood. > > 1) Is data stored in some external data structure, or is it stored in an > actual Cassandra table, as columns within column families? > 2) Is data stored sorted or not? How is it partitioned? > 3) How can I access index data? > > Thanks in a advance, > > Alexander Altanis > >>> >> >> > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Out of control memory consumption
On 02/09/2011 11:15 AM, Huy Le wrote: > There is already an email thread on memory issue on this email list, but I > creating a new thread as we are experiencing a different memory consumption > issue. > > We are 12-server cluster. We use random partitioner with manually generated > server tokens. Memory usage on one server keeps growing out of control. We > ran flush and cleared key and row caches but and ran GC but heap memory > usage won't go down. The only way to heap memory usage to go down is the > restart cassandra. We have to do this one a day. All other servers have > heap memory usage less than 500MB. This issue happened on both Cassandra > 0.6.6 and 0.6.11. > If the heap usages continues to grow an OOM will eventually be thrown. Are you experiencing OOMs on these boxes? If you are not OOMing, then what problem are you experiencing (excessive CPU use garbage collection for one example)? > Our JVM info: > > java version "1.6.0_21" > Java(TM) SE Runtime Environment (build 1.6.0_21-b06) > Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) > > And JVM memory allocation: -Xms3G -Xmx3G > > Non-heap memory usage is 138MB. > > Any recommendation where should look to see why memory usage keep growing? > > Thanks! > > Huy Are you using standard, mmap_index_only, or mmap io? Are you using JNA?
Re: Anyone want to help out with http://wiki.apache.org/cassandra/MavenPlugin
oh you might have to check out and install mojo-sandbox-parent (a sibling svn url) sandbox projects are not allowed to deploy releases... the vote on dev@mojo will promote from sandbox and release in one vote 32 h to go - Stephen --- Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen On 9 Feb 2011 16:35, "Nate McCall" wrote: > Stephen, > I get an error regarding a non-resolvable parent pom. Is there any > local additional local configuration or parameters that should be > passed with the install phase? > > I'd be happy to look at this over the next several days as it would > make the Hector integration testing setup and tear down much easier. > > -Nate > > On Wed, Feb 9, 2011 at 5:41 AM, Stephen Connolly > wrote: >> Until the release vote passes at mojo, you will need to do the >> following to follow the example: >> >> svn co https://svn.codehaus.org/mojo/trunk/sandbox/cassandra-maven-plugin >> cd cassandra-maven-plugin >> mvn install >> cd .. >> >> Otherwise the example should be fine. >> >> It's a wiki page, so I'm hoping that people can make the example a bit >> better... specifically some hector people might be able to put in >> actual example code for accessing cassandra from the index.jsp. >> >> -Stephen >>
Re: Out of control memory consumption
> We are 12-server cluster. We use random partitioner with manually generated > server tokens. Memory usage on one server keeps growing out of control. We > ran flush and cleared key and row caches but and ran GC but heap memory > usage won't go down. The only way to heap memory usage to go down is the > restart cassandra. We have to do this one a day. All other servers have > heap memory usage less than 500MB. This issue happened on both Cassandra > 0.6.6 and 0.6.11. To be clear: You are not talking about the size of the Java process in top, but the actual amount of heap used as reported by the JVM via jmx/jconsole/etc? Is the memory amount of memory that you consider high, the heap size just after a concurrent mark/sweep? Are you actually seeing OOM:s or are you restarting the node pre-emptively in response to seeing heap usage go up? > And JVM memory allocation: -Xms3G -Xmx3G Just FYI: So it is entirely expected that the JVM will be 3G (a bit higher) in size (even with standard I/O) and further that the amount of live data in the heap be approaching 3G. The concurrent mark/sweep GC won't trigger until the initial occupancy reaches the limit (if modern Cassandra with default settings). If you've got a 3 gig heap size and the other nodes stay at 500 mb, the question is why *don't* they increase in heap usage. Unless your 500 mb is the report of the actual live data set as evidenced by post-CMS heap usage. -- / Peter Schuller
Re: Out of control memory consumption
(If you're looking at e.g. jconsole graphs a screenshot of the graph would not hurt.) -- / Peter Schuller
Specifying row caching on per query basis ?
Is there any way to specify on per query basis(like we specify the Consistency level), what rows be cached while you're reading them, from a row_cache enabled CF. I believe, this could lead to much more efficient use of the cache space!!( if you use same data for different features/ parts in your application which have different caching needs).
Re: Do supercolumns have a purpose?
On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote: > Shaun, I agree with you, but marking them as deprecated is not good enough > for me. I can't easily stop using supercolumns. I need an upgrade path. > David, Cassandra is open source and community developed. The right thing to do is what's best for the community, which sometimes conflicts with what's best for individual users. Such strife should be minimized, it will never be eliminated. Luckily, because this is an open source, liberal licensed project, if you feel strongly about something you should feel free to add whatever features you want yourself. I'm sure other people in your situation will thank you for it. At a minimum I think it would behoove you to re-read some of the comments here re: why super columns aren't really needed and take another look at your data model and code. I would actually be quite surprised to find a use of super columns that could not be trivially converted to normal columns. In fact, it should be possible to do at the framework/client library layer - you probably wouldn't even need to change any application code. Mike On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: > >> >> I'm a newbie here, but, with apologies for my presumptuousness, I think >> you should deprecate SuperColumns. They are already distracting you, and as >> the years go by the cost of supporting them as you add more and more >> functionality is only likely to get worse. It would be better to concentrate >> on making the "core" column families better (and I'm sure we can all think >> of lots of things we'd like). >> >> Just dropping SuperColumns would be bad for your reputation -- and for >> users like David who are currently using them. But if you mark them clearly >> as deprecated and explain why and what to do instead (perhaps putting a bit >> of effort into migration tools... or even a "virtual" layer supporting >> arbitrary hierarchical data), then you can drop them in a few years (when >> you get to 1.0, say), without people feeling betrayed. >> >> -- Shaun >> >> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >> >> "My main point was to say that it's think it is better to create tickets >> for what you want, rather than for something else completely different that >> would, as a by-product, give you what you want." >> >> Then let me say what I want: I want supercolumn families to have any >> feature that regular column families have. >> >> My data model is full of supercolumns. I used them, even though I knew it >> didn't *have to*, "because they were there", which implied to me that I was >> supposed to use them for some good reason. Now I suspect that they will >> gradually become less and less functional, as features are added to regular >> column families and not supported for supercolumn families. >> >> >> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne >> wrote: >> >>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: >>> On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne wrote: > On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: > >> The advantage would be to enable secondary indexes on supercolumn >> families. >> > > Then I suggest opening a ticket for adding secondary indexes to > supercolumn families and voting on it. This will be 1 or 2 order of > magnitude less work than getting rid of super column internally, and > probably a much better solution anyway. > I realize that this is largely subjective, and on such matters code speaks louder than words, but I don't think I agree with you on the issue of which alternative is less work, or even which is a better solution. >>> >>> You are right, I put probably too much emphase in that sentence. My main >>> point was to say that it's think it is better to create tickets for what you >>> want, rather than for something else completely different that would, as a >>> by-product, give you what you want. >>> Then I suspect that *if* the only goal is to get secondary indexes on >>> super columns, then there is a good chance this would be less work than >>> getting rid of super columns. But to be fair, secondary indexes on super >>> columns may not make too much sense without #598, which itself would require >>> quite some work, so clearly I spoke a bit quickly. >>> >>> If the goal is to have a hierarchical model, limiting the depth to two seems arbitrary. Why not go all the way and allow an arbitrarily deep hierarchy? If a more sophisticated hierarchical model is deemed unnecessary, or impractical, allowing a depth of two seems inconsistent and unnecessary. It's pretty trivial to overlay a hierarchical model on top of the map-of-sorted-maps model that Cassandra implements. Ed Anuff has implemented a custom comparator that does the job [1]. Google's Megastore has a similar architecture and goes even further [2]. It seems to me t
Re: Out of control memory consumption
> > If the heap usages continues to grow an OOM will eventually be thrown. > Are you experiencing OOMs on these boxes? If you are not OOMing, then > what problem are you experiencing (excessive CPU use garbage collection > for one example)? > > > No OOM. The JVM just too busy doing GC when the used heap size is big making this node unresponsive to its peers on the cluster. > > > Our JVM info: > > > > java version "1.6.0_21" > > Java(TM) SE Runtime Environment (build 1.6.0_21-b06) > > Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) > > > > And JVM memory allocation: -Xms3G -Xmx3G > > > > Non-heap memory usage is 138MB. > > > > Any recommendation where should look to see why memory usage keep > growing? > > > > Thanks! > > > > Huy > > Are you using standard, mmap_index_only, or mmap io? Are you using JNA? > > We use standard disk access mode with JNA. Huy -- Huy Le Spring Partners, Inc. http://springpadit.com
Re: Do supercolumns have a purpose?
I still think super-columns are useful you just need to be aware of the limitations... Bye, Norman 2011/2/9 Mike Malone : > On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote: >> >> Shaun, I agree with you, but marking them as deprecated is not good enough >> for me. I can't easily stop using supercolumns. I need an upgrade path. > > David, > Cassandra is open source and community developed. The right thing to do is > what's best for the community, which sometimes conflicts with what's best > for individual users. Such strife should be minimized, it will never be > eliminated. Luckily, because this is an open source, liberal licensed > project, if you feel strongly about something you should feel free to add > whatever features you want yourself. I'm sure other people in your situation > will thank you for it. > At a minimum I think it would behoove you to re-read some of the comments > here re: why super columns aren't really needed and take another look at > your data model and code. I would actually be quite surprised to find a use > of super columns that could not be trivially converted to normal columns. In > fact, it should be possible to do at the framework/client library layer - > you probably wouldn't even need to change any application code. > Mike >> >> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: >>> >>> I'm a newbie here, but, with apologies for my presumptuousness, I think >>> you should deprecate SuperColumns. They are already distracting you, and as >>> the years go by the cost of supporting them as you add more and more >>> functionality is only likely to get worse. It would be better to concentrate >>> on making the "core" column families better (and I'm sure we can all think >>> of lots of things we'd like). >>> Just dropping SuperColumns would be bad for your reputation -- and for >>> users like David who are currently using them. But if you mark them clearly >>> as deprecated and explain why and what to do instead (perhaps putting a bit >>> of effort into migration tools... or even a "virtual" layer supporting >>> arbitrary hierarchical data), then you can drop them in a few years (when >>> you get to 1.0, say), without people feeling betrayed. >>> >>> -- Shaun >>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >>> >>> "My main point was to say that it's think it is better to create tickets >>> for what you want, rather than for something else completely different that >>> would, as a by-product, give you what you want." >>> >>> Then let me say what I want: I want supercolumn families to have any >>> feature that regular column families have. >>> >>> My data model is full of supercolumns. I used them, even though I knew it >>> didn't *have to*, "because they were there", which implied to me that I was >>> supposed to use them for some good reason. Now I suspect that they will >>> gradually become less and less functional, as features are added to regular >>> column families and not supported for supercolumn families. >>> >>> >>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne >>> wrote: On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: > > On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne > wrote: >> >> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn >> wrote: >>> >>> The advantage would be to enable secondary indexes on supercolumn >>> families. >> >> Then I suggest opening a ticket for adding secondary indexes to >> supercolumn families and voting on it. This will be 1 or 2 order of >> magnitude less work than getting rid of super column internally, and >> probably a much better solution anyway. > > I realize that this is largely subjective, and on such matters code > speaks louder than words, but I don't think I agree with you on the issue > of > which alternative is less work, or even which is a better solution. You are right, I put probably too much emphase in that sentence. My main point was to say that it's think it is better to create tickets for what you want, rather than for something else completely different that would, as a by-product, give you what you want. Then I suspect that *if* the only goal is to get secondary indexes on super columns, then there is a good chance this would be less work than getting rid of super columns. But to be fair, secondary indexes on super columns may not make too much sense without #598, which itself would require quite some work, so clearly I spoke a bit quickly. > > If the goal is to have a hierarchical model, limiting the depth to two > seems arbitrary. Why not go all the way and allow an arbitrarily deep > hierarchy? > If a more sophisticated hierarchical model is deemed unnecessary, or > impractical, allowing a depth of two seems inconsistent and > unnecessary. It's pretty trivial to overlay a hierarchical model on top of > the m
Re: Out of control memory consumption
> > To be clear: You are not talking about the size of the Java process in > top, but the actual amount of heap used as reported by the JVM via > jmx/jconsole/etc? > > This is memory usage shows in JMX that we are talking about. > Is the memory amount of memory that you consider high, the heap size > just after a concurrent mark/sweep? > > Memory usage grows overtime. > Are you actually seeing OOM:s or are you restarting the node > pre-emptively in response to seeing heap usage go up? > > > No OOM. We pre-emptively restart it before it become unresponsive due to GC. > > And JVM memory allocation: -Xms3G -Xmx3G > > Just FYI: So it is entirely expected that the JVM will be 3G (a bit > higher) in size (even with standard I/O) and further that the amount > of live data in the heap be approaching 3G. The concurrent mark/sweep > GC won't trigger until the initial occupancy reaches the limit (if > modern Cassandra with default settings). > > Our CMS settings are: -XX:CMSInitiatingOccupancyFraction=35 \ -XX:+UseCMSInitiatingOccupancyOnly \ > If you've got a 3 gig heap size and the other nodes stay at 500 mb, > the question is why *don't* they increase in heap usage. Unless your > 500 mb is the report of the actual live data set as evidenced by > post-CMS heap usage. > > What's considered to be "live data"? If we clear caches, run flush on the key space, shouldn't that free up memory? Thanks! Huy > -- > / Peter Schuller > -- Huy Le Spring Partners, Inc. http://springpadit.com
Re: Using Cassandra-cli
"help update column family"? On Wed, Feb 9, 2011 at 1:15 PM, Eranda Sooriyabandara <0704...@gmail.com> wrote: > Hi Vishan, Aron and all, > > Thanks for the help. I tried it and successfully worked for me. > But I could not find a place where mention about the attributes of some > commands. > > e.g. > update column family [with = [and = ...]]; > create keyspace [with = [and = ...]]; > (we can use comparator=UTF8Type and default_validation_class=UTF8Type as > changed attributes) > > Is there any documentaries which mentioned about those applicable attributes > in each case? > > thanks > Eranda > > P.S. I put a blog post on Cassandra-cli in > http://emsooriyabandara.blogspot.com/ please correct me if I am got it wrong > in any place > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Specifying row caching on per query basis ?
Currently there is not. On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew wrote: > Is there any way to specify on per query basis(like we specify the > Consistency level), what rows be cached while you're reading them, > from a row_cache enabled CF. I believe, this could lead to much more > efficient use of the cache space!!( if you use same data for different > features/ parts in your application which have different caching > needs). > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Out of control memory consumption
On Wed, Feb 9, 2011 at 11:04 AM, Huy Le wrote: > Memory usage grows overtime. It is relatively typical for caches to exert memory pressure over time as they fill. What are your cache settings, for how many columnfamilies, and with what sized memtables? What version of Cassandra? =Rob
Re: Specifying row caching on per query basis ?
Is this under consideration for future releases ? or being thought about!? On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis wrote: > Currently there is not. > > On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew wrote: >> Is there any way to specify on per query basis(like we specify the >> Consistency level), what rows be cached while you're reading them, >> from a row_cache enabled CF. I believe, this could lead to much more >> efficient use of the cache space!!( if you use same data for different >> features/ parts in your application which have different caching >> needs). >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >
Re: Specifying row caching on per query basis ?
Not really, no. If you can't trust LRU to cache the hottest rows perhaps you should split the data into different ColumnFamilies. On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew wrote: > Is this under consideration for future releases ? or being thought about!? > > > > On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis wrote: >> Currently there is not. >> >> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew wrote: >>> Is there any way to specify on per query basis(like we specify the >>> Consistency level), what rows be cached while you're reading them, >>> from a row_cache enabled CF. I believe, this could lead to much more >>> efficient use of the cache space!!( if you use same data for different >>> features/ parts in your application which have different caching >>> needs). >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com
Re: Specifying row caching on per query basis ?
On Wed, Feb 9, 2011 at 2:43 PM, Ertio Lew wrote: > Is this under consideration for future releases ? or being thought about!? > > > > On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis wrote: >> Currently there is not. >> >> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew wrote: >>> Is there any way to specify on per query basis(like we specify the >>> Consistency level), what rows be cached while you're reading them, >>> from a row_cache enabled CF. I believe, this could lead to much more >>> efficient use of the cache space!!( if you use same data for different >>> features/ parts in your application which have different caching >>> needs). >>> >> >> >> >> -- >> Jonathan Ellis >> Project Chair, Apache Cassandra >> co-founder of DataStax, the source for professional Cassandra support >> http://www.datastax.com >> > I have mentioned a suggested implemented inside this issue. https://issues.apache.org/jira/browse/CASSANDRA-2035
Exceptions on 0.7.0
I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X On 3 out of the 4 nodes I get exceptions in the log. I am using RP. Changes that I did: 1. changed the replication factor from 3 to 4 2. configured the nodes to use Dynamic Snitch 3. RR of 0.33 I run repair on 2 nodes before I noticed the errors. One of them is having the first error and the other the second. I restart the nodes but I still get the exceptions. The following Exception I get from 2 nodes: WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java (line 84) Cannot provide an optimal Bloom Filter for 1986622313 elements (1/4 buckets per element). ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:105) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:34) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:604) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:76) at org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50) at org.apache.cassandra.io.LazilyCompactedRow.(LazilyCompactedRow.java:88) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:136) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:76) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:101) ... 29 more On another node I get: ERROR [pool-1-thread-2] 2011-02-09 19:48:32,137 Cassandra.java (line 2876) Internal error processing get_range_ slices java.lang.RuntimeException: error reading 1 of 1970563183 at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:82) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:39) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:108) at org.apache.commons.collections.iterators.CollatingIterator.anyHasNext(CollatingIterator.java:364) at org.apache.commons.collections.iterators.CollatingIterator.hasNext(CollatingIterat
Re: Specifying row caching on per query basis ?
Jonathan, what if the data is really homogeneous, but over a long period of time. I decided that the users who hit the database for recent past should have a better ride. Splitting into a separate CF also has costs, right? In fact, if I were to go this way, do you think I can crank down the key caches? If yes, down to what level, zero? Thanks! Jonathan Ellis-3 wrote: > > Not really, no. If you can't trust LRU to cache the hottest rows > perhaps you should split the data into different ColumnFamilies. > > On Wed, Feb 9, 2011 at 1:43 PM, Ertio Lew wrote: >> Is this under consideration for future releases ? or being thought >> about!? >> >> >> >> On Thu, Feb 10, 2011 at 12:56 AM, Jonathan Ellis >> wrote: >>> Currently there is not. >>> >>> On Wed, Feb 9, 2011 at 12:04 PM, Ertio Lew wrote: Is there any way to specify on per query basis(like we specify the Consistency level), what rows be cached while you're reading them, from a row_cache enabled CF. I believe, this could lead to much more efficient use of the cache space!!( if you use same data for different features/ parts in your application which have different caching needs). >>> >>> >>> >>> -- >>> Jonathan Ellis >>> Project Chair, Apache Cassandra >>> co-founder of DataStax, the source for professional Cassandra support >>> http://www.datastax.com >>> >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com > > -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Specifying-row-caching-on-per-query-basis-tp6008838p6009462.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: read latency in cassandra
On Fri, Feb 4, 2011 at 11:13 AM, Dan Kuebrich wrote: > Is 2 seconds the normal "I went to disk" latency for cassandra? Cassandra exposes metrics on a per-CF basis which indicate latency. This includes both cache hits and misses, as well as requests for rows which do not exist. It does NOT include an assortment of other latency causing things, like thrift. If you see two seconds of latency from the perspective of your application, you should compare it to the latency numbers Cassandra reports. If you are getting timed-out exceptions, that does seem relatively likely to be the cold cache "I went to disk" case, and the Cassandra latency numbers should reflect that. =Rob
Default Listen Port
What's the easiest way to change the port nodes listen for comm on from other nodes? It appears that the default is 8080 which collides with my tomcat server on one of our dev boxes. I tried doing something in cassandra.yaml like listen_address: 192.1.fake.2: but that doesn't work it throws an exception. Also can you not put the actual name of servers in the config or does it always have to be the actual ip address currently? Thanks. jt ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___
Re: Default Listen Port
On 02/09/2011 04:00 PM, jeremy.truel...@barclayscapital.com wrote: > What's the easiest way to change the port nodes listen for comm on > from other nodes? It appears that the default is 8080 which collides > with my tomcat server on one of our dev boxes. I tried doing > something in cassandra.yaml like > > listen_address: 192.1.fake.2: > > but that doesn't work it throws an exception. Also can you not put > the actual name of servers in the config or does it always have to be > the actual ip address currently? Thanks. > 8080 is used by jmx [1]. You can change that in cassandra-env.sh. hostnames are allowed. [1] http://wiki.apache.org/cassandra/FAQ#ports
RE: Default Listen Port
Thanks for the heads up that worked. -Original Message- From: Chris Burroughs [mailto:chris.burrou...@gmail.com] Sent: Wednesday, February 09, 2011 4:04 PM To: user@cassandra.apache.org Cc: Truelove, Jeremy: IT (NYK) Subject: Re: Default Listen Port On 02/09/2011 04:00 PM, jeremy.truel...@barclayscapital.com wrote: > What's the easiest way to change the port nodes listen for comm on > from other nodes? It appears that the default is 8080 which collides > with my tomcat server on one of our dev boxes. I tried doing > something in cassandra.yaml like > > listen_address: 192.1.fake.2: > > but that doesn't work it throws an exception. Also can you not put > the actual name of servers in the config or does it always have to be > the actual ip address currently? Thanks. > 8080 is used by jmx [1]. You can change that in cassandra-env.sh. hostnames are allowed. [1] http://wiki.apache.org/cassandra/FAQ#ports ___ This e-mail may contain information that is confidential, privileged or otherwise protected from disclosure. If you are not an intended recipient of this e-mail, do not duplicate or redistribute it by any means. Please delete it and any attachments and notify the sender that you have received it in error. Unless specifically indicated, this e-mail is not an offer to buy or sell or a solicitation to buy or sell any securities, investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Barclays. Any views or opinions presented are solely those of the author and do not necessarily represent those of Barclays. This e-mail is subject to terms available at the following link: www.barcap.com/emaildisclaimer. By messaging with Barclays you consent to the foregoing. Barclays Capital is the investment banking division of Barclays Bank PLC, a company registered in England (number 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. ___
Re: Default Listen Port
On Wed, Feb 9, 2011 at 4:00 PM, wrote: > What’s the easiest way to change the port nodes listen for comm on from > other nodes? It appears that the default is 8080 which collides with my > tomcat server on one of our dev boxes. I tried doing something in > cassandra.yaml like > > > > listen_address: 192.1.fake.2: > > > > but that doesn’t work it throws an exception. Also can you not put the > actual name of servers in the config or does it always have to be the actual > ip address currently? Thanks. > > > > jt > > > > > > ___ > > > > This e-mail may contain information that is confidential, privileged or > otherwise protected from disclosure. If you are not an intended recipient of > this e-mail, do not duplicate or redistribute it by any means. Please delete > it and any attachments and notify the sender that you have received it in > error. Unless specifically indicated, this e-mail is not an offer to buy or > sell or a solicitation to buy or sell any securities, investment products or > other financial product or service, an official confirmation of any > transaction, or an official statement of Barclays. Any views or opinions > presented are solely those of the author and do not necessarily represent > those of Barclays. This e-mail is subject to terms available at the > following link: www.barcap.com/emaildisclaimer. By messaging with Barclays > you consent to the foregoing. Barclays Capital is the investment banking > division of Barclays Bank PLC, a company registered in England (number > 1026167) with its registered office at 1 Churchill Place, London, E14 5HP. > This email may relate to or be sent from other members of the Barclays > Group. > > ___ You are having a collision on 8080 which is the default JMX port. In conf/cassandra-env.sh look for JMX_PORT="8080" 9160 is the thrift port used by clients 7000 is the storage port (used between nodes) If you change the jmx port you have specify it when using nodetool, 'nodetool -h localhost -p ring'
Re: Do supercolumns have a purpose?
On Thu, 2011-02-03 at 15:35 -0800, Mike Malone wrote: > In my dealings with the Cassandra code, super columns end up making a > mess all over the place when algorithms need to be special cased and > branch based on the column/supercolumn distinction. > > > I won't even mention what it does to the thrift interface. My observation is similar, in that they (SCFs) make the "type system" in Cassandra disjoint. This makes me doubt that moving to Avro would simplify anything for Cassandra users. It also means knock-on effects such as no common supertype in APIs for languages like Java (so the surface area of clients like Hector blow up badly when you compare it the HBase client). I can't wait to see how CQL fares with SCFs; a sane query language will be closed under its operations and I doubt it can be done atm. That said, I keep finding uses for them, which is irksome; but maybe I'm being lazy when it comes to modelling and now that secondary indexes are in, I should pretend SCFs don't exist. Bill
What will happen if I try to compact with insufficient headroom?
One of my nodes is 76% full. I know that one of CFs represents 90% of the data, others are really minor. Can I still compact under these conditions? Will it crash and lose the data? Will it try to create one very large file out of fragments, for that dominating CF? TIA -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-will-happen-if-I-try-to-compact-with-insufficient-headroom-tp6009619p6009619.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: ApplicationState Schema has drifted from DatabaseDescriptor
Thanks Gary. I'll keep an eye on things and see if it happens again. From reading the code I'm wondering if there is a small chance of a race condition in HintedHandoffManager.waitForSchemaAgreement() .Could the following happen? I'm a little unsure on exactly how the endpoint state is removed from the map in Gossiper.1) node 1 starts2) Gossiper calls StorageService.onAlive() when the endpoints are detected as alive.3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandoff TP4) This happens several times, and node 1 gets busy delivering hints but there is only 1 thread in the thread pool.5) Node n is removed from the cluster and the endpoint state is deleted in the Gossiper on node 1 6) Node 1 gets around to processing the hints for node n and Gossiper.getEndpointStateForEndpoint() returns null for node nThanksAaronOn 10 Feb, 2011,at 03:03 AM, Gary Dusbabek wrote:Aaron, It looks like you're experiencing a side-effect of CASSANDRA-2083. There was at least one place (when node B received updated schema from node A) where gossip was not being updated with the correct schema even though DatabaseDescriptor had the right version. I'm pretty sure this is what you're seeing. Gary. On Wed, Feb 9, 2011 at 00:08, Aaron Mortonwrote: > I noticed this after I upgraded one node in a 0.7 cluster of 5 to the latest > stable 0.7 build "2011-02-08_20-41-25" (upgraded node was jb-cass1 below). > This is a long email, you can jump to the end and help me out by checking > something on your 07 cluster. > This is the value from o.a.c.gms.FailureDetector.AllEndpointStates on > jb-cass05 9114.67) > /192.168.114.63 X3:2011-02-08_20-41-25 > SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455 LOAD:2.84182972E8 > STATUS:NORMAL,0 > /192.168.114.64 SCHEMA:2f555eb0-3332-11e0-9e8d-c4f8bbf76455 > LOAD:2.84354156E8 STATUS:NORMAL,34028236692093846346337460743176821145 > /192.168.114.66 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 > LOAD:2.59171601E8 STATUS:NORMAL,102084710076281539039012382229530463435 > /192.168.114.65 SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 > LOAD:2.70907168E8 STATUS:NORMAL,68056473384187692692674921486353642290 > jb08.wetafx.co.nz/192.168.114.67 > SCHEMA:075cbd1f-3316-11e0-9e8d-c4f8bbf76455 LOAD:1.155260665E9 > STATUS:NORMAL,136112946768375385385349842972707284580 > Notice the schema for nodes 63 and 64 starts with 2f55 and for 65, 66 and 67 > it starts with 075. > This is the output from pycassa calling describe_versions when connected to > both the 63 (jb-cass1) and 67 (jb-cass5) nodes > In [34]: sys.describe_schema_versions() > Out[34]: > {'2f555eb0-3332-11e0-9e8d-c4f8bbf76455': ['192.168.114.63', > '192.168.114.64', > '192.168.114.65', > '192.168.114.66', > '192.168.114.67']} > It's reporting all nodes on the 2f55 schema. The SchemaCheckVerbHandler is > getting the value from DatabaseDescriptor. FailureDetector MBean is getting > them from Gossiper.endpointStateMap . Requests are working though, so the > CFid's must be matching up. > Commit https://github.com/apache/cassandra/commit/ecbd71f6b4bb004d26e585ca8a7e642436a5c1a4 added > code to the 0.7 branch in the HintedHandOffManager to check the schema > versions of nodes it has hints for. This is now failing on the new node as > follows... > ERROR [HintedHandoff:1] 2011-02-09 16:11:23,559 AbstractCassandraDaemon.java > (line > org.apache.cassandra.service.AbstractCassandraDaemon$1.uncaughtException(AbstractCassandraDaemon.java:114)) > Fatal exception in thread Thread[HintedHandoff:1,1,main] > java.lang.RuntimeException: java.lang.RuntimeException: Could not reach > schema agreement with /192.168.114.64 in 6ms > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > Caused by: java.lang.RuntimeException: Could not reach schema agreement with > /192.168114.64 in 6ms > at > org.apache.cassandra.db.HintedHandOffManager.waitForSchemaAgreement(HintedHandOffManager.java:256) > at > org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:267) > at > org.apache.cassandra.db.HintedHandOffManager.access$100(HintedHandOffManager.java:88) > at > org.apachecassandra.db.HintedHandOffManager$2.runMayThrow(HintedHandOffManager.java:391) > at > org.apachecassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) > ... 3 more > (the nodes can all see each other, checked with notetool during the 60 > seconds) > If I restart one of the nodes with the 075 schema (without upgrading it) it > reads
Re: ApplicationState Schema has drifted from DatabaseDescriptor
On Wed, Feb 9, 2011 at 4:31 PM, Aaron Morton wrote: > Thanks Gary. I'll keep an eye on things and see if it happens again. > > From reading the code I'm wondering if there is a small chance of a race > condition in HintedHandoffManager.waitForSchemaAgreement() . > > Could the following happen? I'm a little unsure on exactly how the endpoint > state is removed from the map in Gossiper. > > 1) node 1 starts > 2) Gossiper calls StorageService.onAlive() when the endpoints are detected > as alive. > 3) HintedHandoffManager.deliverHints() adds a runnable to the HintedHandoff > TP > 4) This happens several times, and node 1 gets busy delivering hints but > there is only 1 thread in the thread pool. > 5) Node n is removed from the cluster and the endpoint state is deleted in > the Gossiper on node 1 > 6) Node 1 gets around to processing the hints for node n and > Gossiper.getEndpointStateForEndpoint() returns null for node n > Yes, this is currently possible, but you have to decommission the node before the schema check/sleep portion of HH is over, which is unlikely in practice. It will be especially unlikely after https://issues.apache.org/jira/browse/CASSANDRA-2115. -Brandon
RE: Exceptions on 0.7.0
Out of curiosity, do you really have on the order of 1,986,622,313 elements (I believe elements=keys) in the cf? Dan From: shimi [mailto:shim...@gmail.com] Sent: February-09-11 15:06 To: user@cassandra.apache.org Subject: Exceptions on 0.7.0 I have a 4 node test cluster were I test the port to 0.7.0 from 0.6.X On 3 out of the 4 nodes I get exceptions in the log. I am using RP. Changes that I did: 1. changed the replication factor from 3 to 4 2. configured the nodes to use Dynamic Snitch 3. RR of 0.33 I run repair on 2 nodes before I noticed the errors. One of them is having the first error and the other the second. I restart the nodes but I still get the exceptions. The following Exception I get from 2 nodes: WARN [CompactionExecutor:1] 2011-02-09 19:50:51,281 BloomFilter.java (line 84) Cannot provide an optimal Bloom Filter for 1986622313 elements (1/4 buckets per element). ERROR [CompactionExecutor:1] 2011-02-09 19:51:10,190 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,main] java.io.IOError: java.io.EOFException at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentity Iterator.java:105) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentity Iterator.java:34) at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter ator.java:284) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt erator.java:326) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte rator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at com.google.common.collect.Iterators$7.computeNext(Iterators.java:604) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:7 6) at org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50) at org.apache.cassandra.io.LazilyCompactedRow.(LazilyCompactedRow.java:88 ) at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterato r.java:136) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav a:107) at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.jav a:42) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav a:73) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131 ) at org.apache.commons.collections.iterators.FilterIterator.setNextObject(Filter Iterator.java:183) at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterat or.java:94) at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.jav a:323) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.EOFException at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383) at org.apache.cassandra.utils.FBUtilities.readByteArray(FBUtilities.java:280) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:7 6) at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:3 5) at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentity Iterator.java:101) ... 29 more On another node I get: ERROR [pool-1-thread-2] 2011-02-09 19:48:32,137 Cassandra.java (line 2876) Internal error processing get_range_ slices java.lang.RuntimeException: error reading 1 of 1970563183 at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleS liceReader.java:82) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleS liceReader.java:39) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator .java:136) at com.google.common.collect.AbstractIt
Re: unsubscribe
unsubscribe
Re: unsubscribe
instructions are herehttp://wiki.apache.org/cassandra/FAQ#unsubscribeOn 10 Feb, 2011,at 02:38 PM, Chance Li wrote:unsubscribe
Re: Using Cassandra-cli
Hi all, Thanks Jonathan and Eric, you both describes what I want. Now I am looking forward to play with them. thanks Eranda
Re: time to live rows
AFAIK 2nd index only works for operator EQ. -邮件原件- 发件人: Kallin Nagelberg [mailto:kallin.nagelb...@gmail.com] 发送时间: 2011年2月9日 3:36 收件人: user@cassandra.apache.org 主题: Re: time to live rows I'm thinking if this row expiry notion doesn't pan out then I might create a 'lastAccessed' column with a secondary index (i think that's right) on it. Then I can periodically run a query to find all lastAccessed columns less than a certain value and manually delete them. Sound reasonable? -Kal
Re: Row Key Types
Did you set compare_with attribute of your ColumnFamily to TimeUUIDType? -邮件原件- 发件人: Bill Speirs [mailto:bill.spe...@gmail.com] 发送时间: 2011年2月2日 0:47 收件人: Cassandra Usergroup 主题: Row Key Types What is the type of a Row Key? Can you define how they are compared? I ask because I'm using TimeUUIDs as my row keys, but when I make a call to get a range of row keys (get_range in phpcassa) I have to specify the UTF8 range of '' to '----' instead of the TimeUUID range of '----' to '----'. This works, but feels wrong/inefficient... thoughts? Thanks... Bill-
RE: Do supercolumns have a purpose?
SCFs are very useful and I hope lives forever. We need them! Best regards/ Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063 Fax: +370 5 261 0453 Konstitucijos pr. 23, LT-08105 Vilnius, Lithuania Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the interested recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete or destroy this message and any copies.-Original Message- From: norman.mau...@googlemail.com [mailto:norman.mau...@googlemail.com] On Behalf Of Norman Maurer Sent: Wednesday, February 09, 2011 20:59 To: user@cassandra.apache.org Subject: Re: Do supercolumns have a purpose? I still think super-columns are useful you just need to be aware of the limitations... Bye, Norman 2011/2/9 Mike Malone : > On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote: >> >> Shaun, I agree with you, but marking them as deprecated is not good enough >> for me. I can't easily stop using supercolumns. I need an upgrade path. > > David, > Cassandra is open source and community developed. The right thing to do is > what's best for the community, which sometimes conflicts with what's best > for individual users. Such strife should be minimized, it will never be > eliminated. Luckily, because this is an open source, liberal licensed > project, if you feel strongly about something you should feel free to add > whatever features you want yourself. I'm sure other people in your situation > will thank you for it. > At a minimum I think it would behoove you to re-read some of the comments > here re: why super columns aren't really needed and take another look at > your data model and code. I would actually be quite surprised to find a use > of super columns that could not be trivially converted to normal columns. In > fact, it should be possible to do at the framework/client library layer - > you probably wouldn't even need to change any application code. > Mike >> >> On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: >>> >>> I'm a newbie here, but, with apologies for my presumptuousness, I think >>> you should deprecate SuperColumns. They are already distracting you, and as >>> the years go by the cost of supporting them as you add more and more >>> functionality is only likely to get worse. It would be better to concentrate >>> on making the "core" column families better (and I'm sure we can all think >>> of lots of things we'd like). >>> Just dropping SuperColumns would be bad for your reputation -- and for >>> users like David who are currently using them. But if you mark them clearly >>> as deprecated and explain why and what to do instead (perhaps putting a bit >>> of effort into migration tools... or even a "virtual" layer supporting >>> arbitrary hierarchical data), then you can drop them in a few years (when >>> you get to 1.0, say), without people feeling betrayed. >>> >>> -- Shaun >>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >>> >>> "My main point was to say that it's think it is better to create tickets >>> for what you want, rather than for something else completely different that >>> would, as a by-product, give you what you want." >>> >>> Then let me say what I want: I want supercolumn families to have any >>> feature that regular column families have. >>> >>> My data model is full of supercolumns. I used them, even though I knew it >>> didn't *have to*, "because they were there", which implied to me that I was >>> supposed to use them for some good reason. Now I suspect that they will >>> gradually become less and less functional, as features are added to regular >>> column families and not supported for supercolumn families. >>> >>> >>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne >>> wrote: On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: > > On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne > wrote: >> >> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn >> wrote: >>> >>> The advantage would be to enable secondary indexes on supercolumn >>> families. >> >> Then I suggest opening a ticket for adding secondary indexes to >> supercolumn families and voting on it. This will be 1 or 2 order of >> magnitude less work than getting rid of super column internally, and >> probably a much better solution anyway. > > I realize that this is largely subjective, and on such matters code > speaks louder than words, but I don't think I agree with you on the issue > of > which alternative is less work, or even which is a better solution. You are right, I put probably too much emphase in that sentence
Re: Do supercolumns have a purpose?
Mike, my problem is that I have an database and codebase that already uses supercolumns. If I had to do it over, it wouldn't use them, for the reasons you point out. In fact, I have a feeling that over time supercolumns will become deprecated de facto, if not de jure. That's why I would like to see them represented internally as regular columns, with an upgrade path for backward compatibility. I would love to do it myself! (I haven't looked at the code base, but I don't understand why it should be so hard.) But my employer has other ideas... On Wed, Feb 9, 2011 at 8:14 PM, Mike Malone wrote: > On Tue, Feb 8, 2011 at 2:03 AM, David Boxenhorn wrote: > >> Shaun, I agree with you, but marking them as deprecated is not good enough >> for me. I can't easily stop using supercolumns. I need an upgrade path. >> > > David, > > Cassandra is open source and community developed. The right thing to do is > what's best for the community, which sometimes conflicts with what's best > for individual users. Such strife should be minimized, it will never be > eliminated. Luckily, because this is an open source, liberal licensed > project, if you feel strongly about something you should feel free to add > whatever features you want yourself. I'm sure other people in your situation > will thank you for it. > > At a minimum I think it would behoove you to re-read some of the comments > here re: why super columns aren't really needed and take another look at > your data model and code. I would actually be quite surprised to find a use > of super columns that could not be trivially converted to normal columns. In > fact, it should be possible to do at the framework/client library layer - > you probably wouldn't even need to change any application code. > > Mike > > On Tue, Feb 8, 2011 at 3:53 AM, Shaun Cutts wrote: >> >>> >>> I'm a newbie here, but, with apologies for my presumptuousness, I think >>> you should deprecate SuperColumns. They are already distracting you, and as >>> the years go by the cost of supporting them as you add more and more >>> functionality is only likely to get worse. It would be better to concentrate >>> on making the "core" column families better (and I'm sure we can all think >>> of lots of things we'd like). >>> >>> Just dropping SuperColumns would be bad for your reputation -- and for >>> users like David who are currently using them. But if you mark them clearly >>> as deprecated and explain why and what to do instead (perhaps putting a bit >>> of effort into migration tools... or even a "virtual" layer supporting >>> arbitrary hierarchical data), then you can drop them in a few years (when >>> you get to 1.0, say), without people feeling betrayed. >>> >>> -- Shaun >>> >>> On Feb 6, 2011, at 3:48 AM, David Boxenhorn wrote: >>> >>> "My main point was to say that it's think it is better to create tickets >>> for what you want, rather than for something else completely different that >>> would, as a by-product, give you what you want." >>> >>> Then let me say what I want: I want supercolumn families to have any >>> feature that regular column families have. >>> >>> My data model is full of supercolumns. I used them, even though I knew it >>> didn't *have to*, "because they were there", which implied to me that I was >>> supposed to use them for some good reason. Now I suspect that they will >>> gradually become less and less functional, as features are added to regular >>> column families and not supported for supercolumn families. >>> >>> >>> On Fri, Feb 4, 2011 at 10:58 AM, Sylvain Lebresne >>> wrote: >>> On Fri, Feb 4, 2011 at 12:35 AM, Mike Malone wrote: > On Thu, Feb 3, 2011 at 6:44 AM, Sylvain Lebresne > wrote: > >> On Thu, Feb 3, 2011 at 3:00 PM, David Boxenhorn wrote: >> >>> The advantage would be to enable secondary indexes on supercolumn >>> families. >>> >> >> Then I suggest opening a ticket for adding secondary indexes to >> supercolumn families and voting on it. This will be 1 or 2 order of >> magnitude less work than getting rid of super column internally, and >> probably a much better solution anyway. >> > > I realize that this is largely subjective, and on such matters code > speaks louder than words, but I don't think I agree with you on the issue > of > which alternative is less work, or even which is a better solution. > You are right, I put probably too much emphase in that sentence. My main point was to say that it's think it is better to create tickets for what you want, rather than for something else completely different that would, as a by-product, give you what you want. Then I suspect that *if* the only goal is to get secondary indexes on super columns, then there is a good chance this would be less work than getting rid of super columns. But to be fair, secondary indexes on super columns may not make too much sense without #598,