Counters values are less than expected [1.0.6 - Java/Pelops]
Hi all, I have a problem with counters I'd like to solve before going in production. When a user write a comment in my platform I increase a counter (there is a counter for each user) and I write a new column in the user specific row. Everything worked fine but yesterday I noticed that the column count of the Row was different from the counters value ... In my test environment the user had 7 comments, so 7 columns and 7 as value of his countercolumn. I wrote 3 comments in few minutes, the counter value was still 7, the columns number was 10! Counters and columns are written in the same operation. I've checked for my application log but all was normal. I wrote one more comment today to check and now counter is 8 and column number is 11 . I'm trying to get permissions to read the cassandra log (no comment) but in the meanwhile I'd like to know if anyone faced problems like this one ... I've read that sometimes people had counters bigger than expected due to client retry of succesful operation marked as failed ... I will post log results ... thanks for any help Regards, Carlo
R: Re: Counters values are less than expected [1.0.6 - Java/Pelops]
Cannot reproduce ...Written in CL Quorum, RF = 3, cluster of 5 nodes ... I suppose it's an issue with the client since it's not the first "strange behaviour" with CounterColumns ... Messaggio originale Da: aa...@thelastpickle.com Data: 20/07/2012 11.12 A: Ogg: Re: Counters values are less than expected [1.0.6 - Java/Pelops] Nothing jumps out, can you reproduce the problem ? If you can repo it let us know and the RF / CL. Good luck. -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 20/07/2012, at 1:07 AM, cbert...@libero.it wrote:Hi all, I have a problem with counters I'd like to solve before going in production. When a user write a comment in my platform I increase a counter (there is a counter for each user) and I write a new column in the user specific row. Everything worked fine but yesterday I noticed that the column count of the Row was different from the counters value ... In my test environment the user had 7 comments, so 7 columns and 7 as value of his countercolumn. I wrote 3 comments in few minutes, the counter value was still 7, the columns number was 10! Counters and columns are written in the same operation. I've checked for my application log but all was normal. I wrote one more comment today to check and now counter is 8 and column number is 11 . I'm trying to get permissions to read the cassandra log (no comment) but in the meanwhile I'd like to know if anyone faced problems like this one ... I've read that sometimes people had counters bigger than expected due to client retry of succesful operation marked as failed ... I will post log results ... thanks for any help Regards, Carlo
Generic questions over Cassandra 1.1/1.2
Hi all, I'm in production with cassandra since version 0.6 then upgraded to 0.7 and finally to 1.0. If I look at my schema now it's "senseless" to be on 1.0 but many things changed from 0.6 ... secondary indexes, counters, expiring columns and more. Now I am going to write a new application using cassandra so I started reading documentation in order to "model" the new db using all the new features and not reinventing the wheel. So let's start with couple of questions ... sorry if stupid :-) 1) SCF are deprecated and I see that what it used to be a development concept (use a CF and build a row name using ROW+SC name, if you want keep sorting use OPP) has become a Cassandra Concept (compund key). Is it right? And more, can I avoid OPP when using compound keys since inside partition key data are ordered on the remaining components of the primary key? finally I've tried to use the order by to sort data and it works -- but can I use order by and where clause on a secondary index together? CREATE TABLE ctable ( basekey uuid, extensionkey uuid, myvalue varchar, PRIMARY KEY (basekey, extensionkey) ) SELECT * FROM ctable WHERE basekey = ? and myvalue = ? ORDER BY extensionkey DESC LIMIT 5 I haven't been able to do it 2) Is Cassandra still schemaless? One thing I loved is that to create a new column I didn't have to "alter" any cf before. Trying CQL 3 I noticed that if I try to make an insert of a new "column" not defined in schema I raised an exception. Thanks in advance for any help Carlo
R: Re: Generic questions over Cassandra 1.1/1.2
Aaron first of all thanks for your precious help everytime Some resources for CQL 3, it may match your needs. If not you can still use Thrift through your favourite client...There have been a few articles on the DS blog http://www.datastax.com/dev/blogA talk at the conference by Eric http://www.datastax.com/events/cassandrasummit2012/presentationsI did a webinar about it last month http://www.datastax.com/resources/webinars/collegecredit I will read all the links carefully. The idea was to keep Pelops (client I am familiar with) and include CQL3 through the new Java driver Datastax is going to provide. SELECT * FROM ctable WHERE basekey = ? and myvalue = ? ORDER BY extensionkey DESC LIMIT 5 I haven't been able to do it …. That looks ok, what was the error ? What cassandra version and what CQL version? Cassandra 1.2 beta2 and CQL 3 -- honestly I didn't remember the exact error (only that it was about the order by) and I don't have Cassandra here to try. Will write more about this tomorrow ... however if I avoided the where on secondary indexed column the query was ok. 2) Is Cassandra still schemaless? One thing I loved is that to create a new column I didn't have to "alter" any cf before. Trying CQL 3 I noticed that if I try to make an insert of a new "column" not defined in schema I raised an exception. CQL 3 requires a schema, however altering the schema is easier. And in 1.2 will support concurrent schema modifications. Thrift API is still schema less. What it means that it will support "concurrent schema modifications?" (if the answer is in the link above I will know tomorrow :) )I imagined that only CQL required a schema. What happen in a situation like this? 1) I create a table using CQL2) I add a new column using Thrift3) I query for the column using CQL One more question:Is there any noticeable performance difference between thrift or CQL3? Thanks,Carlo
Re: Generic questions over Cassandra 1.1/1.2
>> Aaron first of all thanks for your precious help everytime …. >Thanks for using Cassandra since version 0.6 :) ahahah :-) >There are two types of CQL 3 tables, regular ones and those that use "COMPACT STORAGE". Regular CQL 3 tables are not visible to Thrift as they store some extra data that thrift clients may not understand. COMPACT STORAGE tables are visible to thrift for read and write, not sure about schema mods. They do not support the compound primary key, Thanks for the answer ... the error described before is: "ORDER BY is only supported when the partition key is restricted by an EQ or an IN." But I don't see how I didn't respect the rule ... Cheers, Carlo
R: Re: Generic questions over Cassandra 1.1/1.2
Da: sylv...@datastax.com > The error message is indeed somewhat misleading and I've just committed a fix > to return a better message. But at the end of the day, the > limitation is > that ORDER BY is just not supported with 2ndary indexes. mmm this is not good news for the model I just designed ... however thanks for the information. Please write it somewhere in datastax documentation cause I didn't find it anywhere and lost time to understand what I did wrong :-) --Sylvain
Migrate a model from 0.6
Hi all, more than a years ago I wrote a comment for migrating an old schema to a new model. Since the company had other priorities we didn't realize, and now I'm trying to upgrade my 0.6 data-model to the newest 2.0 model. The DB contains mainly comments written by users on companies. Comments must be validated (when they come into the application they are in "pending" status, and then they can be "approved" or "rejected"). The main queries with very intensive use (and that should perform very fast) are: 1) Get all approved comments of a company sorted by insertion time 2) Get all approved comments of a user sorted by insertion time 3) Get latest X approved comments in city with a vote higher than Y sorted by insertion time User/Company comments are less than 100 in 90% of situations: in general when dealing with user and company comments the amount of data is few kilobytes. Comments in a city can be a more than 200.000 and is a fast-growing number. In my old data model I had companies table, users table and comments table. The last containing the comments and 3 more column families (company_comments/user_comments/city_comments) containing only a set of time-sorted uuid pointers to comments table. I have no idea in how many tables I should keep data in new model. I've been reading lots of documentation: to make the model easier I though something like this ... users and companies table like in the old model. As far as comments: CREATE TABLE comments ( location text, id timeuuid, status text, companyid uuid, userid uuid, text text, title text, vote varint, PRIMARY KEY ((location, status, vote), id) ) WITH CLUSTERING ORDER BY (id DESC); create index companyid_key on commenti(companyid); create index userid_key on commenti(userid); This model should provide, out of the box, the query number 3. select * from comments where location='city' and status='approved' and vote in (3,4,5) order by id DESC limit X; But the other 2 queries are made with secondary index and client-side intensive. select * from comments where companyid='123'; select * from comments where userid='123'; And this will retrieve all company/user comments but they are 1 - not filtered by their status 2 - not sorted in any way Considering the amount of data told before how would you model the platform? Thanks for any help
Clustering order and secondary index
Hi all, I'm trying to migrate my old project born with Cassandra 0.6 and grown with 0.7 /1.0 to the latest 2.0. I have an easy question for you all: query using only secondary indexes do not respect any clustering order? Thanks
backend query of a Cassandra db
Hello, I have a working cluster of Cassandra that performs very well on a high traffic web application. Now I need to build a backend web application to query Cassandra on many non indexed columns ... what is the best way to do that? Apache hive? Pig? Cassandra 2 Thanks
Moving a CF between keyspaces
Hi all, for some reason I have a CF in a Keyspace and I need to duplicate this CF and its content into another keyspace Is there any best practice to do it or I need to read/write all rows? Best regards Carlo
R: Re: AntiEntropy?
>From "Cassandra the definitive guide" - Basic Maintenance - Repair Running nodetool repair causes Cassandra to execute a Major Compaction [...] AntiEntropyService implements the Singleton pattern and defines the static Differencer class as well, which is used to compare two trees. If it finds any differences, it launches a repair for the ranges that don't agree. So, although Cassandra takes care of such matters automatically on occasion you can run it yourself as well So now I'm confused ... Cassandra doc says that I have to run it by myself, Cassandra book says I don't have to. Did I misunderstand something? >> I looked around in the code, it seems that AntiEntropy operations are >> not automatically run in the server daemon, but only >> manually invoked through nodetool, am I correct? > >Yes, and it's important that you do run repair: >http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
R: Re: Re: AntiEntropy?
>The book is wrong, at least by current versions of Cassandra (I'm >basing that on the quote you pasted, I don't know the context). To be sure that I didn't misunderstand (English is not my mother tongue) here is what the entire "repair paragraph" says ... Basic Maintenance There are a few tasks that you’ll need to perform before or after more impactful tasks. For example, it makes sense to take a snapshot only after you’ve performed a flush. So in this section we look at some of these basic maintenance tasks: repair, snapshot, and cleanup. Repair Running nodetool repair causes Cassandra to execute a major compaction. A Merkle tree of the data on the target node is computed, and the Merkle tree is compared with those of other replicas. This step makes sure that any data that might be out of sync with other nodes isn’t forgotten. During a major compaction (see “Compaction” in the Glossary), the server initiates a TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring nodes. The Merkle tree is a hash representing the data in that column family. If the trees from the different nodes don’t match, they have to be reconciled (or “repaired”) in order to determine the latest data values they should all be set to. This tree compar- ison validation is the responsibility of the org.apache.cassandra.service. AntiEntropy Service class. AntiEntropyService implements the Singleton pattern and defines the static Differencer class as well, which is used to compare two trees. If it finds any differences, it launches a repair for the ranges that don’t agree. So although Cassandra takes care of such matters automatically on occasion, you can run it yourself as well. > >nodetool repair must be scheduled by the operator to run regularly. >The name "repair" is a bit unfortunate; it is not meant to imply that >it only needs to run when something is "wrong". > >-- >/ Peter Schuller >
R: Re: Re: Re: AntiEntropy?
Thanks for the confirmatio, Peter. In the company I work for I suggested many times to run repair at least 1 every 10 days (gcgraceseconds is set approx to 10 days in our config) -- but this book has been used against me :-) I will ask to run repair asap >Messaggio originale >Da: peter.schul...@infidyne.com >Data: 13/07/2011 5.07 >A: , "cbert...@libero.it" >Ogg: Re: Re: Re: AntiEntropy? > >> To be sure that I didn't misunderstand (English is not my mother tongue) here >> is what the entire "repair paragraph" says ... > >Read it, I maintain my position - the book is wrong or at the very >least strongly misleading. > >You *definitely* need to run nodetool repair periodically for the >reasons documented in the link I sent before, unless you have specific >reasons not to and know what you're doing. > >-- >/ Peter Schuller >
R: Re: Re: Re: Re: AntiEntropy?
>Note that if GCGraceSeconds is 10 days, you want to run repair often >enough that there will never be a moment where there is more than >exactly 10 days since the last successfully completed repair >*STARTED*. >When scheduling repairs, factor in things like - what happens if >repair fails? Who gets alerted and how, and will there be time to fix >the problem? How long does repair take? Peter thanks for the tip. I'm still very surprised for what I've read in the book about the repair. Best Regards Carlo
Too many open files during Repair operation
Hi all. In production we want to run nodetool repair but each time we do it we get the too many open files error. We've increased the number of available FD for Cassandra till 8192 but still we get the same error after few seconds. Should I increase it more? WARN [Thread-7] 2011-07-19 12:34:00,348 CustomTThreadPoolServer.java (line 131) Transport error occurred during acceptance of message. org.apache.thrift.transport.TTransportException: java.net.SocketException: Too many open files at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket. java:124) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl (TCustomServerSocket.java:68) at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl (TCustomServerSocket.java:39) at org.apache.thrift.transport.TServerTransport.accept (TServerTransport.java:31) at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve (CustomTThreadPoolServer.java:121) at org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run (CassandraDaemon.java:155) Caused by: java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at java.net.ServerSocket.accept(ServerSocket.java:430) at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket. java:119) ... 5 more nodetool repair keyspacename -h host Cassandra 0.7.5, 1 cluster, 5 nodes. Each node give the same output. One more question: when repair start throwing this kind of exceptions (very fast) we stop the process of repair ... is it dangerous for data? Best Regards Carlo
2800 file descriptors?
Hi all, I wonder if is normal that Cassandra (5 nodes, 0.75) has more than 2800 fd open and growing. I still have the problem that during repair I get into the "too many open files" Best regards
R: Re: 2800 file descriptors?
> For the "too many open files" issue, maybe you could try: ulimit -n 5000 > && . Ok, thanks for the tip but I get this error running nodetool repair and not during cassandra execution. I however wonder if this is normal or not ... in production do you get similar numbers? Isn't it too much? best regards
My "nodetool" in Java
Hi all, I'd like to build something like "nodetool" to show the status of the ring (nodes up-down, info on single node) all via JAVA. Do you have any tip for this? (I don't want to run the nodetool through java and capture the output ...). I have really no idea on how to do it ... :-)
R: Re: My "nodetool" in Java
It was easier than what I though :-) thanks >Messaggio originale >Da: jeremy.hanna1...@gmail.com >Data: 20/07/2011 22.25 >A: >Ogg: Re: My "nodetool" in Java > >If you look at the bin/nodetool file, it's just a shell script to run org. apache.cassandra.tools.NodeCmd. You could probably call that directly from your code. > >On Jul 20, 2011, at 3:18 PM, cbert...@libero.it wrote: > >> Hi all, >> I'd like to build something like "nodetool" to show the status of the ring >> (nodes up-down, info on single node) all via JAVA. >> Do you have any tip for this? (I don't want to run the nodetool through java >> and capture the output ...). >> >> I have really no idea on how to do it ... :-) > >
Can not repair
Hi all, I can't get the repair in my production. We are out since 6 months but before we did not perform any delete do we didn't need to run repair. Now we are out since 2 weeks with a new version of our software that performs delete but we can not get the nodetool repair working, The first problem I see in the log is a: ERROR 16:34:49,790 Fatal exception in thread Thread[CompactionExecutor:1,1, main] java.io.IOException: Keys must be written in ascending order. at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend (SSTableWriter.java:111) at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter. java:128) at org.apache.cassandra.db.CompactionManager.doCompaction (CompactionManager.java:451) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager. java:124) at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager. java:94) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask (ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor .java:908) at java.lang.Thread.run(Thread.java:662) I've read something concerning a change of the partitioner but I've never modified it. There is to say that we were online with 0.6.5 and now we are with 0.7.5: to migrate from this version to another we followed the documentation (conversion of the yaml, nodetool drain and so on) ... The system is working even if there is this "problem" but not the repair. The limit of FD of the user running the repair is unlimited but everytime we get a "Too many open files". I'm little bit worried cause if some delete reappears the webapp will export wrong data ... Best regards Carlo
Counters and Top 10
Hi all, I'm using Cassandra in production for a small social network (~10.000 people). Now I have to assign some "credits" to each user operation (login, write post and so on) and then beeing capable of providing in each moment the top 10 of the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new version in order to use Counters for the user points but ... what about the top 10? I was thinking about a specific ROW that always keeps the 10 most active users ... but I think it would be heavy (to write and to handle in thread-safe mode) ... can counters provide something like a "value ordered list"? Thanks for any help. Best regards, Carlo
R: Re: Counters and Top 10
Hi all, I've red all your messages concerning the top 10 ... any solution is possibile but I still did not find the best one. Using a composite Column Name as suggested would be smart cause it brings to a sorted row where I can have my top-10 in any moment but it can slow down all the platform since, for every operation, I have to read data from cassandra, calculate and store data back. Using counters I could just say "hey, +1 on this" and forget. But using counters I don't have any kind of value-sorting ... I know redis but I think it's too much to use a new key-value db just for this sorting ... I think I'll use a thread that run every X to generate the top10 row ... it won't be realtime but at least it will keep platform performance to a good level. Thank you all and merry christmas >Messaggio originale >Da: ben...@noisette.ch >Data: 25/12/2011 10.19 >A: >Ogg: Re: Counters and Top 10 > >With Composite Column Name, you can even have column composed of sore >(int) and userid (uuid or whatever). Empty column value to avoid >repeating user UUID. > > >2011/12/22 R. Verlangen : >> I would suggest you to create a CF with a single row (or multiple for >> historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple >> columns for every user's score. The column (utf8) would then be the score + >> something unique of the user (e.g. hex representation of the TimeUUID). The >> value would be the TimeUUID of the user. >> >> By default columns will be sorted and you can perform a slice to get the top >> 10. >> >> 2011/12/14 cbert...@libero.it >> >>> Hi all, >>> I'm using Cassandra in production for a small social network (~10.000 >>> people). >>> Now I have to assign some "credits" to each user operation (login, write >>> post >>> and so on) and then beeing capable of providing in each moment the top 10 >>> of >>> the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new >>> version in order to use Counters for the user points but ... what about >>> the top >>> 10? >>> I was thinking about a specific ROW that always keeps the 10 most active >>> users >>> ... but I think it would be heavy (to write and to handle in thread-safe >>> mode) >>> ... can counters provide something like a "value ordered list"? >>> >>> Thanks for any help. >>> Best regards, >>> >>> Carlo >>> >>> >> > > > >-- >sent from my Nokia 3210 >
Migration from 0.7 to 1.0
Hi, I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to know the best way to do it ... "Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, one node at a time. (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: upgrade to the most recent 0.8 release first.) You do not need to bring down the whole cluster at once. - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones." So what I'd do is for each node to ... 1 - run nodetool drain 2 - stop cassandra process 3 - start the new cassandra 1.0 4 - run nodetool scrub on the node Is it right? Do i miss something (I will backup everything before the upgrade)? Should I worry for some kind of particular/known problems? As far as maintenance is concerned, is enough to run a repair every x? (x < GCGraceSeconds) Best regards, Carlo
R: Re: Migration from 0.7 to 1.0
Aaron first of all thanks for your great support. I'm paranoid, so I would upgrade 1 node and let it soak in for a few hours. Nothing like upgrading an entire cluster and then discovering a problem. Ok but as far as my application is concerned is safe to keep a cluster with part of 1.0 and part of 0.7?I've read that they can communicate but will it bring to "strange" situations? Will my application continue working (java/pelops)? You can take some extra steps when doing a rolling restart see http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ This is what I was looking for! :-)Thanks for the repair tips ... Best regards,Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 04/01/2012 22.00 A: Ogg: Re: Migration from 0.7 to 1.0 Sounds good. You can take some extra steps when doing a rolling restart see http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/ Also make sure repair *does not* run until all the nodes have been upgraded. Do i miss something (I will backup everything before the upgrade)? I'm paranoid, so I would upgrade 1 node and let it soak in for a few hours. Nothing like upgrading an entire cluster and then discovering a problem. As far as maintenance is concerned, is enough to run a repair every x? (x < GCGraceSeconds)once for each node with in that time frame http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair Cheers -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 5/01/2012, at 2:47 AM, cbert...@libero.it wrote:Hi, I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to know the best way to do it ... "Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, one node at a time. (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: upgrade to the most recent 0.8 release first.) You do not need to bring down the whole cluster at once. - After upgrading, run nodetool scrub against each node before running repair, moving nodes, or adding new ones." So what I'd do is for each node to ... 1 - run nodetool drain 2 - stop cassandra process 3 - start the new cassandra 1.0 4 - run nodetool scrub on the node Is it right? Do i miss something (I will backup everything before the upgrade)? Should I worry for some kind of particular/known problems? As far as maintenance is concerned, is enough to run a repair every x? (x < GCGraceSeconds) Best regards, Carlo
Schema clone ...
Hi, I have create a new dev-cluster with cassandra 1.0 -- I would like to have the same CFs that I have in the 0.7 one but I don't need data to be there, just the schema. Which is the fastest way to do it without making 30 "create column family ..." Best regards, Carlo
R: Re: Schema clone ...
I was just trying it but ... in 0.7 CLI there is no show schema command.When I connect with 1.0 CLI to my 0.7 cluster ... [default@social] show schema;null I always get a "null" as answer! :-|Any tip for this? ty, Cheers Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 09/01/2012 11.33 A: , "cbert...@libero.it" Ogg: Re: Schema clone ... Try show schema in the CLI. Cheers -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:Hi, I have create a new dev-cluster with cassandra 1.0 -- I would like to have the same CFs that I have in the 0.7 one but I don't need data to be there, just the schema. Which is the fastest way to do it without making 30 "create column family ..." Best regards, Carlo
R: Re: Schema clone ...
* Grab the system sstables from one of the 0.7 nodes and spin up a temp 1.0 machine them, then use the command. Probably I'm still sleeping but I can't get what I want! :-( I've copied the SSTables of a node to my own computer where I installed a Cassandra 1.0 just for the purpose. I've copied it in the data folder under the keyspace name carlo@ubpc:/store/cassandra/data/social$ now here I have lots of file like this now ... User-f-74-Data.db User-f-74-Filter.db User-f-74-Index.db User-f-74-Statistics.db but now how to tell cassandra "hey, load the content of social"?Did I miss something? Cheers,Carlo Messaggio originale Ogg: Re: Schema clone ... ah, sorry brain not good work. It's only in 0.8. You could either: * write the CLI script by handor* Grab the system sstables from one of the 0.7 nodes and spin up a temp 1.0 machine them, then use the command. or* See if your cassandra client software can help. Hope that helps. -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 9/01/2012, at 11:41 PM, cbert...@libero.it wrote:I was just trying it but ... in 0.7 CLI there is no show schema command.When I connect with 1.0 CLI to my 0.7 cluster ... [default@social] show schema; null I always get a "null" as answer! :-|Any tip for this? ty, Cheers Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 09/01/2012 11.33 A: , "cbert...@libero.it" Ogg: Re: Schema clone ... Try show schema in the CLI. Cheers -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:Hi, I have create a new dev-cluster with cassandra 1.0 -- I would like to have the same CFs that I have in the 0.7 one but I don't need data to be there, just the schema. Which is the fastest way to do it without making 30 "create column family ..." Best regards, Carlo
R: Re: AW: How to control location of data?
In each node of the ring has a unique Token which representing the node's logical position in the cluster. When you perform an operation on a row is calculated a token based on this row ... the node-token "closest" to the row-token will store the data (and also the RF-1 remaining nodes) -- this tecnique should guarantee that data are balanced among the cluster (if you use the Random Partitioner) Regards,Carlo Messaggio originale Da: andreas.rudo...@spontech-spine.com Data: 10/01/2012 15.05 A: "user@cassandra.apache.org" Ogg: Re: AW: How to control location of data? -->Hi! Thank you for your last reply. I'm still wondering if I got you right... ... A partitioner decides into which partition a piece of data belongsDoes your statement imply that the partitioner does not take any decisions at all on the (physical) storage location? Or put another way: What do you mean with "partition"? To quote http://wiki.apache.org/cassandra/ArchitectureInternals: "... AbstractReplicationStrategy controls what nodes get secondary, tertiary, etc. replicas of each key range. Primary replica is always determined by the token ring (...)" ... You can select different placement strategies and partitioners for different keyspaces, thereby choosing known data to be stored on known hosts.This is however discouraged for various reasons – i.e. you need a lot of knowledge about your data to keep the cluster balanced. What is your usecase for this requirement? there is probably a more suitable solution. What we want is to partition the cluster with respect to key spaces.That is we want to establish an association between nodes and key spaces so that a node of the cluster holds data from a key space if and only if that node is a *member* of that key space. To our knowledge Cassandra has no built-in way to specify such a membership-relation. Therefore we thought of implementing our own replica placement strategy until we started to assume that the partitioner had to be replaced, too, to accomplish the task. Do you have any ideas? Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] Gesendet: Dienstag, 10. Januar 2012 09:53 An: user@cassandra.apache.org Betreff: How to control location of data? Hi! We're evaluating Cassandra for our storage needs. One of the key benefits we see is the online replication of the data, that is an easy way to share data across nodes. But we have the need to precisely control on what node group specific parts of a key space (columns/column families) are stored on. Now we're having trouble understanding the documentation. Could anyone help us with to find some answers to our questions?· What does the term "replica" mean: If a key is stored on exactly three nodes in a cluster, is it correct then to say that there are three replicas of that key or are there just two replicas (copies) and one original?· What is the relation between the Cassandra concepts "Partitioner" and "Replica Placement Strategy"? According to documentation found on DataStax web site and architecture internals from the Cassandra Wiki the first storage location of a key (and its associated data) is determined by the "Partitioner" whereas additional storage locations are defined by "Replica Placement Strategy". I'm wondering if I could completely redefine the way how nodes are selected to store a key by just implementing my own subclass of AbstractReplicationStrategy and configuring that subclass into the key space.· How can I suppress that the "Partitioner" is consulted at all to determine what node stores a key first?· Is a key space always distributed across the whole cluster? Is it possible to configure Cassandra in such a way that more or less freely chosen parts of a key space (columns) are stored on arbitrarily chosen nodes? Any tips would be very appreciated :-)
R: Re: Schema clone ...
I got it :-)Thanks for your patience Aaron ... the problem was that the cluster-name in yaml was different.Now it works, I've cloned the schema Regards Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 10/01/2012 20.13 A: Ogg: Re: Schema clone ... * Grab the system sstables from one of the 0.7 nodes and spin up a temp 1.0 machine them, then use the command. Grab the *system* tables Migrations , Schema etc. in cassandra/data/system Cheers -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 10/01/2012, at 10:20 PM, cbert...@libero.it wrote: * Grab the system sstables from one of the 0.7 nodes and spin up a temp 1.0 machine them, then use the command. Probably I'm still sleeping but I can't get what I want! :-( I've copied the SSTables of a node to my own computer where I installed a Cassandra 1.0 just for the purpose. I've copied it in the data folder under the keyspace name carlo@ubpc:/store/cassandra/data/social$ now here I have lots of file like this now ... User-f-74-Data.db User-f-74-Filter.db User-f-74-Index.db User-f-74-Statistics.db but now how to tell cassandra "hey, load the content of social"?Did I miss something? Cheers,Carlo Messaggio originale Ogg: Re: Schema clone ... ah, sorry brain not good work. It's only in 0.8. You could either: * write the CLI script by handor* Grab the system sstables from one of the 0.7 nodes and spin up a temp 1.0 machine them, then use the command. or* See if your cassandra client software can help. Hope that helps. -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 9/01/2012, at 11:41 PM, cbert...@libero.it wrote:I was just trying it but ... in 0.7 CLI there is no show schema command.When I connect with 1.0 CLI to my 0.7 cluster ... [default@social] show schema; null I always get a "null" as answer! :-|Any tip for this? ty, Cheers Carlo Messaggio originale Da: aa...@thelastpickle.com Data: 09/01/2012 11.33 A: , "cbert...@libero.it" Ogg: Re: Schema clone ... Try show schema in the CLI. Cheers -Aaron MortonFreelance Developer@aaronmortonhttp://www.thelastpickle.com On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:Hi, I have create a new dev-cluster with cassandra 1.0 -- I would like to have the same CFs that I have in the 0.7 one but I don't need data to be there, just the schema. Which is the fastest way to do it without making 30 "create column family ..." Best regards, Carlo
R: Cassandra is not storing values correctly.
When I store some values in a certain column these values are only stored when cassandra is running. What do you mean? When i restart cassandra the values that i stored are misteriously gone Are you sure that your cluster is ok? And that you are not writing columns' TTL? And also the old values that i deleted before reappear. definitely: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair Regards,Carlo Messaggio originale Da: linuxispossi...@gmail.com Data: 04/05/2012 9.57 A: Ogg: Cassandra is not storing values correctly. Hi everyone, i'm using Cassandra as main storage for my PHP platform, but actually seems that cassandra is not working properly. When I store some values in a certain column these values are only stored when cassandra is running. When i restart cassandra the values that i stored are misteriously gone. No trace. And also the old values that i deletedbefore reappear. Have a good day.
CL1 and CLQ with 5 nodes cluster and 3 alives node
Hi all, I'm experiencing some problems after 3 years of cassandra in production (from 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory Exception. In the log I can read the warn about the few heap available ... now I'm increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing the size of rows and memtables thresholds. Other tips? Now a question -- why with 2 nodes offline all my application stop providing the service, even when a Consistency Level One read is invoked? I'd expected this behaviour: CL1 operations keep working more than 80% of CLQ operations working (nodes offline where 2 and 5 in a clockwise key distribution only writes to fifth node should impact to node 2) most of all CLALL operations (that I don't use) failing The situation instead was that I had ALL services stop responding throwing a TTransportException ... Thanks in advance Carlo
R: Re: CL1 and CLQ with 5 nodes cluster and 3 alives node
Hi Aaron, thanks for your help. >If you have more than 500Million rows you may want to check the bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.) number is > 0.01 for sized tiered. I really don't think I have more than 500 million rows ... any smart way to count rows number inside the ks? >> Now a question -- why with 2 nodes offline all my application stop providing >> the service, even when a Consistency Level One read is invoked? >What error did the client get and what client are you using ? >it also depends on if/how the node fails. The later versions try to shut down when there is an OOM, not sure what 1.0 does. The exception was a TTransportException -- I am using Pelops client. >Is the node went into a zombie state the clients may have been timing out. The should then move onto to another node. >If it had started shutting down the client should have gotten some immediate errors. It didn't shut down, it was more like in a zombie state, One more question: I'm experiencing some wrong counters (which are very important in my platform since the are used to keep user-points and generate the TopX users) --could it be related with this problem? The problem is that in some users (not all) the counter column increased its value. After such a crash in 1.0 is there any best-practice to follow? (nodetool or something?) Cheers, Carlo > >Cheers > > >- >Aaron Morton >Cassandra Consultant >New Zealand > >@aaronmorton >http://www.thelastpickle.com > >On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote: > >> Hi all, >> I'm experiencing some problems after 3 years of cassandra in production (from >> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory >> Exception. >> In the log I can read the warn about the few heap available ... now I'm >> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing the >> size of rows and memtables thresholds. Other tips? >> >> Now a question -- why with 2 nodes offline all my application stop providing >> the service, even when a Consistency Level One read is invoked? >> I'd expected this behaviour: >> >> CL1 operations keep working >> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a >> clockwise key distribution only writes to fifth node should impact to node 2) >> most of all CLALL operations (that I don't use) failing >> >> The situation instead was that I had ALL services stop responding throwing a >> TTransportException ... >> >> Thanks in advance >> >> Carlo > >
Data disappear immediately after reading?
Hi all, I know the subject is not saying much but this is what I'm experiencing now with my cluster. After some years without any problem now I'm experiencing problems with counters but, the most serious problem, is data loss immediately after a read. I have some webservices that I use to query data on Cassandra but in the last month happened 2 times the following problem: I call my WS, it shows data. I refresh the page -- data are no more available! I can call then 200 times the WS but I won't see data anymore ... today my colleague experienced the same problem. The WS are ABSOLUTELY read only on the DB and there are no write to erase these data. Anyone understand wth is going on? I have no idea but most of all I don't know how to fix. Any help would really be appreciated. Kind Regards, Carlo
R: Data disappear immediately after reading?
Sorry I forgot to tell Apache Cassandra 1.07 on Ubuntu 10.04 The data that are disappearing are not Counters but common Rows >Messaggio originale >Da: cbert...@libero.it >Data: 24/07/2013 22.34 >A: >Ogg: Data disappear immediately after reading? > >Hi all, >I know the subject is not saying much but this is what I'm experiencing now >with my cluster. >After some years without any problem now I'm experiencing problems with >counters but, the most serious problem, is data loss immediately after a read. > >I have some webservices that I use to query data on Cassandra but in the last >month happened 2 times the following problem: I call my WS, it shows data. I >refresh the page -- data are no more available! I can call then 200 times the >WS but I won't see data anymore ... today my colleague experienced the same >problem. The WS are ABSOLUTELY read only on the DB and there are no write to >erase these data. Anyone understand wth is going on? I have no idea but most of >all I don't know how to fix. > >Any help would really be appreciated. > >Kind Regards, >Carlo >
Refactoring old project
Hi all, in my very old Cassandra schema (started with 0.6 -- so without secondary indexes -- and now on 1.0.6) I have a rating&review platform with about 1 million review. The core of the application is the review that a user can leave about a company. At the time I created many CF: Comments, UserComments, CompanyComments , CityComments -- and I used timeuuid to keep data sorted in the way i needed (UserComments/CompanyComments/CityComments did not keep real comments but just a "referece" [id] to the comment table) Since I need comments to be sorted by date, what would be the best way to write it again using cassandra 2.0? Obviously all these CF will merge into one. What I would need is to perform query likes ... Get latest X comments in a specific city Get latest X comments of a company Get latest X comments of a user I can't sort client side because, even if for user/company I can have up to 200 reviews, for a city I can have 50.000 and more comments. I know that murmur3 is the suggested one but I wonder if this is not the case to use the Order Preserving. A row entry would be something like CommentID (RowKey) -- companyId -- userId -- text - vote - city Another idea is to use a composite key made by (city, commentid) so I would have all comments sorted by city for free and could perform client-side sorting for user/company comments. Am I missing something? TIA, Carlo
Online shop with Cassandra
Hi all, for an online shop owned by my company I would like to remove MySQL for everything concerning the frontend and use Cassandra instead. The site has more than a million visit each day but what we need to know is Products (deals) are divided for cities Each product can stay online for X time and sell a max number of Y items CREATE TABLE deals ( city string, deal_id timeuuid, availability int, deal_info string, PRIMARY KEY ((city, deal_id)) ); The only problem I see in this model is how to guarantee the "availability" of a deal and don't "overbook" -- How to solve the problem of "remaining items" in real time? I have many idea how to solve it on the web-application but I wonder if there is nothing ready on Cassandra that might help. Kindest regards, Carlo
Ring up but read fails ...
Hi all, I build a java webapp using Cassandra as DB. At the startup the application creates a cassandra pool using the Pelops client ... it my dev and test environment everything works but in production I have some strange problems. So I built a JSP to check the status of Cassandra DB, doing nothing more than a ... try { // connect to the webapp pool // make some select in quorum mode // print OK } catch (Exception e) { // print KO } well this page "often" returns KO. While in dev/test environment is OK if nodes are up (and KO if some nodes are down), in production it "often" (not always) print KO also if the nodetool ring returns all nodes "UP". Here is and extract of the errors of the webapp ... ERROR UserNameCmd:38 - java.net.SocketException: Broken pipe ERROR UidCmd:61 - java.net.SocketException: Broken pipe org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe ... at org.apache.cassandra.thrift.Cassandra$Client.send_get_slice(Cassandra.java: 386) at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java: 371) Any idea for "debugging"? :) (5 nodes, replicationfactor=3) Best regards Carlo
R: Re: Ring up but read fails ...
> I've seen this when you leave a socket open and idle for a long time. The > connection times out. It could be the situation ... any idea about the solution?I create the pool once at startup and rely on this ... > Perhaps you use wrong transport in Thrift. Which version of cassandra you use? Cassandra 0.6.8 Best Regards -- - Carlo -
R: Re: R: Re: Ring up but read fails ...
> Reconnect and try again? Sorry what do you mean by "Reconnect and try again?" -- You mean to shut down the old pool and create a new pool of connections?I don't have the possibility to handle the single connection using Pelops ... >From Dominic Williams Blog "To work with a Cassandra cluster, you need to start off by defining a connection pool. This is typically done once in the startup code of your application"[...] "One of the key design decisions that at the time of writing distinguishes Pelops, is that the data processing code written by developers does not involve connection pooling or management. Instead, classes like Mutator and Selector borrow connections to Cassandra from a Pelops pool for just the periods that they need to read and write to the underlying Thrift API. This has two advantages. Firstly, obviously, code becomes cleaner and developers are freed from connection management concerns. But also more subtly this enables the Pelops library to completely manage connection pooling itself, and for example keep track of how many outstanding operations are currently running against each cluster node. This for example, enables Pelops to perform more effective client load balancing by ensuring that new operations are performed against the node to which it currently has the least outstanding operations running. Because of this architectural choice, it will even be possible to offer strategies in the future where for example nodes are actually queried to determine their load." TIA -- - Carlo -
Backend application for Cassandra
Hi all, I've build a web application using Cassandra. Data are stored in order to be quickly red/sorted due to my web-app needs. Everything is working quite good. Now the big "problem" is that the "other side" of my company needs to create reports over these data and the query they need to do would be very "heavy" in terms of client-side complexity. I'd like to know if you have any tips that may help ... I've red something about Kundera and Lucandra but I don't know these could be solutions ... Did you already face problems like this? Could you suggest any valid product/solution? I've heard (team-mates) some tips like "export all your CF into a relational model and query it" ... and I behaved like i didn't hear it :) TIA for any help Best Regards Carlo
Are row-keys sorted by the compareWith?
Hi all, I created a CF in which i need to get, sorted by time, the Rows inside. Each Row represents a comment. I've created a few rows using as Row Key a generated TimeUUID but when I call the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: rows are not sorted by TimeUUID. I though it was probably cause of the random-part of the TimeUUID so I create a new CF ... This time I created a few rows using the java System.CurrentTimeMillis() that retrieve a long. I call again the "GetColumnsFromRows" and again the same results: data are not sorted! I've read many times that Rows are sorted as specified in the compareWith but I can't see it. To solve this problem for the moment I've used a SuperColumnFamily with an UNIQUE ROW ... but I think this is just a workaround and not the solution. Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I expected: sorted by TimeUUID. Why it does not happen the same with the Rows? I'm confused. TIA for any help. Best Regards Carlo
R: Re: Are row-keys sorted by the compareWith?
Sorry Dan, I just noticed I answer you and not to the group!Didn't want to bother, just mistake. Best Regards Carlo Messaggio originale Da: d...@reactive.org Data: 21/02/2011 4.23 A: , "cbert...@libero.it" Ogg: Re: Are row-keys sorted by the compareWith? Hi Carlo,As Jonathan mentions the compareWith on a column family def. is defines the order for the columns *within* a row... In order to control the ordering of rows you'll need to use the OrderPreservingPartitioner (http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-ring). As for getColumnsFromRows; it should be returning you a map of lists. The map is insertion-order-preserving and populated based on the provided list of row keys (so if you iterate over the entries in the map they should be in the same order as the list of row keys). The list for each row entry are definitely in the order that Cassandra provides them, take a look at org.scale7.cassandra.pelops.Selector#toColumnList if you need more info. Cheers,Dan -- Dan Washusen Sent with Sparrow On Saturday, 19 February 2011 at 8:16 AM, cbert...@libero.it wrote: Hi all, I created a CF in which i need to get, sorted by time, the Rows inside. Each Row represents a comment. I've created a few rows using as Row Key a generated TimeUUID but when I call the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: rows are not sorted by TimeUUID. I though it was probably cause of the random-part of the TimeUUID so I create a new CF ... This time I created a few rows using the java System.CurrentTimeMillis() that retrieve a long. I call again the "GetColumnsFromRows" and again the same results: data are not sorted! I've read many times that Rows are sorted as specified in the compareWith but I can't see it. To solve this problem for the moment I've used a SuperColumnFamily with an UNIQUE ROW ... but I think this is just a workaround and not the solution. Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I expected: sorted by TimeUUID. Why it does not happen the same with the Rows? I'm confused. TIA for any help. Best Regards Carlo
I: Re: Are row-keys sorted by the compareWith?
As Jonathan mentions the compareWith on a column family def. is defines the order for the columns *within* a row... In order to control the ordering of rows you'll need to use the OrderPreservingPartitioner (http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-ring). Thanks for your answer and for your time, I will take a look at this. As for getColumnsFromRows; it should be returning you a map of lists. The map is insertion-order-preserving and populated based on the provided list of row keys (so if you iterate over the entries in the map they should be in the same order as the list of row keys). mmm ... well it didn't happen like this. In my code I had a CF named comments and also a CF called usercomments. UserComments use an uuid as row-key to keep, TimeUUID sorted, the "pointers" to the comments of the user. When I get the sorted list of keys from the UserComments and I use this list as row-keys-list in the GetColumnsFromRows I don't get back the data sorted as I expect them to be.It looks like if Cassandra/Pelops does not care on how I provide the row-keys-list. I am sure about that cause I did something different: I iterate over my row-keys-list and made many GetColumnFromRow instead of one GetColumnsFromRows and when I iterate data are correctly sorted. But this can not be a solution ... I am using Cassandra 0.6.9 I profit of your knownledge of Pelops to ask you something: I am evaluating the migration to Cassandra 0.7 ... as far as you know, in terms of written code, is it an heavy job? Best Regards Carlo Messaggio originale Da: d...@reactive.org On Saturday, 19 February 2011 at 8:16 AM, cbert...@libero.it wrote: Hi all, I created a CF in which i need to get, sorted by time, the Rows inside. Each Row represents a comment. I've created a few rows using as Row Key a generated TimeUUID but when I call the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: rows are not sorted by TimeUUID. I though it was probably cause of the random-part of the TimeUUID so I create a new CF ... This time I created a few rows using the java System.CurrentTimeMillis() that retrieve a long. I call again the "GetColumnsFromRows" and again the same results: data are not sorted! I've read many times that Rows are sorted as specified in the compareWith but I can't see it. To solve this problem for the moment I've used a SuperColumnFamily with an UNIQUE ROW ... but I think this is just a workaround and not the solution. Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I expected: sorted by TimeUUID. Why it does not happen the same with the Rows? I'm confused. TIA for any help. Best Regards Carlo
WriteMultiColumns just write one column ... amazing!
Hi all, I'm almost sure I'm just tired and I am doing something stupid however I can't understand this problem. In one Super Column Family I have just 2 rows, called ALL and INCREMENTAL. For some reason I sometimes need to duplicate a SuperColumn from the row ALL to the INCREMENTAL one ... very easy (cassandra 0.7.4, java, pelops) private static void mirrorizeEngineSuperColumn(Bytes superColumnId) { Mutator mutator = Pelops.createMutator(SocialContext.POOL_NAME_VALUE); Selector selector = Pelops.createSelector(SocialContext. POOL_NAME_VALUE); try { SuperColumn sc = selector.getSuperColumnFromRow(MotoreFamily, SocialColumn.MOTORE_ALL_ROW, superColumnId, ConsistencyLevel.QUORUM); LOG.debug("Column list size of supercolumn is " + sc. getColumnsSize()); mutator.writeSubColumns(MotoreFamily, SocialColumn. MOTORE_INCREMENTALI_ROW, superColumnId, sc.getColumns()); mutator.execute(ConsistencyLevel.QUORUM); } catch (NotFoundException nfe) { LOG.debug("Supercolumn not found ..."); } catch (Exception e) { LOG.error(e.toString()); } } When I print it the column list size is exact (3, 4 it depends on which supercolumn I'm working) but when I write them I find only one column of this column list ... here is the output produced (viewing with cassandra cli ...) -- compare the 3 super_column in the row INCREMENTAL and you'll see they're different from the one in the row ALL RowKey: ALL => (super_column=54b05120-552f-11e0-9d1f-020054554e01, (column=54fc9c60-552f-11e0-9d1f-020054554e01, value=0003, timestamp=1300872296917000) (column=746595b0-553f-11e0-9e66-020054554e01, value=0002, timestamp=1300879284037000) (column=6ec46ef0-5540-11e0-9e66-020054554e01, value=0004, timestamp=1300879641811000) (column=99d911d0-5541-11e0-af7b-020054554e01, value=0001, timestamp=1300880138869000)) => (super_column=97351e20-5545-11e0-9464-001d72d09363, (column=9763cf40-5545-11e0-9464-001d72d09363, value=0004, timestamp=1300881876938000) (column=1e5b7a40-5549-11e0-8da1-020054554e01, value=0005, timestamp=1300883402593000) (column=89f7c3e0-560b-11e0-a6ac-020054554e01, value=0005, timestamp=1300966880883000)) => (super_column=cadf5940-55ed-11e0-9b97-020054554e01, (column=cb03aa20-55ed-11e0-9b97-020054554e01, value=0004, timestamp=1300954178721000) (column=27092500-5609-11e0-b1f1-020054554e01, value=0004, timestamp=1300965858839000) (column=5cdf88d0-560a-11e0-a6ac-020054554e01, value=0005, timestamp=1300966438198000) (column=c6e34110-561c-11e0-9399-020054554e01, value=0005, timestamp=1300974305208000)) => (super_column=309d66a0-5602-11e0-9cc8-020054554e01, (column=30d8e900-5602-11e0-9cc8-020054554e01, value=0005, timestamp=1300963602927000) (column=8c8a4f40-5603-11e0-9cc8-020054554e01, value=0005, timestamp=1300963728307000) (column=62246620-5606-11e0-9e06-020054554e01, value=0005, timestamp=1300964702748000) (column=db951080-561b-11e0-8880-020054554e01, value=0003, timestamp=1300973895462000)) => (super_column=e44f1860-560c-11e0-b696-020054554e01, (column=e5045ea0-560c-11e0-b696-020054554e01, value=0005, timestamp=1300967480905000)) => (super_column=e53d7000-560c-11e0-b696-020054554e01, (column=e56395a0-560c-11e0-b696-020054554e01, value=0005, timestamp=1300967620609000)) => (super_column=90ce8370-5615-11e0-b696-020054554e01, (column=9100de10-5615-11e0-b696-020054554e01, value=0005, timestamp=1300971213814000) (column=a5171450-5615-11e0-b696-020054554e01, value=0005, timestamp=1300971294115000) (column=9fb68390-5617-11e0-9ed9-020054554e01, value=0002, timestamp=1300972093565000) (column=79889ed0-561a-11e0-bf27-020054554e01, value=0002, timestamp=130097330153)) --- RowKey: INCREMENTAL => (super_column=cadf5940-55ed-11e0-9b97-020054554e01, (column=c6e34110-561c-11e0-9399-020054554e01, value=0005, timestamp=1300974305208000)) => (super_column=309d66a0-5602-11e0-9cc8-020054554e01, (column=db951080-561b-11e0-8880-020054554e01, value=0003, timestamp=1300973895462000)) => (super_column=90ce8370-5615-11e0-b696-020054554e01, (column=9fb68390-5617-11e0-9ed9-020054554e01, value=0002, timestamp=1300972093565000)) I think I'm getting crazy! TIA for any help Best regards Carlo
R: WriteMultiColumns just write one column ... amazing!
I answer myself :) But i put the answer here maybe is useful for someone in future. The problem was in the Timestamp, if you "copy" a column from another row but you don't change the timestamp then it will be written only if in the past, a column with the same name (key) has not been erased after the set timestamp. Best regards Carlo >Messaggio originale---- >Da: cbert...@libero.it >Data: 24/03/2011 16.03 >A: >Ogg: WriteMultiColumns just write one column ... amazing! > >Hi all, >I'm almost sure I'm just tired and I am doing something stupid however I can't >understand this problem. >In one Super Column Family I have just 2 rows, called ALL and INCREMENTAL. > >For some reason I sometimes need to duplicate a SuperColumn from the row ALL >to the INCREMENTAL one ... very easy >(cassandra 0.7.4, java, pelops) > >private static void mirrorizeEngineSuperColumn(Bytes superColumnId) { >Mutator mutator = Pelops.createMutator(SocialContext. POOL_NAME_VALUE); >Selector selector = Pelops.createSelector(SocialContext. >POOL_NAME_VALUE); >try { >SuperColumn sc = selector.getSuperColumnFromRow(MotoreFamily, >SocialColumn.MOTORE_ALL_ROW, superColumnId, ConsistencyLevel.QUORUM); >LOG.debug("Column list size of supercolumn is " + sc. >getColumnsSize()); >mutator.writeSubColumns(MotoreFamily, SocialColumn. >MOTORE_INCREMENTALI_ROW, superColumnId, sc.getColumns()); >mutator.execute(ConsistencyLevel.QUORUM); >} catch (NotFoundException nfe) { >LOG.debug("Supercolumn not found ..."); >} catch (Exception e) { >LOG.error(e.toString()); >} >} > >When I print it the column list size is exact (3, 4 it depends on which >supercolumn I'm working) but when I write them I find only one column of this >column list ... here is the output produced (viewing with cassandra cli ...) -- >compare the 3 super_column in the row INCREMENTAL and you'll see they're >different from the one in the row ALL > >RowKey: ALL >=> (super_column=54b05120-552f-11e0-9d1f-020054554e01, > (column=54fc9c60-552f-11e0-9d1f-020054554e01, value=0003, >timestamp=1300872296917000) > (column=746595b0-553f-11e0-9e66-020054554e01, value=0002, >timestamp=1300879284037000) > (column=6ec46ef0-5540-11e0-9e66-020054554e01, value=0004, >timestamp=1300879641811000) > (column=99d911d0-5541-11e0-af7b-020054554e01, value=0001, >timestamp=1300880138869000)) >=> (super_column=97351e20-5545-11e0-9464-001d72d09363, > (column=9763cf40-5545-11e0-9464-001d72d09363, value=0004, >timestamp=1300881876938000) > (column=1e5b7a40-5549-11e0-8da1-020054554e01, value=0005, >timestamp=1300883402593000) > (column=89f7c3e0-560b-11e0-a6ac-020054554e01, value=0005, >timestamp=1300966880883000)) >=> (super_column=cadf5940-55ed-11e0-9b97-020054554e01, > (column=cb03aa20-55ed-11e0-9b97-020054554e01, value=0004, >timestamp=1300954178721000) > (column=27092500-5609-11e0-b1f1-020054554e01, value=0004, >timestamp=1300965858839000) > (column=5cdf88d0-560a-11e0-a6ac-020054554e01, value=0005, >timestamp=1300966438198000) > (column=c6e34110-561c-11e0-9399-020054554e01, value=0005, >timestamp=1300974305208000)) >=> (super_column=309d66a0-5602-11e0-9cc8-020054554e01, > (column=30d8e900-5602-11e0-9cc8-020054554e01, value=0005, >timestamp=1300963602927000) > (column=8c8a4f40-5603-11e0-9cc8-020054554e01, value=0005, >timestamp=1300963728307000) > (column=62246620-5606-11e0-9e06-020054554e01, value=0005, >timestamp=1300964702748000) > (column=db951080-561b-11e0-8880-020054554e01, value=0003, >timestamp=1300973895462000)) >=> (super_column=e44f1860-560c-11e0-b696-020054554e01, > (column=e5045ea0-560c-11e0-b696-020054554e01, value=0005, >timestamp=1300967480905000)) >=> (super_column=e53d7000-560c-11e0-b696-020054554e01, > (column=e56395a0-560c-11e0-b696-020054554e01, value=0005, >timestamp=1300967620609000)) >=> (super_column=90ce8370-5615-11e0-b696-020054554e01, > (column=9100de10-5615-11e0-b696-020054554e01, value=0005, >timestamp=1300971213814000) > (column=a5171450-5615-11e0-b696-020054554e01, value=0005, >timestamp=1300971294115000) > (column=9fb68390-5617-11e0-9ed9-020054554e01, value=0002, >timestamp=1300972093565000) > (column=79889ed0-561a-11e0-bf27-020054554e01, value=0002, >timestamp=130097330153)) >--- >RowKey: INCREMENTAL >=> (super_column=cadf5940-55ed-11e0-9b97-020054554e01, >
Filter on row iterator
Hi all, I have a column family with about 300 rows. Rows name are of 2 categories: number (eg: 12345) e_number (eg: e_12345) is there any way to extract only rows that are numbers? For the moment I'm iterating over all rows with a KeyRange and filtering client-side but I don't like this solution. I've seen that KeyRange can be created using tokens instead of keys but I don't understand how they works and did not find any working example. (Java/Pelops/Cassandra 0.7.5) TIA Carlo
Sorting in Cassandra
Hi, I need some help about sorting data in the right way from Cassandra. I have a SuperColumnFamily SCF: UserData { UserID (ROW) { SuperColumnKey1 { firstCol: value secondCol: value } SuperColumnKey2 { firstCol: value secondCol: value } } } Both CompareWith (SC & columns) are UTF8type ... First question: when I make a get of all supercolumns within the UserID row I'd expect to receive them sorted alphabetically ... but it does not happen (did not understand what's the order ...) Am I assuming it wrong? Second question: can i get back all data sorted on firstCol Column? Imagine like SCKey is an ID of a Company and FirstCol is the name ... how can I get all the companies of a user sorted by name (alphabetic order)? I am using Pelops Client on Cassandra 0.6.5 Thanks in advance for any help.
R: Re: Sorting in Cassandra
Aaron, first of all thanks for your time. 1. You cannot return just the super columns, you have to get their sub columns as well. The returned data is ordered, please provide and example of where it is not. I don't know what I did before but now I checked and data are sorted as I expected them to be :-o. I know I can't get a SC without their sub columns and this is ok. 2. Pull back the entire row and filter/sort the columns client side. It's not possible to return columns of the same name from different super columns (I think that's what you are asking). Let me know if you think you have too much data per row to do that. Probably I explained myself wrong. What I want is to get the entire ROW back but already ordered on the base of a specific column key and not on the base of the SCKey ... example UID (ROW) { Company0 { name: zaz, address: street x, phone: 123, other cols } Company1 { name: abacus, address: street y, phone: 234, other cols } Company2 { name: more, address: street x, phone: 345, other cols }} What I want is to get all the data back from cassandra sorted by the name of the company, and not of the SC ... UID (ROW) { Company1 { name: abacus, address: street y, phone: 234, other cols } Company2 { name: more, address: street x, phone: 345, other cols }Company0 { name: zaz, address: street x, phone: 123, other cols } } As far as I know Cassandra I don't think it's possible since I cannot be sure that each SC contains the specific Column (name), right? Is the only way to sort them on client-side? Best Regards
Client-side sorting
Hi all, do you know any component for client-side sorting of cassandra structures? Like order groups of SuperColumn on the base of a value of SubColumn and similar operations? (ordering by asciitype/bytestype and so on) ... Do you know anything like this? I'd like to avoid DTO/VO pattern + Comparable Interface. like ... Inside Cassandra: UID (ROW) { Company1 { name: webcompany, address: street c, other columns } Company2 { name: acompany, address: street b, other columns } Company3 { name: thecompany, address: street a, other columns } } Sort asciitype on *name* subcolumn UID (ROW) { Company2 { name: acompany, address: street b, other columns } Company3 { name: thecompany, address: street a, other columns } Company1 { name: webcompany, address: street c, , other columns } } Sort asciitype on *street* subcolumn UID (ROW) { Company3 { name: thecompany, address: street a, other columns } Company2 { name: acompany, address: street b, other columns } Company1 { name: webcompany, address: street c, , other columns } } If exists I'd like not reinventing the wheel. Best Regards
R: Client-side sorting
>Hi all, >do you know any component for client-side sorting of cassandra structures? Sorry, i forget. I' am using Java ...
TimeUUID makes me crazy
I am getting crazy using TimeUUID in cassandra via Java. I've read the FAQ but it didn't help. Can I use a TimeUUID as ROW identifier? (if converted to string) I have a CF like this and SCF like these: TIMEUUID OPECID (ROW) { phone: 123 address: street xyz } String USERID (ROW) { TIMEUUID OPECID (SuperColumnName) { collection of columns; } } In one situation the TimeUUID is a ROW identifier while in another is the SuperColumn name. I get many "UUID must be a 16 byte" when I try to read a data that did not give any exception during his save. at a Time T0 this one works: mutator.writeColumns(UuidHelper.timeUuidFromBytes (OpecID).toString(), opecfamily, notNull); // (notnull contains a list of columns also opecstatus) Immediately after this one raise an exception: selector.getColumnFromRow (UuidHelper.timeUuidFromBytes(OpecID).toString(), opecfamily, "opecstatus", ConsistencyLevel.ONE) I hope that someone help me understanding it ...
R: Re: TimeUUID makes me crazy
I am using Pelops for Cassandra 0.6.x The error that raise isInvalidRequestException(why:UUIDs must be exactly 16 bytes) For the UUID I am using the UuidHelper class provided.
R: Indexes on Columns & SubColumns Clarification
In each family, both CF and SCF, data are grouped by rows. Just to give an idea ... Super Column Family Name{ Row 1 { SuperColumn1 { Column1 Key: Column1 Value ... ColumnN Key: ColumnN Value} SuperColumn2 { Column1 Key: Column1 Value, ColumnN Key: ColumnN Value} } Row N { SuperColumn1 { Column1 Key: Column1 Value ... ColumnN Key: ColumnN Value} SuperColumn2 { Column1 Key: Column1 Value ... ColumnN Key: ColumnN Value} SuperColumn3 { Column1 Key: Column1 Value ... ColumnN Key: ColumnN Value} } } Column Family Name { Row1 { Column1 Key: Column1 Value . ColumnN Key: ColumnN Value } RowN { Column1 Key: Column1 Value . ColumnN Key: ColumnN Value } } Your representation looks like a SCF ... detailed_log: { // supercolumnfamily username : { // row uuid // supercolumn identifier { { price : 100 } // column price { min : 10 } column min { max : 500 }, // column max } uuid // supercolumn identifier { { price : 100 } // column price { min : 10 } column min { max : 500 }, // column max } } } detailed_log can contains from 0 to N rows and each row can contain from 0 to N SuperColumns. Each SuperColumn can contain from 0 to N columns. >SELECT * FROM detailed_log WHERE username = 'foobar' AND uuid RANGE( >start_UUID -> end_UUID ); I would say in Pelops (java) Client i Use is something like this ... getSuperColumnsFromRow(/** * Retrieve super columns from a row * @param rowKeyThe key of the row * @param columnFamily The name of the column family containing the super columns * @param colPredicate The super column selector predicate * @param cLevelThe Cassandra consistency level with which to perform the operation * @return A list of matching columns */) List result = selector.getSuperColumnsFromRow(username," detailed_log", Selector.newColumnsPredicateAll(false, howmany), ConsistencyLevel.ONE); This will retrieve "howmany" SuperColumns, sorted by your Storage Conf sorting definition, from the row username. Hope this helps. Best Regards Carlo