from:"cbert...@libero.it"

Counters values are less than expected [1.0.6 - Java/Pelops]

2012-07-19 Thread cbert...@libero.it

Hi all, I have a problem with counters I'd like to solve before going in 
production.
When a user write a comment in my platform I increase a counter (there is a 
counter for each user) and I write a new column in the user specific row.
Everything worked fine but yesterday I noticed that the column count of the 
Row was different from the counters value ... 

In my test environment the user had 7 comments, so 7 columns and 7 as value of 
his countercolumn.
I wrote 3 comments in few minutes, the counter value was still 7, the columns 
number was 10!
Counters and columns are written in the same operation. I've checked for my 
application log but all was normal.
I wrote one more comment today to check and now counter is 8 and column number 
is 11 .

I'm trying to get permissions to read the cassandra log (no comment) but in 
the meanwhile I'd like to know if anyone faced problems like this one ... I've 
read that sometimes people had counters bigger than expected due to client 
retry of succesful operation marked as failed ...  

I will post log results ... thanks for any help

Regards,
Carlo

R: Re: Counters values are less than expected [1.0.6 - Java/Pelops]

2012-07-23 Thread cbert...@libero.it



Cannot reproduce ...Written in CL Quorum, RF = 3, cluster of 5 nodes ... I 
suppose it's an issue with the client since it's not the first "strange 
behaviour" with CounterColumns ...



Messaggio originale

Da: aa...@thelastpickle.com

Data: 20/07/2012 11.12

A: 

Ogg: Re: Counters values are less than expected [1.0.6 - Java/Pelops]



Nothing jumps out, can you reproduce the problem ? 
If you can repo it let us know and the RF / CL. 
Good luck.

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com



On 20/07/2012, at 1:07 AM, cbert...@libero.it wrote:Hi all, I have a problem 
with counters I'd like to solve before going in 
production.
When a user write a comment in my platform I increase a counter (there is a 
counter for each user) and I write a new column in the user specific row.
Everything worked fine but yesterday I noticed that the column count of the 
Row was different from the counters value ... 

In my test environment the user had 7 comments, so 7 columns and 7 as value of 
his countercolumn.
I wrote 3 comments in few minutes, the counter value was still 7, the columns 
number was 10!
Counters and columns are written in the same operation. I've checked for my 
application log but all was normal.
I wrote one more comment today to check and now counter is 8 and column number 
is 11 .

I'm trying to get permissions to read the cassandra log (no comment) but in 
the meanwhile I'd like to know if anyone faced problems like this one ... I've 
read that sometimes people had counters bigger than expected due to client 
retry of succesful operation marked as failed ...  

I will post log results ... thanks for any help

Regards,
Carlo

Generic questions over Cassandra 1.1/1.2

2012-11-25 Thread cbert...@libero.it

Hi all,
I'm in production with cassandra since version 0.6 then upgraded to 0.7 and 
finally to 1.0.
If I look at my schema now it's "senseless" to be on 1.0 but many things 
changed from 0.6 ... secondary indexes, counters, expiring columns and more.
Now I am going to write a new application using cassandra so I started reading 
documentation in order to "model" the new db using all the new features and not 
reinventing the wheel.

So let's start with couple of questions ... sorry if stupid :-)

1) SCF are deprecated and I see that what it used to be a development concept 
(use a CF and build a row name using ROW+SC name, if you want keep sorting use 
OPP) has become a Cassandra Concept (compund key).
Is it right? And more, can I avoid OPP when using compound keys since inside 
partition key data are ordered on the remaining components of the primary key?

finally  I've tried to use the order by to sort data and it works -- but can I 
use order by and where clause on a secondary index together? 

CREATE TABLE ctable (
basekey uuid,
extensionkey uuid,
myvalue varchar,
PRIMARY KEY (basekey, extensionkey)
)

 SELECT * FROM ctable WHERE basekey = ?  and myvalue = ? ORDER BY extensionkey 
DESC LIMIT 5
I haven't been able to do it 

2) Is Cassandra still schemaless? One thing I loved is that to create a new 
column I didn't have to "alter" any cf before. 
Trying CQL 3 I noticed that if I try to make an insert of a new "column" not 
defined in schema I raised an exception. 

Thanks in advance for any help

Carlo

R: Re: Generic questions over Cassandra 1.1/1.2

2012-11-25 Thread cbert...@libero.it


Aaron first of all thanks for your precious help everytime 

Some resources for CQL 3, it may match your needs. If not you can still use 
Thrift through your favourite client...There have been a few articles on the DS 
blog http://www.datastax.com/dev/blogA talk at the conference by Eric  
http://www.datastax.com/events/cassandrasummit2012/presentationsI did a webinar 
about it last month http://www.datastax.com/resources/webinars/collegecredit
I will read all the links carefully. The idea was to keep Pelops (client I am 
familiar with) and include CQL3 through the new Java driver Datastax is going 
to provide.
SELECT * FROM ctable WHERE basekey = ?  and myvalue = ? ORDER BY extensionkey 
DESC LIMIT 5
I haven't been able to do it ….
That looks ok, what was the error ? What cassandra version and what CQL version?
Cassandra 1.2 beta2 and CQL 3 -- honestly I didn't remember the exact  error 
(only that it was about the order by) and I don't have Cassandra here to try. 
Will write more about this tomorrow ... however if I avoided the where on 
secondary indexed column the query was ok. 
2) Is Cassandra still schemaless? One thing I loved is that to create a new 
column I didn't have to "alter" any cf before. 
Trying CQL 3 I noticed that if I try to make an insert of a new "column" not 
defined in schema I raised an exception. 
CQL 3 requires a schema, however altering the schema is easier. And in 1.2 will 
support concurrent schema modifications. Thrift API is still schema less. 
What it means that it will support "concurrent schema modifications?" (if the 
answer is in the link above I will know tomorrow :) )I imagined that only CQL 
required a schema. What happen in a situation like this?
1) I create a table using CQL2) I add a new column using Thrift3) I query for 
the column using CQL
One more question:Is there any noticeable performance difference between thrift 
or CQL3?
Thanks,Carlo

Re: Generic questions over Cassandra 1.1/1.2

2012-11-26 Thread cbert...@libero.it

>> Aaron first of all thanks for your precious help everytime ….
>Thanks for using Cassandra since version 0.6 :)

ahahah :-) 

>There are two types of CQL 3 tables, regular ones and those that use "COMPACT 
STORAGE". Regular CQL 3 tables are not visible to Thrift as they store some 
extra data that thrift clients may not understand. COMPACT STORAGE tables are 
visible to thrift for read and write, not sure about schema mods. They do not 
support the compound primary key,

Thanks for the answer ... the error described before is: "ORDER BY is only 
supported when the partition key is restricted by an EQ or an IN."
But I don't see how I didn't respect the rule ...

Cheers,
Carlo

R: Re: Generic questions over Cassandra 1.1/1.2

2012-11-26 Thread cbert...@libero.it


Da: sylv...@datastax.com

> The error message is indeed somewhat misleading and I've just committed a fix 
> to return a better message. But at the end of the day, the > limitation is 
> that ORDER BY is just not supported with 2ndary indexes.
mmm this is not good news for the model I just designed ... however thanks for 
the information.
Please write it somewhere in datastax documentation cause I didn't find it 
anywhere and lost time to understand what I did wrong :-)
 --Sylvain

Migrate a model from 0.6

2014-05-16 Thread cbert...@libero.it

Hi all,
more than a years ago I wrote a comment for migrating an old schema to a new 
model.
Since the company had other priorities we didn't realize, and now I'm trying 
to upgrade 
my 0.6 data-model to the newest 2.0 model.

The DB contains mainly comments written by users on companies.
Comments must be validated (when they come into the application they are in 
"pending" status,
and then they can be "approved" or "rejected").

The main queries with very intensive use (and that should perform very fast) 
are:

1) Get all approved comments of a company sorted by insertion time
2) Get all approved comments of a user sorted by insertion time
3) Get latest X approved comments in city with a vote higher than Y sorted by 
insertion time 

User/Company comments are less than 100 in 90% of situations: in general when 
dealing with
user and company comments the amount of data is few kilobytes.
Comments in a city can be a more than 200.000 and is a fast-growing number.

In my old data model I had companies table, users table and comments table. 
The last containing the comments and 3 more
column families (company_comments/user_comments/city_comments) containing only 
a set of time-sorted uuid pointers to comments table. 

I have no idea in how many tables I should keep data in new model. I've been 
reading lots of
documentation: to make the model easier I though something like this ...

users and companies table like in the old model. As far as comments:

CREATE TABLE comments (
  location text,
  id timeuuid,
  status text,
  companyid uuid,
  userid uuid,
  text text,
  title text,
  vote varint,
  PRIMARY KEY ((location, status, vote), id)
) WITH CLUSTERING ORDER BY (id DESC);

create index companyid_key on commenti(companyid);
create index userid_key on commenti(userid);

This model should provide, out of the box, the query number 3. 

select * from comments where location='city' and status='approved' and vote in 
(3,4,5) order by id DESC limit X;

But the other 2 queries are made with secondary index and client-side 
intensive.

select * from comments where companyid='123';
select * from comments where userid='123';

And this will retrieve all company/user comments but they are

1 - not filtered by their status
2 - not sorted in any way

Considering the amount of data told before how would you model the platform?

Thanks for any help

Clustering order and secondary index

2014-05-16 Thread cbert...@libero.it

Hi all,
I'm trying to migrate my old project born with Cassandra 0.6 and grown with 0.7
/1.0 to the latest 2.0.
I have an easy question for you all: query using only secondary indexes do not 
respect any clustering order?

Thanks

backend query of a Cassandra db

2014-05-30 Thread cbert...@libero.it

Hello,
I have a working cluster of Cassandra that performs very well on a high 
traffic web application. 
Now I need to build a backend web application to query Cassandra on many non 
indexed columns ... what is the best way to do that? Apache hive? Pig?

Cassandra 2 

Thanks

Moving a CF between keyspaces

2011-05-30 Thread cbert...@libero.it

Hi all,
for some reason I have a CF in a Keyspace and I need to duplicate this CF and 
its content into another keyspace
Is there any best practice to do it or I need to read/write all rows?

Best regards

Carlo

R: Re: AntiEntropy?

2011-07-12 Thread cbert...@libero.it

>From "Cassandra the definitive guide" - Basic Maintenance - Repair

Running nodetool repair causes Cassandra to execute a Major Compaction [...] 
AntiEntropyService implements the Singleton pattern and defines the static 
Differencer class as well, which is used to compare two trees. If it finds any 
differences, it launches a repair for the ranges that don't agree. So, although 
Cassandra takes care of such matters automatically on occasion you can run it 
yourself as well

So now I'm confused ... Cassandra doc says that I have to run it by myself, 
Cassandra book says I don't have to. 
Did I misunderstand something?


>> I looked around in the code, it seems that AntiEntropy operations are
>> not automatically run in the server daemon, but only
>> manually invoked through nodetool, am I correct?
>
>Yes, and it's important that you do run repair:
>http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

R: Re: Re: AntiEntropy?

2011-07-12 Thread cbert...@libero.it

>The book is wrong, at least by current versions of Cassandra (I'm
>basing that on the quote you pasted, I don't know the context).

To be sure that I didn't misunderstand (English is not my mother tongue) here 
is what the entire "repair paragraph" says ...

Basic Maintenance
There are a few tasks that you’ll need to perform before or after more 
impactful tasks.
For example, it makes sense to take a snapshot only after you’ve performed a 
flush. So
in this section we look at some of these basic maintenance tasks: repair, 
snapshot, and
cleanup.

Repair
Running nodetool repair causes Cassandra to execute a major compaction. A 
Merkle
tree of the data on the target node is computed, and the Merkle tree is 
compared with
those of other replicas. This step makes sure that any data that might be out 
of sync
with other nodes isn’t forgotten.
During a major compaction (see “Compaction” in the Glossary), the server 
initiates a
TreeRequest/TreeReponse conversation to exchange Merkle trees with neighboring
nodes. The Merkle tree is a hash representing the data in that column family. 
If the
trees from the different nodes don’t match, they have to be reconciled (or 
“repaired”)
in order to determine the latest data values they should all be set to. This 
tree compar-
ison validation is the responsibility of the org.apache.cassandra.service.
AntiEntropy
Service class. AntiEntropyService implements the Singleton pattern and defines 
the
static Differencer class as well, which is used to compare two trees. If it 
finds any
differences, it launches a repair for the ranges that don’t agree.
So although Cassandra takes care of such matters automatically on occasion, 
you can
run it yourself as well.



>
>nodetool repair must be scheduled by the operator to run regularly.
>The name "repair" is a bit unfortunate; it is not meant to imply that
>it only needs to run when something is "wrong".
>
>-- 
>/ Peter Schuller
>

R: Re: Re: Re: AntiEntropy?

2011-07-13 Thread cbert...@libero.it

Thanks for the confirmatio, Peter.
In the company I work for I suggested many times to run repair at least 1 
every 10 days (gcgraceseconds is set approx to 10 days in our config) -- but 
this book has been used against me :-) I will ask to run repair asap

>Messaggio originale
>Da: peter.schul...@infidyne.com
>Data: 13/07/2011 5.07
>A: , "cbert...@libero.it"
>Ogg: Re: Re: Re: AntiEntropy?
>
>> To be sure that I didn't misunderstand (English is not my mother tongue) 
here
>> is what the entire "repair paragraph" says ...
>
>Read it, I maintain my position - the book is wrong or at the very
>least strongly misleading.
>
>You *definitely* need to run nodetool repair periodically for the
>reasons documented in the link I sent before, unless you have specific
>reasons not to and know what you're doing.
>
>-- 
>/ Peter Schuller
>

R: Re: Re: Re: Re: AntiEntropy?

2011-07-13 Thread cbert...@libero.it

>Note that if GCGraceSeconds is 10 days, you want to run repair often
>enough that there will never be a moment where there is more than
>exactly 10 days since the last successfully completed repair
>*STARTED*.

>When scheduling repairs, factor in things like - what happens if
>repair fails? Who gets alerted and how, and will there be time to fix
>the problem? How long does repair take?

Peter thanks for the tip. I'm still very surprised for what I've read in the 
book about the repair.
Best Regards

Carlo

Too many open files during Repair operation

2011-07-19 Thread cbert...@libero.it

Hi all.
In production we want to run nodetool repair but each time we do it we get the 
too many open files error.
We've increased the number of available FD for Cassandra till 8192 but still 
we get the same error after few seconds.
Should I increase it more? 

WARN [Thread-7] 2011-07-19 12:34:00,348 CustomTThreadPoolServer.java (line 
131) Transport error occurred during acceptance of message.
org.apache.thrift.transport.TTransportException: java.net.SocketException: Too 
many open files
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.
java:124)
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl
(TCustomServerSocket.java:68)
at org.apache.cassandra.thrift.TCustomServerSocket.acceptImpl
(TCustomServerSocket.java:39)
at org.apache.thrift.transport.TServerTransport.accept
(TServerTransport.java:31)
at org.apache.cassandra.thrift.CustomTThreadPoolServer.serve
(CustomTThreadPoolServer.java:121)
at org.apache.cassandra.thrift.CassandraDaemon$ThriftServer.run
(CassandraDaemon.java:155)
Caused by: java.net.SocketException: Too many open files
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:408)
at java.net.ServerSocket.implAccept(ServerSocket.java:462)
at java.net.ServerSocket.accept(ServerSocket.java:430)
at org.apache.thrift.transport.TServerSocket.acceptImpl(TServerSocket.
java:119)
... 5 more


nodetool repair keyspacename -h host

Cassandra 0.7.5, 1 cluster, 5 nodes. Each node give the same output.
One more question: when repair start throwing this kind of exceptions (very 
fast) we stop the process of repair ... is it dangerous for data?

Best Regards

Carlo

2800 file descriptors?

2011-07-20 Thread cbert...@libero.it

Hi all,
I wonder if is normal that Cassandra (5 nodes, 0.75) has more than 2800 fd 
open and growing.
I still have the problem that during repair I get into the "too many open 
files"

Best regards

R: Re: 2800 file descriptors?

2011-07-20 Thread cbert...@libero.it

> For the "too many open files" issue, maybe you could try:  ulimit -n 5000 
> && .
Ok, thanks for the tip but I get this error running nodetool repair and not 
during cassandra execution. 
I however wonder if this is normal or not ... in production do you get similar 
numbers? Isn't it too much?
best regards

My "nodetool" in Java

2011-07-20 Thread cbert...@libero.it

Hi all,
I'd like to build something like "nodetool" to show the status of the ring 
(nodes up-down, info on single node) all via JAVA.
Do you have any tip for this? (I don't want to run the nodetool through java 
and capture the output ...).

I have really no idea on how to do it ... :-)

R: Re: My "nodetool" in Java

2011-07-21 Thread cbert...@libero.it

It was easier than what I though :-)
thanks

>Messaggio originale
>Da: jeremy.hanna1...@gmail.com
>Data: 20/07/2011 22.25
>A: 
>Ogg: Re: My "nodetool" in Java
>
>If you look at the bin/nodetool file, it's just a shell script to run org.
apache.cassandra.tools.NodeCmd.  You could probably call that directly from 
your code.
>
>On Jul 20, 2011, at 3:18 PM, cbert...@libero.it wrote:
>
>> Hi all,
>> I'd like to build something like "nodetool" to show the status of the ring 
>> (nodes up-down, info on single node) all via JAVA.
>> Do you have any tip for this? (I don't want to run the nodetool through 
java 
>> and capture the output ...).
>> 
>> I have really no idea on how to do it ... :-)
>
>

Can not repair

2011-07-21 Thread cbert...@libero.it

Hi all,
I can't get the repair in my production.
We are out since 6 months but before we did not perform any delete do we 
didn't need to run repair.
Now we are out since 2 weeks with a new version of our software that performs 
delete but we can not get the nodetool repair working,

The first problem I see in the log is a: 

ERROR 16:34:49,790 Fatal exception in thread Thread[CompactionExecutor:1,1,
main]
java.io.IOException: Keys must be written in ascending order.
at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend
(SSTableWriter.java:111)
at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.
java:128)
at org.apache.cassandra.db.CompactionManager.doCompaction
(CompactionManager.java:451)
at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.
java:124)
at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.
java:94)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask
(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run
(ThreadPoolExecutor .java:908)
at java.lang.Thread.run(Thread.java:662)


I've read something concerning a change of the partitioner but I've never 
modified it. There is to say that we were online with 0.6.5 and now we are with 
0.7.5: to migrate from this version to another we followed the documentation 
(conversion of the yaml, nodetool drain and so on) ... 

The system is working even if there is this "problem" but not the repair. The 
limit of FD of the user running the repair is unlimited but everytime we get a 
"Too many open files".

I'm little bit worried cause if some delete reappears the webapp will export 
wrong data ...

Best regards

Carlo

Counters and Top 10

2011-12-14 Thread cbert...@libero.it

Hi all,
I'm using Cassandra in production for a small social network (~10.000 people).
Now I have to assign some "credits" to each user operation (login, write post 
and so on) and then beeing capable of providing in each moment the top 10 of 
the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new 
version in order to use Counters for the user points but ... what about the top 
10?
I was thinking about a specific ROW that always keeps the 10 most active users 
... but I think it would be heavy (to write and to handle in thread-safe mode) 
... can counters provide something like a "value ordered list"?

Thanks for any help. 
Best regards,

Carlo

R: Re: Counters and Top 10

2011-12-25 Thread cbert...@libero.it

Hi all,
I've red all your messages concerning the top 10 ... any solution is possibile 
but I still did not find the best one.

Using a composite Column Name as suggested would be smart cause it brings to a 
sorted row where I can have my top-10 in any moment but it can slow down all 
the platform since, for every operation, I have to read data from cassandra, 
calculate and store data back. Using counters I could just say "hey, +1 on 
this" and forget. But using counters I don't have any kind of value-sorting ...

I know redis but I think it's too much to use a new key-value db just for this 
sorting ... I think I'll use a thread that run every X to generate the top10 
row ... it won't be realtime but at least it will keep platform performance to 
a good level.

Thank you all and merry christmas

>Messaggio originale
>Da: ben...@noisette.ch
>Data: 25/12/2011 10.19
>A: 
>Ogg: Re: Counters and Top 10
>
>With Composite Column Name, you can even have column composed of sore
>(int) and userid (uuid or whatever). Empty column value to avoid
>repeating user UUID.
>
>
>2011/12/22 R. Verlangen :
>> I would suggest you to create a CF with a single row (or multiple for
>> historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple
>> columns for every user's score. The column (utf8) would then be the score +
>> something unique of the user (e.g. hex representation of the TimeUUID). The
>> value would be the TimeUUID of the user.
>>
>> By default columns will be sorted and you can perform a slice to get the 
top
>> 10.
>>
>> 2011/12/14 cbert...@libero.it 
>>
>>> Hi all,
>>> I'm using Cassandra in production for a small social network (~10.000
>>> people).
>>> Now I have to assign some "credits" to each user operation (login, write
>>> post
>>> and so on) and then beeing capable of providing in each moment the top 10
>>> of
>>> the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
>>> version in order to use Counters for the user points but ... what about
>>> the top
>>> 10?
>>> I was thinking about a specific ROW that always keeps the 10 most active
>>> users
>>> ... but I think it would be heavy (to write and to handle in thread-safe
>>> mode)
>>> ... can counters provide something like a "value ordered list"?
>>>
>>> Thanks for any help.
>>> Best regards,
>>>
>>> Carlo
>>>
>>>
>>
>
>
>
>-- 
>sent from my Nokia 3210
>

Migration from 0.7 to 1.0

2012-01-04 Thread cbert...@libero.it

Hi,
I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to 
know the best way to do it ...

"Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, 
one node at a time.  (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: 
upgrade to the most recent 0.8 release first.) You do not need to bring down 
the whole cluster at once.  - After upgrading, run nodetool scrub against each 
node before running repair, moving nodes, or adding new ones."

So what I'd do is for each node to ...

1 - run nodetool drain
2 - stop cassandra process
3 - start the new cassandra 1.0
4 - run nodetool scrub on the node

Is it right? Do i miss something (I will backup everything before the 
upgrade)? Should I worry for some kind of particular/known problems? As far as 
maintenance is concerned, is enough to run a repair every x? (x < 
GCGraceSeconds)

Best regards,
Carlo

R: Re: Migration from 0.7 to 1.0

2012-01-05 Thread cbert...@libero.it



Aaron first of all thanks for your great support.
I'm paranoid, so I would  upgrade 1 node and let it soak in for a few 
hours. Nothing like upgrading an entire cluster and then discovering a
problem. 
Ok but as far as my application is concerned is safe to keep a cluster with 
part of 1.0 and part of 0.7?I've read that they can communicate but will it 
bring to "strange" situations? Will my application continue working 
(java/pelops)?
You can take some extra steps when doing a rolling restart see 
http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
This is what I was looking for! :-)Thanks for the repair tips ...
Best regards,Carlo




Messaggio originale

Da: aa...@thelastpickle.com

Data: 04/01/2012 22.00

A: 

Ogg: Re: Migration from 0.7 to 1.0



Sounds good. 
You can take some extra steps when doing a rolling restart see 
http://blog.milford.io/2011/11/rolling-upgrades-for-cassandra/
Also make sure repair *does not* run until all the nodes have been upgraded. 
Do i miss something (I will backup everything before the 
upgrade)? I'm paranoid, so I would  upgrade 1 node and let it soak in for a few 
hours. Nothing like upgrading an entire cluster and then discovering a problem. 
As far as 
maintenance is concerned, is enough to run a repair every x? (x < 
GCGraceSeconds)once for each node with in that time frame 
http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair

Cheers

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com


On 5/01/2012, at 2:47 AM, cbert...@libero.it wrote:Hi,
I'm going to migrate from Cassandra 0.7 to 1.0 in production and I'd like to 
know the best way to do it ...

"Upgrading from version 0.7.1+ or 0.8.2+ can be done with a rolling restart, 
one node at a time.  (0.8.0 or 0.8.1 are NOT network-compatible with 1.0: 
upgrade to the most recent 0.8 release first.) You do not need to bring down 
the whole cluster at once.  - After upgrading, run nodetool scrub against each 
node before running repair, moving nodes, or adding new ones."

So what I'd do is for each node to ...

1 - run nodetool drain
2 - stop cassandra process
3 - start the new cassandra 1.0
4 - run nodetool scrub on the node

Is it right? Do i miss something (I will backup everything before the 
upgrade)? Should I worry for some kind of particular/known problems? As far as 
maintenance is concerned, is enough to run a repair every x? (x < 
GCGraceSeconds)

Best regards,
Carlo

Schema clone ...

2012-01-09 Thread cbert...@libero.it

Hi,
I have create a new dev-cluster with cassandra 1.0 -- I would like to have the 
same CFs that I have in the 0.7 one but I don't need data to be there, just the 
schema. Which is the fastest way to do it without making 30 "create column 
family ..." 

Best regards,

Carlo

R: Re: Schema clone ...

2012-01-09 Thread cbert...@libero.it


I was just trying it but ... in 0.7 CLI there is no show schema command.When I 
connect with 1.0 CLI to my 0.7 cluster ...
[default@social] show schema;null
I always get a "null" as answer! :-|Any tip for this?
ty, Cheers 
Carlo



Messaggio originale

Da: aa...@thelastpickle.com

Data: 09/01/2012 11.33

A: , "cbert...@libero.it"

Ogg: Re: Schema clone ...



Try show schema in the CLI. 
Cheers

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com



On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:Hi,
I have create a new dev-cluster with cassandra 1.0 -- I would like to have the 
same CFs that I have in the 0.7 one but I don't need data to be there, just the 
schema. Which is the fastest way to do it without making 30 "create column 
family ..." 

Best regards,

Carlo

R: Re: Schema clone ...

2012-01-10 Thread cbert...@libero.it



* Grab the system sstables from one of the 0.7 nodes and spin up a temp 
1.0 machine them, then use the command.  

Probably I'm still sleeping but I can't get what I want! :-(
I've copied the SSTables of a node to my own computer where I installed a 
Cassandra 1.0 just for the purpose.
I've copied it in the data folder under the keyspace name 


carlo@ubpc:/store/cassandra/data/social$


now here I have lots of file like this now ...


User-f-74-Data.db 
User-f-74-Filter.db
User-f-74-Index.db   User-f-74-Statistics.db   
but now how to tell cassandra "hey, load the content of social"?Did I miss 
something?
Cheers,Carlo





Messaggio originale
 Ogg: Re: Schema clone ...



ah, sorry brain not good work. 
It's only in 0.8. 
You could either:
*  write the CLI script by handor* Grab the system sstables from one of the 0.7 
nodes and spin up a temp 1.0 machine them, then use the command. or* See if 
your cassandra client software can help. 

Hope that helps. 

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com


On 9/01/2012, at 11:41 PM, cbert...@libero.it wrote:I was just trying it but 
... in 0.7 CLI there is no show schema command.When I connect with 1.0 CLI to 
my 0.7 cluster ...

[default@social] show schema;
null
I always get a "null" as answer! :-|Any tip for this?
ty, Cheers 
Carlo


Messaggio originale

Da: aa...@thelastpickle.com

Data: 09/01/2012 11.33

A: , "cbert...@libero.it"

Ogg: Re: Schema clone ...



Try show schema in the CLI. 
Cheers

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com



On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:Hi,
I have create a new dev-cluster with cassandra 1.0 -- I would like to have the 
same CFs that I have in the 0.7 one but I don't need data to be there, just the 
schema. Which is the fastest way to do it without making 30 "create column 
family ..." 

Best regards,

Carlo

R: Re: AW: How to control location of data?

2012-01-10 Thread cbert...@libero.it


In each node of the ring has a unique Token which representing the node's 
logical position in the cluster. 
When you perform an operation on a row is calculated a token based on this row 
... the node-token "closest" to the row-token will store the data (and also the 
RF-1 remaining nodes) -- this tecnique should guarantee that data are balanced 
among the cluster (if you use the Random Partitioner)
Regards,Carlo



Messaggio originale

Da: andreas.rudo...@spontech-spine.com

Data: 10/01/2012 15.05

A: "user@cassandra.apache.org"

Ogg: Re: AW: How to control location of data?



-->Hi!
Thank you for your last reply. I'm still wondering if I got you right...
... A partitioner decides into which partition a piece of data belongsDoes your 
statement imply that the partitioner does not take any decisions at all on the 
(physical) storage location? Or put another way: What do you mean with 
"partition"?
To quote http://wiki.apache.org/cassandra/ArchitectureInternals: "... 
AbstractReplicationStrategy controls what nodes get secondary, tertiary,
 etc. replicas of each key range.  Primary replica is always determined 
by the token ring (...)"
... You can select different placement strategies and partitioners for 
different keyspaces, thereby choosing known data to be stored on known 
hosts.This is however discouraged for various reasons – i.e.  you need a lot of 
knowledge about your data to keep the cluster balanced. What is your usecase 
for this requirement? there is probably a more suitable solution. What we want 
is to partition the cluster with respect to key spaces.That is we want to 
establish an association between nodes and key spaces so that a node of the 
cluster holds data from a key space if and only if that node is a *member* of 
that key space.
To our knowledge Cassandra has no built-in way to specify such a 
membership-relation. Therefore we thought of implementing our own replica 
placement strategy until we started to assume that the partitioner had to be 
replaced, too, to accomplish the task.
Do you have any ideas?

Von: Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com] 
Gesendet: Dienstag, 10. Januar 2012 09:53
An: user@cassandra.apache.org
Betreff: How to control location of data? Hi! We're evaluating Cassandra for 
our storage needs. One of the key benefits we see is the online replication of 
the data, that is an easy way to share data across nodes. But we have the need 
to precisely control on what node group specific parts of a key space 
(columns/column families) are stored on. Now we're having trouble understanding 
the documentation. Could anyone help us with to find some answers to our 
questions?·  What does the term "replica" mean: If a key is stored on exactly 
three nodes in a cluster, is it correct then to say that there are three 
replicas of that key or are there just two replicas (copies) and one original?· 
 What is the relation between the Cassandra concepts "Partitioner" and "Replica 
Placement Strategy"? According to documentation found on DataStax web site and 
architecture internals from the Cassandra Wiki the first storage location of a 
key (and its associated data) is determined by the "Partitioner" whereas 
additional storage locations are defined by "Replica Placement Strategy". I'm 
wondering if I could completely redefine the way how nodes are selected to 
store a key by just implementing my own subclass of AbstractReplicationStrategy 
and configuring that subclass into the key space.·  How can I suppress that the 
"Partitioner" is consulted at all to determine what node stores a key first?·  
Is a key space always distributed across the whole cluster? Is it possible to 
configure Cassandra in such a way that more or less freely chosen parts of a 
key space (columns) are stored on arbitrarily chosen nodes? Any tips would be 
very appreciated :-)

R: Re: Schema clone ...

2012-01-11 Thread cbert...@libero.it

I got it :-)Thanks for your patience Aaron ... the problem was that the 
cluster-name in yaml was different.Now it works, I've cloned the schema
Regards
Carlo


Messaggio originale

Da: aa...@thelastpickle.com

Data: 10/01/2012 20.13

A: 

Ogg: Re: Schema clone ...




* Grab the system sstables from one of the 0.7 nodes and spin up a temp 
1.0 machine them, then use the command. Grab the *system* tables Migrations , 
Schema etc.  in cassandra/data/system
Cheers 
-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com


On 10/01/2012, at 10:20 PM, cbert...@libero.it wrote:
* Grab the system sstables from one of the 0.7 nodes and spin up a temp 
1.0 machine them, then use the command.  

Probably I'm still sleeping but I can't get what I want! :-(
I've copied the SSTables of a node to my own computer where I installed a 
Cassandra 1.0 just for the purpose.
I've copied it in the data folder under the keyspace name 


carlo@ubpc:/store/cassandra/data/social$


now here I have lots of file like this now ...


User-f-74-Data.db 
User-f-74-Filter.db
User-f-74-Index.db   User-f-74-Statistics.db   
but now how to tell cassandra "hey, load the content of social"?Did I miss 
something?
Cheers,Carlo




Messaggio originale
 Ogg: Re: Schema clone ...



ah, sorry brain not good work. 
It's only in 0.8. 
You could either:
*  write the CLI script by handor* Grab the system sstables from one of the 0.7 
nodes and spin up a temp 1.0 machine them, then use the command. or* See if 
your cassandra client software can help. 

Hope that helps. 

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com


On 9/01/2012, at 11:41 PM, cbert...@libero.it wrote:I was just trying it but 
... in 0.7 CLI there is no show schema command.When I connect with 1.0 CLI to 
my 0.7 cluster ...

[default@social] show schema;
null
I always get a "null" as answer! :-|Any tip for this?
ty, Cheers 
Carlo


Messaggio originale

Da: aa...@thelastpickle.com

Data: 09/01/2012 11.33

A: , "cbert...@libero.it"

Ogg: Re: Schema clone ...



Try show schema in the CLI. 
Cheers

-Aaron MortonFreelance 
Developer@aaronmortonhttp://www.thelastpickle.com



On 9/01/2012, at 11:12 PM, cbert...@libero.it wrote:Hi,
I have create a new dev-cluster with cassandra 1.0 -- I would like to have the 
same CFs that I have in the 0.7 one but I don't need data to be there, just the 
schema. Which is the fastest way to do it without making 30 "create column 
family ..." 

Best regards,

Carlo

R: Cassandra is not storing values correctly.

2012-05-04 Thread cbert...@libero.it

When I store some values in a certain column these values are only stored when 
cassandra is running. 


What do you mean? 
When i restart cassandra the values that i stored are misteriously gone

Are you sure that your cluster is ok? And that you are not writing columns' TTL?
And also the old values that i deleted before reappear.
definitely: 
http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
Regards,Carlo


Messaggio originale

Da: linuxispossi...@gmail.com

Data: 04/05/2012 9.57

A: 

Ogg: Cassandra is not storing values correctly.



Hi everyone,
i'm using Cassandra as main storage for my PHP platform, but actually seems 
that cassandra is not working properly.
When I store some values in a certain column these values are only stored when 
cassandra is running.
When i restart cassandra the values that i stored are misteriously gone. No 
trace. And also the old values that i deletedbefore reappear.
Have a good day.

CL1 and CLQ with 5 nodes cluster and 3 alives node

2013-07-18 Thread cbert...@libero.it

Hi all,
I'm experiencing some problems after 3 years of cassandra in production (from 
0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory 
Exception.
In the log I can read the warn about the few heap available ... now I'm 
increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing the 
size of rows and memtables thresholds. Other tips?

Now a question -- why with 2 nodes offline all my application stop providing 
the service, even when a Consistency Level One read is invoked?
I'd expected this behaviour:

CL1 operations keep working
more than 80% of CLQ operations working (nodes offline where 2 and 5 in a 
clockwise key distribution only writes to fifth node should impact to node 2)
most of all CLALL operations (that I don't use) failing

The situation instead was that I had ALL services stop responding throwing a 
TTransportException ...

Thanks in advance

Carlo

R: Re: CL1 and CLQ with 5 nodes cluster and 3 alives node

2013-07-22 Thread cbert...@libero.it

Hi Aaron, thanks for your help.

>If you have more than 500Million rows you may want to check the 
bloom_filter_fp_chance, the old default was 0.000744 and the new (post 1.) 
number is > 0.01 for sized tiered. 

I really don't think I have more than 500 million rows ... any smart way to 
count rows number inside the ks?

>> Now a question -- why with 2 nodes offline all my application stop 
providing 
>> the service, even when a Consistency Level One read is invoked?

>What error did the client get and what client are you using ? 
>it also depends on if/how the node fails. The later versions try to shut down 
when there is an OOM, not sure what 1.0 does. 

The exception was a TTransportException -- I am using Pelops client.

>Is the node went into a zombie state the clients may have been timing out. 
The should then move onto to another node. 
>If it had started shutting down the client should have gotten some immediate 
errors. 

It didn't shut down, it was more like in a zombie state,
One more question: I'm experiencing some wrong counters (which are very 
important in my platform since the are used to keep user-points and generate 
the TopX users) --could it be related with this problem? The problem is that in 
some users (not all) the counter column increased its value.

After such a crash in 1.0 is there any best-practice to follow? (nodetool or 
something?)

Cheers,
Carlo

>
>Cheers
>
>
>-
>Aaron Morton
>Cassandra Consultant
>New Zealand
>
>@aaronmorton
>http://www.thelastpickle.com
>
>On 19/07/2013, at 5:02 PM, cbert...@libero.it wrote:
>
>> Hi all,
>> I'm experiencing some problems after 3 years of cassandra in production 
(from 
>> 0.6 to 1.0.6) -- for 2 times in 3 weeks 2 nodes crashed with OutOfMemory 
>> Exception.
>> In the log I can read the warn about the few heap available ... now I'm 
>> increasing a little bit my RAM, my Java Heap (1/4 of the RAM) and reducing 
the 
>> size of rows and memtables thresholds. Other tips?
>> 
>> Now a question -- why with 2 nodes offline all my application stop 
providing 
>> the service, even when a Consistency Level One read is invoked?
>> I'd expected this behaviour:
>> 
>> CL1 operations keep working
>> more than 80% of CLQ operations working (nodes offline where 2 and 5 in a 
>> clockwise key distribution only writes to fifth node should impact to node 
2)
>> most of all CLALL operations (that I don't use) failing
>> 
>> The situation instead was that I had ALL services stop responding throwing 
a 
>> TTransportException ...
>> 
>> Thanks in advance
>> 
>> Carlo
>
>

Data disappear immediately after reading?

2013-07-24 Thread cbert...@libero.it

Hi all, 
I know the subject is not saying much but this is what I'm experiencing now 
with my cluster.
After some years without any problem now I'm experiencing problems with 
counters but, the most serious problem, is data loss immediately after a read.

I have some webservices that I use to query data on Cassandra but in the last 
month happened 2 times the following problem: I call my WS, it shows data. I 
refresh the page -- data are no more available! I can call then 200 times the 
WS but I won't see data anymore ... today my colleague experienced the same 
problem. The WS are ABSOLUTELY read only on the DB and there are no write to 
erase these data. Anyone understand wth is going on? I have no idea but most of 
all I don't know how to fix.

Any help would really be appreciated.

Kind Regards,
Carlo

R: Data disappear immediately after reading?

2013-07-24 Thread cbert...@libero.it

Sorry I forgot to tell

Apache Cassandra 1.07 on Ubuntu 10.04
The data that are disappearing are not Counters but common Rows

>Messaggio originale
>Da: cbert...@libero.it
>Data: 24/07/2013 22.34
>A: 
>Ogg: Data disappear immediately after reading?
>
>Hi all, 
>I know the subject is not saying much but this is what I'm experiencing now 
>with my cluster.
>After some years without any problem now I'm experiencing problems with 
>counters but, the most serious problem, is data loss immediately after a 
read.
>
>I have some webservices that I use to query data on Cassandra but in the 
last 
>month happened 2 times the following problem: I call my WS, it shows data. I 
>refresh the page -- data are no more available! I can call then 200 times 
the 
>WS but I won't see data anymore ... today my colleague experienced the same 
>problem. The WS are ABSOLUTELY read only on the DB and there are no write to 
>erase these data. Anyone understand wth is going on? I have no idea but most 
of 
>all I don't know how to fix.
>
>Any help would really be appreciated.
>
>Kind Regards,
>Carlo 
>

Refactoring old project

2013-09-27 Thread cbert...@libero.it

Hi all, in my very old Cassandra schema (started with 0.6 -- so without 
secondary indexes -- and now on 1.0.6) I have a rating&review platform with 
about 1 million review. The core of the application is the review that a user 
can leave about a company. At the time I created many CF: Comments, 
UserComments, CompanyComments , CityComments -- and I used timeuuid to keep 
data sorted in the way i needed (UserComments/CompanyComments/CityComments did 
not keep real comments but just a "referece" [id] to the comment table)

Since I need comments to be sorted by date, what would be the best way to 
write it again using cassandra 2.0?
Obviously all these CF will merge into one. What I would need is to perform 
query likes ...

Get latest X comments in a specific city
Get latest X comments of a company
Get latest X comments of a user

I can't sort client side because, even if for user/company I can have up to 
200 reviews, for a city I can have 50.000 and more comments.
I know that murmur3 is the suggested one but I wonder if this is not the case 
to use the Order Preserving.

A row entry would be something like

CommentID (RowKey) -- companyId -- userId -- text - vote - city

Another idea is to use a composite key made by (city, commentid) so I would 
have all comments sorted by city for free and could perform client-side sorting 
for user/company comments. Am I missing something? 

TIA,
Carlo

Online shop with Cassandra

2013-10-09 Thread cbert...@libero.it

Hi all,
for an online shop owned by my company I would like to remove MySQL for 
everything concerning the frontend and use Cassandra instead.
The site has more than a million visit each day but what we need to know is

Products (deals) are divided for cities
Each product can stay online for X time and sell a max number of Y items

 CREATE TABLE deals (
city string,
deal_id timeuuid,
availability int,
deal_info string,
PRIMARY KEY ((city, deal_id))
);

The only problem I see in this model is how to guarantee the "availability" of 
a deal and don't "overbook" -- How to solve the problem of "remaining items" in 
real time? 

I have many idea how to solve it on the web-application but I wonder if there 
is nothing ready on Cassandra that might help.

Kindest regards,
Carlo

Ring up but read fails ...

2011-01-23 Thread cbert...@libero.it

Hi all,
I build a java webapp using Cassandra as DB.
At the startup the application creates a cassandra pool using the Pelops 
client ... it my dev and test environment everything works but in production I 
have some strange problems.
So I built a JSP to check the status of Cassandra DB, doing nothing more than 
a ...

try {
  // connect to the webapp pool
  // make some select in quorum mode
  // print OK
} catch (Exception e) {
  // print KO
}

well this page "often" returns KO. While in dev/test environment is OK if 
nodes are up (and KO if some nodes are down), in production it "often" (not 
always) print KO also if the nodetool ring returns all nodes "UP".

Here is and extract of the errors of the webapp ...

ERROR UserNameCmd:38 - java.net.SocketException: Broken pipe
ERROR UidCmd:61 - java.net.SocketException: Broken pipe
org.apache.thrift.transport.TTransportException: java.net.SocketException: 
Broken pipe
...
 at org.apache.cassandra.thrift.Cassandra$Client.send_get_slice(Cassandra.java:
386)   at org.apache.cassandra.thrift.Cassandra$Client.get_slice(Cassandra.java:
371)

Any idea for "debugging"? :)

(5 nodes, replicationfactor=3)

Best regards

Carlo

R: Re: Ring up but read fails ...

2011-01-23 Thread cbert...@libero.it

> I've seen this when you leave a socket open and idle for a long time. The 
> connection times out. 
It could be the situation ... any idea about the solution?I create the pool 
once at startup and rely on this ...

> Perhaps you use wrong transport in Thrift. Which version of cassandra you use?
Cassandra 0.6.8
Best Regards
-- - Carlo -

R: Re: R: Re: Ring up but read fails ...

2011-01-23 Thread cbert...@libero.it

> Reconnect and try again?
Sorry what do you mean by "Reconnect and try again?" -- You mean to shut down 
the old pool and create a new pool of connections?I don't have the possibility 
to handle the single connection using Pelops ...
>From Dominic Williams Blog 
"To work with a Cassandra cluster, you need to start off by defining a  
connection pool. This is typically done once in the startup code of your  
application"[...]
"One of the key design decisions that at the time of writing  distinguishes 
Pelops, is that the data processing code written by  developers does not 
involve connection pooling or management. Instead,  classes like Mutator and 
Selector borrow  connections to Cassandra from a Pelops pool for just the 
periods that  they need to read and write to the underlying Thrift API. This 
has two  advantages.

Firstly, obviously, code becomes cleaner and developers are freed  from 
connection management concerns. But also more subtly this enables  the Pelops 
library to completely manage connection pooling itself, and  for example keep 
track of how many outstanding operations are currently  running against each 
cluster node.

This for example, enables Pelops to perform more effective client  load 
balancing by ensuring that new operations are performed against the  node to 
which it currently has the least outstanding operations  running. Because of 
this architectural choice, it will even be possible  to offer strategies in the 
future where for example nodes are actually  queried to determine their load." 

TIA
-- - Carlo -

Backend application for Cassandra

2011-02-14 Thread cbert...@libero.it

Hi all,
I've build a web application using Cassandra.
Data are stored in order to be quickly red/sorted due to my web-app needs. 
Everything is working quite good.
Now the big "problem" is that the "other side" of my company needs to create 
reports over these data and the query they need to do would be very "heavy" in 
terms of client-side complexity.
I'd like to know if you have any tips that may help ... I've red something 
about Kundera and Lucandra but I don't know these could be solutions ...

Did you already face problems like this? Could you suggest any valid 
product/solution?
I've heard (team-mates) some tips like "export all your CF into a relational 
model and query it" ... and I behaved like i didn't hear it :)

TIA for any help

Best Regards

Carlo

Are row-keys sorted by the compareWith?

2011-02-18 Thread cbert...@libero.it

Hi all,
I created a CF in which i need to get, sorted by time, the Rows inside. Each 
Row represents a comment.



I've created a few rows using as Row Key a generated TimeUUID but when I call 
the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: 
rows are not sorted by TimeUUID.
I though it was probably cause of the random-part of the TimeUUID so I create 
a new CF ...



This time I created a few rows using the java System.CurrentTimeMillis() that 
retrieve a long. I call again the "GetColumnsFromRows" and again the same 
results: data are not sorted!
I've read many times that Rows are sorted as specified in the compareWith but 
I can't see it. 
To solve this problem for the moment I've used a SuperColumnFamily with an 
UNIQUE ROW ... but I think this is just a workaround and not the solution.



Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I 
expected: sorted by TimeUUID. Why it does not happen the same with the Rows? 
I'm confused.

TIA for any help.

Best Regards

Carlo

R: Re: Are row-keys sorted by the compareWith?

2011-02-21 Thread cbert...@libero.it

Sorry Dan, I just noticed I answer you and not to the group!Didn't want to 
bother, just mistake.
Best Regards
Carlo


Messaggio originale

Da: d...@reactive.org

Data: 21/02/2011 4.23

A: , "cbert...@libero.it"

Ogg: Re: Are row-keys sorted by the compareWith?





Hi Carlo,As Jonathan mentions the compareWith on a column 
family def. is defines the order for the columns *within* a row... In order to 
control the ordering of rows you'll need to use the OrderPreservingPartitioner 
(http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-ring).
As for getColumnsFromRows; it should be returning you a map of lists.  The map 
is insertion-order-preserving and populated based on the provided list of row 
keys (so if you iterate over the entries in the map they should be in the same 
order as the list of row keys).  The list for each row entry are definitely in 
the order that Cassandra provides them, take a look at 
org.scale7.cassandra.pelops.Selector#toColumnList if you need more info.
Cheers,Dan

-- 
Dan Washusen
Sent with Sparrow


On Saturday, 19 February 2011 at 8:16 AM, cbert...@libero.it 
wrote:

Hi all,
I created a CF in which i need to get, sorted by time, the Rows inside. Each 
Row represents a comment.



I've created a few rows using as Row Key a generated TimeUUID but when I call 
the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: 
rows are not sorted by TimeUUID.
I though it was probably cause of the random-part of the TimeUUID so I create 
a new CF ...



This time I created a few rows using the java System.CurrentTimeMillis() that 
retrieve a long. I call again the "GetColumnsFromRows" and again the same 
results: data are not sorted!
I've read many times that Rows are sorted as specified in the compareWith but 
I can't see it. 
To solve this problem for the moment I've used a SuperColumnFamily with an 
UNIQUE ROW ... but I think this is just a workaround and not the solution.



Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I 
expected: sorted by TimeUUID. Why it does not happen the same with the Rows? 
I'm confused.

TIA for any help.

Best Regards

Carlo

I: Re: Are row-keys sorted by the compareWith?

2011-02-21 Thread cbert...@libero.it


As Jonathan mentions the compareWith on a column family def. is defines the 
order for the columns *within* a row... In order to control the ordering of 
rows you'll need to use the OrderPreservingPartitioner 
(http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-ring).

Thanks for your answer and for your time, I will take a look at this.

As for getColumnsFromRows; it should be returning you a map of lists.  The map 
is insertion-order-preserving and populated based on the provided list of row 
keys (so if you iterate over the entries in the map they should be in the same 
order as the list of row keys).  



mmm ... well it didn't happen like this. In my code I had a CF named comments 
and also a CF called usercomments. UserComments use an uuid as row-key to keep, 
TimeUUID sorted, the "pointers" to the comments of the user. When I get the 
sorted list of keys from the UserComments and I use this list as row-keys-list 
in the GetColumnsFromRows I don't get back the data sorted as I expect them to 
be.It looks like if Cassandra/Pelops does not care on how I provide the 
row-keys-list. I am sure about that cause I did something different: I iterate 
over my row-keys-list and made many GetColumnFromRow instead of one 
GetColumnsFromRows and when I iterate data are correctly sorted. But this can 
not be a solution ...
I am using Cassandra 0.6.9
I profit of your knownledge of Pelops to ask you something: I am evaluating the 
migration to Cassandra 0.7 ... as far as you know, in terms of written code, is 
it an heavy job?
Best Regards
Carlo


Messaggio originale

Da: d...@reactive.org



On Saturday, 19 February 2011 at 8:16 AM, cbert...@libero.it 
wrote:

Hi all,
I created a CF in which i need to get, sorted by time, the Rows inside. Each 
Row represents a comment.



I've created a few rows using as Row Key a generated TimeUUID but when I call 
the Pelops method "GetColumnsFromRows" I don't get the data back as I expect: 
rows are not sorted by TimeUUID.
I though it was probably cause of the random-part of the TimeUUID so I create 
a new CF ...



This time I created a few rows using the java System.CurrentTimeMillis() that 
retrieve a long. I call again the "GetColumnsFromRows" and again the same 
results: data are not sorted!
I've read many times that Rows are sorted as specified in the compareWith but 
I can't see it. 
To solve this problem for the moment I've used a SuperColumnFamily with an 
UNIQUE ROW ... but I think this is just a workaround and not the solution.



Now when I call the "GetSuperColumnsFromRow" I get all the SuperColumns as I 
expected: sorted by TimeUUID. Why it does not happen the same with the Rows? 
I'm confused.

TIA for any help.

Best Regards

Carlo

WriteMultiColumns just write one column ... amazing!

2011-03-24 Thread cbert...@libero.it

Hi all,
I'm almost sure I'm just tired and I am doing something stupid however I can't 
understand this problem.
In one Super Column Family I have just 2 rows, called ALL and INCREMENTAL.

For some reason I sometimes need to duplicate a SuperColumn from the row ALL 
to the INCREMENTAL one ... very easy 
(cassandra 0.7.4, java, pelops)

private static void mirrorizeEngineSuperColumn(Bytes superColumnId) {
Mutator mutator = Pelops.createMutator(SocialContext.POOL_NAME_VALUE);
Selector selector = Pelops.createSelector(SocialContext.
POOL_NAME_VALUE);
try {
SuperColumn sc = selector.getSuperColumnFromRow(MotoreFamily, 
SocialColumn.MOTORE_ALL_ROW, superColumnId, ConsistencyLevel.QUORUM);
LOG.debug("Column list size of supercolumn is " + sc.
getColumnsSize());
mutator.writeSubColumns(MotoreFamily, SocialColumn.
MOTORE_INCREMENTALI_ROW, superColumnId, sc.getColumns());
mutator.execute(ConsistencyLevel.QUORUM);
} catch (NotFoundException nfe) {
LOG.debug("Supercolumn not found ...");
} catch (Exception e) {
LOG.error(e.toString());
}
}

When I print it the column list size is exact (3, 4 it depends on which 
supercolumn I'm working) but when I write them I find only one column of this 
column list ... here is the output produced (viewing with cassandra cli ...) -- 
compare the 3 super_column in the row INCREMENTAL and you'll see they're 
different from the one in the row ALL

RowKey: ALL
=> (super_column=54b05120-552f-11e0-9d1f-020054554e01,
 (column=54fc9c60-552f-11e0-9d1f-020054554e01, value=0003, 
timestamp=1300872296917000)
 (column=746595b0-553f-11e0-9e66-020054554e01, value=0002, 
timestamp=1300879284037000)
 (column=6ec46ef0-5540-11e0-9e66-020054554e01, value=0004, 
timestamp=1300879641811000)
 (column=99d911d0-5541-11e0-af7b-020054554e01, value=0001, 
timestamp=1300880138869000))
=> (super_column=97351e20-5545-11e0-9464-001d72d09363,
 (column=9763cf40-5545-11e0-9464-001d72d09363, value=0004, 
timestamp=1300881876938000)
 (column=1e5b7a40-5549-11e0-8da1-020054554e01, value=0005, 
timestamp=1300883402593000)
 (column=89f7c3e0-560b-11e0-a6ac-020054554e01, value=0005, 
timestamp=1300966880883000))
=> (super_column=cadf5940-55ed-11e0-9b97-020054554e01,
 (column=cb03aa20-55ed-11e0-9b97-020054554e01, value=0004, 
timestamp=1300954178721000)
 (column=27092500-5609-11e0-b1f1-020054554e01, value=0004, 
timestamp=1300965858839000)
 (column=5cdf88d0-560a-11e0-a6ac-020054554e01, value=0005, 
timestamp=1300966438198000)
 (column=c6e34110-561c-11e0-9399-020054554e01, value=0005, 
timestamp=1300974305208000))
=> (super_column=309d66a0-5602-11e0-9cc8-020054554e01,
 (column=30d8e900-5602-11e0-9cc8-020054554e01, value=0005, 
timestamp=1300963602927000)
 (column=8c8a4f40-5603-11e0-9cc8-020054554e01, value=0005, 
timestamp=1300963728307000)
 (column=62246620-5606-11e0-9e06-020054554e01, value=0005, 
timestamp=1300964702748000)
 (column=db951080-561b-11e0-8880-020054554e01, value=0003, 
timestamp=1300973895462000))
=> (super_column=e44f1860-560c-11e0-b696-020054554e01,
 (column=e5045ea0-560c-11e0-b696-020054554e01, value=0005, 
timestamp=1300967480905000))
=> (super_column=e53d7000-560c-11e0-b696-020054554e01,
 (column=e56395a0-560c-11e0-b696-020054554e01, value=0005, 
timestamp=1300967620609000))
=> (super_column=90ce8370-5615-11e0-b696-020054554e01,
 (column=9100de10-5615-11e0-b696-020054554e01, value=0005, 
timestamp=1300971213814000)
 (column=a5171450-5615-11e0-b696-020054554e01, value=0005, 
timestamp=1300971294115000)
 (column=9fb68390-5617-11e0-9ed9-020054554e01, value=0002, 
timestamp=1300972093565000)
 (column=79889ed0-561a-11e0-bf27-020054554e01, value=0002, 
timestamp=130097330153))
---
RowKey: INCREMENTAL
=> (super_column=cadf5940-55ed-11e0-9b97-020054554e01,
 (column=c6e34110-561c-11e0-9399-020054554e01, value=0005, 
timestamp=1300974305208000))
=> (super_column=309d66a0-5602-11e0-9cc8-020054554e01,
 (column=db951080-561b-11e0-8880-020054554e01, value=0003, 
timestamp=1300973895462000))
=> (super_column=90ce8370-5615-11e0-b696-020054554e01,
 (column=9fb68390-5617-11e0-9ed9-020054554e01, value=0002, 
timestamp=1300972093565000))

I think I'm getting crazy!
TIA for any help

Best regards

Carlo

R: WriteMultiColumns just write one column ... amazing!

2011-03-24 Thread cbert...@libero.it

I answer myself :)
But i put the answer here maybe is useful for someone in future.
The problem was in the Timestamp, if you "copy" a column from another row but 
you don't change the timestamp then it will be written only if in the past, a 
column with the same name (key) has not been erased after the set timestamp.

Best regards

Carlo

>Messaggio originale----
>Da: cbert...@libero.it
>Data: 24/03/2011 16.03
>A: 
>Ogg: WriteMultiColumns just write one column ... amazing!
>
>Hi all,
>I'm almost sure I'm just tired and I am doing something stupid however I 
can't 
>understand this problem.
>In one Super Column Family I have just 2 rows, called ALL and INCREMENTAL.
>
>For some reason I sometimes need to duplicate a SuperColumn from the row ALL 
>to the INCREMENTAL one ... very easy 
>(cassandra 0.7.4, java, pelops)
>
>private static void mirrorizeEngineSuperColumn(Bytes superColumnId) {
>Mutator mutator = Pelops.createMutator(SocialContext.
POOL_NAME_VALUE);
>Selector selector = Pelops.createSelector(SocialContext.
>POOL_NAME_VALUE);
>try {
>SuperColumn sc = selector.getSuperColumnFromRow(MotoreFamily, 
>SocialColumn.MOTORE_ALL_ROW, superColumnId, ConsistencyLevel.QUORUM);
>LOG.debug("Column list size of supercolumn is " + sc.
>getColumnsSize());
>mutator.writeSubColumns(MotoreFamily, SocialColumn.
>MOTORE_INCREMENTALI_ROW, superColumnId, sc.getColumns());
>mutator.execute(ConsistencyLevel.QUORUM);
>} catch (NotFoundException nfe) {
>LOG.debug("Supercolumn not found ...");
>} catch (Exception e) {
>LOG.error(e.toString());
>}
>}
>
>When I print it the column list size is exact (3, 4 it depends on which 
>supercolumn I'm working) but when I write them I find only one column of 
this 
>column list ... here is the output produced (viewing with cassandra cli ...) 
-- 
>compare the 3 super_column in the row INCREMENTAL and you'll see they're 
>different from the one in the row ALL
>
>RowKey: ALL
>=> (super_column=54b05120-552f-11e0-9d1f-020054554e01,
> (column=54fc9c60-552f-11e0-9d1f-020054554e01, value=0003, 
>timestamp=1300872296917000)
> (column=746595b0-553f-11e0-9e66-020054554e01, value=0002, 
>timestamp=1300879284037000)
> (column=6ec46ef0-5540-11e0-9e66-020054554e01, value=0004, 
>timestamp=1300879641811000)
> (column=99d911d0-5541-11e0-af7b-020054554e01, value=0001, 
>timestamp=1300880138869000))
>=> (super_column=97351e20-5545-11e0-9464-001d72d09363,
> (column=9763cf40-5545-11e0-9464-001d72d09363, value=0004, 
>timestamp=1300881876938000)
> (column=1e5b7a40-5549-11e0-8da1-020054554e01, value=0005, 
>timestamp=1300883402593000)
> (column=89f7c3e0-560b-11e0-a6ac-020054554e01, value=0005, 
>timestamp=1300966880883000))
>=> (super_column=cadf5940-55ed-11e0-9b97-020054554e01,
> (column=cb03aa20-55ed-11e0-9b97-020054554e01, value=0004, 
>timestamp=1300954178721000)
> (column=27092500-5609-11e0-b1f1-020054554e01, value=0004, 
>timestamp=1300965858839000)
> (column=5cdf88d0-560a-11e0-a6ac-020054554e01, value=0005, 
>timestamp=1300966438198000)
> (column=c6e34110-561c-11e0-9399-020054554e01, value=0005, 
>timestamp=1300974305208000))
>=> (super_column=309d66a0-5602-11e0-9cc8-020054554e01,
> (column=30d8e900-5602-11e0-9cc8-020054554e01, value=0005, 
>timestamp=1300963602927000)
> (column=8c8a4f40-5603-11e0-9cc8-020054554e01, value=0005, 
>timestamp=1300963728307000)
> (column=62246620-5606-11e0-9e06-020054554e01, value=0005, 
>timestamp=1300964702748000)
> (column=db951080-561b-11e0-8880-020054554e01, value=0003, 
>timestamp=1300973895462000))
>=> (super_column=e44f1860-560c-11e0-b696-020054554e01,
> (column=e5045ea0-560c-11e0-b696-020054554e01, value=0005, 
>timestamp=1300967480905000))
>=> (super_column=e53d7000-560c-11e0-b696-020054554e01,
> (column=e56395a0-560c-11e0-b696-020054554e01, value=0005, 
>timestamp=1300967620609000))
>=> (super_column=90ce8370-5615-11e0-b696-020054554e01,
> (column=9100de10-5615-11e0-b696-020054554e01, value=0005, 
>timestamp=1300971213814000)
> (column=a5171450-5615-11e0-b696-020054554e01, value=0005, 
>timestamp=1300971294115000)
> (column=9fb68390-5617-11e0-9ed9-020054554e01, value=0002, 
>timestamp=1300972093565000)
> (column=79889ed0-561a-11e0-bf27-020054554e01, value=0002, 
>timestamp=130097330153))
>---
>RowKey: INCREMENTAL
>=> (super_column=cadf5940-55ed-11e0-9b97-020054554e01,
>

Filter on row iterator

2011-05-06 Thread cbert...@libero.it

Hi all,
I have a column family with about 300 rows. Rows name are of 2 categories:

number (eg: 12345)
e_number (eg: e_12345)

is there any way to extract only rows that are numbers?
For the moment I'm iterating over all rows with a KeyRange and filtering 
client-side but I don't like this solution.
I've seen that KeyRange can be created using tokens instead of keys but I 
don't understand how they works and did not find any working example.

(Java/Pelops/Cassandra 0.7.5)

TIA

Carlo

Sorting in Cassandra

2010-10-05 Thread cbert...@libero.it

Hi,
I need some help about sorting data in the right way from Cassandra.
I have a SuperColumnFamily

SCF: UserData {
   UserID (ROW) {
 SuperColumnKey1 {
firstCol: value
secondCol: value
 }
 SuperColumnKey2 {
firstCol: value
secondCol: value
  }
}
}

Both CompareWith (SC & columns) are UTF8type ...

First question: when I make a get of all supercolumns within the UserID row 
I'd expect to receive them sorted alphabetically ... but it does not happen 
(did not understand what's the order ...)
Am I assuming it wrong?

Second question: can i get back all data sorted on firstCol Column? Imagine 
like SCKey is an ID of a Company and FirstCol is the name ... how can I get all 
the companies of a user sorted by name (alphabetic order)?

I am using Pelops Client on Cassandra 0.6.5 

Thanks in advance for any help.

R: Re: Sorting in Cassandra

2010-10-06 Thread cbert...@libero.it


Aaron,
first of all thanks for your time.


1. You cannot return just the super columns, you have to get their sub columns 
as well. The returned data is ordered, please provide and example of where it 
is not. 
I don't know what I did before but now I checked and data are sorted as I 
expected them to be :-o. 
I know I can't get a SC without their sub columns and this is ok. 

2. Pull back the entire row and filter/sort the columns client side. It's not 
possible to return columns of the same name from different super columns (I 
think that's what you are asking). Let me know if you think you have too much 
data per row to do that. 
Probably I explained myself wrong. What I want is to get the entire ROW back 
but already ordered on the base of a specific column key and not on the base of 
the SCKey ... example
UID (ROW) {
Company0 { name: zaz, address: street x, phone: 123, other cols }
Company1 { name: abacus, address: street y, phone: 234, other cols }
Company2 { name: more, address: street x, phone: 345, other cols }}
What I want is to get all the data back from cassandra sorted by the name of 
the company, and not of the SC  ...
UID (ROW) {
 Company1 { name: abacus, address: street y, phone: 234, other cols }
Company2 { name: more, address: street x, phone: 345, other cols }Company0 
{ name: zaz, address: street x, phone: 123, other cols }


}
As far as I know Cassandra I don't think it's possible since I cannot be sure 
that each SC contains the specific Column (name), right? Is the only way to 
sort them on client-side?
Best Regards

Client-side sorting

2010-10-09 Thread cbert...@libero.it

Hi all,
do you know any component for client-side sorting of cassandra structures?
Like order groups of SuperColumn on the base of a value of SubColumn and 
similar operations? (ordering by asciitype/bytestype and so on) ...
Do you know anything like this? I'd like to avoid DTO/VO pattern + Comparable 
Interface.

like ...

Inside Cassandra:
UID (ROW) {
Company1 { name: webcompany, address: street c, other columns }
Company2 { name: acompany, address: street b, other columns }
Company3 { name: thecompany, address: street a, other columns }
}

Sort asciitype on  *name* subcolumn
UID (ROW) {
Company2 { name: acompany, address: street b, other columns  }
Company3 { name: thecompany, address: street a, other columns }
Company1 { name: webcompany, address: street c, , other columns }
}

Sort asciitype on  *street* subcolumn
UID (ROW) {
Company3 { name: thecompany, address: street a, other columns }
Company2 { name: acompany, address: street b, other columns  }
Company1 { name: webcompany, address: street c, , other columns }
}

If exists I'd like not reinventing the wheel.

Best Regards

R: Client-side sorting

2010-10-10 Thread cbert...@libero.it

>Hi all,
>do you know any component for client-side sorting of cassandra structures?

Sorry, i forget.
I' am using Java ...

TimeUUID makes me crazy

2010-10-18 Thread cbert...@libero.it

I am getting crazy using TimeUUID in cassandra via Java. I've read the FAQ but 
it didn't help.
Can I use a TimeUUID as ROW identifier? (if converted to string)

I have a CF like this and SCF like these:


TIMEUUID OPECID (ROW) {
 phone: 123
 address: street xyz
}


String USERID (ROW) {
TIMEUUID OPECID (SuperColumnName)  {
collection of columns;
 }
}

In one situation the TimeUUID is a ROW identifier while in another is the 
SuperColumn name. I get many "UUID must be a 16 byte" when I try to read a data 
that did not give any exception during his save.

at a Time T0 this one works: mutator.writeColumns(UuidHelper.timeUuidFromBytes
(OpecID).toString(), opecfamily, notNull); // (notnull contains a list of 
columns also opecstatus)

Immediately after this one raise an exception: selector.getColumnFromRow
(UuidHelper.timeUuidFromBytes(OpecID).toString(), opecfamily, "opecstatus", 
ConsistencyLevel.ONE)

I hope that someone help me understanding it ...

R: Re: TimeUUID makes me crazy

2010-10-19 Thread cbert...@libero.it

I am using Pelops for Cassandra 0.6.x
The error that raise isInvalidRequestException(why:UUIDs must be exactly 16 
bytes)
For the UUID I am using the UuidHelper class provided.

R: Indexes on Columns & SubColumns Clarification

2010-11-04 Thread cbert...@libero.it

In each family, both CF and SCF, data are grouped by rows.

Just to give an idea ...

Super Column Family Name{
   Row 1  {
SuperColumn1 { Column1 Key: Column1 Value ... ColumnN Key: 
ColumnN Value}
SuperColumn2 { Column1 Key: Column1 Value, ColumnN Key: 
ColumnN Value}
}
   Row N  {
SuperColumn1 { Column1 Key: Column1 Value ... ColumnN Key: 
ColumnN Value}
SuperColumn2 { Column1 Key: Column1 Value ... ColumnN Key: 
ColumnN Value}
SuperColumn3 { Column1 Key: Column1 Value ... ColumnN Key: 
ColumnN Value}
}
}

Column Family Name {
 Row1 {
Column1 Key: Column1 Value
.
ColumnN Key: ColumnN Value
 }
 RowN {
Column1 Key: Column1 Value
.
ColumnN Key: ColumnN Value
 }
}

Your representation looks like a SCF ... 

detailed_log: { // supercolumnfamily 
username : { // row
uuid // supercolumn identifier { 
{ price : 100 } // column price
{ min   : 10 } column min
{ max : 500 }, // column max
}
uuid // supercolumn identifier { 
{ price : 100 } // column price
{ min   : 10 } column min
{ max : 500 }, // column max
}   
}
}

detailed_log can contains from 0 to N rows and each row can contain from 0 to 
N SuperColumns. Each SuperColumn can contain from 0 to N columns.

>SELECT * FROM detailed_log WHERE username = 'foobar' AND uuid RANGE(
>start_UUID -> end_UUID );

I would say in Pelops (java) Client i Use is something like this ...

 getSuperColumnsFromRow(/**
 * Retrieve super columns from a row
 * @param rowKeyThe key of the row
 * @param columnFamily  The name of the column family 
containing the super columns
 * @param colPredicate  The super column selector 
predicate
 * @param cLevelThe Cassandra consistency level 
with 
which to perform the operation
 * @return  A list of matching columns
 */)

List result = selector.getSuperColumnsFromRow(username,"
detailed_log", Selector.newColumnsPredicateAll(false, howmany), 
ConsistencyLevel.ONE);
This will retrieve "howmany" SuperColumns, sorted by your Storage Conf sorting 
definition, from the row username.

Hope this helps.

Best Regards

Carlo

53 matches

Mail list logo