question about updates internal work in case of cache

2012-04-23 Thread DE VITO Dominique
Hi,

Let's suppose a column (name+value) is cached in memory, with timestamp T.

1) An update, for this column, arrives with exactly the *same* timestamp, and 
the *same* value.
Is the commitlog updated ?

2) An update, for this column, arrives with a timestamp < T.
Is the commitlog updated ?

Thanks for your help.

Regards,
Dominique



Re: question about updates internal work in case of cache

2012-04-23 Thread Sylvain Lebresne
On Mon, Apr 23, 2012 at 10:19 AM, DE VITO Dominique
 wrote:
> Hi,
>
>
>
> Let's suppose a column (name+value) is cached in memory, with timestamp T.
>
>
>
> 1) An update, for this column, arrives with exactly the *same* timestamp,
> and the *same* value.
>
> Is the commitlog updated ?
>
>
>
> 2) An update, for this column, arrives with a timestamp < T.
>
> Is the commitlog updated ?
>

Yes to both, the commit log is always updated. In fact, the commit log
insertion is done in parallel and independently with in-memory updates
(which include caches updates).

--
Sylvain

>
>
> Thanks for your help.
>
>
>
> Regards,
>
> Dominique
>
>


Re: Two Random Ports in Private port range

2012-04-23 Thread aaron morton
If you've seen Lord of the Rings and can remember the scene where Frodo has to 
pass the spider you will be well equipped to understand JMX ports. 

The ports are randomly opened…
"A common problem with RMI and firewall is that the JMX default agent will not 
let you specify which port to use to export the server's RMI stub. "
https://blogs.oracle.com/jmxetc/entry/troubleshooting_connection_problems_in_jconsole

Cheers
p.s. In the analogy above you are Frodo, not the Spider. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/04/2012, at 6:33 AM, W F wrote:

> Yes, they are.
> 
> What are they used for and are they specifically documented somewhere?
> 
> Thanks!
> 
> On Fri, Apr 20, 2012 at 11:25 AM, Kirk True  wrote:
> Are these the dynamic JMX ports?
> 
> Sent from my iPad
> 
> On Apr 19, 2012, at 8:58 AM, W F  wrote:
> 
>> Hi All,
>> 
>> I did a web search of the archives (hope I looked in the right place) and 
>> could not find a request like this.
>> 
>> When Cassandra is running, it seems to create to random tcp listen ports.
>> 
>> For example: "50378 and 58692", "49952, 52792".
>> 
>> What are are these for and is there documentation regarding this?
>> 
>> Sorry if this is already in the archive!
>> 
>> Thanks ~A
> 



Re: Help with Wide Rows with CounterColumns

2012-04-23 Thread aaron morton
No. 

CounterColumnType only works with column values, which are not sorted. Sorting 
counters while they are being updated is potentially very expensive. 

You have a few options:

1) If the list of counters is short (say < 100 columns) get all the columns and 
sort client side.
2) Run a periodic task that gets all the columns, sorts client side, pivots so 
column name and values are swapped, and writes them back to another row. You 
can now get the top X columns. 
3) Depending on your requirements, consider a different server such as Redis. 

Hope that helps.  
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/04/2012, at 7:40 AM, Praveen Baratam wrote:

> Hello All,
> 
> I have a particular requirement where I need to update CounterColumns in a 
> Row by a specific UID which is the key for the CounterColumn in that row and 
> then query for those columns in that Row such that we get the top 5 UIDs with 
> highest Counter Values.
> 
> create column family Counters
> with comparator = 'UTF8Type'
> and key_validation_class = 'UTF8Type'
> and default_validation_class = 'CounterColumnType';  
> 
> Can it be done?
> 
> 
> 
> 



Re: nodetool decommission hangs

2012-04-23 Thread aaron morton
You can check the streaming progress with nodetool netstats. That will tell you 
what it thinks it is moving. 

nodetool ring will also tell you what state the nodes are in. 

That said, this looks a little suspicious…
 
>>  INFO [StreamStage:1] 2012-04-21 11:07:45,262 StreamOut.java (line 160) 
>> Stream context metadata [], 0 sstables.

If netstats is showing nothing could you please reply with some details of what 
was in the schema, and if there was *any* data at all.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/04/2012, at 2:12 PM, Aniket Chakrabarti wrote:

> Actually its an empty cluster i.e. no data is present, I am just trying to 
> get the decommission to work. The cluster just has keyspace and a column 
> family with no keys in it.
> 
> On 4/21/2012 10:05 PM, Ji Cheng wrote:
>> 
>> It needs to stream all its data to the other node, which may take a long 
>> time. You can use nodetool netstats to see the progress of streaming. 
>> 
>> Best,
>> Cheng
>> 
>> On Sun, Apr 22, 2012 at 4:11 AM, Aniket Chakrabarti 
>>  wrote:
>> Hi,
>> 
>> I am trying to use nodetool decommission to remove a node from the 
>> cluster(it is just a 2 node cluster). It hangs with the following log:
>> 
>> INFO [RMI TCP Connection(4)-164.107.119.50] 2012-04-21 11:07:15,095 
>> StorageService.java (line 668) LEAVING: sleeping 3 ms for pending range 
>> setup
>>  INFO [RMI TCP Connection(4)-164.107.119.50] 2012-04-21 11:07:45,251 
>> StorageService.java (line 668) LEAVING: streaming data to other nodes
>>  INFO [StreamStage:1] 2012-04-21 11:07:45,259 StreamOut.java (line 114) 
>> Beginning transfer to /10.2.246.0
>>  INFO [StreamStage:1] 2012-04-21 11:07:45,262 StreamOut.java (line 95) 
>> Flushing memtables for [CFS(Keyspace='keyspace1', 
>> ColumnFamily='Standard1')]...
>>  INFO [StreamStage:1] 2012-04-21 11:07:45,262 StreamOut.java (line 160) 
>> Stream context metadata [], 0 sstables.
>>  INFO [StreamStage:1] 2012-04-21 11:07:45,264 StreamOutSession.java (line 
>> 203) Streaming to /10.2.246.0
>> 
>> 
>> Any pointers will be very helpful.
>> 
>> Thanks,
>> Aniket
>> 



Re: repair strange behavior

2012-04-23 Thread aaron morton
> What is strange - when streams for the second repair starts they have the 
> same or even bigger total volume,
What measure are you using ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/04/2012, at 10:16 PM, Igor wrote:

> but after repair all nodes should be in sync regardless of whether new files 
> were compacted or not.
> Do you suggest major compaction after repair? I'd like to avoid it.
> 
> On 04/22/2012 11:52 AM, Philippe wrote:
>> 
>> Repairs generate new files that then need to be compacted.
>> Maybe that's where the temporary extra volume comes from?
>> 
>> Le 21 avr. 2012 20:43, "Igor"  a écrit :
>> Hi
>> 
>> I can't understand the repair behavior in my case. I have 12 nodes ring (all 
>> 1.0.7):
>> 
>> 10.254.237.2LA  ADS-LA-1Up Normal  50.92 GB0.00% 
>>   0
>> 10.254.238.2TX  TX-24-RACK  Up Normal  33.29 GB0.00% 
>>   1
>> 10.254.236.2VA  ADS-VA-1Up Normal  50.07 GB0.00% 
>>   2
>> 10.254.93.2 IL  R1  Up Normal  49.29 GB0.00% 
>>   3
>> 10.253.4.2  AZ  R1  Up Normal  37.83 GB0.00% 
>>   5
>> 10.254.180.2GB  GB-1Up Normal  42.86 GB
>> 50.00%  85070591730234615865843651857942052863
>> 10.254.191.2LA  ADS-LA-1Up Normal  47.64 GB0.00% 
>>   85070591730234615865843651857942052864
>> 10.254.221.2TX  TX-24-RACK  Up Normal  43.42 GB0.00% 
>>   85070591730234615865843651857942052865
>> 10.254.217.2VA  ADS-VA-1Up Normal  38.44 GB0.00% 
>>   85070591730234615865843651857942052866
>> 10.254.94.2 IL  R1  Up Normal  49.31 GB0.00% 
>>   85070591730234615865843651857942052867
>> 10.253.5.2  AZ  R1  Up Normal  49.01 GB0.00% 
>>   85070591730234615865843651857942052869
>> 10.254.179.2GB  GB-1Up Normal  27.08 GB
>> 50.00%  170141183460469231731687303715884105727
>> 
>> I have single keyspace 'meter' and two column families (one 'ids' is small, 
>> and second is bigger). The strange thing happened today when I try to run
>> "nodetool -h 10.254.180.2 -pr meter ids"
>> two times one after another. First repair finished successfully
>> 
>>  INFO 16:33:02,492 [repair #db582370-8bba-11e1--5b777f708bff] ids is 
>> fully synced
>>  INFO 16:33:02,526 [repair #db582370-8bba-11e1--5b777f708bff] session 
>> completed successfully
>> 
>> after moving near 50G of data, and I started second session one hour later:
>> 
>> INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1--5b777f708bff] new 
>> session: will sync localhost/1
>> 0.254.180.2, /10.254.221.2, /10.254.191.2, /10.254.217.2, /10.253.5.2, 
>> /10.254.94.2 on range (5,8507
>> 0591730234615865843651857942052863] for meter.[ids]
>> 
>> What is strange - when streams for the second repair starts they have the 
>> same or even bigger total volume, and I expected that second run will move 
>> less data (or even no data at all).
>> 
>> Is it OK? Or should I fix something?
>> 
>> Thanks!
>> 
> 



Re: repair strange behavior

2012-04-23 Thread Igor

Hi, Aaron

Just sum of total volume for all streams between nodes.

But seems I understand what happened: after repair my column family pass 
over several minor compactions, and during these compactions it create 
new tombstones (my CF contain data with TTL, so it can discover and mark 
new data each time it make minor compaction). As these tombstones 
arranged and created differently on each node (sstables have different 
sizes and so on, so size-tiered compaction works slightly different) - 
each subsequent repair discover new ranges to sync.


When I try to run *major* compaction, and then run repair it vent in 
minutes (against hours) as far as I understand - because after major 
compaction tombstones on all nodes are almost the same.


Does it sounds reasonable?

I'll try to find best strategy to minimize repair streams as I'm afraid 
of major compactions for other, possible large, CFs.


On 04/23/2012 12:34 PM, aaron morton wrote:
What is strange - when streams for the second repair starts they have 
the same or even bigger total volume,

What measure are you using ?

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/04/2012, at 10:16 PM, Igor wrote:

but after repair all nodes should be in sync regardless of whether 
new files were compacted or not.

Do you suggest major compaction after repair? I'd like to avoid it.

On 04/22/2012 11:52 AM, Philippe wrote:


Repairs generate new files that then need to be compacted.
Maybe that's where the temporary extra volume comes from?

Le 21 avr. 2012 20:43, "Igor" > a écrit :


Hi

I can't understand the repair behavior in my case. I have 12
nodes ring (all 1.0.7):

10.254.237.2LA  ADS-LA-1Up Normal  50.92 GB
   0.00%   0
10.254.238.2TX  TX-24-RACK  Up Normal  33.29 GB
   0.00%   1
10.254.236.2VA  ADS-VA-1Up Normal  50.07 GB
   0.00%   2
10.254.93.2 IL  R1  Up Normal  49.29 GB
   0.00%   3
10.253.4.2  AZ  R1  Up Normal  37.83 GB
   0.00%   5
10.254.180.2GB  GB-1Up Normal  42.86 GB
   50.00%  85070591730234615865843651857942052863
10.254.191.2LA  ADS-LA-1Up Normal  47.64 GB
   0.00%   85070591730234615865843651857942052864
10.254.221.2TX  TX-24-RACK  Up Normal  43.42 GB
   0.00%   85070591730234615865843651857942052865
10.254.217.2VA  ADS-VA-1Up Normal  38.44 GB
   0.00%   85070591730234615865843651857942052866
10.254.94.2 IL  R1  Up Normal  49.31 GB
   0.00%   85070591730234615865843651857942052867
10.253.5.2  AZ  R1  Up Normal  49.01 GB
   0.00%   85070591730234615865843651857942052869
10.254.179.2GB  GB-1Up Normal  27.08 GB
   50.00%  170141183460469231731687303715884105727

I have single keyspace 'meter' and two column families (one
'ids' is small, and second is bigger). The strange thing
happened today when I try to run
"nodetool -h 10.254.180.2 -pr meter ids"
two times one after another. First repair finished successfully

 INFO 16:33:02,492 [repair
#db582370-8bba-11e1--5b777f708bff] ids is fully synced
 INFO 16:33:02,526 [repair
#db582370-8bba-11e1--5b777f708bff] session completed
successfully

after moving near 50G of data, and I started second session one
hour later:

INFO 17:44:37,842 [repair #aa415d00-8bd9-11e1--5b777f708bff]
new session: will sync localhost/1
0.254.180.2, /10.254.221.2 , /10.254.191.2
, /10.254.217.2 ,
/10.253.5.2 , /10.254.94.2
 on range (5,8507
0591730234615865843651857942052863] for meter.[ids]

What is strange - when streams for the second repair starts they
have the same or even bigger total volume, and I expected that
second run will move less data (or even no data at all).

Is it OK? Or should I fix something?

Thanks!









Bad Request: No indexed columns present in by-columns clause with "equals" operator

2012-04-23 Thread mdione.ext

 I understand the error message, but I don't understand why I get it. 
Here's the CF:

cqlsh:avatars> describe columnfamily HBX_FILE;

CREATE COLUMNFAMILY HBX_FILE (
  KEY blob PRIMARY KEY,
  HBX_FIL_DATE text,
  HBX_FIL_LARGE ascii,
  HBX_FIL_MEDIUM ascii,
  HBX_FIL_SMALL ascii,
  HBX_FIL_STATUS text,
  HBX_FIL_TINY ascii
) WITH
  comment='' AND
  comparator=text AND
  read_repair_chance=1.00 AND
  gc_grace_seconds=864000 AND
  default_validation=blob AND
  min_compaction_threshold=4 AND
  max_compaction_threshold=32 AND
  replicate_on_write=True;

CREATE INDEX HBX_FILE_HBX_FIL_STATUS_idx ON HBX_FILE (HBX_FIL_STATUS);

  The query and the error:

cqlsh:avatars> SELECT HBX_FIL_SMALL FROM HBX_FILE WHERE KEY=1 AND 
HBX_FIL_STATUS='actif';
Bad Request: No indexed columns present in by-columns clause with "equals" 
operator

  A query that works:

cqlsh:avatars> SELECT HBX_FIL_STATUS FROM HBX_FILE WHERE KEY=1;
 HBX_FIL_STATUS

  Actif

Just in case, here's cli's output for the same CF:

[default@avatars] describe HBX_FILE;
ColumnFamily: HBX_FILE
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Row Cache Provider: 
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
  Key cache size / save period in seconds: 20.0/14400
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 1.0
  Replicate on write: true
  Bloom Filter FP chance: default
  Built indexes: []
  Column Metadata:
Column Name: HBX_FIL_DATE
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
Column Name: HBX_FIL_LARGE
  Validation Class: org.apache.cassandra.db.marshal.AsciiType
Column Name: HBX_FIL_MEDIUM
  Validation Class: org.apache.cassandra.db.marshal.AsciiType
Column Name: HBX_FIL_SMALL
  Validation Class: org.apache.cassandra.db.marshal.AsciiType
Column Name: HBX_FIL_STATUS
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: HBX_FILE_HBX_FIL_STATUS_idx
  Index Type: KEYS
Column Name: HBX_FIL_TINY
  Validation Class: org.apache.cassandra.db.marshal.AsciiType
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

  And the same error, with other words, in the CLI:

[default@avatars] get HBX_FILE where HBX_FIL_STATUS = 'actif';
No indexed columns present in index clause with operator EQ

  Am I missing something? Might as well be that I'm too tired...

--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



auto-generate data

2012-04-23 Thread puneet loya
Can we auto-generate random data in cassandra?


Thanks and Regaads,

Puneet


Re: Bad Request: No indexed columns present in by-columns clause with "equals" operator

2012-04-23 Thread Dave Brosius

Works for me on trunk... what version are you using?

On 04/23/2012 08:39 AM, mdione@orange.com wrote:

  I understand the error message, but I don't understand why I get it.
Here's the CF:

cqlsh:avatars>  describe columnfamily HBX_FILE;

CREATE COLUMNFAMILY HBX_FILE (
   KEY blob PRIMARY KEY,
   HBX_FIL_DATE text,
   HBX_FIL_LARGE ascii,
   HBX_FIL_MEDIUM ascii,
   HBX_FIL_SMALL ascii,
   HBX_FIL_STATUS text,
   HBX_FIL_TINY ascii
) WITH
   comment='' AND
   comparator=text AND
   read_repair_chance=1.00 AND
   gc_grace_seconds=864000 AND
   default_validation=blob AND
   min_compaction_threshold=4 AND
   max_compaction_threshold=32 AND
   replicate_on_write=True;

CREATE INDEX HBX_FILE_HBX_FIL_STATUS_idx ON HBX_FILE (HBX_FIL_STATUS);

   The query and the error:

cqlsh:avatars>  SELECT HBX_FIL_SMALL FROM HBX_FILE WHERE KEY=1 AND 
HBX_FIL_STATUS='actif';
Bad Request: No indexed columns present in by-columns clause with "equals" 
operator

   A query that works:

cqlsh:avatars>  SELECT HBX_FIL_STATUS FROM HBX_FILE WHERE KEY=1;
  HBX_FIL_STATUS

   Actif

Just in case, here's cli's output for the same CF:

[default@avatars] describe HBX_FILE;
 ColumnFamily: HBX_FILE
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   Default column value validator: org.apache.cassandra.db.marshal.BytesType
   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
   Row cache size / save period in seconds / keys to save : 0.0/0/all
   Row Cache Provider: 
org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
   Key cache size / save period in seconds: 20.0/14400
   GC grace seconds: 864000
   Compaction min/max thresholds: 4/32
   Read repair chance: 1.0
   Replicate on write: true
   Bloom Filter FP chance: default
   Built indexes: []
   Column Metadata:
 Column Name: HBX_FIL_DATE
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
 Column Name: HBX_FIL_LARGE
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
 Column Name: HBX_FIL_MEDIUM
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
 Column Name: HBX_FIL_SMALL
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
 Column Name: HBX_FIL_STATUS
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: HBX_FILE_HBX_FIL_STATUS_idx
   Index Type: KEYS
 Column Name: HBX_FIL_TINY
   Validation Class: org.apache.cassandra.db.marshal.AsciiType
   Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy

   And the same error, with other words, in the CLI:

[default@avatars] get HBX_FILE where HBX_FIL_STATUS = 'actif';
No indexed columns present in index clause with operator EQ

   Am I missing something? Might as well be that I'm too tired...

--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com



_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.






RE: Bad Request: No indexed columns present in by-columns clause with "equals" operator

2012-04-23 Thread mdione.ext
De : Dave Brosius [mailto:dbros...@mebigfatguy.com]
> Works for me on trunk... what version are you using?

  Beh, I forgot that detail: 1.0.9.

--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



RE: Bad Request: No indexed columns present in by-columns clause with "equals" operator

2012-04-23 Thread mdione.ext
De : mdione@orange.com [mailto:mdione@orange.com]
> De : Dave Brosius [mailto:dbros...@mebigfatguy.com]
> > Works for me on trunk... what version are you using?
> 
>   Beh, I forgot that detail: 1.0.9.

  I also forgot to mention: the index was recently created, after the database 
was populated, but there is not much data in the database.

--
Marcos Dione
SysAdmin
Astek Sud-Est
pour FT/TGPF/OPF/PORTAIL/DOP/HEBEX @ Marco Polo
04 97 12 62 45 - mdione@orange.com

_

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete 
altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages 
that have been modified, changed or falsified.
Thank you.



Failing to delete commitlog at startup/shutdown (Windows)

2012-04-23 Thread Conan Cook
Hi,

I'm experiencing a problem running a suite of integration tests on Windows
7, using Cassandra 1.0.9 and Java 1.6.0_31.  A new cassandra instance is
spun up for each test class and shut down afterwards, using the Maven
Failsafe plugin.  The problem is that the Commitlog file seems to be kept
open, and so subsequent test classes fail to delete it.  Here is the stack
trace:

java.io.IOException: Failed to delete
D:\amee.realtime.api\server\engine\tmp\var\lib\cassandra\commitlog\CommitLog-1335190398587.log
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
 at
org.apache.cassandra.io.util.FileUtils.deleteRecursive(FileUtils.java:220)
at
org.apache.cassandra.io.util.FileUtils.deleteRecursive(FileUtils.java:216)
...

I've tried to delete the file when shutting down Cassandra and before
firing up a new one.  I've tried setting the failsafe plugin's forkMode to
both "once" and "always", so that it fires up a new JVM for each test or a
single JVM for all tests; the results are similar.  Debugging through the
code takes me right down to the native method call in the windows
filesystem class in the JVM, and an access denied error is returned; I'm
also unable to delete it manually through Windows Explorer or a terminal
window at that point (with the JVM suspended), and running Process Explorer
indicates that a Java process has a handle open to that file.

I've read a number of posts and mails mentioning this problem and there is
a JIRA saying a similar problem is fixed (
https://issues.apache.org/jira/browse/CASSANDRA-1348).  I've tried a number
of things to clean up the Commitlog file after each test is complete, and
have followed the recommendations made here (I'm also using Hector's
EmbeddedServerHelper to start/stop Cassandra):
http://stackoverflow.com/questions/7944287/how-to-cleanup-embedded-cassandra-after-unittest

Does anyone have any ideas on how to avoid this issue?  I don't have any
way of knowing what it is that's holding onto this file other than a Java
process.

Thanks!


Conan


Re: Failing to delete commitlog at startup/shutdown (Windows)

2012-04-23 Thread Steve Neely
We used a modified version of Ran's embedded Cassandra for a while:
http://prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/which
worked well for us. You have way more control over that.

Recently, we switched to having a single Cassandra installation that runs
all the time. Kind of like you'd treat a regular relational DB. Just fire
up Cassandra, leave it running and point your tests at that instance. Seems
like starting up your data store every time you execute integration tests
will slow them down and isn't really helpful.

BTW, you may want to scrub the test data out of Cassandra when you're test
suite finishes.

-- Steve


On Mon, Apr 23, 2012 at 8:41 AM, Conan Cook  wrote:

> Hi,
>
> I'm experiencing a problem running a suite of integration tests on Windows
> 7, using Cassandra 1.0.9 and Java 1.6.0_31.  A new cassandra instance is
> spun up for each test class and shut down afterwards, using the Maven
> Failsafe plugin.  The problem is that the Commitlog file seems to be kept
> open, and so subsequent test classes fail to delete it.  Here is the stack
> trace:
>
> java.io.IOException: Failed to delete
> D:\amee.realtime.api\server\engine\tmp\var\lib\cassandra\commitlog\CommitLog-1335190398587.log
> at
> org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
>  at
> org.apache.cassandra.io.util.FileUtils.deleteRecursive(FileUtils.java:220)
> at
> org.apache.cassandra.io.util.FileUtils.deleteRecursive(FileUtils.java:216)
> ...
>
> I've tried to delete the file when shutting down Cassandra and before
> firing up a new one.  I've tried setting the failsafe plugin's forkMode to
> both "once" and "always", so that it fires up a new JVM for each test or a
> single JVM for all tests; the results are similar.  Debugging through the
> code takes me right down to the native method call in the windows
> filesystem class in the JVM, and an access denied error is returned; I'm
> also unable to delete it manually through Windows Explorer or a terminal
> window at that point (with the JVM suspended), and running Process Explorer
> indicates that a Java process has a handle open to that file.
>
> I've read a number of posts and mails mentioning this problem and there is
> a JIRA saying a similar problem is fixed (
> https://issues.apache.org/jira/browse/CASSANDRA-1348).  I've tried a
> number of things to clean up the Commitlog file after each test is
> complete, and have followed the recommendations made here (I'm also using
> Hector's EmbeddedServerHelper to start/stop Cassandra):
> http://stackoverflow.com/questions/7944287/how-to-cleanup-embedded-cassandra-after-unittest
>
> Does anyone have any ideas on how to avoid this issue?  I don't have any
> way of knowing what it is that's holding onto this file other than a Java
> process.
>
> Thanks!
>
>
> Conan
>
>


cassandra.input.split.size and number of mappers

2012-04-23 Thread Filippo Diotalevi
Hi,
I'm finding very difficult to try to understand how Hadoop and Cassandra 
(CDH3u3 and 1.0.8 respectively) splits the work between mappers.


The thing that confuses me is that, for any value of cassandra.input.split.size 
I set, I always get 1 (at most 2) mapper per node.

I'm trying to debug the Cassandra code connecting with a 3 node cluster, and I 
notice the following things

** ColumnFamilyInputFormat.getRangeMap returns (correctly, I assume) 3 ranges  
[TokenRange(start_token:0, end_token:56713727820156410577229101238628035242, ….
TokenRange(start_token:56713727820156410577229101238628035242, 
end_token:113427455640312814857969558651062452224, ….
TokenRange(start_token:113427455640312814857969558651062452224, end_token:0, 
…….]

** Inside the SplitCallable object, the getSubsplits methods always return 1 
split.  
Irregardless of the splitSize, the call to client.describe_splits(..)   always 
return 1 split (which is the original range).


I should mention  also that the CF I'm trying to map/reduce is composed of 
around 1500 rows, and I've tried split size ranging from 1000 to 10 without 
change, except for a "sweet spot" split size of 120 that creates exactly 2 
mappers per node. However, decreasing the split size under 120 has the effect 
of Hadoop creating again 1 mapper per node.

It seems to me that, with my current Cassandra configuration, the 
describe_splits RPC call always return 1 or 2, irregardless of the 
keys_per_split value passed.

Is it maybe a Cassadra configuration? Or can it be a bug in the code?

Thanks,
--  
Filippo Diotalevi



High latency of cassandra

2012-04-23 Thread 马超
Hi all,

I have some troubles of cassandra in my production:

I build up a RPC server which using hector client to manipulate the
cassandra. Wired things happen nowadays: the latency of RPC sometimes
becames very high (10seconds~70seconds) in several minutes and reduce
to normal level (30ms in average) after that time. I investigate the
debug log of cassandra. During high latency time, the cassandra output
lots of message like:
"IncomingTcpConnection.java(116) Version is now 3. "
Seems everything be blocked during that time.

Our settings as following:
The version of cassandra is 1.0.1 and hector version is 0.7.0 for the
compatible of thrift version which we use (0.5.0)
The cluster contains 4 nodes and all of them are seeds. The
gcgraceseconds is 0 since we needn't delete the data

p.s. It works well for a long time (3 months) but becames crazy these
days after we push the new RPC server which supports bigger data
saving (2mb in average). I'm not sure if these is the reason.

Hope getting your replay~~

Thanks,

Chao.


Re: auto-generate data

2012-04-23 Thread Tyler Hobbs
Yes, use the stress tool:
http://www.datastax.com/docs/1.0/references/stress_java

On Mon, Apr 23, 2012 at 8:25 AM, puneet loya  wrote:

> Can we auto-generate random data in cassandra?
>
>
> Thanks and Regaads,
>
> Puneet
>



-- 
Tyler Hobbs
DataStax 


Cassandra dying when gets many deletes

2012-04-23 Thread crypto five
Hi,

I have 50 millions of rows in column family on 4G RAM box. I allocatedf 2GB
to cassandra.
I have program which is traversing this CF and cleaning some data there, it
generates about 20k delete statements per second.
After about of 3 millions deletions cassandra stops responding to queries:
it doesn't react to CLI, nodetool etc.
I see in the logs that it tries to free some memory but can't even if I
wait whole day.
Also I see following in  the logs:

INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
2647) Unable to reduce heap usage since there are no dirty column families

When I am looking at memory dump I see that memory goes to
ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).

What can I do to make cassandra stop dying?
Why it can't free the memory?
Any ideas?

Thank you.


Re: Kundera 2.0.6 Released

2012-04-23 Thread Codevally
Thanks guys for the all hard working. Special thanks to Vivek and Amresh. 
Finally we are integrating Kundera to our production codes and they will go 
live very soon.

Best Regards

/Roshan.

On Saturday, April 21, 2012 8:08:18 AM UTC+10, Kundera Team wrote:
>
>  Hi All,
>
> We are happy to announce release of Kundera 2.0.6.
>
> Kundera is a JPA 2.0 based, object-datastore papping library for NoSQL 
> datastores. The idea behind Kundera is to make working with NoSQL Databases
> drop-dead simple and fun. It currently supports Cassandra, HBase, MongoDB 
> and relational databases.
>
> Major Changes in this release:
> ---
> * HBase 0.90.x migration.
> * Enhanced Persistence Context.
> * Named and Native queries support (including CQL support for cassandra)
> * UPDATE and DELETE queries support.
> * DDL auto-schema creation.
> * Performance improvements.
>
>
> To download, use or contribute to Kundera, visit:
> http://github.com/impetus-opensource/Kundera
> Latest released tag version is 2.0.6. Kundera maven libraries are now 
> available at: 
> https://oss.sonatype.org/content/repositories/releases/com/impetus 
>
> Sample codes and examples for using Kundera can be found here:
> http://github.com/impetus-opensource/Kundera-Examples
>
> Thank you all for your contributions!
>
> Regards,
> Kundera Team.
>
>  
> --
>
> Watch live presentations and demo of Impetus’ IP Enabled Testing Solutions 
> for Load Testing & Mobile Test Automation at the Star East Virtual Event 
> http://www.sqe.com/stareast/virtualconference/. 
>
> View Impetus webcast ‘Testing Automation of Mobile Apps – The Best 
> Practices’ at http://lf1.me/A5/mautomaterecording . 
>
>
> NOTE: This message may contain information that is confidential, 
> proprietary, privileged or otherwise protected by law. The message is 
> intended solely for the named addressee. If received in error, please 
> destroy and notify the sender. Any use of this email is prohibited when 
> received in error. Impetus does not represent, warrant and/or guarantee, 
> that the integrity of this communication has been maintained nor that the 
> communication is free of errors, virus, interception or interference.
>  


Re: Cassandra dying when gets many deletes

2012-04-23 Thread Віталій Тимчишин
See https://issues.apache.org/jira/browse/CASSANDRA-3741
I did post a fix there that helped me.

2012/4/24 crypto five 

> Hi,
>
> I have 50 millions of rows in column family on 4G RAM box. I allocatedf
> 2GB to cassandra.
> I have program which is traversing this CF and cleaning some data there,
> it generates about 20k delete statements per second.
> After about of 3 millions deletions cassandra stops responding to queries:
> it doesn't react to CLI, nodetool etc.
> I see in the logs that it tries to free some memory but can't even if I
> wait whole day.
> Also I see following in  the logs:
>
> INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
> 2647) Unable to reduce heap usage since there are no dirty column families
>
> When I am looking at memory dump I see that memory goes to
> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>
> What can I do to make cassandra stop dying?
> Why it can't free the memory?
> Any ideas?
>
> Thank you.
>



-- 
Best regards,
 Vitalii Tymchyshyn


Highest and lowest valid values for UUIDs/TimeUUIDs

2012-04-23 Thread Drew Kutcharian
Hi All,

Considering that UUIDs are compared as numbers in Java [1], what are the lowest 
and highest possible values a valid UUID can have? How about TimeUUIDs?

The reason I ask is that I would like to pick a "default" UUID value in a 
composite column definition like Composite(UUID1, UUID2) where UUID1 can be set 
to the default value if not supplied. In addition, it'd be nice if the 
"default" columns are always sorted before the rest of the columns.

I was thinking of just doing "new UUID(Long.MAX_VALUE, Long.MAX_VALUE)" or "new 
UUID(Long.MIN_VALUE, Long.MIN_VALUE)" but not sure if that's going to cause 
other issues that I'm not aware of.

Thanks,

Drew


[1] Here's the compareTo of java.util.UUID as a reference:

public int compareTo(UUID val) {
// The ordering is intentionally set up so that the UUIDs
// can simply be numerically compared as two numbers
return (this.mostSigBits < val.mostSigBits ? -1 : 
(this.mostSigBits > val.mostSigBits ? 1 :
 (this.leastSigBits < val.leastSigBits ? -1 :
  (this.leastSigBits > val.leastSigBits ? 1 :
   0;
}



Re: Cassandra dying when gets many deletes

2012-04-23 Thread crypto five
Thank you Vitalii.

Looking at the Jonathan's answer to your patch I think it's probably not my
case. I see that LiveRatio is calculated in my case, but calculations look
strange:

WARN [MemoryMeter:1] 2012-04-23 23:29:48,430 Memtable.java (line 181)
setting live ratio to maximum of 64 instead of Infinity
 INFO [MemoryMeter:1] 2012-04-23 23:29:48,432 Memtable.java (line 186)
CFS(Keyspace='lexems', ColumnFamily='countersCF') liveRatio is 64.0
(just-counted was 64.0).  calculation took 63355ms for 0 columns

Looking at the comments in the code: "If it gets higher than 64 something
is probably broken.", looks like it's probably the problem.
Not sure how to investigate it.

2012/4/23 Віталій Тимчишин 

> See https://issues.apache.org/jira/browse/CASSANDRA-3741
> I did post a fix there that helped me.
>
>
> 2012/4/24 crypto five 
>
>> Hi,
>>
>> I have 50 millions of rows in column family on 4G RAM box. I allocatedf
>> 2GB to cassandra.
>> I have program which is traversing this CF and cleaning some data there,
>> it generates about 20k delete statements per second.
>> After about of 3 millions deletions cassandra stops responding to
>> queries: it doesn't react to CLI, nodetool etc.
>> I see in the logs that it tries to free some memory but can't even if I
>> wait whole day.
>> Also I see following in  the logs:
>>
>> INFO [ScheduledTasks:1] 2012-04-23 18:38:13,333 StorageService.java (line
>> 2647) Unable to reduce heap usage since there are no dirty column families
>>
>> When I am looking at memory dump I see that memory goes to
>> ConcurrentSkipListMap(10%), HeapByteBuffer(13%), DecoratedKey(6%),
>> int[](6%), BigInteger(8.2%), ConcurrentSkipListMap$HeadIndex(7.2%),
>> ColumnFamily(6.5%), ThreadSafeSortedColumns(13.7%), long[](5.9%).
>>
>> What can I do to make cassandra stop dying?
>> Why it can't free the memory?
>> Any ideas?
>>
>> Thank you.
>>
>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>