Re: Understanding Virtual Nodes on Cassandra 1.2

2013-01-29 Thread aaron morton
> After I searched some document on Datastax website and some old ticket, seems 
> that it works for random partitioner only, and leaves order preserved 
> partitioner out of the luck.
Links ? 

>   or allow add Virtual Nodes manually?
If not looked into it but there is a cassandra.inital_token startup param that 
takes a comma separated list of tokens for the node.

There also appears to be support for the ordered partitions to generate random 
tokens. 

But you would still have the problem of having to balance your row keys around 
the token space. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 10:31 AM, Zhong Li  wrote:

> Hi All,
> 
> Virtual Nodes is great feature. After I searched some document on Datastax 
> website and some old ticket, seems that it works for random partitioner only, 
> and leaves order preserved partitioner out of the luck. I may misunderstand, 
> please correct me. if it doesn't love order preserved partitioner, would be 
> possible to add support multiple initial_token(s) for  order preserved 
> partitioner  or allow add Virtual Nodes manually? 
> 
> Thanks,
> 
> Zhong



Re: Cass returns Incorrect column data on writes during flushing

2013-01-29 Thread aaron morton
> Ie. Query for a single column works but the column does not appear in slice 
> queries depending on the other columns in the query
> 
> cfq.getKey("foo").getColumn("A") returns "A"
> cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
> cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
Can you replicate this using cassandra-cli or CQL ? 
Makes it clearer what's happening and removes any potential issues with the 
client or your code.
If you cannot repo it show you astynax code.
 
Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 1:15 PM, Elden Bishop  wrote:

> I'm trying to track down some really worrying behavior. It appears that 
> writing multiple columns while a table flush is occurring can result in 
> Cassandra recording its data in a way that makes columns visible only to some 
> queries but not others.
> 
> Ie. Query for a single column works but the column does not appear in slice 
> queries depending on the other columns in the query
> 
> cfq.getKey("foo").getColumn("A") returns "A"
> cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
> cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
> 
> This is a permanent condition meaning that even hours later with no reads or 
> writes the DB will return the same results. I can reproduce this 100% of the 
> time by writing multiple columns and then reading a different set of multiple 
> columns. Columns written during the flush may or may not appear.
> 
> Details
> 
> # There are no log errors
> # All single column queries return correct data.
> # Slice queries may or may not return the column depending on which other 
> columns are in the query.
> # This is on a stock "unzip and run" installation of Cassandra using default 
> options only; basically doing the cassandra getting started tutorial and 
> using the Demo table described in that tutorial.
> # Cassandra 1.2.0 using Astynax and Java 1.6.0_37.
> # There are no errors but there is always a "flushing high traffic column 
> family" that happens right before the incoherent state occurs
> # to reproduce just update multiple columns at the same time, using random 
> rows and then verify the writes by reading multiple columns. I get can 
> generate the error on 100% of runs. Once the state is screwed up, the multi 
> column read will not contain the column but the single column read will.
> 
> Log snippet
>  INFO 15:47:49,066 GC for ParNew: 320 ms for 1 collections, 20712 used; 
> max is 1052770304
>  INFO 15:47:58,076 GC for ParNew: 330 ms for 1 collections, 232839680 used; 
> max is 1052770304
>  INFO 15:48:00,374 flushing high-traffic column family CFS(Keyspace='BUGS', 
> ColumnFamily='Test') (estimated 50416978 bytes)
>  INFO 15:48:00,374 Enqueuing flush of 
> Memtable-Test@1575891161(4529586/50416978 serialized/live bytes, 279197 ops)
>  INFO 15:48:00,378 Writing Memtable-Test@1575891161(4529586/50416978 
> serialized/live bytes, 279197 ops)
>  INFO 15:48:01,142 GC for ParNew: 654 ms for 1 collections, 239478568 used; 
> max is 1052770304
>  INFO 15:48:01,474 Completed flushing 
> /var/lib/cassandra/data/BUGS/Test/BUGS-Test-ia-45-Data.db (4580066 bytes) for 
> commitlog position ReplayPosition(segmentId=1359415964165, position=7462737)
> 
> 
> Any ideas on what could be going on? I could not find anything like this in 
> the open bugs and the only workaround seems to be never doing multi-column 
> reads or writes. I'm concerned that the DB can get into a state where 
> different queries can return such inconsistent results. All with no warning 
> or errors. There is no way to even verify data correctness; every column can 
> seem correct when queried and then disappear during slice queries depending 
> on the other columns in the query.
> 
> 
> Thanks



RE: getting error for decimal type data

2013-01-29 Thread Rishabh Agrawal
Can you provide specs of the column family using describe.

From: Kuldeep Mishra [mailto:kuld.cs.mis...@gmail.com]
Sent: Tuesday, January 29, 2013 12:37 PM
To: user@cassandra.apache.org
Subject: getting error for decimal type data

while I an trying to list column family data using cassandra-cli then I am 
getting following problem for decimal type data,
any suggestion will be appreciated.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.(StringBuilder.java:80)
at java.math.BigDecimal.getValueString(BigDecimal.java:2885)
at java.math.BigDecimal.toPlainString(BigDecimal.java:2869)
at org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72)
at 
org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62)
at org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873)
at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:337)


--
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Cassandra pending compaction tasks keeps increasing

2013-01-29 Thread Wei Zhu
Thanks for the reply. Here is some information:

Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 

* I don't see any log about "wide rows"

Are you seeing GC activity logged or seeing CPU steal on a VM ? 

* There is some GC, but CPU general is under 20%. We have heap size of 8G, RAM 
is at 72G.

Have you tried disabling multithreaded_compaction ? 

* By default, it's disabled. We enabled it, but doesn't see much difference. 
Even a little slower with it's enabled. Is it bad to enable it? We have SSD, 
according to comment in yaml, it should help while using SSD.

Are you using Key Caches ? Have you tried disabling 
compaction_preheat_key_cache? 

* We have fairly big Key caches, we set as 10% of Heap which is 800M. Yes, 
compaction_preheat_key_cache is disabled. 

Can you enabled DEBUG level logging and make them available ? 

* Will try it tomorrow. Do I need to restart server to change the log level?


-Wei

- Original Message -
From: "aaron morton" 
To: user@cassandra.apache.org
Sent: Monday, January 28, 2013 11:31:42 PM
Subject: Re: Cassandra pending compaction tasks keeps increasing







* Why nodetool repair increases the data size that much? It's not likely that 
much data needs to be repaired. Will that happen for all the subsequent repair? 
Repair only detects differences in entire rows. If you have very wide rows then 
small differences in rows can result in a large amount of streaming. 
Streaming creates new SSTables on the receiving side, which then need to be 
compacted. So repair often results in compaction doing it's thing for a while. 








* How to make LCS run faster? After almost a day, the LCS tasks only dropped by 
1000. I am afraid it will never catch up. We set 


This is going to be tricky to diagnose, sorry for asking silly questions... 


Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 
Are you seeing GC activity logged or seeing CPU steal on a VM ? 
Have you tried disabling multithreaded_compaction ? 
Are you using Key Caches ? Have you tried disabling 
compaction_preheat_key_cache? 
Can you enabled DEBUG level logging and make them available ? 


Cheers 








- 
Aaron Morton 
Freelance Cassandra Developer 
New Zealand 


@aaronmorton 
http://www.thelastpickle.com 


On 29/01/2013, at 8:59 AM, Derek Williams < de...@fyrie.net > wrote: 



I could be wrong about this, but when repair is run, it isn't just values that 
are streamed between nodes, it's entire sstables. This causes a lot of 
duplicate data to be written which was already correct on the node, which needs 
to be compacted away. 


As for speeding it up, no idea. 



On Mon, Jan 28, 2013 at 12:16 PM, Wei Zhu < wz1...@yahoo.com > wrote: 


Any thoughts? 


Thanks. 
-Wei 

- Original Message - 

From: "Wei Zhu" < wz1...@yahoo.com > 
To: user@cassandra.apache.org 

Sent: Friday, January 25, 2013 10:09:37 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 



To recap the problem, 
1.1.6 on SSD, 5 nodes, RF = 3, one CF only. 
After data load, initially all 5 nodes have very even data size (135G, each). I 
ran nodetool repair -pr on node 1 which have replicates on node 2, node 3 since 
we set RF = 3. 
It appears that huge amount of data got transferred. Node 1 has 220G, node 2, 3 
have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have around 
7K each. 
Questions: 

* Why nodetool repair increases the data size that much? It's not likely that 
much data needs to be repaired. Will that happen for all the subsequent repair? 
* How to make LCS run faster? After almost a day, the LCS tasks only dropped by 
1000. I am afraid it will never catch up. We set 


* compaction_throughput_mb_per_sec = 500 
* multithreaded_compaction: true 



Both Disk and CPU util are less than 10%. I understand LCS is single threaded, 
any chance to speed it up? 


* We use default SSTable size as 5M, Will increase the size of SSTable help? 
What will happen if I change the setting after the data is loaded. 


Any suggestion is very much appreciated. 

-Wei 


- Original Message - 

From: "Wei Zhu" < wz1...@yahoo.com > 
To: user@cassandra.apache.org 

Sent: Thursday, January 24, 2013 11:46:04 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

I believe I am running into this one: 

https://issues.apache.org/jira/browse/CASSANDRA-4765 

By the way, I am using 1.1.6 (I though I was using 1.1.7) and this one is fixed 
in 1.1.7. 



- Original Message - 

From: "Wei Zhu" < wz1...@yahoo.com > 
To: user@cassandra.apache.org 
Sent: Thursday, January 24, 2013 11:18:59 PM 
Subject: Re: Cassandra pending compaction tasks keeps increasing 

Thanks Derek, 
in the cassandra-env.sh, it says 

# reduce the per-thread stack size to minimize the impact of Thrift 
# thread-per-client. (Best practice is for client connections to 
# be pooled anyway.) Only do so on Linux where it is known to be 
# suppor

ConfigHelper.setThriftContact() undefined in cassandra v1.2

2013-01-29 Thread Tejas Patil
I am trying out the example given in Cassandra Definitive guide, Ch 12.
This statement gives error and I am not able to figure out the replacement
for it:
*ConfigHelper.setThriftContact(job.getConfiguration(), "localhost",  9160);*

Also,

*IColumn column = columns.get(columnName.getBytes());*
*String value = new String(column.value());*

column.value() gives compilation error. any solutions ?

Thanks,
Tejas Patil


Re: getting error for decimal type data

2013-01-29 Thread Kuldeep Mishra
ColumnFamily: STUDENT
  Key Validation Class: org.apache.cassandra.db.marshal.LongType
  Default column value validator:
org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: [STUDENT.STUDENT_AGE_idx,
STUDENT.STUDENT_BIG_DECIMAL_idx, STUDENT.STUDENT_PERCENTAGE_idx,
STUDENT.STUDENT_ROLL_NUMBER_idx, STUDENT.STUDENT_SEMESTER_idx,
STUDENT.STUDENT_STUDENT_NAME_idx, STUDENT.STUDENT_UNIQUE_ID_idx]
  Column Metadata:
Column Name: PERCENTAGE
  Validation Class: org.apache.cassandra.db.marshal.FloatType
  Index Name: STUDENT_PERCENTAGE_idx
  Index Type: KEYS
Column Name: AGE
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Name: STUDENT_AGE_idx
  Index Type: KEYS
Column Name: SEMESTER
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: STUDENT_SEMESTER_idx
  Index Type: KEYS
Column Name: ROLL_NUMBER
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: STUDENT_ROLL_NUMBER_idx
  Index Type: KEYS
Column Name: UNIQUE_ID
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: STUDENT_UNIQUE_ID_idx
  Index Type: KEYS
Column Name: STUDENT_NAME
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: STUDENT_STUDENT_NAME_idx
  Index Type: KEYS
*Column Name: BIG_DECIMAL
  Validation Class: org.apache.cassandra.db.marshal.DecimalType
  Index Name: STUDENT_BIG_DECIMAL_idx
  Index Type: KEYS*
  Compaction Strategy:
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression:
org.apache.cassandra.io.compress.SnappyCompressor


for value of *BIG_DECIMAL is *2.28542855225E-825373481



Thanks
Kuldeep

On Tue, Jan 29, 2013 at 1:52 PM, Rishabh Agrawal <
rishabh.agra...@impetus.co.in> wrote:

>  Can you provide specs of the column family using describe.
>
>
>
> *From:* Kuldeep Mishra [mailto:kuld.cs.mis...@gmail.com]
> *Sent:* Tuesday, January 29, 2013 12:37 PM
> *To:* user@cassandra.apache.org
> *Subject:* getting error for decimal type data
>
>
>
> while I an trying to list column family data using cassandra-cli then I am
> getting following problem for decimal type data,
> any suggestion will be appreciated.
>
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at
> java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
> at java.lang.StringBuilder.(StringBuilder.java:80)
> at java.math.BigDecimal.getValueString(BigDecimal.java:2885)
> at java.math.BigDecimal.toPlainString(BigDecimal.java:2869)
> at
> org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72)
> at
> org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62)
> at
> org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873)
> at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486)
> at
> org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
> at
> org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210)
> at org.apache.cassandra.cli.CliMain.main(CliMain.java:337)
>
>
> --
> Thanks and Regards
> Kuldeep Kumar Mishra
> +919540965199
>
> --
>
>
>
>
>
>
> NOTE: This message may contain information that is confidential,
> proprietary, privileged or otherwise protected by law. The message is
> intended solely for the named addressee. If received in error, please
> destroy and notify the sender. Any use of this email is prohibited when
> received in error. Impetus does not represent, warrant and/or guarantee,
> that the integrity of this communication has been maintained nor that the
> communication is free of errors, virus, interception or interference.
>



-- 
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199


RE: getting error for decimal type data

2013-01-29 Thread Rishabh Agrawal
Did u trt accessing this cf from CQL, I think it must work from there, also try 
accessing it through any API and see if error persists.

Thanks
Rishabh  Agrawal
From: Kuldeep Mishra [mailto:kuld.cs.mis...@gmail.com]
Sent: Tuesday, January 29, 2013 2:51 PM
To: user@cassandra.apache.org
Subject: Re: getting error for decimal type data

ColumnFamily: STUDENT
  Key Validation Class: org.apache.cassandra.db.marshal.LongType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
  GC grace seconds: 864000
  Compaction min/max thresholds: 4/32
  Read repair chance: 0.1
  DC Local Read repair chance: 0.0
  Replicate on write: true
  Caching: KEYS_ONLY
  Bloom Filter FP chance: default
  Built indexes: [STUDENT.STUDENT_AGE_idx, STUDENT.STUDENT_BIG_DECIMAL_idx, 
STUDENT.STUDENT_PERCENTAGE_idx, STUDENT.STUDENT_ROLL_NUMBER_idx, 
STUDENT.STUDENT_SEMESTER_idx,  STUDENT.STUDENT_STUDENT_NAME_idx, 
STUDENT.STUDENT_UNIQUE_ID_idx]
  Column Metadata:
Column Name: PERCENTAGE
  Validation Class: org.apache.cassandra.db.marshal.FloatType
  Index Name: STUDENT_PERCENTAGE_idx
  Index Type: KEYS
Column Name: AGE
  Validation Class: org.apache.cassandra.db.marshal.IntegerType
  Index Name: STUDENT_AGE_idx
  Index Type: KEYS
Column Name: SEMESTER
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: STUDENT_SEMESTER_idx
  Index Type: KEYS
Column Name: ROLL_NUMBER
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: STUDENT_ROLL_NUMBER_idx
  Index Type: KEYS
Column Name: UNIQUE_ID
  Validation Class: org.apache.cassandra.db.marshal.LongType
  Index Name: STUDENT_UNIQUE_ID_idx
  Index Type: KEYS
Column Name: STUDENT_NAME
  Validation Class: org.apache.cassandra.db.marshal.UTF8Type
  Index Name: STUDENT_STUDENT_NAME_idx
  Index Type: KEYS
Column Name: BIG_DECIMAL
  Validation Class: org.apache.cassandra.db.marshal.DecimalType
  Index Name: STUDENT_BIG_DECIMAL_idx
  Index Type: KEYS
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
  Compression Options:
sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor


for value of BIG_DECIMAL is 2.28542855225E-825373481



Thanks
Kuldeep
On Tue, Jan 29, 2013 at 1:52 PM, Rishabh Agrawal 
mailto:rishabh.agra...@impetus.co.in>> wrote:
Can you provide specs of the column family using describe.

From: Kuldeep Mishra 
[mailto:kuld.cs.mis...@gmail.com]
Sent: Tuesday, January 29, 2013 12:37 PM
To: user@cassandra.apache.org
Subject: getting error for decimal type data

while I an trying to list column family data using cassandra-cli then I am 
getting following problem for decimal type data,
any suggestion will be appreciated.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.(StringBuilder.java:80)
at java.math.BigDecimal.getValueString(BigDecimal.java:2885)
at java.math.BigDecimal.toPlainString(BigDecimal.java:2869)
at org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72)
at 
org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62)
at org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873)
at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486)
at 
org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
at 
org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:337)


--
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.



--
Thanks and Regards
Kuldeep Kumar Mishra
+919540965199








NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does 

Re: Cassandra timeout whereas it is not much busy

2013-01-29 Thread Nicolas Lalevée

Le 29 janv. 2013 à 08:08, aaron morton  a écrit :

>> From what I could read there seems to be a contention issue around the 
>> flushing (the "switchlock" ?). Cassandra would then be slow, but not using 
>> the entire cpu. I would be in the strange situation I was where I reported 
>> my issue in this thread.
>> Does my theory makes sense ?
> If you are seeing contention around the switch lock you will see a pattern in 
> the logs where a "Writing…" message is immediately followed by an "Enqueing…" 
> message. This happens when the flush_queue is full and the thread flushing 
> (either because of memory, commit log or snapshot etc) is waiting. 
> 
> See the comments for memtable_flush_queue_size in the yaml file. 
> 
> If you increase the value you will flush more frequently as C* leaves for 
> memory to handle the case where the queue is full. 
> 
> If you have spare IO you could consider increasing memtable_flush_writers

ok. I see.

I think that the RAM upgrade will fix most of my issues. But if I come to see 
that situation again, I'll definitively look into tuning memtable_flush_writers.

Thanks for your help.

Nicolas

> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 4:19 AM, Nicolas Lalevée  wrote:
> 
>> I did some testing, I have a theory.
>> 
>> First, we have it seems "a lot" of CF. And two are particularly every hungry 
>> in RAM, consuming a quite big amount of RAM for the bloom filters. Cassandra 
>> do not force the flush of the memtables if it has more than 6G of Xmx 
>> (luckily for us, this is the maximum reasonable we can give).
>> Since our machines have 8G, this gives quite a little room for the disk 
>> cache. Thanks to this systemtap script [1], I have seen that the hit ratio 
>> is about 10%.
>> 
>> Then I have tested with an Xmx at 4G. So %wa drops down. The disk cache 
>> ratio raises to 80%. On the other hand, flushing is happening very often. I 
>> cannot say how much, since I have too many CF to graph them all. But the 
>> ones I graph, none of their memtable goes above 10M, whereas they usually go 
>> up to 200M.
>> 
>> I have not tested further. Since it is quite obvious that the machines needs 
>> more RAM. And they're about to receive more.
>> 
>> But I guess that if I had to put more write and read pressure, with still an 
>> xmx at 4G, the %wa would still be quite low, but the flushing would be even 
>> more intensive. And I guess that it would go wrong. From what I could read 
>> there seems to be a contention issue around the flushing (the "switchlock" 
>> ?). Cassandra would then be slow, but not using the entire cpu. I would be 
>> in the strange situation I was where I reported my issue in this thread.
>> Does my theory makes sense ?
>> 
>> Nicolas
>> 
>> [1] http://sourceware.org/systemtap/wiki/WSCacheHitRate
>> 
>> Le 23 janv. 2013 à 18:35, Nicolas Lalevée  a 
>> écrit :
>> 
>>> Le 22 janv. 2013 à 21:50, Rob Coli  a écrit :
>>> 
 On Wed, Jan 16, 2013 at 1:30 PM, Nicolas Lalevée
  wrote:
> Here is the long story.
> After some long useless staring at the monitoring graphs, I gave a try to
> using the openjdk 6b24 rather than openjdk 7u9
 
 OpenJDK 6 and 7 are both counter-recommended with regards to
 Cassandra. I've heard reports of mysterious behavior like the behavior
 you describe, when using OpenJDK 7.
 
 Try using the Sun/Oracle JVM? Is your JNA working?
>>> 
>>> JNA is working.
>>> I tried both oracle-jdk6 and oracle-jdk7, no difference with openjdk6. And 
>>> since ubuntu is only maintaining openjdk, we'll stick with it until 
>>> oracle's one proven better.
>>> oracle vs openjdk, I tested for now under "normal" pressure though.
>>> 
>>> What amaze me is whatever how much I google it and ask around, I still 
>>> don't know for sure the difference between the openjdk and oracle's jdk…
>>> 
>>> Nicolas
>>> 
>> 
> 



Re: problem with Cassandra map-reduce support

2013-01-29 Thread Brian Jeltema
In hadoop-0.20.2, org.apache.hadoop.mapreduce.JobContext is a class. Looks like 
in hadoop-0.21+ JobContext has morphed into an interface.
I'd guess that Hadoop support in Cassandra is based on the older Hadoop.

Brian

On Jan 29, 2013, at 3:42 AM, Tejas Patil wrote:

> I am trying to run a map-reduce job to read data from Cassandra v1.2.0.
> I started off with the code here:
> https://svn.apache.org/repos/asf/cassandra/trunk/examples/hadoop_word_count/src/WordCount.java
> 
> While running it over hadoop-0.22.0, I get this:
> 
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found 
> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>   at 
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
>   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
>   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
>   at MyHadoopApp.run(MyHadoopApp.java:163)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>   at MyHadoopApp.main(MyHadoopApp.java:82)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
> 
> Does anyone knows about this ?
> 
> Thanks,
> Tejas Patil



Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-29 Thread Andy Cobley
When connecting to Cassandra 1.2.0 from CQLSH the table was created with:

CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor' : 1};
cqlsh> use test;
cqlsh:test> create columnfamily users (KEY varchar Primary key, password 
varchar, gender varchar) ;
cqlsh:test> INSERT INTO users (KEY, password) VALUES ('jsmith', 'ch@ngem3a');
cqlsh:test> INSERT INTO users (KEY, gender) VALUES ('jbrown', 'male');

stack trace (generated et.printStackTrace()) is:

Can not execute statement java.lang.NullPointerException
at org.apache.cassandra.cql.jdbc.TypedColumn.(TypedColumn.java:45)
at 
org.apache.cassandra.cql.jdbc.CassandraResultSet.createColumn(CassandraResultSet.java:972)
at 
org.apache.cassandra.cql.jdbc.CassandraResultSet.populateColumns(CassandraResultSet.java:156)
at 
org.apache.cassandra.cql.jdbc.CassandraResultSet.(CassandraResultSet.java:130)
at 
org.apache.cassandra.cql.jdbc.CassandraStatement.doExecute(CassandraStatement.java:167)
at 
org.apache.cassandra.cql.jdbc.CassandraStatement.executeQuery(CassandraStatement.java:227)
at uk.ac.dundee.computing.aec.test.test.doGet(test.java:51)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:728)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:305)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:680)

Hope that helps !

Andy


On 29 Jan 2013, at 07:17, aaron morton  wrote:

> What is your table spec ? 
> Do you have the full stack trace from the exception ? 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 8:15 AM, Andy Cobley  wrote:
> 
>> I have the following code in my app using the JDBC 
>> (cassandra-jdbc-1.1.2.jar) drivers to CQL:
>> 
>> try {
>>  rs= stmt.executeQuery("SELECT * FROM users");
>> }catch(Exception et){
>>  System.out.println("Can not execute statement "+et);
>> }
>> 
>> When connecting to a CQL2 server (cassandra 1.1.5) the code works as 
>> expected returning a result set .  When connecting to CQL3 (Cassandra 1.2) I 
>> catch the following exception:
>> 
>> Can not execute statement java.lang.NullPointerException
>> 
>> The Select statement (Select * from users) does work from CQLSH as expected. 
>>  Is there a problem with my code or something else ?
>> 
>> Andy C
>> School of Computing
>> University of Dundee.
>> 
>> 
>> 
>> The University of Dundee is a Scottish Registered Charity, No. SC015096.
> 


The University of Dundee is a Scottish Registered Charity, No. SC015096.




Re: JNA not found.

2013-01-29 Thread chandra Varahala
I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder

-chandra



On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy  wrote:

> I went to github to try to download jna again. I downloaded version 3.5.1
>
> [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
> -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar
>
> I noticed in the datastax docs that java 7 was not recommended so I
> downgraded to java 6
>
> [root@cassandra-node01 cassandrahome]# java -version
> java version "1.6.0_34"
> Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)
>
> And now if I try to start cassandra with that library it fails with this
> message:
>
> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>  INFO 22:00:14,318 Logging initialized
>  INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
> VM/1.6.0_34
>  INFO 22:00:14,334 Heap size: 301727744/302776320
>  INFO 22:00:14,334 Classpath:
> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna-3.5.1.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
> Killed
>
> I move the library back out of the lib directory and cassandra starts
> again albeit without JNA working quite naturally.
>
>
> Both my cassandra and java installs are tarball installs.
>
> Thanks
> Tim
>
>
> On Mon, Jan 28, 2013 at 6:29 PM, Tim Dunphy  wrote:
>
>> Hey List,
>>
>>  I just downloaded 1.21 and have set it up across my cluster. When I
>> noticed the following notice:
>>
>>  INFO 18:14:53,828 JNA not found. Native methods will be disabled.
>>
>> So I downloaded jna.jar from git hub and moved it to the cassandra /lib
>> directory. I changed mod to 755 as per the datastax docs. I've also tried
>> installing the jna package (via yum, I am using centos 6.2). Nothing seems
>> to do the trick, I keep getting this message. What can I do to get
>> cassandra 1.2.1 to recognize JNA?
>>
>> Thanks
>> Tim
>> --
>> GPG me!!
>>
>> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>>
>>
>
>
> --
> GPG me!!
>
> gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
>
>


SStable Writer and composite key

2013-01-29 Thread POUGET Laurent
Hi,


I have some trouble to request my data. I use SSTableSimpleUnsortedWriter to 
write SSTable. Writing and Importing works fine.
I think, I'm misusing CompositeType.Builder with SSTableSimpleUnsortedWriter.
Do you have any idea ?

Thanks

Here is my case :

/**
* CREATE STATEMENT
*/

CREATE TABLE raw_data (
  id text,
  date text,
  request text,
  data1 text,
  data2 text,
  PRIMARY KEY (id, date, request)
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};

/**
* JAVA CODE
*/

List> compositeList = new ArrayList>();

compositeList.add(UTF8Type.instance);
compositeList.add(UTF8Type.instance);

IPartitioner partitioner = StorageService.getPartitioner();
dir = Directories.create(keyspace.getKeyspaceName(), 
columnFamily.getName()).getDirectoryForNewSSTables(0);

simpleUnsortedWriter = new SSTableSimpleUnsortedWriter(dir,
   partitioner, 
keyspace.getKeyspaceName(),
   
columnFamily.getName(), UTF8Type.instance, null,
   32);

CompositeType.Builder builderRequestDate = new CompositeType.Builder( 
CompositeType.getInstance   (compositeList) );
CompositeType.Builder builderUrl = new CompositeType.Builder( 
CompositeType.getInstance(compositeList) );

simpleUnsortedWriter.newRow(bytes(id));

builderRequestDate.add(bytes("date"));
builderRequestDate.add(bytes("request"));

long timestamp = System.currentTimeMillis() * 1000;

simpleUnsortedWriter.addColumn(builderRequestDate.build(),
   bytes(date), timestamp);
simpleUnsortedWriter.addColumn(builderUrl.build(),
   bytes(request), timestamp);

simpleUnsortedWriter.addColumn(bytes("data1"),
   bytes(data1), timestamp);
simpleUnsortedWriter.addColumn(bytes("data2"),
   bytes(data2), timestamp);

simpleUnsortedWriter.close();




Laurent Pouget
Ingénieur étude et développement
Tel : 01.84.95.11.20

Car & Boat Media
22 Rue Joubert  75009 Paris







Re: Understanding Virtual Nodes on Cassandra 1.2

2013-01-29 Thread Zhong Li
I was misunderstood this  
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially 
"If you want to get started with vnodes on a fresh cluster, however, that is 
fairly straightforward. Just don’t set the initial_token parameter in 
yourconf/cassandra.yaml and instead enable the num_tokens parameter. A good 
default value for this is 256"

Also I couldn't find document about set multiple tokens for 
cassandra.inital_token 

Anyway, I just tested, it does work to set  comma separated list of tokens. 

Thanks,

Zhong


On Jan 29, 2013, at 3:06 AM, aaron morton wrote:

>> After I searched some document on Datastax website and some old ticket, 
>> seems that it works for random partitioner only, and leaves order preserved 
>> partitioner out of the luck.
> Links ? 
> 
>>   or allow add Virtual Nodes manually?
> If not looked into it but there is a cassandra.inital_token startup param 
> that takes a comma separated list of tokens for the node.
> 
> There also appears to be support for the ordered partitions to generate 
> random tokens. 
> 
> But you would still have the problem of having to balance your row keys 
> around the token space. 
> 
> Cheers
>  
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 29/01/2013, at 10:31 AM, Zhong Li  wrote:
> 
>> Hi All,
>> 
>> Virtual Nodes is great feature. After I searched some document on Datastax 
>> website and some old ticket, seems that it works for random partitioner 
>> only, and leaves order preserved partitioner out of the luck. I may 
>> misunderstand, please correct me. if it doesn't love order preserved 
>> partitioner, would be possible to add support multiple initial_token(s) for  
>> order preserved partitioner  or allow add Virtual Nodes manually? 
>> 
>> Thanks,
>> 
>> Zhong
> 



Uneven CPU load on a 4 node cluster

2013-01-29 Thread Jabbar
Hello,

I've been testing a four identical node cassanda 1.2 cluster for a number
of days. I have written a c# client using cassandra sharp() which inserts
data into a table.

The keyspace difinition is

CREATE KEYSPACE "data"
 WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};


The table definition is

CREATE TABLE datapoints (
 siteid bigint,
 time timestamp,
 channel int,
 data float,
 PRIMARY KEY ((siteid, channel),time)
)


I am finding that the CPU load on one of the servers stays at ~90% whilst
the load on the other servers stays < 40%. All the servers are supposed to
be identical.

The client library I  am using does load balancing between all nodes.

I have also used the cassandra stress tool as follows

cassandra-stress -d 192.168.21.7,192.168.21.9,192.168.21.12,192.168.21.14
--replication-factor 3 -n 1000 -t 100

and have found that  it behaves similarly.

Can somebody explain why this happens?




-- 
Thanks

 A Jabbar Azam


Re: Uneven CPU load on a 4 node cluster

2013-01-29 Thread Jabbar
Forgot to mention that I also used

ALTER KEYSPACE "Keyspace1" WITH REPLICATION =
  { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

To change the replication factor for Keyspace1. For some reason the command
line doesn't me to change the replication factor. I get the following error

Unable to create stress keyspace: Keyspace names must be case-insensitively
unique ("Keyspace1" conflicts with "Keyspace1")


On 29 January 2013 16:29, Jabbar  wrote:

> Hello,
>
> I've been testing a four identical node cassanda 1.2 cluster for a number
> of days. I have written a c# client using cassandra sharp() which inserts
> data into a table.
>
> The keyspace difinition is
>
> CREATE KEYSPACE "data"
>  WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'dc1' : 3};
>
>
> The table definition is
>
> CREATE TABLE datapoints (
>   siteid bigint,
>   time timestamp,
>   channel int,
>  data float,
>   PRIMARY KEY ((siteid, channel),time)
> )
>
>
> I am finding that the CPU load on one of the servers stays at ~90% whilst
> the load on the other servers stays < 40%. All the servers are supposed to
> be identical.
>
> The client library I  am using does load balancing between all nodes.
>
> I have also used the cassandra stress tool as follows
>
> cassandra-stress -d 192.168.21.7,192.168.21.9,192.168.21.12,192.168.21.14
> --replication-factor 3 -n 1000 -t 100
>
> and have found that  it behaves similarly.
>
> Can somebody explain why this happens?
>
>
>
>
> --
> Thanks
>
>  A Jabbar Azam
>



-- 
Thanks

 A Jabbar Azam


Re: Problem on node join the ring

2013-01-29 Thread Daning Wang
Thanks very much Aaron.

* Other nodes still report it is in "Joining"
* Here are bootstrap information in the log

[ca...@dsat305e.prod:/usr/local/cassy log]$ grep -i boot system.log
 INFO [main] 2013-01-28 20:16:07,488 StorageService.java (line 774)
JOINING: schema complete, ready to bootstrap
 INFO [main] 2013-01-28 20:16:07,489 StorageService.java (line 774)
JOINING: getting bootstrap token
 INFO [main] 2013-01-28 20:16:37,518 StorageService.java (line 774)
JOINING: Starting to bootstrap...

* I tried to run repair -pr, but it gives exception

[ca...@dsat305e.prod:/usr/local/cassy log]$ nodetool -h localhost repair -pr
Exception in thread "main" java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.getToken(TokenMetadata.java:304)
at
org.apache.cassandra.service.StorageService.getPrimaryRangeForEndpoint(StorageService.java:2080)
at
org.apache.cassandra.service.StorageService.getLocalPrimaryRange(StorageService.java:211)
at
org.apache.cassandra.service.StorageService.forceTableRepairPrimaryRange(StorageService.java:1993)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:111)
at
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:45)
at
com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:235)
at
com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at
com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:250)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)



On Mon, Jan 28, 2013 at 11:55 PM, aaron morton wrote:

>  there is no streaming anymore
>
> Nodes only bootstrap once, when they are first started.
>
> I have turned on the debug, this is what it is doing now(cpu is pretty
> much idle), no any error message.
>
> Looks like it is receiving writes and reads, looks like it's part of the
> ring.
>
> Is this ring output from the Joining node or from one of the others ? Do
> the other nodes
> see this node as up or joining ?
>
> When starting the node was there a log line with "Bootstrap variables" ?
>
> Anyways I would try running a nodetool repair -pr on the joining node. If
> you are not using QUOURM / QUOURM you maybe getting inconsistent results
> now.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 29/01/2013, at 9:51 AM, Daning Wang  wrote:
>
> I add a new node to ring(version 1.1.6), after more than 30 hours, it is
> still in the 'Joining' state
>
> Address DC  RackStatus State   Load
>  Effective-Ownership Token
>
>  141784319550391026443072753096570088105
> 10.28.78.123datacenter1 rack1   Up Normal  18.73 GB
>  50.00%  0
> 10.4.17.138 datacenter1 rack1   Up Normal  15 GB
> 39.29%  24305883351495604533098186245126300818
> 10.93.95.51 datacenter1 rack1   Up Normal  17.96 GB
>  41.67%  42535295865117307932921825928971026432
> 10.170.1.26 datacenter1 rack1   Up Joining 6.89 GB
> 0.00%   56713727820156410577229101238628035242
> 10.6.115.239datacenter1 rack1   Up Normal  20.3 GB
> 50.00%  85070591730234615865843651857942052864
> 10.28.20.200datacenter1 rack1   Up Normal  22.68 GB
>  60.71%  127605887595351923798765477786913079296
> 10.240.113.171  datacenter1 rack1   Up Normal  18.4 GB
> 58.33%  141784319550391026443072753096570088105
>
>
> since after a while, the cpu usage goes down to 0, looks it is stuck. I
> have restarted server several times in last 30 hours. when server is just
> started, you can see streaming in 'nodetool netstats', but after a few
> minutes, there is no streaming anymore
>
> I have turned on the debug, this is what it is doing now(cpu is pretty
> much idle), no any error message.
>
> Please help, I can provide more info if needed.
>
> Thanks in advance,
>
>
> DEBUG [MutationStage:17] 2013-01-28 12:47:59,618
> RowMutationVerbHandler.java (line 44) Applying RowMutation(keyspace='dsat',
> key='52f5298affbb8bf0', modifications=[ColumnFamily(dsatcache
> [_meta:false:278@1359406079725000!3888000,])])
> DEBUG [MutationStage:17] 2013-01-28 12:47:59,618 Table.java (line 395)
> applying mutation of row 52f5298affbb8bf0
> DEBUG [MutationStage:17] 2013-01-28 12:47:59,618
> RowMutationVerbHandler.java (line 56) RowMutation(keyspace='dsat',
> key='52f5298affbb8bf0', modifications=[ColumnFamily(dsatcache
> [_

Re: JNA not found.

2013-01-29 Thread Tim Dunphy
Hi Chandra,

Thanks for your reply. Well I have added both jna.jar and platform.jar to
my lib directory (jna 3.3.0):

[root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar lib/platform.jar
-rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
-rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar

But sadly I get the same result:

[root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
-Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
 INFO 12:14:52,493 Logging initialized
 INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
VM/1.6.0_34
 INFO 12:14:52,507 Heap size: 301727744/302776320
 INFO 12:14:52,508 Classpath:
/etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
Killed

And still when I remove those library files cassandra starts without a
problem exception the fact that it is not able to use JNA.

I'd appreciate any input the list might have.

Thanks
Tim

On Tue, Jan 29, 2013 at 8:54 AM, chandra Varahala <
hadoopandcassan...@gmail.com> wrote:

> I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder
>
> -chandra
>
>
>
> On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy  wrote:
>
>> I went to github to try to download jna again. I downloaded version 3.5.1
>>
>> [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
>> -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar
>>
>> I noticed in the datastax docs that java 7 was not recommended so I
>> downgraded to java 6
>>
>> [root@cassandra-node01 cassandrahome]# java -version
>> java version "1.6.0_34"
>> Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
>> Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)
>>
>> And now if I try to start cassandra with that library it fails with this
>> message:
>>
>> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
>> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
>> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>  INFO 22:00:14,318 Logging initialized
>>  INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
>> VM/1.6.0_34
>>  INFO 22:00:14,334 Heap size: 301727744/302776320
>>  INFO 22:00:14,334 Classpath:
>> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-

Re: Understanding Virtual Nodes on Cassandra 1.2

2013-01-29 Thread Zhong Li
One more question, can I add a virtual node manually without reboot and rebuild 
a host data?

I checked nodetool command, there is no option to add a node.

Thanks.

Zhong 


On Jan 29, 2013, at 11:09 AM, Zhong Li wrote:

> I was misunderstood this  
> http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2 , especially 
> "If you want to get started with vnodes on a fresh cluster, however, that is 
> fairly straightforward. Just don’t set the initial_token parameter in 
> yourconf/cassandra.yaml and instead enable the num_tokens parameter. A good 
> default value for this is 256"
> 
> Also I couldn't find document about set multiple tokens for 
> cassandra.inital_token 
> 
> Anyway, I just tested, it does work to set  comma separated list of tokens. 
> 
> Thanks,
> 
> Zhong
> 
> 
> On Jan 29, 2013, at 3:06 AM, aaron morton wrote:
> 
>>> After I searched some document on Datastax website and some old ticket, 
>>> seems that it works for random partitioner only, and leaves order preserved 
>>> partitioner out of the luck.
>> Links ? 
>> 
>>>   or allow add Virtual Nodes manually?
>> If not looked into it but there is a cassandra.inital_token startup param 
>> that takes a comma separated list of tokens for the node.
>> 
>> There also appears to be support for the ordered partitions to generate 
>> random tokens. 
>> 
>> But you would still have the problem of having to balance your row keys 
>> around the token space. 
>> 
>> Cheers
>>  
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> 
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 29/01/2013, at 10:31 AM, Zhong Li  wrote:
>> 
>>> Hi All,
>>> 
>>> Virtual Nodes is great feature. After I searched some document on Datastax 
>>> website and some old ticket, seems that it works for random partitioner 
>>> only, and leaves order preserved partitioner out of the luck. I may 
>>> misunderstand, please correct me. if it doesn't love order preserved 
>>> partitioner, would be possible to add support multiple initial_token(s) for 
>>>  order preserved partitioner  or allow add Virtual Nodes manually? 
>>> 
>>> Thanks,
>>> 
>>> Zhong
>> 
> 



Re: JNA not found.

2013-01-29 Thread Jabbar
Try downloading jna-3.5.1.jar and copying into the lib directory. I made
the same mistake :)
On Jan 29, 2013 5:20 PM, "Tim Dunphy"  wrote:

> Hi Chandra,
>
> Thanks for your reply. Well I have added both jna.jar and platform.jar to
> my lib directory (jna 3.3.0):
>
> [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar lib/platform.jar
> -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
> -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar
>
> But sadly I get the same result:
>
> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>  INFO 12:14:52,493 Logging initialized
>  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
> VM/1.6.0_34
>  INFO 12:14:52,507 Heap size: 301727744/302776320
>  INFO 12:14:52,508 Classpath:
> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
> Killed
>
> And still when I remove those library files cassandra starts without a
> problem exception the fact that it is not able to use JNA.
>
> I'd appreciate any input the list might have.
>
> Thanks
> Tim
>
> On Tue, Jan 29, 2013 at 8:54 AM, chandra Varahala <
> hadoopandcassan...@gmail.com> wrote:
>
>> I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder
>>
>> -chandra
>>
>>
>>
>> On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy wrote:
>>
>>> I went to github to try to download jna again. I downloaded version 3.5.1
>>>
>>> [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
>>> -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar
>>>
>>> I noticed in the datastax docs that java 7 was not recommended so I
>>> downgraded to java 6
>>>
>>> [root@cassandra-node01 cassandrahome]# java -version
>>> java version "1.6.0_34"
>>> Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)
>>>
>>> And now if I try to start cassandra with that library it fails with this
>>> message:
>>>
>>> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
>>> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
>>> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>>  INFO 22:00:14,318 Logging initialized
>>>  INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
>>> VM/1.6.0_34
>>>  INFO 22:00:14,334 Heap size: 301727744/302776320
>>>  INFO 22:00:14,334 Classpath:
>>> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-so

Re: JNA not found.

2013-01-29 Thread Jabbar
Oops you've already done that. Ive used the same methods for java 6 and
java 7.
On Jan 29, 2013 6:35 PM, "Jabbar"  wrote:

> Try downloading jna-3.5.1.jar and copying into the lib directory. I made
> the same mistake :)
> On Jan 29, 2013 5:20 PM, "Tim Dunphy"  wrote:
>
>> Hi Chandra,
>>
>> Thanks for your reply. Well I have added both jna.jar and platform.jar to
>> my lib directory (jna 3.3.0):
>>
>> [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar
>> lib/platform.jar
>> -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
>> -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar
>>
>> But sadly I get the same result:
>>
>> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
>> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
>> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>  INFO 12:14:52,493 Logging initialized
>>  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
>> VM/1.6.0_34
>>  INFO 12:14:52,507 Heap size: 301727744/302776320
>>  INFO 12:14:52,508 Classpath:
>> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> Killed
>>
>> And still when I remove those library files cassandra starts without a
>> problem exception the fact that it is not able to use JNA.
>>
>> I'd appreciate any input the list might have.
>>
>> Thanks
>> Tim
>>
>> On Tue, Jan 29, 2013 at 8:54 AM, chandra Varahala <
>> hadoopandcassan...@gmail.com> wrote:
>>
>>> I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder
>>>
>>> -chandra
>>>
>>>
>>>
>>> On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy wrote:
>>>
 I went to github to try to download jna again. I downloaded version
 3.5.1

 [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
 -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar

 I noticed in the datastax docs that java 7 was not recommended so I
 downgraded to java 6

 [root@cassandra-node01 cassandrahome]# java -version
 java version "1.6.0_34"
 Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)

 And now if I try to start cassandra with that library it fails with
 this message:

 [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
 xss =  -ea
 -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
 -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
  INFO 22:00:14,318 Logging initialized
  INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
 VM/1.6.0_34
  INFO 22:00:14,334 Heap size: 301727744/302776320
  INFO 22:00:14,334 Classpath:
 /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apa

Re: JNA not found.

2013-01-29 Thread Yogi Nerella
Chandra,

Try adding the following option, which may give you more info in the log or
console.

-Xcheck:jni

Do you have any custom c++ libraries using JNA interface?You should add
your custom libraries in LD_LIBRARY_PATH
or provide them in -Djava.library.path.

Yogi


On Tue, Jan 29, 2013 at 10:35 AM, Jabbar  wrote:

> Try downloading jna-3.5.1.jar and copying into the lib directory. I made
> the same mistake :)
> On Jan 29, 2013 5:20 PM, "Tim Dunphy"  wrote:
>
>> Hi Chandra,
>>
>> Thanks for your reply. Well I have added both jna.jar and platform.jar to
>> my lib directory (jna 3.3.0):
>>
>> [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar
>> lib/platform.jar
>> -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
>> -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar
>>
>> But sadly I get the same result:
>>
>> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
>> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
>> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>  INFO 12:14:52,493 Logging initialized
>>  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
>> VM/1.6.0_34
>>  INFO 12:14:52,507 Heap size: 301727744/302776320
>>  INFO 12:14:52,508 Classpath:
>> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> Killed
>>
>> And still when I remove those library files cassandra starts without a
>> problem exception the fact that it is not able to use JNA.
>>
>> I'd appreciate any input the list might have.
>>
>> Thanks
>> Tim
>>
>> On Tue, Jan 29, 2013 at 8:54 AM, chandra Varahala <
>> hadoopandcassan...@gmail.com> wrote:
>>
>>> I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder
>>>
>>> -chandra
>>>
>>>
>>>
>>> On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy wrote:
>>>
 I went to github to try to download jna again. I downloaded version
 3.5.1

 [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
 -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar

 I noticed in the datastax docs that java 7 was not recommended so I
 downgraded to java 6

 [root@cassandra-node01 cassandrahome]# java -version
 java version "1.6.0_34"
 Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)

 And now if I try to start cassandra with that library it fails with
 this message:

 [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
 xss =  -ea
 -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
 -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
 -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
  INFO 22:00:14,318 Logging initialized
  INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
 VM/1.6.0_34
  INFO 22:00:14,334 Heap size: 301727744/302776320
  INFO 22:00:14,334 Classpath:
 /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc

Re: Cass returns Incorrect column data on writes during flushing

2013-01-29 Thread Elden Bishop
Sure thing, Here is a console dump showing the error. Notice that column '9801' 
is NOT NULL on the first two queries but IS NULL on the last query. I get this 
behavior constantly on any writes that coincide with a flush. The column is 
always readable by itself but disappears depending on the other columns being 
queried.

$
$ bin/cqlsh –2
cqlsh>
cqlsh> SELECT '9801' FROM BUGS.Test WHERE KEY='a';

 9801
-
 0.02271159951509616

cqlsh> SELECT '9801','6814' FROM BUGS.Test WHERE KEY='a';

 9801| 6814
-+
 0.02271159951509616 | 0.6612351709326891

cqlsh> SELECT '9801','6814','' FROM BUGS.Test WHERE KEY='a';

 9801 | 6814   | 
--++
 null | 0.6612351709326891 | 0.8921380283891902

cqlsh> exit;
$
$

From: aaron morton mailto:aa...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 12:21 AM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Cass returns Incorrect column data on writes during flushing

Ie. Query for a single column works but the column does not appear in slice 
queries depending on the other columns in the query

cfq.getKey("foo").getColumn("A") returns "A"
cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"
Can you replicate this using cassandra-cli or CQL ?
Makes it clearer what's happening and removes any potential issues with the 
client or your code.
If you cannot repo it show you astynax code.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 1:15 PM, Elden Bishop 
mailto:ebis...@exacttarget.com>> wrote:

I'm trying to track down some really worrying behavior. It appears that writing 
multiple columns while a table flush is occurring can result in Cassandra 
recording its data in a way that makes columns visible only to some queries but 
not others.

Ie. Query for a single column works but the column does not appear in slice 
queries depending on the other columns in the query

cfq.getKey("foo").getColumn("A") returns "A"
cfq.getKey("foo").withColumnSlice("A", "B") returns "B" only
cfq.getKey("foo").withColumnSlice("A","B","C") returns "A","B" and "C"

This is a permanent condition meaning that even hours later with no reads or 
writes the DB will return the same results. I can reproduce this 100% of the 
time by writing multiple columns and then reading a different set of multiple 
columns. Columns written during the flush may or may not appear.

Details

# There are no log errors
# All single column queries return correct data.
# Slice queries may or may not return the column depending on which other 
columns are in the query.
# This is on a stock "unzip and run" installation of Cassandra using default 
options only; basically doing the cassandra getting started tutorial and using 
the Demo table described in that tutorial.
# Cassandra 1.2.0 using Astynax and Java 1.6.0_37.
# There are no errors but there is always a "flushing high traffic column 
family" that happens right before the incoherent state occurs
# to reproduce just update multiple columns at the same time, using random rows 
and then verify the writes by reading multiple columns. I get can generate the 
error on 100% of runs. Once the state is screwed up, the multi column read will 
not contain the column but the single column read will.

Log snippet
 INFO 15:47:49,066 GC for ParNew: 320 ms for 1 collections, 20712 used; max 
is 1052770304
 INFO 15:47:58,076 GC for ParNew: 330 ms for 1 collections, 232839680 used; max 
is 1052770304
 INFO 15:48:00,374 flushing high-traffic column family CFS(Keyspace='BUGS', 
ColumnFamily='Test') (estimated 50416978 bytes)
 INFO 15:48:00,374 Enqueuing flush of Memtable-Test@1575891161(4529586/50416978 
serialized/live bytes, 279197 ops)
 INFO 15:48:00,378 Writing Memtable-Test@1575891161(4529586/50416978 
serialized/live bytes, 279197 ops)
 INFO 15:48:01,142 GC for ParNew: 654 ms for 1 collections, 239478568 used; max 
is 1052770304
 INFO 15:48:01,474 Completed flushing 
/var/lib/cassandra/data/BUGS/Test/BUGS-Test-ia-45-Data.db (4580066 bytes) for 
commitlog position ReplayPosition(segmentId=1359415964165, position=7462737)


Any ideas on what could be going on? I could not find anything like this in the 
open bugs and the only workaround seems to be never doing multi-column reads or 
writes. I'm concerned that the DB can get into a state where different queries 
can return such inconsistent results. All with no warning or errors. There is 
no way to even verify data correctness; every column can seem correct when 
queried and then disappear during slice queries depending on the other columns 
in the query.


Than

trying to create encrypted ephemeral drive on Amazon

2013-01-29 Thread Brian Tarbox
I've heard that on Amazon EC2 I should be using ephemeral drives...but I
want/need to be using encrypted volumes.

On my local machine I use cryptsetup to encrypt a device and then mount it
and so on...but on Amazon I get the error:

"Cannot open device /dev/xvdb for read-only access".

Reading further I wonder if this is even possible based on this statement
in the Amazon doc set "*An instance store is dedicated to a particular
instance; however, the disk subsystem is shared among instances on a host
computer*"

How are other folks achieving performance and encryption on EC2?

Thanks.


Re: data not shown up after some time

2013-01-29 Thread aaron morton
> How can I check for this secondary index read fails?

Your description was that reads which use a secondary index (not the row key) 
failed…
> if I do a simple “list ;” the data is shown, but it I do a “get  
> where =’’;”


If you can retrieve the row using it's row key, but not via the secondary index 
( in your example) then the index is broken. 

If you are on pre 1.1.9 try upgrading. 

cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 8:19 PM, Matthias Zeilinger 
 wrote:

> How can I check for this secondary index read fails?
> Is it in the system.log or over the nodetool?
>  
> Br,
> Matthias Zeilinger
> Production Operation – Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B
> A-1030 Vienna
>  
> www.bwinparty.com
>  
> From: aaron morton [mailto:aa...@thelastpickle.com] 
> Sent: Dienstag, 29. Jänner 2013 08:04
> To: user@cassandra.apache.org
> Subject: Re: data not shown up after some time
>  
> If you are seeing failed secondary index reads you may be seeing this 
> https://issues.apache.org/jira/browse/CASSANDRA-5079
>  
> Cheers
>   
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 29/01/2013, at 3:31 AM, Matthias Zeilinger 
>  wrote:
> 
> 
> Hi,
>  
> No I have checked the TTL: 7776000
>  
> Very interesting is, if I do a simple “list ;” the data is shown, but it 
> I do a “get  where =’’;” it returns “0 Row Returned”.
>  
> How can that be?
>  
> Br,
> Matthias Zeilinger
> Production Operation – Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B
> A-1030 Vienna
>  
> www.bwinparty.com
>  
> From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
> Sent: Montag, 28. Jänner 2013 15:25
> To: user@cassandra.apache.org
> Subject: RE: data not shown up after some time
>  
> Are you sure your app is setting TTL correctly?
> TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
> What If you set by accident 777600 (10 times less) – that will be 9 days, 
> almost what you see.
>  
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
>  
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> Take a ride with Adform's Rich Media Suite
> 
> 
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
>  
> From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com] 
> Sent: Monday, January 28, 2013 15:57
> To: user@cassandra.apache.org
> Subject: data not shown up after some time
>  
> Hi,
>  
> I´m a simple operations guy and new to Cassandra.
> I have the problem that one of our application is writing data into Cassandra 
> (but not deleting them, because we should have a 90 days TTL).
> The application operates in 1 KS with 5 CF. my current setup:
>  
> 3 node cluster and KS has a RF of 3 (I know it´s not the best setup)
>  
> I can see now the problem that after 10 days most (nearly all) data are not 
> showing anymore in the cli and also our application cannot see the data.
> I assume that it has something to do with the gc_grace_seconds, it is set to 
> 10 days.
>  
> I have read many documentations about tombstones, but our application doesn´t 
> perform deletes.
> How can I see in the cli, if I row key has any tombstone or not.
>  
> Could it be that there are some ghost tombstones?
>  
> Thx for your help
>  
> Br,
> Matthias Zeilinger
> Production Operation – Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B
> A-1030 Vienna
>  
> www.bwinparty.com
>  



Re: Cassandra pending compaction tasks keeps increasing

2013-01-29 Thread aaron morton
> * Will try it tomorrow. Do I need to restart server to change the log level?
You can set it via JMX, and supposedly log4j is configured to watch the config 
file. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:36 PM, Wei Zhu  wrote:

> Thanks for the reply. Here is some information:
> 
> Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 
> 
> * I don't see any log about "wide rows"
> 
> Are you seeing GC activity logged or seeing CPU steal on a VM ? 
> 
> * There is some GC, but CPU general is under 20%. We have heap size of 8G, 
> RAM is at 72G.
> 
> Have you tried disabling multithreaded_compaction ? 
> 
> * By default, it's disabled. We enabled it, but doesn't see much difference. 
> Even a little slower with it's enabled. Is it bad to enable it? We have SSD, 
> according to comment in yaml, it should help while using SSD.
> 
> Are you using Key Caches ? Have you tried disabling 
> compaction_preheat_key_cache? 
> 
> * We have fairly big Key caches, we set as 10% of Heap which is 800M. Yes, 
> compaction_preheat_key_cache is disabled. 
> 
> Can you enabled DEBUG level logging and make them available ? 
> 
> * Will try it tomorrow. Do I need to restart server to change the log level?
> 
> 
> -Wei
> 
> - Original Message -
> From: "aaron morton" 
> To: user@cassandra.apache.org
> Sent: Monday, January 28, 2013 11:31:42 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> 
> 
> 
> 
> 
> 
> * Why nodetool repair increases the data size that much? It's not likely that 
> much data needs to be repaired. Will that happen for all the subsequent 
> repair? 
> Repair only detects differences in entire rows. If you have very wide rows 
> then small differences in rows can result in a large amount of streaming. 
> Streaming creates new SSTables on the receiving side, which then need to be 
> compacted. So repair often results in compaction doing it's thing for a 
> while. 
> 
> 
> 
> 
> 
> 
> 
> 
> * How to make LCS run faster? After almost a day, the LCS tasks only dropped 
> by 1000. I am afraid it will never catch up. We set 
> 
> 
> This is going to be tricky to diagnose, sorry for asking silly questions... 
> 
> 
> Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 
> Are you seeing GC activity logged or seeing CPU steal on a VM ? 
> Have you tried disabling multithreaded_compaction ? 
> Are you using Key Caches ? Have you tried disabling 
> compaction_preheat_key_cache? 
> Can you enabled DEBUG level logging and make them available ? 
> 
> 
> Cheers 
> 
> 
> 
> 
> 
> 
> 
> 
> - 
> Aaron Morton 
> Freelance Cassandra Developer 
> New Zealand 
> 
> 
> @aaronmorton 
> http://www.thelastpickle.com 
> 
> 
> On 29/01/2013, at 8:59 AM, Derek Williams < de...@fyrie.net > wrote: 
> 
> 
> 
> I could be wrong about this, but when repair is run, it isn't just values 
> that are streamed between nodes, it's entire sstables. This causes a lot of 
> duplicate data to be written which was already correct on the node, which 
> needs to be compacted away. 
> 
> 
> As for speeding it up, no idea. 
> 
> 
> 
> On Mon, Jan 28, 2013 at 12:16 PM, Wei Zhu < wz1...@yahoo.com > wrote: 
> 
> 
> Any thoughts? 
> 
> 
> Thanks. 
> -Wei 
> 
> - Original Message - 
> 
> From: "Wei Zhu" < wz1...@yahoo.com > 
> To: user@cassandra.apache.org 
> 
> Sent: Friday, January 25, 2013 10:09:37 PM 
> Subject: Re: Cassandra pending compaction tasks keeps increasing 
> 
> 
> 
> To recap the problem, 
> 1.1.6 on SSD, 5 nodes, RF = 3, one CF only. 
> After data load, initially all 5 nodes have very even data size (135G, each). 
> I ran nodetool repair -pr on node 1 which have replicates on node 2, node 3 
> since we set RF = 3. 
> It appears that huge amount of data got transferred. Node 1 has 220G, node 2, 
> 3 have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have 
> around 7K each. 
> Questions: 
> 
> * Why nodetool repair increases the data size that much? It's not likely that 
> much data needs to be repaired. Will that happen for all the subsequent 
> repair? 
> * How to make LCS run faster? After almost a day, the LCS tasks only dropped 
> by 1000. I am afraid it will never catch up. We set 
> 
> 
> * compaction_throughput_mb_per_sec = 500 
> * multithreaded_compaction: true 
> 
> 
> 
> Both Disk and CPU util are less than 10%. I understand LCS is single 
> threaded, any chance to speed it up? 
> 
> 
> * We use default SSTable size as 5M, Will increase the size of SSTable help? 
> What will happen if I change the setting after the data is loaded. 
> 
> 
> Any suggestion is very much appreciated. 
> 
> -Wei 
> 
> 
> - Original Message - 
> 
> From: "Wei Zhu" < wz1...@yahoo.com > 
> To: user@cassandra.apache.org 
> 
> Sent: Thursday, January 24, 2013 11:46:04 PM 
> Subject: Re: Cassandra pending compaction tasks keeps in

Re: ConfigHelper.setThriftContact() undefined in cassandra v1.2

2013-01-29 Thread aaron morton
> I am trying out the example given in Cassandra Definitive guide, Ch 12. 
That book may be out of date. 
You might be better off with info from 
http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration and 
http://wiki.apache.org/cassandra/HadoopSupport as well as the sample in the 
source distribution. 

> his statement gives error and I am not able to figure out the replacement for 
> it:
What is the error?

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:37 PM, Tejas Patil  wrote:

> I am trying out the example given in Cassandra Definitive guide, Ch 12. This 
> statement gives error and I am not able to figure out the replacement for it:
> ConfigHelper.setThriftContact(job.getConfiguration(), "localhost",  9160);
> 
> Also,
> 
> IColumn column = columns.get(columnName.getBytes());
> String value = new String(column.value());
> 
> column.value() gives compilation error. any solutions ?
> 
> Thanks,
> Tejas Patil



Re: getting error for decimal type data

2013-01-29 Thread aaron morton
The cli is probably trying to read more data than it can keep in memory. 
Try using the LIMIT clause for the list statement, or getting a single row, to 
reduce the size of the read.

Alternatively try increase the heap size for the cassandra-cli in 
bin/cassandra-cli

 
>   Built indexes: [STUDENT.STUDENT_AGE_idx, 
> STUDENT.STUDENT_BIG_DECIMAL_idx, STUDENT.STUDENT_PERCENTAGE_idx, 
> STUDENT.STUDENT_ROLL_NUMBER_idx, STUDENT.STUDENT_SEMESTER_idx,  
> STUDENT.STUDENT_STUDENT_NAME_idx, STUDENT.STUDENT_UNIQUE_ID_idx]
> 
> 
You have a lot of indexes there. Consider if they are all needed. 

cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 10:41 PM, Rishabh Agrawal  
wrote:

> Did u trt accessing this cf from CQL, I think it must work from there, also 
> try accessing it through any API and see if error persists.
>  
> Thanks
> Rishabh  Agrawal
> From: Kuldeep Mishra [mailto:kuld.cs.mis...@gmail.com] 
> Sent: Tuesday, January 29, 2013 2:51 PM
> To: user@cassandra.apache.org
> Subject: Re: getting error for decimal type data
>  
> ColumnFamily: STUDENT
>   Key Validation Class: org.apache.cassandra.db.marshal.LongType
>   Default column value validator: 
> org.apache.cassandra.db.marshal.BytesType
>   Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>   GC grace seconds: 864000
>   Compaction min/max thresholds: 4/32
>   Read repair chance: 0.1
>   DC Local Read repair chance: 0.0
>   Replicate on write: true
>   Caching: KEYS_ONLY
>   Bloom Filter FP chance: default
>   Built indexes: [STUDENT.STUDENT_AGE_idx, 
> STUDENT.STUDENT_BIG_DECIMAL_idx, STUDENT.STUDENT_PERCENTAGE_idx, 
> STUDENT.STUDENT_ROLL_NUMBER_idx, STUDENT.STUDENT_SEMESTER_idx,  
> STUDENT.STUDENT_STUDENT_NAME_idx, STUDENT.STUDENT_UNIQUE_ID_idx]
>   Column Metadata:
> Column Name: PERCENTAGE
>   Validation Class: org.apache.cassandra.db.marshal.FloatType
>   Index Name: STUDENT_PERCENTAGE_idx
>   Index Type: KEYS   
> Column Name: AGE
>   Validation Class: org.apache.cassandra.db.marshal.IntegerType
>   Index Name: STUDENT_AGE_idx
>   Index Type: KEYS
> Column Name: SEMESTER
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Index Name: STUDENT_SEMESTER_idx
>   Index Type: KEYS
> Column Name: ROLL_NUMBER
>   Validation Class: org.apache.cassandra.db.marshal.LongType
>   Index Name: STUDENT_ROLL_NUMBER_idx
>   Index Type: KEYS   
> Column Name: UNIQUE_ID
>   Validation Class: org.apache.cassandra.db.marshal.LongType
>   Index Name: STUDENT_UNIQUE_ID_idx
>   Index Type: KEYS
> Column Name: STUDENT_NAME
>   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>   Index Name: STUDENT_STUDENT_NAME_idx
>   Index Type: KEYS
> Column Name: BIG_DECIMAL
>   Validation Class: org.apache.cassandra.db.marshal.DecimalType
>   Index Name: STUDENT_BIG_DECIMAL_idx
>   Index Type: KEYS
>   Compaction Strategy: 
> org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
>   Compression Options:
> sstable_compression: org.apache.cassandra.io.compress.SnappyCompressor
> 
> 
> for value of BIG_DECIMAL is 2.28542855225E-825373481
> 
> 
> 
> Thanks 
> Kuldeep
> 
> On Tue, Jan 29, 2013 at 1:52 PM, Rishabh Agrawal 
>  wrote:
> Can you provide specs of the column family using describe.
>  
> From: Kuldeep Mishra [mailto:kuld.cs.mis...@gmail.com] 
> Sent: Tuesday, January 29, 2013 12:37 PM
> To: user@cassandra.apache.org
> Subject: getting error for decimal type data
>  
> while I an trying to list column family data using cassandra-cli then I am 
> getting following problem for decimal type data,
> any suggestion will be appreciated.  
> 
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
> at java.lang.AbstractStringBuilder.(AbstractStringBuilder.java:45)
> at java.lang.StringBuilder.(StringBuilder.java:80)
> at java.math.BigDecimal.getValueString(BigDecimal.java:2885)
> at java.math.BigDecimal.toPlainString(BigDecimal.java:2869)
> at 
> org.apache.cassandra.cql.jdbc.JdbcDecimal.getString(JdbcDecimal.java:72)
> at 
> org.apache.cassandra.db.marshal.DecimalType.getString(DecimalType.java:62)
> at org.apache.cassandra.cli.CliClient.printSliceList(CliClient.java:2873)
> at org.apache.cassandra.cli.CliClient.executeList(CliClient.java:1486)
> at 
> org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)
> at 
> org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:210)
> at org.apache.cassandra.cli.CliMain.main(CliMain.java:337)
> 
> 
> -- 
> Thanks and Regards
> Kuldeep Kumar Mishra
> +919540965199
>  
> 
> 
> 
> 
> 
> 
> NOTE: This message

Re: problem with Cassandra map-reduce support

2013-01-29 Thread aaron morton
Brian,
Could you raise a ticket at 
https://issues.apache.org/jira/browse/CASSANDRA ?

Thanks

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 30/01/2013, at 1:23 AM, Brian Jeltema  wrote:

> In hadoop-0.20.2, org.apache.hadoop.mapreduce.JobContext is a class. Looks 
> like in hadoop-0.21+ JobContext has morphed into an interface.
> I'd guess that Hadoop support in Cassandra is based on the older Hadoop.
> 
> Brian
> 
> On Jan 29, 2013, at 3:42 AM, Tejas Patil wrote:
> 
>> I am trying to run a map-reduce job to read data from Cassandra v1.2.0.
>> I started off with the code here:
>> https://svn.apache.org/repos/asf/cassandra/trunk/examples/hadoop_word_count/src/WordCount.java
>> 
>> While running it over hadoop-0.22.0, I get this:
>> 
>> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found 
>> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>>  at 
>> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
>>  at 
>> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
>>  at 
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
>>  at 
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
>>  at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
>>  at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
>>  at java.security.AccessController.doPrivileged(Native Method)
>>  at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
>>  at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
>>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
>>  at MyHadoopApp.run(MyHadoopApp.java:163)
>>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>>  at MyHadoopApp.main(MyHadoopApp.java:82)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:601)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
>> 
>> Does anyone knows about this ?
>> 
>> Thanks,
>> Tejas Patil
> 



Re: why set replica placement strategy at keyspace level ?

2013-01-29 Thread Manu Zhang

On Tue 29 Jan 2013 03:39:17 PM CST, aaron morton wrote:



  So If I write to CF Users with rowkey="dean"
and to CF Schedules with rowkey="dean", it is actually one row?

In my mental model that's correct.
A RowMutation is a row key and a collection of (internal) ColumnFamilies which 
contain the columns to write for a single CF.

This is the thing that is committed to the log, and then the changes in the 
ColumnFamilies are applied to each CF in an isolated way.


.(must have missed that several times in the
documentation).

http://wiki.apache.org/cassandra/FAQ#batch_mutate_atomic

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 9:28 AM, "Hiller, Dean"  wrote:


"If you write to 4 CF's with the same row key that is considered one
mutation"

Hm, I never considered this, never knew either.(very un-intuitive from
a user perspective IMHO).  So If I write to CF Users with rowkey="dean"
and to CF Schedules with rowkey="dean", it is actually one row?  (it's so
un-intuitive that I had to ask to make sure I am reading that correctly).

I guess I really don't have that case since most of my row keys are GUID's
anyways, but very interesting and unexpected (not sure I really mind, was
just taken aback)

Ps. Not sure I ever minded losting atomic commits to the same row across
CF's as I never expected it in the first place having used cassandra for
more than a year.(must have missed that several times in the
documentation).

Thanks,
Dean

On 1/28/13 12:41 PM, "aaron morton"  wrote:



Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

My mental model is:

cluster == database
keyspace == table
row == a row in a table
CF == a family of columns in one row

(I think that's different to others, but it works for me)


Is it important to store rows of different column families that share
the same row key to the same node?

Makes the failure models a little easier to understand. e.g. Everything
key for user "amorton" is either available or not.


Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Other than it's baked in?

We process all mutations for a row at the same time. If you write to 4
CF's with the same row key that is considered one mutation, for one row.
That one RowMutation is directed to the replicas using the
ReplicationStratagy and atomically applied to the commit log.

If you have RS per CF that one mutation would be split into 4, which
would then be sent to different replicas. Even if they went to the same
replicas they would be written to the commit log as different mutations.

So if you have RS per CF you lose atomic commits for writes to the same
row.

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 28/01/2013, at 11:22 PM, Manu Zhang  wrote:


On Mon 28 Jan 2013 04:42:49 PM CST, aaron morton wrote:

The row is the unit of replication, all values with the same storage
engine row key in a KS are on the same nodes. if they were per CF this
would not hold.

Not that it would be the end of the world, but that is the first thing
that comes to mind.

Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 27/01/2013, at 4:15 PM, Manu Zhang  wrote:


Although I've got to know Cassandra for quite a while, this question
only has occurred to me recently:

Why are the replica placement strategy and replica factors set at the
keyspace level?

Would setting them at the column family level offers more flexibility?

Is this because it's easier for user to manage an application? Or
related to internal implementation? Or it's just that I've overlooked
something?




Is it important to store rows of different column families that share
the same row key to the same node? AFAIK, Cassandra doesn't support get
all of them in a single call.

Meanwhile, what's the drawback of setting RPS and RF at column family
level?

Another thing that's been confusing me is that when we talk about the
data model should the row key be inside or outside a column family?

Thanks









From that wiki page, "mutations against a single key are atomic but not 
isolated". I think a row mutation is isolated now, but is it across 
column families? By the way, the wiki page really needs updating.


Re: ConfigHelper.setThriftContact() undefined in cassandra v1.2

2013-01-29 Thread Edward Capriolo
About as definitive as the word maybe. Oreilys seo keeps it close to top of
search results but it probably not the think you want.

On Tuesday, January 29, 2013, aaron morton  wrote:
> I am trying out the example given in Cassandra Definitive guide, Ch 12.
>
> That book may be out of date.
> You might be better off with info from
http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration
 and http://wiki.apache.org/cassandra/HadoopSupport as well as the sample
in the source distribution.
>
> his statement gives error and I am not able to figure out the replacement
for it:
>
> What is the error?
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> @aaronmorton
> http://www.thelastpickle.com
> On 29/01/2013, at 9:37 PM, Tejas Patil  wrote:
>
> I am trying out the example given in Cassandra Definitive guide, Ch 12.
This statement gives error and I am not able to figure out the replacement
for it:
> ConfigHelper.setThriftContact(job.getConfiguration(), "localhost",  9160);
> Also,
> IColumn column = columns.get(columnName.getBytes());
> String value = new String(column.value());
> column.value() gives compilation error. any solutions ?
> Thanks,
> Tejas Patil
>


Re: ConfigHelper.setThriftContact() undefined in cassandra v1.2

2013-01-29 Thread Tejas Patil
Hey Aaron,

It gives compilation errors saying that the method is undefined.

Thanks,
Tejas Patil


On Tue, Jan 29, 2013 at 4:17 PM, Edward Capriolo wrote:

>
> About as definitive as the word maybe. Oreilys seo keeps it close to top
> of search results but it probably not the think you want.
>
>
> On Tuesday, January 29, 2013, aaron morton 
> wrote:
> > I am trying out the example given in Cassandra Definitive guide, Ch 12.
> >
> > That book may be out of date.
> > You might be better off with info from
> http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration
>  and http://wiki.apache.org/cassandra/HadoopSupport as well as the sample
> in the source distribution.
> >
> > his statement gives error and I am not able to figure out the
> replacement for it:
> >
> > What is the error?
> > Cheers
> > -
> > Aaron Morton
> > Freelance Cassandra Developer
> > New Zealand
> > @aaronmorton
> > http://www.thelastpickle.com
> > On 29/01/2013, at 9:37 PM, Tejas Patil  wrote:
> >
> > I am trying out the example given in Cassandra Definitive guide, Ch 12.
> This statement gives error and I am not able to figure out the replacement
> for it:
> > ConfigHelper.setThriftContact(job.getConfiguration(), "localhost",
>  9160);
> > Also,
> > IColumn column = columns.get(columnName.getBytes());
> > String value = new String(column.value());
> > column.value() gives compilation error. any solutions ?
> > Thanks,
> > Tejas Patil
> >
>


Re: ConfigHelper.setThriftContact() undefined in cassandra v1.2

2013-01-29 Thread Michael Kjellman
Pretty sure you are looking for something like:

// thrift input job settings
ConfigHelper.setInputRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setInputInitialAddress(job.getConfiguration(), "127.0.0.1");
ConfigHelper.setInputPartitioner(job.getConfiguration(), "RandomPartitioner");

// thrift output job settings
ConfigHelper.setOutputRpcPort(job.getConfiguration(), "9160");
ConfigHelper.setOutputInitialAddress(job.getConfiguration(), "127.0.0.1");
ConfigHelper.setOutputPartitioner(job.getConfiguration(), "RandomPartitioner");

From: Tejas Patil mailto:tejas.patil...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 4:29 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: ConfigHelper.setThriftContact() undefined in cassandra v1.2

Hey Aaron,

It gives compilation errors saying that the method is undefined.

Thanks,
Tejas Patil


On Tue, Jan 29, 2013 at 4:17 PM, Edward Capriolo 
mailto:edlinuxg...@gmail.com>> wrote:

About as definitive as the word maybe. Oreilys seo keeps it close to top of 
search results but it probably not the think you want.


On Tuesday, January 29, 2013, aaron morton 
mailto:aa...@thelastpickle.com>> wrote:
> I am trying out the example given in Cassandra Definitive guide, Ch 12.
>
> That book may be out of date.
> You might be better off with info from 
> http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration and 
> http://wiki.apache.org/cassandra/HadoopSupport as well as the sample in the 
> source distribution.
>
> his statement gives error and I am not able to figure out the replacement for 
> it:
>
> What is the error?
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> @aaronmorton
> http://www.thelastpickle.com
> On 29/01/2013, at 9:37 PM, Tejas Patil 
> mailto:tejas.patil...@gmail.com>> wrote:
>
> I am trying out the example given in Cassandra Definitive guide, Ch 12. This 
> statement gives error and I am not able to figure out the replacement for it:
> ConfigHelper.setThriftContact(job.getConfiguration(), "localhost",  9160);
> Also,
> IColumn column = columns.get(columnName.getBytes());
> String value = new String(column.value());
> column.value() gives compilation error. any solutions ?
> Thanks,
> Tejas Patil
>



Re: problem with Cassandra map-reduce support

2013-01-29 Thread Tejas Patil
I really really need this running. I cannot get hadoop-0.20.2 tarball from
apache hadoop project website. Is there any place where I can get it ?

thanks,
Tejas Patil


On Tue, Jan 29, 2013 at 1:10 PM, aaron morton wrote:

> Brian,
> Could you raise a ticket at
> https://issues.apache.org/jira/browse/CASSANDRA ?
>
> Thanks
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 30/01/2013, at 1:23 AM, Brian Jeltema 
> wrote:
>
> In hadoop-0.20.2, org.apache.hadoop.mapreduce.JobContext is a class. Looks
> like in hadoop-0.21+ JobContext has morphed into an interface.
> I'd guess that Hadoop support in Cassandra is based on the older Hadoop.
>
> Brian
>
> On Jan 29, 2013, at 3:42 AM, Tejas Patil wrote:
>
> I am trying to run a map-reduce job to read data from Cassandra v1.2.0.
> I started off with the code here:
>
> https://svn.apache.org/repos/asf/cassandra/trunk/examples/hadoop_word_count/src/WordCount.java
>
> While running it over hadoop-0.22.0, I get this:
>
> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>  at
> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
>  at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
>  at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
> at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
>  at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
>  at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
> at MyHadoopApp.run(MyHadoopApp.java:163)
>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
> at MyHadoopApp.main(MyHadoopApp.java:82)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
>
> Does anyone knows about this ?
>
> Thanks,
> Tejas Patil
>
>
>
>


Re: problem with Cassandra map-reduce support

2013-01-29 Thread Edward Capriolo
http://archive.apache.org/dist/hadoop/core/ has older releases.

On Tue, Jan 29, 2013 at 8:08 PM, Tejas Patil wrote:

> I really really need this running. I cannot get hadoop-0.20.2 tarball
> from apache hadoop project website. Is there any place where I can get it ?
>
> thanks,
> Tejas Patil
>
>
> On Tue, Jan 29, 2013 at 1:10 PM, aaron morton wrote:
>
>> Brian,
>> Could you raise a ticket at
>> https://issues.apache.org/jira/browse/CASSANDRA ?
>>
>> Thanks
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 30/01/2013, at 1:23 AM, Brian Jeltema 
>> wrote:
>>
>> In hadoop-0.20.2, org.apache.hadoop.mapreduce.JobContext is a class.
>> Looks like in hadoop-0.21+ JobContext has morphed into an interface.
>> I'd guess that Hadoop support in Cassandra is based on the older Hadoop.
>>
>> Brian
>>
>> On Jan 29, 2013, at 3:42 AM, Tejas Patil wrote:
>>
>> I am trying to run a map-reduce job to read data from Cassandra v1.2.0.
>> I started off with the code here:
>>
>> https://svn.apache.org/repos/asf/cassandra/trunk/examples/hadoop_word_count/src/WordCount.java
>>
>> While running it over hadoop-0.22.0, I get this:
>>
>> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
>> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
>>  at
>> org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
>>  at
>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
>> at
>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
>>  at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
>> at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
>>  at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>  at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
>>  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
>> at MyHadoopApp.run(MyHadoopApp.java:163)
>>  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
>> at MyHadoopApp.main(MyHadoopApp.java:82)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>  at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:601)
>>  at org.apache.hadoop.util.RunJar.main(RunJar.java:192)
>>
>> Does anyone knows about this ?
>>
>> Thanks,
>> Tejas Patil
>>
>>
>>
>>
>


RE: change Storage port

2013-01-29 Thread ANAND_BALARAMAN
Hi

I tried setting the storage port in program using 
System.setProperty("cassandra.storage_port" , "7002").
Still not able to communicate with the ByteOrdered cluster. Seems like the port 
is still pointing to 7000. Not sure how to validate this setting.

Any inputs related to this would be really helpful.

Regards
Anand B

From: anand_balara...@homedepot.com [mailto:anand_balara...@homedepot.com]
Sent: Friday, January 25, 2013 3:54 PM
To: user@cassandra.apache.org
Subject: RE: change Storage port

Even I am facing a similar issue. I am using Bulkoutput format for loading data 
from Hadoop to Cassandra.

All I need is to set the storage_port explicitly in my job/program.
Can anyone help me in setting the storage_port configuration parameter from my 
java program?

Anand B

From: chandra Varahala [mailto:hadoopandcassan...@gmail.com]
Sent: Friday, January 25, 2013 3:33 PM
To: user@cassandra.apache.org
Subject: change Storage port

Hello,

I am using  two cassandra instance/cluster (random & byte order) on same server 
and trying load data from hadoop to Cassandra.
Both storage ports are configured  as 7000 & 7002  in yaml file.
Is there way i can pass storage port  from hadoop  driver  for specific 
instance/cluster.


thanks
chandra




The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.



The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.


Re: JNA not found.

2013-01-29 Thread chandra Varahala
we had this issue before, but after adding those two jar the  error gone.
We used 1.0.8 cassandra (JNA 3.3.0, JNA platform. 3.3.0).  what version
 cassnadra  you are using ?

-chandra


On Tue, Jan 29, 2013 at 12:19 PM, Tim Dunphy  wrote:

> Hi Chandra,
>
> Thanks for your reply. Well I have added both jna.jar and platform.jar to
> my lib directory (jna 3.3.0):
>
> [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar lib/platform.jar
> -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
> -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar
>
> But sadly I get the same result:
>
>
> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>  INFO 12:14:52,493 Logging initialized
>  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
> VM/1.6.0_34
>  INFO 12:14:52,507 Heap size: 301727744/302776320
>  INFO 12:14:52,508 Classpath:
> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
> Killed
>
> And still when I remove those library files cassandra starts without a
> problem exception the fact that it is not able to use JNA.
>
> I'd appreciate any input the list might have.
>
> Thanks
> Tim
>
>
> On Tue, Jan 29, 2013 at 8:54 AM, chandra Varahala <
> hadoopandcassan...@gmail.com> wrote:
>
>> I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder
>>
>> -chandra
>>
>>
>>
>> On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy wrote:
>>
>>> I went to github to try to download jna again. I downloaded version 3.5.1
>>>
>>> [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
>>> -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar
>>>
>>> I noticed in the datastax docs that java 7 was not recommended so I
>>> downgraded to java 6
>>>
>>> [root@cassandra-node01 cassandrahome]# java -version
>>> java version "1.6.0_34"
>>> Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
>>> Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)
>>>
>>> And now if I try to start cassandra with that library it fails with this
>>> message:
>>>
>>> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
>>> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
>>> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>>  INFO 22:00:14,318 Logging initialized
>>>  INFO 22:00:14,333 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
>>> VM/1.6.0_34
>>>  INFO 22:00:14,334 Heap size: 301727744/302776320
>>>  INFO 22:00:14,334 Classpath:
>>> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/e

Re: JNA not found.

2013-01-29 Thread Tim Dunphy
Hi Chandra,

 I'm using Cassandra 1.2.1 and jna/platform 3.5.1.

 One thing I should mention is that I tried putting the jar files into my
java jre/lib directory. The theory being those jars would be available to
all java apps. In that case Cassandra will start but still not recognize
JNA. If I copy the jars to the cassandra/lib directory, I have the same
crashing issue. Even if I symlink from the jre/lib directory to the
cassandra/lib directory the same issue occurs. It's like this version of
Cassandra can't stand having the jna jar in it's lib directory. I'm
beginning to wonder if if anyone has gotten JNA to work with this version
of cassandra and if so how. I've only tried a tarball install so far, I
can't say about the package install which may well work.

Thanks
Tim

On Tue, Jan 29, 2013 at 10:07 PM, chandra Varahala <
hadoopandcassan...@gmail.com> wrote:

> we had this issue before, but after adding those two jar the  error gone.
> We used 1.0.8 cassandra (JNA 3.3.0, JNA platform. 3.3.0).  what version
>  cassnadra  you are using ?
>
> -chandra
>
>
> On Tue, Jan 29, 2013 at 12:19 PM, Tim Dunphy  wrote:
>
>> Hi Chandra,
>>
>> Thanks for your reply. Well I have added both jna.jar and platform.jar to
>> my lib directory (jna 3.3.0):
>>
>> [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar
>> lib/platform.jar
>> -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar
>> -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar
>>
>> But sadly I get the same result:
>>
>>
>> [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f
>> xss =  -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M
>> -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
>>   INFO 12:14:52,493 Logging initialized
>>  INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
>> VM/1.6.0_34
>>  INFO 12:14:52,507 Heap size: 301727744/302776320
>>  INFO 12:14:52,508 Classpath:
>> /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar
>> Killed
>>
>> And still when I remove those library files cassandra starts without a
>> problem exception the fact that it is not able to use JNA.
>>
>> I'd appreciate any input the list might have.
>>
>> Thanks
>> Tim
>>
>>
>> On Tue, Jan 29, 2013 at 8:54 AM, chandra Varahala <
>> hadoopandcassan...@gmail.com> wrote:
>>
>>> I think you need Jna  jar and  jna-plaform jar in  cassandra lib folder
>>>
>>> -chandra
>>>
>>>
>>>
>>> On Mon, Jan 28, 2013 at 10:02 PM, Tim Dunphy wrote:
>>>
 I went to github to try to download jna again. I downloaded version
 3.5.1

 [root@cassandra-node01 cassandrahome]# ls -l lib/jna-3.5.1.jar
 -rw-r--r-- 1 root root 692603 Jan 28 21:57 lib/jna-3.5.1.jar

 I noticed in the datastax docs that java 7 was not recommended so I
 downgraded to java 6

 [root@cassandra-node01 cassandrahome]# java -version
 java version "1.6.0_34"
 Java(TM) SE Runtime Environment (build 1.6.0_34-b04)
 Java HotSpot(TM) 64-Bit Server VM (build 20.9-b04, mixed mode)

 And now if I try to start c

Re: How Cassandra guarantees the replicas if any node is down?

2013-01-29 Thread Michael Kjellman
Do get started look at:

HintedHandoff: http://wiki.apache.org/cassandra/HintedHandoff
Operations: http://wiki.apache.org/cassandra/Operations (specifically repair 
and repair –pr operations)

There should be a ton of information on this you can easily Google.

Best,
Michael

From: "dong.yajun" mailto:dongt...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 7:29 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: How Cassandra guarantees the replicas if any node is down?

Dear list,

I am newer to Cassandra, a question I'd like to know is how Cassandra 
guarantees the correct replica of data in a cluster when a node is crash? if 
anyone can give me some pointers, more appreciated.

--
Rick Dong



Poor key cache hit rate

2013-01-29 Thread Keith
Hi all,

I am running 1.1.9 with 2 data centers and 3 nodes each.  Recently I have been 
seeing a terrible key cache hit rate (around 1-3%) with a 98% row cache hit 
rate.  The seed node appears to take higher traffic than the other nodes 
(approximately twice) but I believe I have astyanax configured properly with 
ring describe and token aware.  Any ideas or steps on how to debug?  I also see 
high GC load.  Perhaps I need more nodes?

Re: How Cassandra guarantees the replicas if any node is down?

2013-01-29 Thread dong.yajun
thanks Michael.

I found it -  :)

---
Rick

On Wed, Jan 30, 2013 at 11:31 AM, Michael Kjellman
wrote:

> Do get started look at:
>
> HintedHandoff: http://wiki.apache.org/cassandra/HintedHandoff
> Operations: http://wiki.apache.org/cassandra/Operations (specifically
> repair and repair –pr operations)
>
> There should be a ton of information on this you can easily Google.
>
> Best,
> Michael
>
> From: "dong.yajun" 
> Reply-To: "user@cassandra.apache.org" 
> Date: Tuesday, January 29, 2013 7:29 PM
> To: "user@cassandra.apache.org" 
> Subject: How Cassandra guarantees the replicas if any node is down?
>
> Dear list,
>
> I am newer to Cassandra, a question I'd like to know is how Cassandra
> guarantees the correct replica of data in a cluster when a node is crash?
> if anyone can give me some pointers, more appreciated.
>
>
> --
> *Rick Dong *
>
>


-- 
*Ric Dong *
Newegg Ecommerce, MIS department


Is there any way to fetch all data efficiently from a column family?

2013-01-29 Thread dong.yajun
hey List,

I consider a way that can read all data from a column family, the following
is my thoughts:

1. make a snapshot for all nodes at the same time with a special column
family in a cluster,

2. copy these sstables to local disk from cassandra nodes.

3. compact these sstables to a single one,

4. parse the sstable to each rows.

My problem is the step2, assume that the replication factor is 3, then I
need to copy the data size is: (3 * number of bytes for all rows with this
column family), is there any proposals on this?

-- 
*Rick Dong *


Re: Is there any way to fetch all data efficiently from a column family?

2013-01-29 Thread Michael Kjellman
How often do you need to do this? How many rows in your column families?

If it's not a frequent operation you can just page the data n number of rows at 
a time using nothing special but C* and a driver.

Or another option is you can write a map/reduce job if you need an entire cf to 
be an input if you only need one cf to be your input. There are example Hadoop 
map/reduce jobs in the examples folder included with Cassandra. Or if you don't 
want to write a M/R job you could look into Pig.

Your method sounds a bit crazy IMHO and I'd definitely recommend against it. 
Better to let the database (C*) do it's thing. If you're super worried about 
more than 1 sstable you can do major compactions but that's not recommended as 
it will take a while to get a new sstable big enough to merge with the other 
big sstable. So avoid that unless you really know what you're doing which is 
what it sounds like your proposing in point 3 ;)

From: "dong.yajun" mailto:dongt...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 9:02 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Is there any way to fetch all data efficiently from a column family?

hey List,

I consider a way that can read all data from a column family, the following is 
my thoughts:

1. make a snapshot for all nodes at the same time with a special column family 
in a cluster,

2. copy these sstables to local disk from cassandra nodes.

3. compact these sstables to a single one,

4. parse the sstable to each rows.

My problem is the step2, assume that the replication factor is 3, then I need 
to copy the data size is: (3 * number of bytes for all rows with this column 
family), is there any proposals on this?

--
Rick Dong



Re: Is there any way to fetch all data efficiently from a column family?

2013-01-29 Thread dong.yajun
Thanks Michael.
*
*

*> *How many rows in your column families?
abort 500w rows, each row has abort 1k data.

> How often do you need to do this?
once a day.

> example Hadoop map/reduce jobs in the examples folder
thanks, I have saw the source code, it uses the *Thrift API* as the
recordReader to interate the rows,  I don't think it's a high performance
method.

> you could look into Pig
could you please describe more details in Pig?

> So avoid that unless you really know what you're doing which is what ...
the step is to purge the bombstones, another option is using the map/reduce
job to do  the purging things without major compactions.


Best

Rick.


On Wed, Jan 30, 2013 at 1:15 PM, Michael Kjellman
wrote:

> How often do you need to do this? How many rows in your column families?
>
> If it's not a frequent operation you can just page the data n number of
> rows at a time using nothing special but C* and a driver.
>
> Or another option is you can write a map/reduce job if you need an entire
> cf to be an input if you only need one cf to be your input. There are
> example Hadoop map/reduce jobs in the examples folder included with
> Cassandra. Or if you don't want to write a M/R job you could look into Pig.
>
> Your method sounds a bit crazy IMHO and I'd definitely recommend against
> it. Better to let the database (C*) do it's thing. If you're super worried
> about more than 1 sstable you can do major compactions but that's not
> recommended as it will take a while to get a new sstable big enough to
> merge with the other big sstable. So avoid that unless you really know what
> you're doing which is what it sounds like your proposing in point 3 ;)
>
> From: "dong.yajun" 
> Reply-To: "user@cassandra.apache.org" 
> Date: Tuesday, January 29, 2013 9:02 PM
> To: "user@cassandra.apache.org" 
> Subject: Is there any way to fetch all data efficiently from a column
> family?
>
> hey List,
>
> I consider a way that can read all data from a column family, the
> following is my thoughts:
>
> 1. make a snapshot for all nodes at the same time with a special column
> family in a cluster,
>
> 2. copy these sstables to local disk from cassandra nodes.
>
> 3. compact these sstables to a single one,
>
> 4. parse the sstable to each rows.
>
> My problem is the step2, assume that the replication factor is 3, then I
> need to copy the data size is: (3 * number of bytes for all rows with this
> column family), is there any proposals on this?
>
> --
> *Rick Dong *
>
>


-- 
*Ric Dong *
Newegg Ecommerce, MIS department


Re: Is there any way to fetch all data efficiently from a column family?

2013-01-29 Thread Michael Kjellman
Yes, wide rows, but doesn't seem horrible by any means. People have gotten by 
with Thrift for many many years in the community. If you are running this once 
a day doesn't sound like latency should be a major concern and I doubt the 
proto is going to be your primary bottleneck.

To answer your question about describing pig:
http://pig.apache.org -- "Apache Pig is a platform for analyzing large data 
sets that consists of a high-level language for expressing data analysis 
programs, coupled with infrastructure for evaluating these programs. The 
salient property of Pig programs is that their structure is amenable to 
substantial parallelization, which in turns enables them to handle very large 
data sets."

Pretty much, pig lets you write in Pig Latin to create Map-Reduce programs 
without writing an actual java Map Reduce program.

Here is a really old wiki article that really needs to be updated about the 
various Hadoop support built into C*: 
http://wiki.apache.org/cassandra/HadoopSupport

On your last point, compaction deals with tombstones yes but generally you only 
run minor compactions. A major compaction says, take every sstable for this cf 
and make one MASSIVE sstable from all the little sstables. This is different 
than standard C* operations. Map/Reduce doesn't purge anything and has nothing 
to do with compactions. It is just a somewhat sane idea I thought of to let you 
iterate over a large amount of data stored in C*, and conveniently C* provides 
Input and Output formats to Hadoop so you can do fun things like iterate over 
500w rows with 1k columns each.

Honestly, the best thing you can do is benchmark Hadoop and see how it will 
work for your work load and specific project requirements.

Best,
Michael

From: "dong.yajun" mailto:dongt...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 10:11 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Is there any way to fetch all data efficiently from a column 
family?

Thanks Michael.

> How many rows in your column families?
abort 500w rows, each row has abort 1k data.

> How often do you need to do this?
once a day.

> example Hadoop map/reduce jobs in the examples folder
thanks, I have saw the source code, it uses the Thrift API as the recordReader 
to interate the rows,  I don't think it's a high performance method.

> you could look into Pig
could you please describe more details in Pig?

> So avoid that unless you really know what you're doing which is what ...
the step is to purge the bombstones, another option is using the map/reduce job 
to do  the purging things without major compactions.


Best

Rick.

On Wed, Jan 30, 2013 at 1:15 PM, Michael Kjellman 
mailto:mkjell...@barracuda.com>> wrote:
How often do you need to do this? How many rows in your column families?

If it's not a frequent operation you can just page the data n number of rows at 
a time using nothing special but C* and a driver.

Or another option is you can write a map/reduce job if you need an entire cf to 
be an input if you only need one cf to be your input. There are example Hadoop 
map/reduce jobs in the examples folder included with Cassandra. Or if you don't 
want to write a M/R job you could look into Pig.

Your method sounds a bit crazy IMHO and I'd definitely recommend against it. 
Better to let the database (C*) do it's thing. If you're super worried about 
more than 1 sstable you can do major compactions but that's not recommended as 
it will take a while to get a new sstable big enough to merge with the other 
big sstable. So avoid that unless you really know what you're doing which is 
what it sounds like your proposing in point 3 ;)

From: "dong.yajun" mailto:dongt...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 9:02 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Is there any way to fetch all data efficiently from a column family?

hey List,

I consider a way that can read all data from a column family, the following is 
my thoughts:

1. make a snapshot for all nodes at the same time with a special column family 
in a cluster,

2. copy these sstables to local disk from cassandra nodes.

3. compact these sstables to a single one,

4. parse the sstable to each rows.

My problem is the step2, assume that the replication factor is 3, then I need 
to copy the data size is: (3 * number of bytes for all rows with this column 
family), is there any proposals on this?

--
Rick Dong




--
Ric Dong
Newegg Ecommerce, MIS department



Re: Is there any way to fetch all data efficiently from a column family?

2013-01-29 Thread Michael Kjellman
And finally to make wide rows with C* and Hadoop even better, these problems 
have already been solved by tickets such as (not inclusive):

https://issues.apache.org/jira/browse/CASSANDRA-3264
https://issues.apache.org/jira/browse/CASSANDRA-2878

And a nice more updated doc from the 1.1 branch from Datastax:
http://www.datastax.com/docs/1.1/cluster_architecture/hadoop_integration

From: Michael Kjellman mailto:mkjell...@barracuda.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 10:36 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Is there any way to fetch all data efficiently from a column 
family?

Yes, wide rows, but doesn't seem horrible by any means. People have gotten by 
with Thrift for many many years in the community. If you are running this once 
a day doesn't sound like latency should be a major concern and I doubt the 
proto is going to be your primary bottleneck.

To answer your question about describing pig:
http://pig.apache.org -- "Apache Pig is a platform for analyzing large data 
sets that consists of a high-level language for expressing data analysis 
programs, coupled with infrastructure for evaluating these programs. The 
salient property of Pig programs is that their structure is amenable to 
substantial parallelization, which in turns enables them to handle very large 
data sets."

Pretty much, pig lets you write in Pig Latin to create Map-Reduce programs 
without writing an actual java Map Reduce program.

Here is a really old wiki article that really needs to be updated about the 
various Hadoop support built into C*: 
http://wiki.apache.org/cassandra/HadoopSupport

On your last point, compaction deals with tombstones yes but generally you only 
run minor compactions. A major compaction says, take every sstable for this cf 
and make one MASSIVE sstable from all the little sstables. This is different 
than standard C* operations. Map/Reduce doesn't purge anything and has nothing 
to do with compactions. It is just a somewhat sane idea I thought of to let you 
iterate over a large amount of data stored in C*, and conveniently C* provides 
Input and Output formats to Hadoop so you can do fun things like iterate over 
500w rows with 1k columns each.

Honestly, the best thing you can do is benchmark Hadoop and see how it will 
work for your work load and specific project requirements.

Best,
Michael

From: "dong.yajun" mailto:dongt...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 10:11 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Is there any way to fetch all data efficiently from a column 
family?

Thanks Michael.

> How many rows in your column families?
abort 500w rows, each row has abort 1k data.

> How often do you need to do this?
once a day.

> example Hadoop map/reduce jobs in the examples folder
thanks, I have saw the source code, it uses the Thrift API as the recordReader 
to interate the rows,  I don't think it's a high performance method.

> you could look into Pig
could you please describe more details in Pig?

> So avoid that unless you really know what you're doing which is what ...
the step is to purge the bombstones, another option is using the map/reduce job 
to do  the purging things without major compactions.


Best

Rick.

On Wed, Jan 30, 2013 at 1:15 PM, Michael Kjellman 
mailto:mkjell...@barracuda.com>> wrote:
How often do you need to do this? How many rows in your column families?

If it's not a frequent operation you can just page the data n number of rows at 
a time using nothing special but C* and a driver.

Or another option is you can write a map/reduce job if you need an entire cf to 
be an input if you only need one cf to be your input. There are example Hadoop 
map/reduce jobs in the examples folder included with Cassandra. Or if you don't 
want to write a M/R job you could look into Pig.

Your method sounds a bit crazy IMHO and I'd definitely recommend against it. 
Better to let the database (C*) do it's thing. If you're super worried about 
more than 1 sstable you can do major compactions but that's not recommended as 
it will take a while to get a new sstable big enough to merge with the other 
big sstable. So avoid that unless you really know what you're doing which is 
what it sounds like your proposing in point 3 ;)

From: "dong.yajun" mailto:dongt...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Tuesday, January 29, 2013 9:02 PM
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Is there any way to fetch all data efficiently from a column family?

hey List,

I consider a 

RE: data not shown up after some time

2013-01-29 Thread Matthias Zeilinger
Hi,

Thx for the great support.
I have checked everything and after a rebuild_index all data were searchable. I 
will upgrade to 1.1.9 asap.

Many thx,

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH 
Marxergasse 1B
A-1030 Vienna

www.bwinparty.com 

-Original Message-
From: aaron morton [mailto:aa...@thelastpickle.com] 
Sent: Dienstag, 29. Jänner 2013 21:51
To: user@cassandra.apache.org
Subject: Re: data not shown up after some time

> How can I check for this secondary index read fails?

Your description was that reads which use a secondary index (not the row key) 
failed.
> if I do a simple "list ;" the data is shown, but it I do a "get  
> where ='';"


If you can retrieve the row using it's row key, but not via the secondary index 
( in your example) then the index is broken. 

If you are on pre 1.1.9 try upgrading. 

cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 8:19 PM, Matthias Zeilinger 
 wrote:

> How can I check for this secondary index read fails?
> Is it in the system.log or over the nodetool?
>  
> Br,
> Matthias Zeilinger
> Production Operation - Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B
> A-1030 Vienna
>  
> www.bwinparty.com
>  
> From: aaron morton [mailto:aa...@thelastpickle.com]
> Sent: Dienstag, 29. Jänner 2013 08:04
> To: user@cassandra.apache.org
> Subject: Re: data not shown up after some time
>  
> If you are seeing failed secondary index reads you may be seeing this 
> https://issues.apache.org/jira/browse/CASSANDRA-5079
>  
> Cheers
>   
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 29/01/2013, at 3:31 AM, Matthias Zeilinger 
>  wrote:
> 
> 
> Hi,
>  
> No I have checked the TTL: 7776000
>  
> Very interesting is, if I do a simple "list ;" the data is shown, but it 
> I do a "get  where ='';" it returns "0 Row Returned".
>  
> How can that be?
>  
> Br,
> Matthias Zeilinger
> Production Operation - Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B
> A-1030 Vienna
>  
> www.bwinparty.com
>  
> From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
> Sent: Montag, 28. Jänner 2013 15:25
> To: user@cassandra.apache.org
> Subject: RE: data not shown up after some time
>  
> Are you sure your app is setting TTL correctly?
> TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
> What If you set by accident 777600 (10 times less) - that will be 9 days, 
> almost what you see.
>  
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
>  
> Email: viktor.jevdoki...@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 
> Vilnius, Lithuania Follow us on Twitter: @adforminsider Take a ride 
> with Adform's Rich Media Suite  
> 
> Disclaimer: The information contained in this message and attachments is 
> intended solely for the attention and use of the named addressee and may be 
> confidential. If you are not the intended recipient, you are reminded that 
> the information remains the property of the sender. You must not use, 
> disclose, distribute, copy, print or rely on this e-mail. If you have 
> received this message in error, please contact the sender immediately and 
> irrevocably delete this message and any copies.
>  
> From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com]
> Sent: Monday, January 28, 2013 15:57
> To: user@cassandra.apache.org
> Subject: data not shown up after some time
>  
> Hi,
>  
> I´m a simple operations guy and new to Cassandra.
> I have the problem that one of our application is writing data into Cassandra 
> (but not deleting them, because we should have a 90 days TTL).
> The application operates in 1 KS with 5 CF. my current setup:
>  
> 3 node cluster and KS has a RF of 3 (I know it´s not the best setup)
>  
> I can see now the problem that after 10 days most (nearly all) data are not 
> showing anymore in the cli and also our application cannot see the data.
> I assume that it has something to do with the gc_grace_seconds, it is set to 
> 10 days.
>  
> I have read many documentations about tombstones, but our application doesn´t 
> perform deletes.
> How can I see in the cli, if I row key has any tombstone or not.
>  
> Could it be that there are some ghost tombstones?
>  
> Thx for your help
>  
> Br,
> Matthias Zeilinger
> Production Operation - Shared Services
>  
> P: +43 (0) 50 858-31185
> M: +43 (0) 664 85-34459
> E: matthias.zeilin...@bwinparty.com
>  
> bwin.party services (Austria) GmbH
> Marxergasse 1B