Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
On 19 September 2013 02:06, Jayadev Jayaraman  wrote:

We use vnodes with num_tokens = 256 ( 256 tokens per node ) . After loading
> some data with sstableloader , we find that the cluster is heavily
> imbalanced :
>

How did you select the tokens?  Is this a brand new cluster which started
on first boot with num_tokens = 256 and chose random tokens?  Or did you
start with num_tokens = 1 and then increase it?

Richard.


Row size in cfstats vs cfhistograms

2013-09-19 Thread Rene Kochen
Hi all,

I use Cassandra 1.0.11

If I do cfstats for a particular column family, I see a "Compacted row
maximum size" of 43388628

However, when I do a cfhistograms I do not see such a big row in the Row
Size column. The biggest row there is 126934.

Can someone explain this?

Thanks!

Rene


cqlsh startup error "Can't locate transport factory function cqlshlib.tfactory.regular_transport_factory"

2013-09-19 Thread Oisin Kim
Hi,

cqlsh stopped working for me recently, I'm unsure how / why it broke and I 
couldn't find anything from the mail archives (or google) that gave me an 
indication of how to fix the problem.

Here's the output I see when I have cassandra running locally (default config 
except using Random Partitioner) and try run cqlsh (running with --debug and 
with the local IP makes no difference) 

oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cqlsh
Can't locate transport factory function 
cqlshlib.tfactory.regular_transport_factory

I Installed cassandra 1.2.9 and Python 2.7.2 via brew and used pip to install 
cql.  I can connect via the cassandra-cli to create and view keyspaces etc 
without any issues.

Any help greatly appreciated, thanks.

Regards,
Oisin

Versions:

oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cassandra -v
xss =  -ea -javaagent:/usr/local/Cellar/cassandra/1.2.9/jamm-0.2.5.jar 
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4096M -Xmx4096M 
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError
1.2.9



oisin@/usr/local/Cellar: python -V
Python 2.7.2

oisin@/usr/local/Cellar: pip -V
pip 1.4.1 from /usr/local/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg 
(python 2.7)




Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Michał Michalski
I believe the reason is that cfhistograms tells you about the sizes of 
the rows returned by given node in a response to the read request, while 
cfstats tracks the largest row stored on given node.


M.

W dniu 19.09.2013 11:31, Rene Kochen pisze:

Hi all,

I use Cassandra 1.0.11

If I do cfstats for a particular column family, I see a "Compacted row
maximum size" of 43388628

However, when I do a cfhistograms I do not see such a big row in the Row
Size column. The biggest row there is 126934.

Can someone explain this?

Thanks!

Rene





Re: cqlsh startup error "Can't locate transport factory function cqlshlib.tfactory.regular_transport_factory"

2013-09-19 Thread Oisin Kim
Fixed this issue, for anyone else with this issue, it was that the version of 
Python installed via brew was 2.7.5 and needed to be put on the path as OS X 
has it's own version of python (2.7.2 currently).



On Thursday 19 September 2013 at 10:33, Oisin Kim wrote:

> Hi,
> 
> cqlsh stopped working for me recently, I'm unsure how / why it broke and I 
> couldn't find anything from the mail archives (or google) that gave me an 
> indication of how to fix the problem.
> 
> Here's the output I see when I have cassandra running locally (default config 
> except using Random Partitioner) and try run cqlsh (running with --debug and 
> with the local IP makes no difference) 
> 
> oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cqlsh
> Can't locate transport factory function 
> cqlshlib.tfactory.regular_transport_factory
> 
> I Installed cassandra 1.2.9 and Python 2.7.2 via brew and used pip to install 
> cql.  I can connect via the cassandra-cli to create and view keyspaces etc 
> without any issues.
> 
> Any help greatly appreciated, thanks.
> 
> Regards,
> Oisin
> 
> Versions:
> 
> oisin@/usr/local/Cellar/cassandra/1.2.9/bin: ./cassandra -v
> xss =  -ea -javaagent:/usr/local/Cellar/cassandra/1.2.9/jamm-0.2.5.jar 
> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4096M -Xmx4096M 
> -Xmn800M -XX:+HeapDumpOnOutOfMemoryError
> 1.2.9
> 
> 
> 
> oisin@/usr/local/Cellar: python -V
> Python 2.7.2
> 
> oisin@/usr/local/Cellar: pip -V
> pip 1.4.1 from /usr/local/lib/python2.7/site-packages/pip-1.4.1-py2.7.egg 
> (python 2.7)
> 
> 
> 
> 




Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Richard Low
On 19 September 2013 10:31, Rene Kochen  wrote:

I use Cassandra 1.0.11
>
> If I do cfstats for a particular column family, I see a "Compacted row
> maximum size" of 43388628
>
> However, when I do a cfhistograms I do not see such a big row in the Row
> Size column. The biggest row there is 126934.
>
> Can someone explain this?
>

The 'Row Size' column is showing the number of rows that have a size
indicated by the value in the 'Offset' column.  So if your output is like

Offset  Row Size
1131752  10
1358102  100

It means you have 100 rows with size between 1131752 and 1358102 bytes.  It
doesn't mean there are rows of size 100.

Richard.


Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Rene Kochen
And how does cfstats track the maximum size? What does "Compacted" mean in
"Compacted row maximum size".

Thanks again!

Rene


2013/9/19 Michał Michalski 

> I believe the reason is that cfhistograms tells you about the sizes of the
> rows returned by given node in a response to the read request, while
> cfstats tracks the largest row stored on given node.
>
> M.
>
> W dniu 19.09.2013 11:31, Rene Kochen pisze:
>
>  Hi all,
>>
>> I use Cassandra 1.0.11
>>
>> If I do cfstats for a particular column family, I see a "Compacted row
>> maximum size" of 43388628
>>
>> However, when I do a cfhistograms I do not see such a big row in the Row
>> Size column. The biggest row there is 126934.
>>
>> Can someone explain this?
>>
>> Thanks!
>>
>> Rene
>>
>>
>


Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Rene Kochen
That is indeed how I read it. The maximal size is 3 rows with an offset of
126934, while cfstats reports 43388628.

Thanks,

Rene


2013/9/19 Richard Low 

> On 19 September 2013 10:31, Rene Kochen  wrote:
>
> I use Cassandra 1.0.11
>>
>> If I do cfstats for a particular column family, I see a "Compacted row
>> maximum size" of 43388628
>>
>> However, when I do a cfhistograms I do not see such a big row in the Row
>> Size column. The biggest row there is 126934.
>>
>> Can someone explain this?
>>
>
> The 'Row Size' column is showing the number of rows that have a size
> indicated by the value in the 'Offset' column.  So if your output is like
>
> Offset  Row Size
> 1131752  10
> 1358102  100
>
> It means you have 100 rows with size between 1131752 and 1358102 bytes.
>  It doesn't mean there are rows of size 100.
>
> Richard.
>


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Hi Richard, 
This is a brand new cluster which started with num_tokens =256 on first boot 
and chose random tokens. The attached ring status is after data is loaded into 
the cluster for the first time using sdtableloader and remains that way even 
after Cassandra is restarted.

Thanks,
Suruchi

On Sep 19, 2013, at 3:46, Richard Low  wrote:

> On 19 September 2013 02:06, Jayadev Jayaraman  wrote:
> 
>> We use vnodes with num_tokens = 256 ( 256 tokens per node ) . After loading 
>> some data with sstableloader , we find that the cluster is heavily 
>> imbalanced : 
> 
> How did you select the tokens?  Is this a brand new cluster which started on 
> first boot with num_tokens = 256 and chose random tokens?  Or did you start 
> with num_tokens = 1 and then increase it?
> 
> Richard.


Reverse compaction on 1.1.11?

2013-09-19 Thread Michael Theroux
Hello,

Quick question.  Is there a tool that allows sstablesplit (reverse compaction) 
against 1.1.11 sstables?  I seem to recall a separate utility somewhere, but 
I'm having difficulty locating it,

Thanks,
-Mike

Re: Cannot get secondary indexes on fields in compound primary key to work (Cassandra 2.0.0)

2013-09-19 Thread Petter von Dolwitz (Hem)
For the record:

https://issues.apache.org/jira/browse/CASSANDRA-5975 (2.0.1) resolved this
issue for me.






2013/9/8 Petter von Dolwitz (Hem) 

> Thank you for you reply.
>
> I will look into this. I cannot not get my head around why the scenario I
> am describing does not work though. Should I report an issue around this or
> is this expected behaviour? A similar setup is described on this blog post
> by the development lead.
>
> http://www.datastax.com/dev/blog/cql3-for-cassandra-experts
>
>
>
>
> 2013/9/6 Robert Coli 
>
>> On Fri, Sep 6, 2013 at 6:18 AM, Petter von Dolwitz (Hem) <
>> petter.von.dolw...@gmail.com> wrote:
>>
>>> I am struggling with getting secondary indexes to work. I have created
>>> secondary indexes on some fields that are part of the compound primary key
>>> but only one of the indexes seems to work (the one set on the field 'e' on
>>> the table definition below). Using any other secondary index in a where
>>> clause causes the message "Request did not complete within rpc_timeout.".
>>> It seems like if a put a value in the where clause that does not exist in a
>>> column with secondary index then cassandra quickly return with the result
>>> (0 rows) but if a put in a value that do exist I get a timeout. There is no
>>> exception in the logs in connection with this. I've tried to increase the
>>> timeout to a minute but it does not help.
>>>
>>
>> In general unless you absolutely need the atomicity of the update of a
>> secondary index with the underlying storage row, you are better off making
>> a manual secondary index column family.
>>
>> =Rob
>>
>
>


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
I think what has happened is that Cassandra was started with num_tokens =
1, then shutdown and num_tokens set to 256.  When this happens, the first
time Cassandra chooses a single random token.  Then when restarted it
splits the token into 256 adjacent ranges.

You can see something like this has happened because the tokens for each
node are sequential.

The way to fix it is to, assuming you don't want the data, shutdown your
cluster, wipe the whole data and commitlog directories, then start
Cassandra again.

Richard.


On 19 September 2013 13:16, Suruchi Deodhar <
suruchi.deod...@generalsentiment.com> wrote:

> Hi Richard,
> This is a brand new cluster which started with num_tokens =256 on first
> boot and chose random tokens. The attached ring status is after data is
> loaded into the cluster for the first time using sdtableloader and remains
> that way even after Cassandra is restarted.
>
> Thanks,
> Suruchi
>
> On Sep 19, 2013, at 3:46, Richard Low  wrote:
>
> On 19 September 2013 02:06, Jayadev Jayaraman  wrote:
>
>  We use vnodes with num_tokens = 256 ( 256 tokens per node ) . After
>> loading some data with sstableloader , we find that the cluster is heavily
>> imbalanced :
>>
>
> How did you select the tokens?  Is this a brand new cluster which started
> on first boot with num_tokens = 256 and chose random tokens?  Or did you
> start with num_tokens = 1 and then increase it?
>
> Richard.
>
>


Re: Reverse compaction on 1.1.11?

2013-09-19 Thread Hiller, Dean
Can ou describe what you mean by reverse compaction?  I mean once you put
a row together and blow away sstables that contained it before, you can't
possibly know how to split it since that information is gone.

Perhaps you want the simple sstable2json script in the bin directory so
you can inspect the file?

Dean

On 9/19/13 7:21 AM, "Michael Theroux"  wrote:

>Hello,
>
>Quick question.  Is there a tool that allows sstablesplit (reverse
>compaction) against 1.1.11 sstables?  I seem to recall a separate utility
>somewhere, but I'm having difficulty locating it,
>
>Thanks,
>-Mike



Re: Reverse compaction on 1.1.11?

2013-09-19 Thread Nate McCall
See https://issues.apache.org/jira/browse/CASSANDRA-4766

The original gist posted by Rob therein might be helpful/work with earlier
versions (I have not tried).

Worst case, might be a good reason to upgrade to 1.2.x (if you suffering
pressure from a large SSTable, the additional offheap structures will help
a bunch and you may not need to split).


On Thu, Sep 19, 2013 at 8:21 AM, Michael Theroux wrote:

> Hello,
>
> Quick question.  Is there a tool that allows sstablesplit (reverse
> compaction) against 1.1.11 sstables?  I seem to recall a separate utility
> somewhere, but I'm having difficulty locating it,
>
> Thanks,
> -Mike


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 7:03 AM, Richard Low  wrote:

> I think what has happened is that Cassandra was started with num_tokens =
> 1, then shutdown and num_tokens set to 256.  When this happens, the first
> time Cassandra chooses a single random token.  Then when restarted it
> splits the token into 256 adjacent ranges.
>

Suruchi,

By which mechanism did you install Cassandra? I ask out of concern that
there may be an issue in the some packaging leading to the above sequence
of events.

=Rob


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread Nate McCall
As opposed to stopping compaction altogether, have you experimented with
turning down compaction_throughput_mb_per_sec (16mb default) and/or
explicitly setting concurrent_compactors (defaults to the number of cores,
iirc).


On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar wrote:

> Hi,
> In general leveled compaction are I/O heavy so when there are bunch of
> writes do we need to stop leveled compactions at all?
> I found the nodetool stop COMPACTION, which states it stops compaction
> happening, does this work for any type of compaction? Also it states in
> documents 'eventually cassandra restarts the compaction', isn't there a way
> to control when to start the compaction again manually ?
> If this is not applicable for leveled compactions in 1.2, then what can be
> used for stopping/restating those?
>
>
>
> Thanks,
> Rashmi
>


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Hi Robert,
I downloaded apache-cassandra-1.2.9.tar.gz from
http://cassandra.apache.org/download/ (
http://apache.mirrors.tds.net/cassandra/1.2.9/apache-cassandra-1.2.9-bin.tar.gz)
and installed it on the individual nodes of the cassandra cluster.
Thanks,
Suruchi


On Thu, Sep 19, 2013 at 12:35 PM, Robert Coli  wrote:

> On Thu, Sep 19, 2013 at 7:03 AM, Richard Low  wrote:
>
>> I think what has happened is that Cassandra was started with num_tokens =
>> 1, then shutdown and num_tokens set to 256.  When this happens, the first
>> time Cassandra chooses a single random token.  Then when restarted it
>> splits the token into 256 adjacent ranges.
>>
>
> Suruchi,
>
> By which mechanism did you install Cassandra? I ask out of concern that
> there may be an issue in the some packaging leading to the above sequence
> of events.
>
> =Rob
>


Re: Problem with counter columns

2013-09-19 Thread Robert Coli
On Wed, Sep 18, 2013 at 11:07 AM, Yulian Oifa  wrote:

> i am using counter columns in cassandra cluster with 3 nodes.
>

>
Current cassandra version is 0.8.10.
>
> How can i debug , find the problem
>

The problem is using Counters in Cassandra 0.8.

But seriously, I don't know whether the particular issue you describe is
fixed upstream. But if it isn't, no one will fix it in 0.8, so you should
probably...

1) upgrade to Cassandra 1.2.9 (note that you likely need to pass through
1.0/1.1)
2) attempt to reproduce
3) if you can, file a JIRA and update this thread with a link to it

=Rob


Re: Row size in cfstats vs cfhistograms

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 3:08 AM, Rene Kochen wrote:

> And how does cfstats track the maximum size? What does "Compacted" mean in
> "Compacted row maximum size".
>

That maximum size is "the largest row that I have encountered in the course
of compaction, since I started."

Hence "compacted," to try to indicate that it is not necessarily the row of
maximum size which currently exists. For example, if you had a huge row at
some time in the past and have now removed it (and have not restarted in
the interim) this value will be misleading.

=Rob


1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread rash aroskar
Hi,
In general leveled compaction are I/O heavy so when there are bunch of
writes do we need to stop leveled compactions at all?
I found the nodetool stop COMPACTION, which states it stops compaction
happening, does this work for any type of compaction? Also it states in
documents 'eventually cassandra restarts the compaction', isn't there a way
to control when to start the compaction again manually ?
If this is not applicable for leveled compactions in 1.2, then what can be
used for stopping/restating those?



Thanks,
Rashmi


Re: questions related to the SSTable file

2013-09-19 Thread Robert Coli
On Tue, Sep 17, 2013 at 6:51 PM, java8964 java8964 wrote:

> I thought I was clearer, but your clarification confused me again.
>


> But there is no way we can be sure that these SSTable files will ONLY
> contain modified data. So the statement being quoted above is not exactly
> right. I agree that all the modified data in that period will be in the
> incremental sstable files, but a lot of other unmodified data will be in
> them too.
>

The incremental backup directory only includes SSTables recently flushed
from memtables. It does not include SSTables created as a result of
compaction.

Memtables, by definition, only contain modified or new data. Yes, there is
one new copy per replica and the ones processed after the first might
appear "unmodified", which may be what you are talking about?

=Rob


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Hi Rob,
Do you suggest I should try with some other installation mechanism? Are
there any known problems with the tar installation of cassandra 1.2.9 that
I should be aware of? Please do let me know.
Thanks,
Suruchi


On Thu, Sep 19, 2013 at 1:04 PM, Suruchi Deodhar <
suruchi.deod...@generalsentiment.com> wrote:

> Hi Robert,
> I downloaded apache-cassandra-1.2.9.tar.gz from
> http://cassandra.apache.org/download/ (
> http://apache.mirrors.tds.net/cassandra/1.2.9/apache-cassandra-1.2.9-bin.tar.gz)
>  and installed it on the individual nodes of the cassandra cluster.
> Thanks,
> Suruchi
>
>
> On Thu, Sep 19, 2013 at 12:35 PM, Robert Coli wrote:
>
>> On Thu, Sep 19, 2013 at 7:03 AM, Richard Low  wrote:
>>
>>> I think what has happened is that Cassandra was started with num_tokens
>>> = 1, then shutdown and num_tokens set to 256.  When this happens, the first
>>> time Cassandra chooses a single random token.  Then when restarted it
>>> splits the token into 256 adjacent ranges.
>>>
>>
>> Suruchi,
>>
>> By which mechanism did you install Cassandra? I ask out of concern that
>> there may be an issue in the some packaging leading to the above sequence
>> of events.
>>
>> =Rob
>>
>
>


Re: What are the steps to go from SimpleSnitch to GossipingPropertyFileSnitch in a live cluster?

2013-09-19 Thread Juan Manuel Formoso
Just FYI, I did it with a rolling restart and everything worked great.


On Wed, Sep 18, 2013 at 5:01 PM, Juan Manuel Formoso wrote:

> Besides making sure the datacenter name is the same in the
> cassandra-rackdc.properties file and the one originally created (
> datacenter1), what else do I have to take into account?
>
> Can I do a rolling restart or should I kill the entire cluster and then
> startup one at a time?
>
> --
> *Juan Manuel Formoso
> *Senior Geek
> http://twitter.com/juanformoso
> http://seniorgeek.com.ar
> LLAP
>



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <
suruchi.deod...@generalsentiment.com> wrote:

> Do you suggest I should try with some other installation mechanism? Are
> there any known problems with the tar installation of cassandra 1.2.9 that
> I should be aware of?
>

I was asking in the context of this JIRA :

https://issues.apache.org/jira/browse/CASSANDRA-2356

Which does not seem to apply in your case!

=Rob


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread sankalp kohli
You cannot start level compaction. It will run based on data in each level.


On Thu, Sep 19, 2013 at 9:19 AM, Nate McCall  wrote:

> As opposed to stopping compaction altogether, have you experimented with
> turning down compaction_throughput_mb_per_sec (16mb default) and/or
> explicitly setting concurrent_compactors (defaults to the number of cores,
> iirc).
>
>
> On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
> wrote:
>
>> Hi,
>> In general leveled compaction are I/O heavy so when there are bunch of
>> writes do we need to stop leveled compactions at all?
>> I found the nodetool stop COMPACTION, which states it stops compaction
>> happening, does this work for any type of compaction? Also it states in
>> documents 'eventually cassandra restarts the compaction', isn't there a way
>> to control when to start the compaction again manually ?
>> If this is not applicable for leveled compactions in 1.2, then what can
>> be used for stopping/restating those?
>>
>>
>>
>> Thanks,
>> Rashmi
>>
>
>


Re: Rebalancing vnodes cluster

2013-09-19 Thread Robert Coli
On Wed, Sep 18, 2013 at 4:26 PM, Nimi Wariboko Jr
wrote:

> When I started with cassandra I had originally set it up to use tokens. I
> then migrated to vnodes (using shuffle), but my cluster isn't balanced (
> http://imgur.com/73eNhJ3).
>

Are you saying that (other than the imbalance that is the subject of this
thread) you were able to use "shuffle" successfully on a cluster with
~150gb per node?

1) How long did it take?
2) Did you experience any difficulties while doing so?
3) Have you run cleanup yet?
4) What version of Cassandra?

=Rob


Re: AssertionError: sstableloader

2013-09-19 Thread Yuki Morishita
Sounds like a bug.
Would you mind filing JIRA at https://issues.apache.org/jira/browse/CASSANDRA?

Thanks,

On Thu, Sep 19, 2013 at 2:12 PM, Vivek Mishra  wrote:
> Hi,
> I am trying to use sstableloader to load some external data and getting
> given below error:
> Established connection to initial hosts
> Opening sstables and calculating sections to stream
> Streaming relevant part of
> /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db to
> [/127.0.0.1]
> progress: [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg:
> 0MB/s)]Exception in thread "STREAM-OUT-/127.0.0.1" java.lang.AssertionError:
> Reference counter -1 for
> /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db
> at
> org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1017)
> at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:120)
> at
> org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:73)
> at
> org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:45)
> at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
> at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:384)
> at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:357)
> at java.lang.Thread.run(Thread.java:722)
>
>
> Any pointers?
>
> -Vivek



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
The only thing you need to guarantee is that Cassandra doesn't start with
num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the
data before starting it with higher num_tokens.


On 19 September 2013 19:07, Robert Coli  wrote:

> On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <
> suruchi.deod...@generalsentiment.com> wrote:
>
>> Do you suggest I should try with some other installation mechanism? Are
>> there any known problems with the tar installation of cassandra 1.2.9 that
>> I should be aware of?
>>
>
> I was asking in the context of this JIRA :
>
> https://issues.apache.org/jira/browse/CASSANDRA-2356
>
> Which does not seem to apply in your case!
>
> =Rob
>


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Thanks for your replies. I wiped out my data from the cluster and also
cleared the commitlog before restarting it with num_tokens=256. I then
uploaded data using sstableloader.

However, I am still not able to see a uniform distribution of data across
nodes of the clusters.

The output of the bin/nodetool -h localhost status commands looks like
follows. Some nodes have data as low as 1.12MB while some have as high as
912.57 MB.

Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host
ID   Rack
UN  10.238.133.174  856.66 MB  256 8.4%
e41d8863-ce37-4d5c-a428-bfacea432a35  1a
UN  10.238.133.97   439.02 MB  256 7.7%
1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
UN  10.151.86.146   1.05 GB256 8.0%
8952645d-4a27-4670-afb2-65061c205734  1a
UN  10.138.10.9 912.57 MB  256 8.6%
25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
UN  10.87.87.24070.85 MB   256 8.6%
ea066827-83bc-458c-83e8-bd15b7fc783c  1b
UN  10.93.5.157 60.56 MB   256 7.6%
4ab9111c-39b4-4d15-9401-359d9d853c16  1b
UN  10.92.231.170   866.73 MB  256 9.3%
a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
UN  10.238.137.250  533.77 MB  256 7.8%
84301648-afff-4f06-aa0b-4be421e0d08f  1a
UN  10.93.91.139478.45 KB  256 8.1%
682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
UN  10.138.2.20 1.12 MB256 7.9%
a6d4672a-0915-4c64-ba47-9f190abbf951  1a
UN  10.93.31.44 282.65 MB  256 7.8%
67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
UN  10.236.138.169  223.66 MB  256 9.1%
cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
UN  10.137.7.90 11.36 MB   256 7.4%
17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
UN  10.93.77.166837.64 MB  256 8.8%
9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
UN  10.120.249.140  838.59 MB  256 9.4%
e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
UN  10.90.246.128   216.75 MB  256 8.4%
054911ec-969d-43d9-aea1-db445706e4d2  1b
UN  10.123.95.248   147.1 MB   256 7.2%
a17deca1-9644-4520-9e62-ac66fc6fef60  1b
UN  10.136.11.404.24 MB256 8.5%
66be1173-b822-40b5-b650-cb38ae3c7a51  1a
UN  10.87.90.42 11.56 MB   256 8.0%
dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
UN  10.87.75.147549 MB 256 8.3%
ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
UN  10.151.49.88119.86 MB  256 8.9%
57043573-ab1b-4e3c-8044-58376f7ce08f  1a
UN  10.87.83.107484.3 MB   256 8.3%
0019439b-9f8a-4965-91b8-7108bbb55593  1b
UN  10.137.20.183   137.67 MB  256 8.4%
15951592-8ab2-473d-920a-da6e9d99507d  1a
UN  10.238.170.159  49.17 MB   256 9.4%
32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a

Is there something else that I should be doing differently?

Thanks for your help!

Suruchi



On Thu, Sep 19, 2013 at 3:20 PM, Richard Low  wrote:

> The only thing you need to guarantee is that Cassandra doesn't start with
> num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the
> data before starting it with higher num_tokens.
>
>
> On 19 September 2013 19:07, Robert Coli  wrote:
>
>> On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <
>> suruchi.deod...@generalsentiment.com> wrote:
>>
>>> Do you suggest I should try with some other installation mechanism? Are
>>> there any known problems with the tar installation of cassandra 1.2.9 that
>>> I should be aware of?
>>>
>>
>> I was asking in the context of this JIRA :
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-2356
>>
>> Which does not seem to apply in your case!
>>
>> =Rob
>>
>
>


AssertionError: sstableloader

2013-09-19 Thread Vivek Mishra
Hi,
I am trying to use sstableloader to load some external data and getting
given below error:
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of
/home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db to [/
127.0.0.1]
progress: [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg:
0MB/s)]Exception in thread "STREAM-OUT-/127.0.0.1"
java.lang.AssertionError: Reference counter -1 for
/home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db
at
org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1017)
at org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:120)
at
org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:73)
at
org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:45)
at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:384)
at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:357)
at java.lang.Thread.run(Thread.java:722)


Any pointers?

-Vivek


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread rash aroskar
Thanks for responses.
Nate - I haven't tried changing compaction_throughput_mb_per_sec. In my
cassandra.yaml I had set it to 32 to begin with. Do you think 32 can be too
much if the cassandra get once in a while writes but when it gets writes
its a big chunk together?


On Thu, Sep 19, 2013 at 12:33 PM, sankalp kohli wrote:

> You cannot start level compaction. It will run based on data in each
> level.
>
>
> On Thu, Sep 19, 2013 at 9:19 AM, Nate McCall wrote:
>
>> As opposed to stopping compaction altogether, have you experimented with
>> turning down compaction_throughput_mb_per_sec (16mb default) and/or
>> explicitly setting concurrent_compactors (defaults to the number of cores,
>> iirc).
>>
>>
>> On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
>> wrote:
>>
>>> Hi,
>>> In general leveled compaction are I/O heavy so when there are bunch of
>>> writes do we need to stop leveled compactions at all?
>>> I found the nodetool stop COMPACTION, which states it stops compaction
>>> happening, does this work for any type of compaction? Also it states in
>>> documents 'eventually cassandra restarts the compaction', isn't there a way
>>> to control when to start the compaction again manually ?
>>> If this is not applicable for leveled compactions in 1.2, then what can
>>> be used for stopping/restating those?
>>>
>>>
>>>
>>> Thanks,
>>> Rashmi
>>>
>>
>>
>


Re: how can i get the column value? Need help!.. cassandra 1.28 and pig 0.11.1

2013-09-19 Thread Cyril Scetbon
Hi,

Did you try to build 1.2.10 and to use it for your tests ? I've got the same 
issue and will give it a try as soon as it's released (expected at the end of 
the week).

Regards
-- 
Cyril SCETBON

On Sep 2, 2013, at 3:09 PM, Miguel Angel Martin junquera 
 wrote:

> hi all:
> 
> More info :
> 
> https://issues.apache.org/jira/browse/CASSANDRA-5941
> 
> 
> 
> I tried this (and gen. cassandra 1.2.9)  but do not work for me, 
> 
> git clone http://git-wip-us.apache.org/repos/asf/cassandra.git
> cd cassandra
> git checkout cassandra-1.2
> patch -p1 < 5867-bug-fix-filter-push-down-1.2-branch.txt
> ant
> 
> 
> 
> Miguel Angel Martín Junquera
> Analyst Engineer.
> miguelangel.mar...@brainsins.com
> 
> 
> 
> 2013/9/2 Miguel Angel Martin junquera 
> hi:
> 
> I test this in cassandra 1.2.9 new  version and the issue still persists .
> 
> :-(
> 
> 
> 
> 
> 
> 
> Miguel Angel Martín Junquera
> Analyst Engineer.
> miguelangel.mar...@brainsins.com
> 
> 
> 
> 2013/8/30 Miguel Angel Martin junquera 
> I try this:
> 
> rows = LOAD 
> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING 
> CqlStorage();
> dump rows;
> ILLUSTRATE rows;
> describe rows;
> 
> values2= FOREACH rows GENERATE  TOTUPLE (id) as (mycolumn:tuple(name,value));
> dump values2;
> describe values2;
> 
> But I get this results:
> 
> 
> 
> -
> | rows | id:chararray   | age:int   | title:chararray   | 
> -
> |  | (id, 6)| (age, 30) | (title, QA)   | 
> -
> 
> rows: {id: chararray,age: int,title: chararray}
> 2013-08-30 09:54:37,831 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable field schema: left is 
> "tuple_0:tuple(mycolumn:tuple(name:bytearray,value:bytearray))", right is 
> "org.apache.pig.builtin.totuple_id_1:tuple(id:chararray)"
> 
> 
> 
> 
> 
> or 
> 
> 
> 
> 
> 
> values2= FOREACH rows GENERATE  TOTUPLE (id) ;
> dump values2;
> describe values2;
> 
> 
> 
> and  the results are:
> 
> 
> ...
> (((id,6)))
> (((id,5)))
> values2: {org.apache.pig.builtin.totuple_id_8: (id: chararray)}
> 
> 
> 
> Aggg!
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Miguel Angel Martín Junquera
> Analyst Engineer.
> miguelangel.mar...@brainsins.com
> 
> 
> 
> 2013/8/28 Miguel Angel Martin junquera 
> hi:
> 
> I can not understand why the schema is  define like 
> "id:chararray,age:int,title:chararray"  and it does not define like tuples or 
> bag tuples,  if we have pair key-values  columns
> 
> 
> I try other time to change schema  but it does not work.
> 
> any ideas ...
> 
> perhaps, is the issue in the definition cql3 tables ?
> 
> regards
> 
> 
> 2013/8/28 Miguel Angel Martin junquera 
> hi all:
> 
> 
> Regards
> 
> Still i can resolve this issue. .
> 
> does anybody have this issue or try to test this simple example?
> 
> 
> i am stumped I can not find a solution working. 
> 
> I appreciate any comment or help
> 
> 
> 2013/8/22 Miguel Angel Martin junquera 
> hi all:
> 
> 
> 
> 
> I,m testing the new CqlStorage() with cassandra 1.28 and pig 0.11.1 
> 
> 
> I am using this sample data test:
> 
>  
> http://frommyworkshop.blogspot.com.es/2013/07/hadoop-map-reduce-with-cassandra.html
> 
> And I load and dump data Righ with this script:
> 
> rows = LOAD 
> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING 
> CqlStorage();
> 
> dump rows;
> describe rows;
> 
> resutls:
> 
> ((id,6),(age,30),(title,QA))
> ((id,5),(age,30),(title,QA))
> rows: {id: chararray,age: int,title: chararray}
> 
> 
> But i can not  get  the column values 
> 
> I try to define   another schemas in Load like I used with cassandraStorage()
> 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-Pig-how-to-get-column-values-td5641158.html
> 
> 
> example:
> 
> rows = LOAD 
> 'cql://keyspace1/test?page_size=1&split_size=4&where_clause=age%3D30' USING 
> CqlStorage() AS (columns: bag {T: tuple(name, value)});
> 
> 
> and I get this error:
> 
> 2013-08-22 12:24:45,426 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1031: Incompatable schema: left is 
> "columns:bag{T:tuple(name:bytearray,value:bytearray)}", right is 
> "id:chararray,age:int,title:chararray"
> 
> 
> 
> I try to use, FLATTEN, SUBSTRING, SPLIT UDF`s but i have not get good result:
> 
> Example:
> 
> when I flatten , I get a set of tuples like
> (title,QA)
> (title,QA)
> 2013-08-22 12:42:20,673 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input 
> paths to process : 1
> A: {title: chararray}
> 
> 
> but i can get value QA 
> 
> Sustring only works with title
> 
> 
> 
> example:
> 
> B = FOREACH A GENERATE SUBSTRING(title,2,5);
> 
> dump B;
> describe B;
> 
> 
> results:
> 
> (tle)
> (tle)
> B: {chararray}
> 
> 
> 
> i try, this like ERIC LEE inthe other mail  and have the same results:
> 
> 
>

Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Suruchi Deodhar
Yes, the key distribution does vary across the nodes. For example, on the
node with the highest data, Number of Keys (estimate) is 6527744 for a
particular column family, whereas for the same column family on the node
with least data, Number of Keys (estimate) = 3840.

Is there a way to control this distribution by setting some parameter of
cassandra.

I am using the Murmur3 partitioner with NetworkTopologyStrategy.

Thanks,
Suruchi



On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia wrote:

> Can you check cfstats to see number of keys per node?
>
>
> On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar <
> suruchi.deod...@generalsentiment.com> wrote:
>
>> Thanks for your replies. I wiped out my data from the cluster and also
>> cleared the commitlog before restarting it with num_tokens=256. I then
>> uploaded data using sstableloader.
>>
>> However, I am still not able to see a uniform distribution of data across
>> nodes of the clusters.
>>
>> The output of the bin/nodetool -h localhost status commands looks like
>> follows. Some nodes have data as low as 1.12MB while some have as high as
>> 912.57 MB.
>>
>> Datacenter: us-east
>> ===
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address Load   Tokens  Owns (effective)  Host
>> ID   Rack
>> UN  10.238.133.174  856.66 MB  256 8.4%
>> e41d8863-ce37-4d5c-a428-bfacea432a35  1a
>> UN  10.238.133.97   439.02 MB  256 7.7%
>> 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
>> UN  10.151.86.146   1.05 GB256 8.0%
>> 8952645d-4a27-4670-afb2-65061c205734  1a
>> UN  10.138.10.9 912.57 MB  256 8.6%
>> 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
>> UN  10.87.87.24070.85 MB   256 8.6%
>> ea066827-83bc-458c-83e8-bd15b7fc783c  1b
>> UN  10.93.5.157 60.56 MB   256 7.6%
>> 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
>> UN  10.92.231.170   866.73 MB  256 9.3%
>> a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
>> UN  10.238.137.250  533.77 MB  256 7.8%
>> 84301648-afff-4f06-aa0b-4be421e0d08f  1a
>> UN  10.93.91.139478.45 KB  256 8.1%
>> 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
>> UN  10.138.2.20 1.12 MB256 7.9%
>> a6d4672a-0915-4c64-ba47-9f190abbf951  1a
>> UN  10.93.31.44 282.65 MB  256 7.8%
>> 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
>> UN  10.236.138.169  223.66 MB  256 9.1%
>> cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
>> UN  10.137.7.90 11.36 MB   256 7.4%
>> 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
>> UN  10.93.77.166837.64 MB  256 8.8%
>> 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
>> UN  10.120.249.140  838.59 MB  256 9.4%
>> e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
>> UN  10.90.246.128   216.75 MB  256 8.4%
>> 054911ec-969d-43d9-aea1-db445706e4d2  1b
>> UN  10.123.95.248   147.1 MB   256 7.2%
>> a17deca1-9644-4520-9e62-ac66fc6fef60  1b
>> UN  10.136.11.404.24 MB256 8.5%
>> 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
>> UN  10.87.90.42 11.56 MB   256 8.0%
>> dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
>> UN  10.87.75.147549 MB 256 8.3%
>> ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
>> UN  10.151.49.88119.86 MB  256 8.9%
>> 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
>> UN  10.87.83.107484.3 MB   256 8.3%
>> 0019439b-9f8a-4965-91b8-7108bbb55593  1b
>> UN  10.137.20.183   137.67 MB  256 8.4%
>> 15951592-8ab2-473d-920a-da6e9d99507d  1a
>> UN  10.238.170.159  49.17 MB   256 9.4%
>> 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
>>
>> Is there something else that I should be doing differently?
>>
>> Thanks for your help!
>>
>> Suruchi
>>
>>
>>
>> On Thu, Sep 19, 2013 at 3:20 PM, Richard Low  wrote:
>>
>>> The only thing you need to guarantee is that Cassandra doesn't start
>>> with num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all
>>> the data before starting it with higher num_tokens.
>>>
>>>
>>> On 19 September 2013 19:07, Robert Coli  wrote:
>>>
 On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <
 suruchi.deod...@generalsentiment.com> wrote:

> Do you suggest I should try with some other installation mechanism?
> Are there any known problems with the tar installation of cassandra 1.2.9
> that I should be aware of?
>

 I was asking in the context of this JIRA :

 https://issues.apache.org/jira/browse/CASSANDRA-2356

 Which does not seem to apply in your case!

 =Rob

>>>
>>>
>>
>


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Mohit Anchlia
Can you check cfstats to see number of keys per node?

On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar <
suruchi.deod...@generalsentiment.com> wrote:

> Thanks for your replies. I wiped out my data from the cluster and also
> cleared the commitlog before restarting it with num_tokens=256. I then
> uploaded data using sstableloader.
>
> However, I am still not able to see a uniform distribution of data across
> nodes of the clusters.
>
> The output of the bin/nodetool -h localhost status commands looks like
> follows. Some nodes have data as low as 1.12MB while some have as high as
> 912.57 MB.
>
> Datacenter: us-east
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens  Owns (effective)  Host
> ID   Rack
> UN  10.238.133.174  856.66 MB  256 8.4%
> e41d8863-ce37-4d5c-a428-bfacea432a35  1a
> UN  10.238.133.97   439.02 MB  256 7.7%
> 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
> UN  10.151.86.146   1.05 GB256 8.0%
> 8952645d-4a27-4670-afb2-65061c205734  1a
> UN  10.138.10.9 912.57 MB  256 8.6%
> 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
> UN  10.87.87.24070.85 MB   256 8.6%
> ea066827-83bc-458c-83e8-bd15b7fc783c  1b
> UN  10.93.5.157 60.56 MB   256 7.6%
> 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
> UN  10.92.231.170   866.73 MB  256 9.3%
> a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
> UN  10.238.137.250  533.77 MB  256 7.8%
> 84301648-afff-4f06-aa0b-4be421e0d08f  1a
> UN  10.93.91.139478.45 KB  256 8.1%
> 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
> UN  10.138.2.20 1.12 MB256 7.9%
> a6d4672a-0915-4c64-ba47-9f190abbf951  1a
> UN  10.93.31.44 282.65 MB  256 7.8%
> 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
> UN  10.236.138.169  223.66 MB  256 9.1%
> cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
> UN  10.137.7.90 11.36 MB   256 7.4%
> 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
> UN  10.93.77.166837.64 MB  256 8.8%
> 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
> UN  10.120.249.140  838.59 MB  256 9.4%
> e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
> UN  10.90.246.128   216.75 MB  256 8.4%
> 054911ec-969d-43d9-aea1-db445706e4d2  1b
> UN  10.123.95.248   147.1 MB   256 7.2%
> a17deca1-9644-4520-9e62-ac66fc6fef60  1b
> UN  10.136.11.404.24 MB256 8.5%
> 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
> UN  10.87.90.42 11.56 MB   256 8.0%
> dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
> UN  10.87.75.147549 MB 256 8.3%
> ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
> UN  10.151.49.88119.86 MB  256 8.9%
> 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
> UN  10.87.83.107484.3 MB   256 8.3%
> 0019439b-9f8a-4965-91b8-7108bbb55593  1b
> UN  10.137.20.183   137.67 MB  256 8.4%
> 15951592-8ab2-473d-920a-da6e9d99507d  1a
> UN  10.238.170.159  49.17 MB   256 9.4%
> 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
>
> Is there something else that I should be doing differently?
>
> Thanks for your help!
>
> Suruchi
>
>
>
> On Thu, Sep 19, 2013 at 3:20 PM, Richard Low  wrote:
>
>> The only thing you need to guarantee is that Cassandra doesn't start with
>> num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all the
>> data before starting it with higher num_tokens.
>>
>>
>> On 19 September 2013 19:07, Robert Coli  wrote:
>>
>>> On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <
>>> suruchi.deod...@generalsentiment.com> wrote:
>>>
 Do you suggest I should try with some other installation mechanism? Are
 there any known problems with the tar installation of cassandra 1.2.9 that
 I should be aware of?

>>>
>>> I was asking in the context of this JIRA :
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-2356
>>>
>>> Which does not seem to apply in your case!
>>>
>>> =Rob
>>>
>>
>>
>


Storing binary blobs data in Cassandra Column family?

2013-09-19 Thread Raihan Jamal
I need to store binary byte data in Cassandra column family in all my
columns. Each columns will have its own binary byte data. Below is the code
where I will be getting binary byte data. My rowKey is going to be String
but all my columns has to store binary blobs data.

GenericDatumWriter writer = new
GenericDatumWriter(schema);
ByteArrayOutputStream os = new ByteArrayOutputStream();
Encoder e = EncoderFactory.get().binaryEncoder(os, null);
writer.write(record, e);
e.flush();
byte[] byteData = os.toByteArray();
os.close();
  // write byteData in Cassandra for the columns


I am not sure what should be the right way to create the Cassandra column
family for the above use case? Below is the column family, I have created
but I am not sure this is the right way to do that for above use case?

create column family TESTING
with key_validation_class = 'UTF8Type'
and comparator = 'BytesType'
and default_validation_class = 'UTF8Type'
and gc_grace = 86400
and column_metadata = [ {column_name : 'lmd', validation_class :
DateType}];




*Raihan Jamal*


Re: 1.2 leveled compactions can affect big bunch of writes? how to stop/restart them?

2013-09-19 Thread Juan Manuel Formoso
concurrent_compactors is ignored when using leveled compactions


On Thu, Sep 19, 2013 at 1:19 PM, Nate McCall  wrote:

> As opposed to stopping compaction altogether, have you experimented with
> turning down compaction_throughput_mb_per_sec (16mb default) and/or
> explicitly setting concurrent_compactors (defaults to the number of cores,
> iirc).
>
>
> On Thu, Sep 19, 2013 at 10:58 AM, rash aroskar 
> wrote:
>
>> Hi,
>> In general leveled compaction are I/O heavy so when there are bunch of
>> writes do we need to stop leveled compactions at all?
>> I found the nodetool stop COMPACTION, which states it stops compaction
>> happening, does this work for any type of compaction? Also it states in
>> documents 'eventually cassandra restarts the compaction', isn't there a way
>> to control when to start the compaction again manually ?
>> If this is not applicable for leveled compactions in 1.2, then what can
>> be used for stopping/restating those?
>>
>>
>>
>> Thanks,
>> Rashmi
>>
>
>


-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Richard Low
On 19 September 2013 20:36, Suruchi Deodhar <
suruchi.deod...@generalsentiment.com> wrote:

> Thanks for your replies. I wiped out my data from the cluster and also
> cleared the commitlog before restarting it with num_tokens=256. I then
> uploaded data using sstableloader.
>
> However, I am still not able to see a uniform distribution of data across
> nodes of the clusters.
>
> The output of the bin/nodetool -h localhost status commands looks like
> follows. Some nodes have data as low as 1.12MB while some have as high as
> 912.57 MB.
>

Now the 'Owns (effective)' column is showing the tokens are roughly
balanced.  So now the problem is the data isn't uniform - either you have
some rows much larger than others or some nodes are missing data that could
be replicated by running repair.

Richard.


Re: AssertionError: sstableloader

2013-09-19 Thread Vivek Mishra
More to add on this:

This is happening for column families created via CQL3 with collection type
columns and without "WITH COMPACT STORAGE".


On Fri, Sep 20, 2013 at 12:51 AM, Yuki Morishita  wrote:

> Sounds like a bug.
> Would you mind filing JIRA at
> https://issues.apache.org/jira/browse/CASSANDRA?
>
> Thanks,
>
> On Thu, Sep 19, 2013 at 2:12 PM, Vivek Mishra 
> wrote:
> > Hi,
> > I am trying to use sstableloader to load some external data and getting
> > given below error:
> > Established connection to initial hosts
> > Opening sstables and calculating sections to stream
> > Streaming relevant part of
> > /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db to
> > [/127.0.0.1]
> > progress: [/127.0.0.1 1/1 (100%)] [total: 100% - 0MB/s (avg:
> > 0MB/s)]Exception in thread "STREAM-OUT-/127.0.0.1"
> java.lang.AssertionError:
> > Reference counter -1 for
> > /home/impadmin/source/Examples/data/Demo/Users/Demo-Users-ja-1-Data.db
> > at
> >
> org.apache.cassandra.io.sstable.SSTableReader.releaseReference(SSTableReader.java:1017)
> > at
> org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:120)
> > at
> >
> org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:73)
> > at
> >
> org.apache.cassandra.streaming.messages.FileMessage$1.serialize(FileMessage.java:45)
> > at
> >
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
> > at
> >
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:384)
> > at
> >
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:357)
> > at java.lang.Thread.run(Thread.java:722)
> >
> >
> > Any pointers?
> >
> > -Vivek
>
>
>
> --
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>


Re: Decomissioning a datacenter

2013-09-19 Thread Juan Manuel Formoso
Not forever, while I decommission the nodes I assume. What I don't
understand is the wording "no longer reference"


On Thu, Sep 19, 2013 at 6:17 PM, Robert Coli  wrote:

> On Thu, Sep 19, 2013 at 1:52 PM, Juan Manuel Formoso 
> wrote:
>
>>
>> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_decomission_dc_t.html
>>
>> When it says "Change all keyspaces so they no longer reference the data
>> center being removed.", does that mean setting my replication_strategy so
>> that datacenter1:0,datacenter2:N ? (assuming I'm removing datacenter1)
>>
>
> I would presume it means remove datacenter1 entirely, not set it to 0
> forever.
>
> =Rob
>
>



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Decomissioning a datacenter

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 1:52 PM, Juan Manuel Formoso wrote:

>
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_decomission_dc_t.html
>
> When it says "Change all keyspaces so they no longer reference the data
> center being removed.", does that mean setting my replication_strategy so
> that datacenter1:0,datacenter2:N ? (assuming I'm removing datacenter1)
>

I would presume it means remove datacenter1 entirely, not set it to 0
forever.

=Rob


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Mohit Anchlia
Can you run nodetool repair on all the nodes first and look at the keys?

On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar <
suruchi.deod...@generalsentiment.com> wrote:

> Yes, the key distribution does vary across the nodes. For example, on the
> node with the highest data, Number of Keys (estimate) is 6527744 for a
> particular column family, whereas for the same column family on the node
> with least data, Number of Keys (estimate) = 3840.
>
> Is there a way to control this distribution by setting some parameter of
> cassandra.
>
> I am using the Murmur3 partitioner with NetworkTopologyStrategy.
>
> Thanks,
> Suruchi
>
>
>
> On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia wrote:
>
>> Can you check cfstats to see number of keys per node?
>>
>>
>> On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar <
>> suruchi.deod...@generalsentiment.com> wrote:
>>
>>> Thanks for your replies. I wiped out my data from the cluster and also
>>> cleared the commitlog before restarting it with num_tokens=256. I then
>>> uploaded data using sstableloader.
>>>
>>> However, I am still not able to see a uniform distribution of data
>>> across nodes of the clusters.
>>>
>>> The output of the bin/nodetool -h localhost status commands looks like
>>> follows. Some nodes have data as low as 1.12MB while some have as high as
>>> 912.57 MB.
>>>
>>> Datacenter: us-east
>>> ===
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address Load   Tokens  Owns (effective)  Host
>>> ID   Rack
>>> UN  10.238.133.174  856.66 MB  256 8.4%
>>> e41d8863-ce37-4d5c-a428-bfacea432a35  1a
>>> UN  10.238.133.97   439.02 MB  256 7.7%
>>> 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
>>> UN  10.151.86.146   1.05 GB256 8.0%
>>> 8952645d-4a27-4670-afb2-65061c205734  1a
>>> UN  10.138.10.9 912.57 MB  256 8.6%
>>> 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
>>> UN  10.87.87.24070.85 MB   256 8.6%
>>> ea066827-83bc-458c-83e8-bd15b7fc783c  1b
>>> UN  10.93.5.157 60.56 MB   256 7.6%
>>> 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
>>> UN  10.92.231.170   866.73 MB  256 9.3%
>>> a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
>>> UN  10.238.137.250  533.77 MB  256 7.8%
>>> 84301648-afff-4f06-aa0b-4be421e0d08f  1a
>>> UN  10.93.91.139478.45 KB  256 8.1%
>>> 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
>>> UN  10.138.2.20 1.12 MB256 7.9%
>>> a6d4672a-0915-4c64-ba47-9f190abbf951  1a
>>> UN  10.93.31.44 282.65 MB  256 7.8%
>>> 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
>>> UN  10.236.138.169  223.66 MB  256 9.1%
>>> cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
>>> UN  10.137.7.90 11.36 MB   256 7.4%
>>> 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
>>> UN  10.93.77.166837.64 MB  256 8.8%
>>> 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
>>> UN  10.120.249.140  838.59 MB  256 9.4%
>>> e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
>>> UN  10.90.246.128   216.75 MB  256 8.4%
>>> 054911ec-969d-43d9-aea1-db445706e4d2  1b
>>> UN  10.123.95.248   147.1 MB   256 7.2%
>>> a17deca1-9644-4520-9e62-ac66fc6fef60  1b
>>> UN  10.136.11.404.24 MB256 8.5%
>>> 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
>>> UN  10.87.90.42 11.56 MB   256 8.0%
>>> dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
>>> UN  10.87.75.147549 MB 256 8.3%
>>> ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
>>> UN  10.151.49.88119.86 MB  256 8.9%
>>> 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
>>> UN  10.87.83.107484.3 MB   256 8.3%
>>> 0019439b-9f8a-4965-91b8-7108bbb55593  1b
>>> UN  10.137.20.183   137.67 MB  256 8.4%
>>> 15951592-8ab2-473d-920a-da6e9d99507d  1a
>>> UN  10.238.170.159  49.17 MB   256 9.4%
>>> 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
>>>
>>> Is there something else that I should be doing differently?
>>>
>>> Thanks for your help!
>>>
>>> Suruchi
>>>
>>>
>>>
>>> On Thu, Sep 19, 2013 at 3:20 PM, Richard Low wrote:
>>>
 The only thing you need to guarantee is that Cassandra doesn't start
 with num_tokens=1 (the default in 1.2.x) or, if it does, that you wipe all
 the data before starting it with higher num_tokens.


 On 19 September 2013 19:07, Robert Coli  wrote:

> On Thu, Sep 19, 2013 at 10:59 AM, Suruchi Deodhar <
> suruchi.deod...@generalsentiment.com> wrote:
>
>> Do you suggest I should try with some other installation mechanism?
>> Are there any known problems with the tar installation of cassandra 1.2.9
>> that I should be aware of?
>>
>
> I was asking in the context of this JIRA :
>
> https://issues.apache.org/jira/browse/CASSANDRA-2356
>
> Which does not seem to apply in your case!
>
> =Rob
>


>>>
>>
>


Decomissioning a datacenter

2013-09-19 Thread Juan Manuel Formoso
Quick question.
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/../../cassandra/operations/ops_decomission_dc_t.html

When it says "Change all keyspaces so they no longer reference the data
center being removed.", does that mean setting my replication_strategy so
that datacenter1:0,datacenter2:N ? (assuming I'm removing datacenter1)

-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-19 Thread srmore
I hit this issue again today and looks like changing -Xss option does not
work :(
I am on 1.0.11 (I know its old, we are upgrading to 1.2.9 right now) and
have about 800-900GB of data. I can see cassandra is spending a lot of time
reading the data files before it quits with  "java.lang.OutOfMemoryError:
unable to create new native thread" error.

My hard and soft limits seems to be ok as well
Datastax recommends [1]

* soft nofile 32768
* hard nofile 32768


and I have
hardnofile 65536
softnofile 65536

My ulimit -u output is 515038 (which again should be sufficient)

complete output

ulimit -a
core file size  (blocks, -c)0
data seg size   (kbytes, -d)  unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 515038
max locked memory   (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files  (-n) 1024
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 10240
cpu time   (seconds, -t) unlimited
max user processes  (-u) 515038
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited




Has anyone run into this ?

[1] http://www.datastax.com/docs/1.1/troubleshooting/index

On Wed, Sep 11, 2013 at 8:47 AM, srmore  wrote:

> Thanks Viktor,
>
>
> - check (cassandra-env.sh) -Xss size, you may need to increase it for your
> JVM;
>
> This seems to have done the trick !
>
> Thanks !
>
>
> On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov <
> viktor.jevdoki...@adform.com> wrote:
>
>>  For start:
>>
>> - check (cassandra-env.sh) -Xss size, you may need to increase it for
>> your JVM;
>>
>> - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase
>> it for your data load/bloom filter/index sizes.
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>Best regards / Pagarbiai
>> *Viktor Jevdokimov*
>> Senior Developer
>>
>>  [image: Adform News] 
>>
>> *Visit us at Dmexco: *Hall 6 Stand B-52
>> September 18-19 Cologne, Germany
>> Email: viktor.jevdoki...@adform.com
>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>> J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
>> Follow us on Twitter: @adforminsider
>> Take a ride with Adform's Rich Media Suite
>>  [image: Dmexco 2013] 
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>> *From:* srmore [mailto:comom...@gmail.com]
>> *Sent:* Tuesday, September 10, 2013 6:16 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
>> create new native thread [heur]
>>
>> ** **
>>
>>
>> I have a 5 node cluster with a load of around 300GB each. A node went
>> down and does not come up. I can see the following exception in the logs.
>>
>> ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
>> 139) Fatal exception in thread Thread[main,5,main]
>> java.lang.OutOfMemoryError: unable to create new native thread
>> at java.lang.Thread.start0(Native Method)
>> at java.lang.Thread.start(Thread.java:640)
>> at
>> java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
>> at
>> java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
>> at
>> org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.(JMXEnabledThreadPoolExecutor.java:77)
>> at
>> org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.(JMXEnabledThreadPoolExecutor.java:65)
>> at
>> org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.(JMXConfigurableThreadPoolExecutor.java:34)
>> at
>> org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
>> at
>> org.apache.cassandra.concurrent.StageManager.(StageManager.java:42)
>> at
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
>> at
>> org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)**
>> **
>>
>> ** **
>>
>> The *ulimit -u* output is
>> *515042*
>>
>> Which is far more than what is recommended [1] (10240) and I am skeptical
>> to set it to unlimited as recommended here [2]
>>
>> Any pointers as to wh

Re: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-19 Thread srmore
Was too fast on the send button, sorry.
The thing I wanted to add was the

pending signals (-i) 515038

that looks odd to me, could that be related.



On Thu, Sep 19, 2013 at 4:53 PM, srmore  wrote:

>
> I hit this issue again today and looks like changing -Xss option does not
> work :(
> I am on 1.0.11 (I know its old, we are upgrading to 1.2.9 right now) and
> have about 800-900GB of data. I can see cassandra is spending a lot of time
> reading the data files before it quits with  "java.lang.OutOfMemoryError:
> unable to create new native thread" error.
>
> My hard and soft limits seems to be ok as well
> Datastax recommends [1]
>
> * soft nofile 32768
> * hard nofile 32768
>
>
> and I have
> hardnofile 65536
> softnofile 65536
>
> My ulimit -u output is 515038 (which again should be sufficient)
>
> complete output
>
> ulimit -a
> core file size  (blocks, -c)0
> data seg size   (kbytes, -d)  unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 515038
> max locked memory   (kbytes, -l) 32
> max memory size (kbytes, -m) unlimited
> open files  (-n) 1024
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 10240
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 515038
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>
>
>
>
> Has anyone run into this ?
>
> [1] http://www.datastax.com/docs/1.1/troubleshooting/index
>
> On Wed, Sep 11, 2013 at 8:47 AM, srmore  wrote:
>
>> Thanks Viktor,
>>
>>
>> - check (cassandra-env.sh) -Xss size, you may need to increase it for
>> your JVM;
>>
>> This seems to have done the trick !
>>
>> Thanks !
>>
>>
>> On Tue, Sep 10, 2013 at 12:46 AM, Viktor Jevdokimov <
>> viktor.jevdoki...@adform.com> wrote:
>>
>>>  For start:
>>>
>>> - check (cassandra-env.sh) -Xss size, you may need to increase it for
>>> your JVM;
>>>
>>> - check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase
>>> it for your data load/bloom filter/index sizes.
>>>
>>> ** **
>>>
>>> ** **
>>>
>>> ** **
>>>Best regards / Pagarbiai
>>> *Viktor Jevdokimov*
>>> Senior Developer
>>>
>>>  [image: Adform News] 
>>>
>>> *Visit us at Dmexco: *Hall 6 Stand B-52
>>> September 18-19 Cologne, Germany
>>> Email: viktor.jevdoki...@adform.com
>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>> J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
>>> Follow us on Twitter: @adforminsider
>>> Take a ride with Adform's Rich Media 
>>> Suite
>>>  [image: Dmexco 2013] 
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>> *From:* srmore [mailto:comom...@gmail.com]
>>> *Sent:* Tuesday, September 10, 2013 6:16 AM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Error during startup - java.lang.OutOfMemoryError: unable to
>>> create new native thread [heur]
>>>
>>> ** **
>>>
>>>
>>> I have a 5 node cluster with a load of around 300GB each. A node went
>>> down and does not come up. I can see the following exception in the logs.
>>>
>>> ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line
>>> 139) Fatal exception in thread Thread[main,5,main]
>>> java.lang.OutOfMemoryError: unable to create new native thread
>>> at java.lang.Thread.start0(Native Method)
>>> at java.lang.Thread.start(Thread.java:640)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
>>> at
>>> org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.(JMXEnabledThreadPoolExecutor.java:77)
>>> at
>>> org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.(JMXEnabledThreadPoolExecutor.java:65)
>>> at
>>> org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.(JMXConfigurableThreadPoolExecutor.java:34)
>>> at
>>> org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
>>> at
>>> org.apache.cassandra.concurrent.StageManager.(StageManager.java:42)
>>> at
>>>

Re: Decomissioning a datacenter

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 2:43 PM, Juan Manuel Formoso wrote:

> Not forever, while I decommission the nodes I assume. What I don't
> understand is the wording "no longer reference"
>

Why does your replication strategy need to be aware of nodes which receive
zero replicas?

"No longer reference" almost certainly means just removing any reference to
that DC from the configuration of the replication strategy.

=Rob


Re: I don't understand shuffle progress

2013-09-19 Thread Jeremiah D Jordan
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/configuration/configVnodesProduction_t.html

On Sep 18, 2013, at 9:41 AM, Chris Burroughs  wrote:

> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
> 
> This is a basic outline.
> 
> 
> On 09/18/2013 10:32 AM, Juan Manuel Formoso wrote:
>> I really like this idea. I can create a new cluster and have it replicate
>> the old one, after it finishes I can remove the original.
>> 
>> Any good resource that explains how to add a new datacenter to a live
>> single dc cluster that anybody can recommend?
>> 
>> 
>> On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
>> wrote:
>> 
>>> On 09/17/2013 09:41 PM, Paulo Motta wrote:
>>> 
 So you're saying the only feasible way of enabling VNodes on an upgraded
 C*
 1.2 is by doing fork writes to a brand new cluster + bulk load of sstables
 from the old cluster? Or is it possible to succeed on shuffling, even if
 that means waiting some weeks for the shuffle to complete?
 
>>> 
>>> In a multi "DC" cluster situation you *should* be able to bring up a new
>>> DC with vnodes, bootstrap it, and then decommission the old cluster.
>>> 
>> 
>> 
>> 
> 



Re: Decomissioning a datacenter

2013-09-19 Thread Robert Coli
On Thu, Sep 19, 2013 at 3:03 PM, Juan Manuel Formoso wrote:

> Oh, so just "datacenter2:N" then.
>

Yes.


> Sorry, not a native English speaker, and also tired :)
>

NP! :D

=Rob


Re: Decomissioning a datacenter

2013-09-19 Thread Juan Manuel Formoso
Oh, so just "datacenter2:N" then.
Sorry, not a native English speaker, and also tired :)


On Thu, Sep 19, 2013 at 6:57 PM, Robert Coli  wrote:

> On Thu, Sep 19, 2013 at 2:43 PM, Juan Manuel Formoso 
> wrote:
>
>> Not forever, while I decommission the nodes I assume. What I don't
>> understand is the wording "no longer reference"
>>
>
> Why does your replication strategy need to be aware of nodes which receive
> zero replicas?
>
> "No longer reference" almost certainly means just removing any reference
> to that DC from the configuration of the replication strategy.
>
> =Rob
>
>



-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


NetworkTopologyStrategy Error

2013-09-19 Thread Ashley Martens
I tried to split my cluster and ran into this error, which I did not see in
the tests I performed.

ERROR [pool-1-thread-52165] 2013-09-19 21:48:08,262 Cassandra.java (line
3250) Internal error processing describe_ring
java.lang.IllegalStateException: datacenter (DC103) has no more endpoints,
(3) replicas still needed
at
org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:118)
at
org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:101)
at
org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:604)
at
org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:579)
at
org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:553)
at
org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:584)
at
org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.process(Cassandra.java:3246)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)


-- 

Ashley  OpenPGP --> KeyID: 0x5B0D6ABB



Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Jayadev Jayaraman
We ran nodetool repair on all nodes for all Keyspaces / CFs, restarted
cassandra and this is what we get for nodetool status :

bin/nodetool -h localhost status
Datacenter: us-east
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  Owns (effective)  Host ID
Rack
UN  10.238.133.174  885.36 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
UN  10.238.133.97   468.66 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
UN  10.151.86.146   1.08 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
UN  10.138.10.9 941.44 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
UN  10.87.87.24099.69 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
UN  10.93.5.157 87.44 MB   256 7.6%
 4ab9111c-39b4-4d15-9401-359d9d853c16  1b
UN  10.238.137.250  561.42 MB  256 7.8%
 84301648-afff-4f06-aa0b-4be421e0d08f  1a
UN  10.92.231.170   893.75 MB  256 9.3%
 a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
UN  10.138.2.20 31.89 MB   256 7.9%
 a6d4672a-0915-4c64-ba47-9f190abbf951  1a
UN  10.93.31.44 312.52 MB  256 7.8%
 67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
UN  10.93.91.13930.46 MB   256 8.1%
 682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
UN  10.236.138.169  260.15 MB  256 9.1%
 cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
UN  10.137.7.90 38.45 MB   256 7.4%
 17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
UN  10.93.77.166867.15 MB  256 8.8%
 9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
UN  10.120.249.140  863.98 MB  256 9.4%
 e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
UN  10.90.246.128   242.63 MB  256 8.4%
 054911ec-969d-43d9-aea1-db445706e4d2  1b
UN  10.123.95.248   171.51 MB  256 7.2%
 a17deca1-9644-4520-9e62-ac66fc6fef60  1b
UN  10.136.11.4033.8 MB256 8.5%
 66be1173-b822-40b5-b650-cb38ae3c7a51  1a
UN  10.87.90.42 38.01 MB   256 8.0%
 dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
UN  10.87.75.147579.29 MB  256 8.3%
 ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
UN  10.151.49.88151.06 MB  256 8.9%
 57043573-ab1b-4e3c-8044-58376f7ce08f  1a
UN  10.87.83.107512.91 MB  256 8.3%
 0019439b-9f8a-4965-91b8-7108bbb55593  1b
UN  10.238.170.159  85.04 MB   256 9.4%
 32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
UN  10.137.20.183   167.41 MB  256 8.4%
 15951592-8ab2-473d-920a-da6e9d99507d  1a

It doesn't seem to have changed by much. The loads are still highly uneven.

As for the number of keys in each node's CFs : the largest node now
has 5589120 keys for the column-family that had 6527744 keys before (load
is now 1.08 GB as compares to 1.05 GB before), while the smallest node now
has 71808 keys as compared to 3840 keys before (load is now 31.89 MB as
compares to 1.12 MB before).


On Thu, Sep 19, 2013 at 5:18 PM, Mohit Anchlia wrote:

> Can you run nodetool repair on all the nodes first and look at the keys?
>
>
> On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar <
> suruchi.deod...@generalsentiment.com> wrote:
>
>> Yes, the key distribution does vary across the nodes. For example, on the
>> node with the highest data, Number of Keys (estimate) is 6527744 for a
>> particular column family, whereas for the same column family on the node
>> with least data, Number of Keys (estimate) = 3840.
>>
>> Is there a way to control this distribution by setting some parameter of
>> cassandra.
>>
>> I am using the Murmur3 partitioner with NetworkTopologyStrategy.
>>
>> Thanks,
>> Suruchi
>>
>>
>>
>> On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia wrote:
>>
>>> Can you check cfstats to see number of keys per node?
>>>
>>>
>>> On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar <
>>> suruchi.deod...@generalsentiment.com> wrote:
>>>
 Thanks for your replies. I wiped out my data from the cluster and also
 cleared the commitlog before restarting it with num_tokens=256. I then
 uploaded data using sstableloader.

 However, I am still not able to see a uniform distribution of data
 across nodes of the clusters.

 The output of the bin/nodetool -h localhost status commands looks like
 follows. Some nodes have data as low as 1.12MB while some have as high as
 912.57 MB.

 Datacenter: us-east
 ===
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address Load   Tokens  Owns (effective)  Host
 ID   Rack
 UN  10.238.133.174  856.66 MB  256 8.4%
 e41d8863-ce37-4d5c-a428-bfacea432a35  1a
 UN  10.238.133.97   439.02 MB  256 7.7%
 1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
 UN  10.151.86.146   1.05 GB256 8.0%
 8952645d-4a27-4670-afb2-65061c205734  1a
 UN  10.138.10.9 912.57 MB  256 8.6%
 25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
 UN  10.87.87.24070.85 MB   256 8.6%
 ea066827-83bc-458c-83e8-bd15b7fc783c  1b
 UN  10.93.5.157 60.56 MB   256 7.6%
 4ab9

Re: Rebalancing vnodes cluster

2013-09-19 Thread Nimi Wariboko Jr
We had originally started with 3 nodes w/ 32GB ram and 768GB SSDs. I pretty 
much Google'd my way into setting up cassandra and set it up using tokens 
because I was following an older docco. We were using Cassandra 1.2.5, I 
learned about vnodes later on and regretted waking up that morning.

1.) I'm not sure if shuffle was successful. We started shuffling on Jun 7th and 
killed it on the 17th. We let it run over 2 weekends (10 days) and it the node 
shuffle tool didn't report any meaningful progress. I explained this over IRC 
and was told `node shuffle` takes a really long time and you shouldn't use it. 
At the time our ring looked "mostly" balanced so we just killed it. We were 
migrating from a MongoDB cluster and didn't want to pay for 2 clusters.
2.) During the shuffle we had upped our RF to 2, did not a do a repair and lost 
1/3rd of our data. Fortunately we could just use sstable tool to reload the 
data as it was really deleted.
3.) We ran cleanup a couple days later
4.) Cassandra 1.2.5

After all this, we converted another mongo node we had into Cassandra (same 
specs) for a cluster of size 4. Now after 4 months, one node (the subject of 
this thread) is growing faster than the others (which is leading to hot 
spotting as well). I guess this has to do with the unfinished shuffle? Are 
there any remedies for this? 

On Thursday, September 19, 2013 at 9:50 AM, Robert Coli wrote:

> On Wed, Sep 18, 2013 at 4:26 PM, Nimi Wariboko Jr  (mailto:nimiwaribo...@gmail.com)> wrote:
> > When I started with cassandra I had originally set it up to use tokens. I
> > then migrated to vnodes (using shuffle), but my cluster isn't balanced 
> > (http://imgur.com/73eNhJ3). 
> 
> Are you saying that (other than the imbalance that is the subject of this 
> thread) you were able to use "shuffle" successfully on a cluster with ~150gb 
> per node?
> 
> 1) How long did it take?
> 2) Did you experience any difficulties while doing so?
> 3) Have you run cleanup yet?
> 4) What version of Cassandra?
> 
> =Rob
>  
> 
> 
> 
> 
> 




Re: NetworkTopologyStrategy Error

2013-09-19 Thread sankalp kohli
Is any of your keyspace still reference this DC?


On Thu, Sep 19, 2013 at 3:03 PM, Ashley Martens wrote:

> I tried to split my cluster and ran into this error, which I did not see
> in the tests I performed.
>
> ERROR [pool-1-thread-52165] 2013-09-19 21:48:08,262 Cassandra.java (line
> 3250) Internal error processing describe_ring
> java.lang.IllegalStateException: datacenter (DC103) has no more endpoints,
> (3) replicas still needed
> at
> org.apache.cassandra.locator.NetworkTopologyStrategy.calculateNaturalEndpoints(NetworkTopologyStrategy.java:118)
>  at
> org.apache.cassandra.locator.AbstractReplicationStrategy.getNaturalEndpoints(AbstractReplicationStrategy.java:101)
> at
> org.apache.cassandra.service.StorageService.constructRangeToEndpointMap(StorageService.java:604)
>  at
> org.apache.cassandra.service.StorageService.getRangeToAddressMap(StorageService.java:579)
> at
> org.apache.cassandra.service.StorageService.getRangeToEndpointMap(StorageService.java:553)
>  at
> org.apache.cassandra.thrift.CassandraServer.describe_ring(CassandraServer.java:584)
> at
> org.apache.cassandra.thrift.Cassandra$Processor$describe_ring.process(Cassandra.java:3246)
>  at
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2555)
> at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:206)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662)
>
>
> --
>
> Ashley  OpenPGP --> KeyID: 0x5B0D6ABB 
> 
>
>


Re: Cassandra column family using Composite Columns

2013-09-19 Thread Raihan Jamal
Can anyone help me on this?

Any help will be appreciated.. Thanks..





*Raihan Jamal*


On Tue, Sep 17, 2013 at 4:44 PM, Raihan Jamal  wrote:

>  I am designing the Column Family for our use case in Cassandra. I am
> planning to go with Dynamic Column Structure.
>
> Below is my requirement per our use case-
>
> user-id   column1123  (Column1-Value  Column1-SchemaName  LMD)
>
>  For each user-id, we will be storing column1 and its value and that value
> will store these three things always-
>
> (Column1-Value   Column1-SchemaName LMD)
>
>  In my above example, I have show only one columns but it might have more
> columns and those columns will also follow the same concept.
>
> Now I am not sure, how to store these three things always at a column
> value level? Should I use composite columns at a column level? if yes, then
> I am not sure how to make a column family like this in Cassandra.
>
> Column1-value will be in binary, Column1-SchemaName will be String, LMD will 
> be DateType.
>
>  This is what I have so far-
>
> create column family USER_DATA
> with key_validation_class = 'UTF8Type'
> and comparator = 'UTF8Type'
> and default_validation_class = 'UTF8Type'
> and gc_grace = 86400
> and column_metadata = [ {column_name : 'lmd', validation_class : DateType}];
>
>  Can anyone help me in designing the column family for this? Thanks.
>


Re: Cassandra 1.2.9 cluster with vnodes is heavily unbalanced.

2013-09-19 Thread Mohit Anchlia
Other thing I noticed is that you are using mutiple RACKS and that might be
contributing factor to it. However, I am not sure.

Can you paste the output of nodetool cfstats and ring? Is it possible to
run the same test but keeping all the nodes in one rack?

I think you should open a JIRA if you are able to reproduce this.

On Thu, Sep 19, 2013 at 4:41 PM, Jayadev Jayaraman wrote:

> We ran nodetool repair on all nodes for all Keyspaces / CFs, restarted
> cassandra and this is what we get for nodetool status :
>
> bin/nodetool -h localhost status
> Datacenter: us-east
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens  Owns (effective)  Host ID
> Rack
> UN  10.238.133.174  885.36 MB  256 8.4%
>  e41d8863-ce37-4d5c-a428-bfacea432a35  1a
> UN  10.238.133.97   468.66 MB  256 7.7%
>  1bf42b5e-4aed-4b06-bdb3-65a78823b547  1a
> UN  10.151.86.146   1.08 GB256 8.0%
>  8952645d-4a27-4670-afb2-65061c205734  1a
> UN  10.138.10.9 941.44 MB  256 8.6%
>  25ccea82-49d2-43d9-830c-b9c9cee026ec  1a
> UN  10.87.87.24099.69 MB   256 8.6%
>  ea066827-83bc-458c-83e8-bd15b7fc783c  1b
> UN  10.93.5.157 87.44 MB   256 7.6%
>  4ab9111c-39b4-4d15-9401-359d9d853c16  1b
> UN  10.238.137.250  561.42 MB  256 7.8%
>  84301648-afff-4f06-aa0b-4be421e0d08f  1a
> UN  10.92.231.170   893.75 MB  256 9.3%
>  a18ce761-88a0-4407-bbd1-c867c4fecd1f  1b
> UN  10.138.2.20 31.89 MB   256 7.9%
>  a6d4672a-0915-4c64-ba47-9f190abbf951  1a
> UN  10.93.31.44 312.52 MB  256 7.8%
>  67a6c0a6-e89f-4f3e-b996-cdded1b94faf  1b
> UN  10.93.91.13930.46 MB   256 8.1%
>  682dd848-7c7f-4ddb-a960-119cf6491aa1  1b
> UN  10.236.138.169  260.15 MB  256 9.1%
>  cbbf27b0-b53a-4530-bfdf-3764730b89d8  1a
> UN  10.137.7.90 38.45 MB   256 7.4%
>  17b79aa7-64fc-4e16-b96a-955b0aae9bb4  1a
> UN  10.93.77.166867.15 MB  256 8.8%
>  9a821d1e-40e5-445d-b6b7-3cdd58bdb8cb  1b
> UN  10.120.249.140  863.98 MB  256 9.4%
>  e1fb69b0-8e66-4deb-9e72-f901d7a14e8a  1b
> UN  10.90.246.128   242.63 MB  256 8.4%
>  054911ec-969d-43d9-aea1-db445706e4d2  1b
> UN  10.123.95.248   171.51 MB  256 7.2%
>  a17deca1-9644-4520-9e62-ac66fc6fef60  1b
> UN  10.136.11.4033.8 MB256 8.5%
>  66be1173-b822-40b5-b650-cb38ae3c7a51  1a
> UN  10.87.90.42 38.01 MB   256 8.0%
>  dac0c6ea-56c6-44da-a4ec-6388f39ecba1  1b
> UN  10.87.75.147579.29 MB  256 8.3%
>  ac060edf-dc48-44cf-a1b5-83c7a465f3c8  1b
> UN  10.151.49.88151.06 MB  256 8.9%
>  57043573-ab1b-4e3c-8044-58376f7ce08f  1a
> UN  10.87.83.107512.91 MB  256 8.3%
>  0019439b-9f8a-4965-91b8-7108bbb55593  1b
> UN  10.238.170.159  85.04 MB   256 9.4%
>  32ce322e-4f7c-46c7-a8ce-bd73cdd54684  1a
> UN  10.137.20.183   167.41 MB  256 8.4%
>  15951592-8ab2-473d-920a-da6e9d99507d  1a
>
> It doesn't seem to have changed by much. The loads are still highly
> uneven.
>
> As for the number of keys in each node's CFs : the largest node now
> has 5589120 keys for the column-family that had 6527744 keys before (load
> is now 1.08 GB as compares to 1.05 GB before), while the smallest node now
> has 71808 keys as compared to 3840 keys before (load is now 31.89 MB as
> compares to 1.12 MB before).
>
>
> On Thu, Sep 19, 2013 at 5:18 PM, Mohit Anchlia wrote:
>
>> Can you run nodetool repair on all the nodes first and look at the keys?
>>
>>
>> On Thu, Sep 19, 2013 at 1:22 PM, Suruchi Deodhar <
>> suruchi.deod...@generalsentiment.com> wrote:
>>
>>> Yes, the key distribution does vary across the nodes. For example, on
>>> the node with the highest data, Number of Keys (estimate) is 6527744 for a
>>> particular column family, whereas for the same column family on the node
>>> with least data, Number of Keys (estimate) = 3840.
>>>
>>> Is there a way to control this distribution by setting some parameter of
>>> cassandra.
>>>
>>> I am using the Murmur3 partitioner with NetworkTopologyStrategy.
>>>
>>> Thanks,
>>> Suruchi
>>>
>>>
>>>
>>> On Thu, Sep 19, 2013 at 3:59 PM, Mohit Anchlia 
>>> wrote:
>>>
 Can you check cfstats to see number of keys per node?


 On Thu, Sep 19, 2013 at 12:36 PM, Suruchi Deodhar <
 suruchi.deod...@generalsentiment.com> wrote:

> Thanks for your replies. I wiped out my data from the cluster and also
> cleared the commitlog before restarting it with num_tokens=256. I then
> uploaded data using sstableloader.
>
> However, I am still not able to see a uniform distribution of data
> across nodes of the clusters.
>
> The output of the bin/nodetool -h localhost status commands looks like
> follows. Some nodes have data as low as 1.12MB while some have as high as
> 912.57 MB.
>
> Datacenter: us-east
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens  Owns (effective)  Host
>

Re: I don't understand shuffle progress

2013-09-19 Thread Juan Manuel Formoso
Thanks. I did this and I finished rebuilding the new cluster in about 8
hours... much better option than shuffle (you have to have the hardware for
duplicating your environment though)


On Thu, Sep 19, 2013 at 7:21 PM, Jeremiah D Jordan <
jeremiah.jor...@gmail.com> wrote:

>
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/configuration/configVnodesProduction_t.html
>
> On Sep 18, 2013, at 9:41 AM, Chris Burroughs 
> wrote:
>
> >
> http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
> >
> > This is a basic outline.
> >
> >
> > On 09/18/2013 10:32 AM, Juan Manuel Formoso wrote:
> >> I really like this idea. I can create a new cluster and have it
> replicate
> >> the old one, after it finishes I can remove the original.
> >>
> >> Any good resource that explains how to add a new datacenter to a live
> >> single dc cluster that anybody can recommend?
> >>
> >>
> >> On Wed, Sep 18, 2013 at 9:58 AM, Chris Burroughs
> >> wrote:
> >>
> >>> On 09/17/2013 09:41 PM, Paulo Motta wrote:
> >>>
>  So you're saying the only feasible way of enabling VNodes on an
> upgraded
>  C*
>  1.2 is by doing fork writes to a brand new cluster + bulk load of
> sstables
>  from the old cluster? Or is it possible to succeed on shuffling, even
> if
>  that means waiting some weeks for the shuffle to complete?
> 
> >>>
> >>> In a multi "DC" cluster situation you *should* be able to bring up a
> new
> >>> DC with vnodes, bootstrap it, and then decommission the old cluster.
> >>>
> >>
> >>
> >>
> >
>
>


-- 
*Juan Manuel Formoso
*Senior Geek
http://twitter.com/juanformoso
http://seniorgeek.com.ar
LLAP


BigTable-like Versioned Cells, Importing PostgreSQL Data

2013-09-19 Thread Keith Bogs
I've been playing with Cassandra and have a few questions that I've been
stuck on for awhile, and Googling around didn't seem to help much:

1. What's the quickest way to import a bunch of data from PostgreSQL? I
have ~20M rows with mostly text (some long text with newlines, and blob
files.) I tried exporting to CSV but had issues with newlines escaped
characters. I also tried writing an ETL tool in Go, but it was taking a
long time to go through the records.

2. How would I create a "versioned" schema with CQL? AFAIK Cassandra's cell
versions are only for conflict resolution.

I envision a wide row, with timestamps and keys representing fields of data
through time. For example, for a CF of web page contents (inspired by
Google's Bigtable paper):

Key  1379649588:body 1379649522:body 1379649123:title
a.com/1.html """A"
a.com/2.html """B"
b.com/1.html """""C"

But CQL doesn't seem to support this. (Yes, I've read
http://www.datastax.com/dev/blog/does-cql-support-dynamic-columns-wide-rows.)
Once upon a time it seems Thrift and Supercolumns maybe would work?

I'd want to efficiently iterate through the "history" of a particular row
(in other words, read all the columns for a row) or efficiently iterate
through all the latest values for the CF (not reading the entire row, just
a column slice). In the previous example, I'd want to return the latest
'body' entries with timestamps for every page ("row"/"key") in the database.

Some have talked of having two CFs, one for versioned data and one for
current values?

I've been struggling because most of the documentation revolves around
Java. I'm most comfortable with Ruby and (increasingly) Go.

I'd appreciate any insights, would really like to get Cassandra going for
real. It's been such a pleasure to setup vs. HBase and whatnot.

Keith


select count query not working at cassandra 2.0.0

2013-09-19 Thread Katsutoshi
I would like to use select count query.
Although it was work at Cassandra 1.2.9, but there is a situation which
does not work at Cassandra 2.0.0.
so, If some row is deleted, 'select count query' seems to return the wrong
value.
Did anything change by Cassandra 2.0.0 ? or Have I made a mistake ?

My test procedure is as follows:

### At Cassandra 1.2.9

1) create table, and insert two rows

```
cqlsh:test> CREATE TABLE count_hash_test (key text, value text, PRIMARY KEY
(key));
cqlsh:test> INSERT INTO count_hash_test (key, value) VALUES ('key1',
'value');
cqlsh:test> INSERT INTO count_hash_test (key, value) VALUES ('key2',
'value');
```

2) do a select count query, it returns 2 which is expected

```
cqlsh:test> SELECT * FROM count_hash_test;

 key  | value
--+---
 key1 | value
 key2 | value

cqlsh:test> SELECT COUNT(*) FROM count_hash_test;

 count
---
 2
```

3) delete one row

```
cqlsh:test> DELETE FROM count_hash_test WHERE key='key1';
```

4) do a select count query, it returns 1 which is expected

```
cqlsh:test> SELECT * FROM count_hash_test;

 key  | value
--+---
 key2 | value

cqlsh:test> SELECT COUNT(*) FROM count_hash_test;

 count
---
 1
```

### At Cassandra 2.0.0

1) create table, and insert two rows

```
cqlsh:test> CREATE TABLE count_hash_test (key text, value text, PRIMARY KEY
(key));
cqlsh:test> INSERT INTO count_hash_test (key, value) VALUES ('key1',
'value');
cqlsh:test> INSERT INTO count_hash_test (key, value) VALUES ('key2',
'value');
```

2) do a select count query, it returns 2  which is expected

```
cqlsh:test> SELECT * FROM count_hash_test;

 key  | value
--+---
 key1 | value
 key2 | value

cqlsh:test> SELECT COUNT(*) FROM count_hash_test;

 count
---
 2
```

3) delete one row

```
cqlsh:test> DELETE FROM count_hash_test WHERE key='key1';
```

4) do a select count query, but it returns 0 which is NOT expected

```
cqlsh:test> SELECT * FROM count_hash_test;

 key  | value
--+---
 key2 | value

cqlsh:test> SELECT COUNT(*) FROM count_hash_test;

 count
---
 0
```

Could anyone help me for this? thanks.

Katsutoshi