how to determine RF on the fly ?

2013-07-10 Thread Илья Шипицин
Hello!

is there easy way to determine current RF, for instance, via mx4j ?

Cheers,
Ilya Shipitsin


Re: High performance hardware with lot of data per node - Global learning about configuration

2013-07-10 Thread Alain RODRIGUEZ
This comment and some testing were enough for us.

"Generally, a value between 128 and 512 here coupled with a large key cache
size on CFs results in the best trade offs.  This value is not often
changed, however if you have many very small rows (many to an OS page),
then increasing this will often lower memory usage without a impact on
performance."

And indeed, I started using this config in only one node without seeing any
performance degradation. Mean reads latency was around 4 ms in all the
servers, including this one. And I had no more heap full. Heap used now
goes from 2.5 GB to 5.5 GB increasing slowly instead of getting stuck
around 5.0 GB and 6.5GB (out of 8GB Heap).

All the graph I could see while having both configurations (128/512) on
different servers were almost the same, excepted about the Heap.

So 512 was a lot better in our case.

Hope it will help you, since it was also the purpose of this thread.

Alain






2013/7/9 Mike Heffner 

> I'm curious because we are experimenting with a very similar
> configuration, what basis did you use for expanding the index_interval to
> that value? Do you have before and after numbers or was it simply reduction
> of the heap pressure warnings that you looked for?
>
> thanks,
>
> Mike
>
>
> On Tue, Jul 9, 2013 at 10:11 AM, Alain RODRIGUEZ wrote:
>
>> Hi,
>>
>> Using C*1.2.2.
>>
>> We recently dropped our 18 m1.xLarge (4CPU, 15GB RAM, 4 Raid-0 Disks)
>> servers to get 3 hi1.4xLarge (16CPU, 60GB RAM, 2 Raid-0 SSD) servers
>> instead, for about the same price.
>>
>> We tried it after reading some benchmark published by Netflix.
>>
>> It is awesome and I recommend it to anyone who is using more than 18
>> xLarge server or can afford these high cost / high performance EC2
>> instances. SSD gives a very good throughput with an awesome latency.
>>
>> Yet, we had about 200 GB data per server and now about 1 TB.
>>
>> To alleviate memory pressure inside the heap I had to reduce the index
>> sampling. I changed the index_interval value from 128 to 512, with no
>> visible impact on latency, but a great improvement inside the heap which
>> doesn't complain about any pressure anymore.
>>
>> Is there some more tuning I could use, more tricks that could be useful
>> while using big servers, with a lot of data per node and relatively high
>> throughput ?
>>
>> SSD are at 20-40 % of their throughput capacity (according to OpsCenter),
>> CPU almost never reach a bigger load than 5 or 6 (with 16 CPU), 15 GB RAM
>> used out of 60GB.
>>
>> At this point I have kept my previous configuration, which is almost the
>> default one from the Datastax community AMI. There is a part of it, you can
>> consider that any property that is not in here is configured as default :
>>
>> cassandra.yaml
>>
>> key_cache_size_in_mb: (empty) - so default - 100MB (hit rate between 88 %
>> and 92 %, good enough ?)
>> row_cache_size_in_mb: 0 (not usable in our use case, a lot of different
>> and random reads)
>> flush_largest_memtables_at: 0.80
>> reduce_cache_sizes_at: 0.90
>>
>> concurrent_reads: 32 (I am thinking to increase this to 64 or more since
>> I have just a few servers to handle more concurrence)
>> concurrent_writes: 32 (I am thinking to increase this to 64 or more too)
>> memtable_total_space_in_mb: 1024 (to avoid having a full heap, shoul I
>> use bigger value, why for ?)
>>
>> rpc_server_type: sync (I tried hsha and had the "ERROR 12:02:18,971 Read
>> an invalid frame size of 0. Are you using TFramedTransport on the client
>> side?" error). No idea how to fix this, and I use 5 different clients for
>> different purpose  (Hector, Cassie, phpCassa, Astyanax, Helenus)...
>>
>> multithreaded_compaction: false (Should I try enabling this since I now
>> use SSD ?)
>> compaction_throughput_mb_per_sec: 16 (I will definitely up this to 32 or
>> even more)
>>
>> cross_node_timeout: true
>> endpoint_snitch: Ec2MultiRegionSnitch
>>
>> index_interval: 512
>>
>> cassandra-env.sh
>>
>> I am not sure about how to tune the heap, so I mainly use defaults
>>
>> MAX_HEAP_SIZE="8G"
>> HEAP_NEWSIZE="400M" (I tried with higher values, and it produced bigger
>> GC times (1600 ms instead of < 200 ms now with 400M)
>>
>> -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled
>> -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1
>> -XX:CMSInitiatingOccupancyFraction=70
>> -XX:+UseCMSInitiatingOccupancyOnly
>>
>> Does this configuration seems coherent ? Right now, performance are
>> correct, latency < 5ms almost all the time. What can I do to handle more
>> data per node and keep these performances or get even better once ?
>>
>> I know this is a long message but if you have any comment or insight even
>> on part of it, don't hesitate to share it. I guess this kind of comment on
>> configuration is usable by the entire community.
>>
>> Alain
>>
>>
>
>
> --
>
>   Mike Heffner 
>   Librato, Inc.
>
>


Re: Purpose of BLOB datatype

2013-07-10 Thread Ollif Lee
fine, thanks.


On Tue, Jul 9, 2013 at 11:24 PM, Pavel Kirienko <
pavel.kirienko.l...@gmail.com> wrote:

> > Do you know any direct ways in CQL to handle BLOB, just like DataStax
> Java driver?
>
> Well, CQL3 specification explicitly says that there is no way to encode
> blob into CQL request other than HEX string:
> http://cassandra.apache.org/doc/cql3/CQL.html#constants
>
>
>
> On Tue, Jul 9, 2013 at 6:40 PM, Ollif Lee  wrote:
>
>> Thank you for your patience. That is what I have expected.
>> PS. Do you know any direct ways in CQL to handle BLOB, just like DataStax
>> Java driver?
>>
>>
>> On Tue, Jul 9, 2013 at 4:53 PM, Sylvain Lebresne wrote:
>>
>>> > Pls explain why and how.
>>>
>>> Why and how what?
>>>
>>> Not encoding blobs into strings is the "preferred way" because that's
>>> obviously
>>>  more efficient (in speed and space), since you don't do any encoding
>>> pass.
>>>
>>> As for how, "use prepared statement" was the "how". What are the exact
>>> lines of
>>> code to use to do prepared statements will depends on the client driver
>>> you
>>> use, and you should check your driver documentation.
>>>
>>> But, to give you an example, if you use the DataStax Java driver
>>> (https://github.com/datastax/java-driver), this might look something
>>> like:
>>>
>>>   PreparedStatement st = session.prepare("INSERT INTO foo(myKey, myBlob)
>>> VALUES (?, ?)");
>>>   String myKey = ...;
>>>   ByteBuffer myBlob = ...;
>>>   session.execute(st.bind(myKey, myBlob));
>>>
>>>
>>> --
>>> Sylvain
>>>
>>
>>
>


manually removing sstable

2013-07-10 Thread Theo Hultberg
Hi,

I think I remember reading that if you have sstables that you know contain
only data that whose ttl has expired, it's safe to remove them manually by
stopping c*, removing the *-Data.db files and then starting up c* again. is
this correct?

we have a cluster where everything is written with a ttl, and sometimes c*
needs to compact over a 100 gb of sstables where we know ever has expired,
and we'd rather just manually get rid of those.

T#


Re: General doubts about bootstrap

2013-07-10 Thread Eric Stevens
> => Adding a new node between other nodes would avoid running move, but
the ring would be unbalanced, right? Would this imply in having a node
(with bigger range, 1/2 of the range while other 2 nodes have 1/2 each,
supposing 3 nodes) overloaded? I'm refering
http://wiki.apache.org/cassandra/Operations#Load_balancing
>
>
>>
>> Yes, if you're using a single vnode per server, or are running an older
version of Cassandra.  For lowest impact, doubling the size of your cluster
is recommended so that you can avoid doing moves.  Or if you're on
Cassandra 1.2+, you can use vnodes, and you should not typically need to
rebalance after bringing a new server online.


On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix <
rodrigofelixdealme...@gmail.com> wrote:

> Thank you very much for you response. Follows my comments about your email.
>
> Att.
>
> *Rodrigo Felix de Almeida*
> LSBD - Universidade Federal do Ceará
> Project Manager
> MBA, CSM, CSPO, SCJP
>
>
> On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli  wrote:
>
>> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <
>> rodrigofelixdealme...@gmail.com> wrote:
>>
>>>
>>>- Is it normal to take about 9 minutes to add a new node? Follows
>>>the log generated by a script to add a new node.
>>>
>>> Sure.  => OK
>>
>>>
>>>- Is there a way to reduce the time to start cassandra?
>>>
>>> Not usually. => OK
>>
>>>
>>>- Sometimes cleanup operation takes make minutes (about 10). Is this
>>>normal since the amount of data is small (1.7gb at maximum / seed)?
>>>
>>> Compaction is throttled, and cleanup is a type of compaction. Bootstrap
>> is also throttled via the streaming throttle. => OK
>>
>>>
>>>- Considering that I have two seeds in the beginning, their tokens
>>>are 0 and 85070591730234615865843651857942052864. When I add a new 
>>> machine,
>>>do I need to execute move and cleanup on both seeds? Nowadays, I'm 
>>> running
>>>cleanup on seed 0, move + cleanup on the other seed and neither move nor
>>>cleanup on the just added node. Is this OK?
>>>
>>> Only nodes which have "lost" ranges need to run cleanup. In general you
>> should add new nodes "between" other nodes such that "move" is not required
>> at all.
>>
>
> => Adding a new node between other nodes would avoid running move, but the
> ring would be unbalanced, right? Would this imply in having a node (with
> bigger range, 1/2 of the range while other 2 nodes have 1/2 each, supposing
> 3 nodes) overloaded? I'm refering
> http://wiki.apache.org/cassandra/Operations#Load_balancing
>
>>
>>>- What if I do not run cleanup in any existing node when adding or
>>>removing a node? Is the data that was not "cleaned up" still available 
>>> if I
>>>send a scan, for instance, and the scan range is still in the node but it
>>>wouldn't be there if I had run cleanup? Data would be gather from other
>>>node, ie. the one that properly has the range specified in the scan 
>>> query?
>>>
>>> If data for range [x] is on node [a] but node [a] is no longer
>> considered an endpoint for range [x], it will never receive a request to
>> serve range [x]. => OK
>>
>>>
>>>- After decommissioning a node, is it advisable to run cleanup in
>>>the remaining nodes? The consequences of not to run are the same of not 
>>> to
>>>run when adding a node?
>>>
>>> Cleanup is only for the node which lost a range. In decommission case,
>> no live nodes lost a range, only some nodes gained one. => OK
>>
>> =Rob
>>
>
>


Re: manually removing sstable

2013-07-10 Thread Mike Heffner
Theo,

We have several CFs that we TTL all columns, set gc_grace=0 and we never
overwrite or delete records. We will manually remove sstables from disk
during a rolling C* restart process. You'll also want to remove all
associated index/filter files with the sst -- so if foo-hf-123-Data.db is >
TTL, ensure you remove all foo-hf-123-*.

I recommend taking a snapshot beforehand to be safe. ;-)

Mike


On Wed, Jul 10, 2013 at 8:09 AM, Theo Hultberg  wrote:

> Hi,
>
> I think I remember reading that if you have sstables that you know contain
> only data that whose ttl has expired, it's safe to remove them manually by
> stopping c*, removing the *-Data.db files and then starting up c* again. is
> this correct?
>
> we have a cluster where everything is written with a ttl, and sometimes c*
> needs to compact over a 100 gb of sstables where we know ever has expired,
> and we'd rather just manually get rid of those.
>
> T#
>



-- 

  Mike Heffner 
  Librato, Inc.


Re: manually removing sstable

2013-07-10 Thread Robert Coli
On Wed, Jul 10, 2013 at 5:09 AM, Theo Hultberg  wrote:

> I think I remember reading that if you have sstables that you know contain
> only data that whose ttl has expired, it's safe to remove them manually by
> stopping c*, removing the *-Data.db files and then starting up c* again. is
> this correct?
>

Yes.


> we have a cluster where everything is written with a ttl, and sometimes c*
> needs to compact over a 100 gb of sstables where we know ever has expired,
> and we'd rather just manually get rid of those.
>

Have you considered TRUNCATE oriented approaches to this problem? I believe
that TRUNCATE (with proper handling/purging of snapshots) oriented
approaches have potential for cases where 100% of data in a given time
window becomes worthless.

=Rob


Quorum reads and response time

2013-07-10 Thread Baskar Duraikannu
I have a 3 node cluster with RF=3.  All nodes are running. I have a table
with 39 rows and ~44,000 columns evenly spread across 39 rows.

When I do range slice query on this table with consistency of one, it
returns the data back in about  ~600 ms.  I tried the same from all of the
3 nodes,no matter which node I ran it from, queries were answered in 600 ms
for consistency level of one.

But when I run the same query with consistency level as Quorum, it is
taking ~2.3 seconds.  It feels as if querying of the nodes are in sequence.


Is this normal?

--
Regards,
Baskar Duraikannu


Re: Quorum reads and response time

2013-07-10 Thread Baskar Duraikannu
Just adding few other details to my question.

- We are using RandomPartitioner
- 256 virtual nodes configured.


On Wed, Jul 10, 2013 at 12:54 PM, Baskar Duraikannu <
baskar.duraikannu...@gmail.com> wrote:

> I have a 3 node cluster with RF=3.  All nodes are running. I have a table
> with 39 rows and ~44,000 columns evenly spread across 39 rows.
>
> When I do range slice query on this table with consistency of one, it
> returns the data back in about  ~600 ms.  I tried the same from all of the
> 3 nodes,no matter which node I ran it from, queries were answered in 600 ms
> for consistency level of one.
>
> But when I run the same query with consistency level as Quorum, it is
> taking ~2.3 seconds.  It feels as if querying of the nodes are in sequence.
>
>
> Is this normal?
>
> --
> Regards,
> Baskar Duraikannu
>
>


Re: Node tokens / data move

2013-07-10 Thread Baskar Duraikannu
I copied the sstables and then ran a repair. It worked. Looks like export
and import may have been much faster given that we had very little data.

Thanks everyone.




On Tue, Jul 9, 2013 at 1:34 PM, sankalp kohli wrote:

> Hi Aaron,
>  Can he not specify all 256 tokens in the YAML of the new
> cluster and then copy sstables?
> I know it is a bit ugly but should work.
>
> Sankalp
>
>
> On Tue, Jul 9, 2013 at 3:19 AM, Baskar Duraikannu <
> baskar.duraikannu...@gmail.com> wrote:
>
>> Thanks Aaron
>>
>> On 7/9/13, aaron morton  wrote:
>> >> Can I just copy data files for the required keyspaces, create schema
>> >> manually and run repair?
>> > If you have something like RF 3 and 3 nodes then yes, you can copy the
>> data
>> > from one node in the source cluster to all nodes in the dest cluster
>> and use
>> > cleanup to remove the unneeded data. Because each node in the source
>> cluster
>> > has a full copy of the data.
>> >
>> > If that's not the case you cannot copy the data files, even if they
>> have the
>> > same number of nodes, because the nodes in the dest cluster will have
>> > different tokens. AFAIK you need to export the full data set from the
>> source
>> > DC and then import it into the dest system.
>> >
>> > The Bulk Load utility may be of help
>> > http://www.datastax.com/docs/1.2/references/bulkloader . You could
>> copy the
>> > SSTables from every node in the source system and bulk load them into
>> the
>> > dest system. That process will ensure rows are sent to nodes that are
>> > replicas.
>> >
>> > Cheers
>> >
>> > -
>> > Aaron Morton
>> > Freelance Cassandra Consultant
>> > New Zealand
>> >
>> > @aaronmorton
>> > http://www.thelastpickle.com
>> >
>> > On 9/07/2013, at 12:45 PM, Baskar Duraikannu
>> >  wrote:
>> >
>> >> We have two clusters used by two different groups with vnodes enabled.
>> Now
>> >> there is a need to move some of the keyspaces from cluster 1 to
>> cluster 2.
>> >>
>> >>
>> >> Can I just copy data files for the required keyspaces, create schema
>> >> manually and run repair?
>> >>
>> >> Anything else required?  Please help.
>> >> --
>> >> Thanks,
>> >> Baskar Duraikannu
>> >
>> >
>>
>
>


JMX Latency stats

2013-07-10 Thread Christopher Wirt
I was wondering if anyone knows the difference between the JMX latency stats
and could enlighten me.

 

We've been looking the column family specific stats and see really lovely <
3ms 99th percentile stats for all our families.

org.apache.cassandra.metrics:type=ColumnFamily,keyspace=mykeyspace,scope=myc
olumnfamily,name=ReadLatency

 

Now, when we look at the overall client request read latency stats we see a
far more inconsistent jagged 99th percentile flying between 5ms - 80ms 

org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency

 

 

Thanks

 

Chris

 

 

 

 



Re: manually removing sstable

2013-07-10 Thread Marcus Eriksson
yep that works, you need to remove all components of the sstable though,
not just -Data.db

and, in 2.0 there is this:
https://issues.apache.org/jira/browse/CASSANDRA-5228

/Marcus


On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg  wrote:

> Hi,
>
> I think I remember reading that if you have sstables that you know contain
> only data that whose ttl has expired, it's safe to remove them manually by
> stopping c*, removing the *-Data.db files and then starting up c* again. is
> this correct?
>
> we have a cluster where everything is written with a ttl, and sometimes c*
> needs to compact over a 100 gb of sstables where we know ever has expired,
> and we'd rather just manually get rid of those.
>
> T#
>


Re: JMX Latency stats

2013-07-10 Thread Nick Bailey
The column family specific numbers are reporting latencies local to the
node. So a write/read that has reached the correct replica and just needs
to hit memory/disk.

The non column family specific numbers are reporting latencies from the
coordinator. So the latency from the time the coordinator receives a
write/read request, contacts the right replica(s), receives an internal
response and responds to the client.


On Wed, Jul 10, 2013 at 12:27 PM, Christopher Wirt wrote:

> I was wondering if anyone knows the difference between the JMX latency
> stats and could enlighten me.
>
> ** **
>
> We’ve been looking the column family specific stats and see really lovely
> < 3ms 99th percentile stats for all our families.
>
>
> org.apache.cassandra.metrics:type=ColumnFamily,keyspace=mykeyspace,scope=mycolumnfamily,name=ReadLatency
> 
>
> ** **
>
> Now, when we look at the overall client request read latency stats we see
> a far more inconsistent jagged 99th percentile flying between 5ms – 80ms *
> ***
>
> org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency***
> *
>
> ** **
>
> ** **
>
> Thanks
>
> ** **
>
> Chris
>
> ** **
>
> ** **
>
> ** **
>
> ** **
>


Cassandra performance tuning...

2013-07-10 Thread Tony Anecito
Hi All,

I am trying to compare Cassandra to another relational database. I am getting 
around 2-3msec response time using Datastax driver, Java 1.7.0_05 64-bit jre 
and the other database is under 500 microseconds for the jdbc SQL 
preparedStatement execute.. One of the major differences is Cassandra uses text 
for the default primary key in the Column family and the SQL table I use int 
which is faster. Can the primary column family key data type be changed to a 
int? I also know Casandra uses varint for IntegerType and not sure that will be 
what I need but I will try it if I can change "key" column to that. If I try 
Int32Type for the primary key I suspect I will need to reload the data after 
that change.

I have looked at the default Java Options in the Cassandra bat file and they 
seem a good starting point but I am just starting to tune now that I can get 
Column Family caching to work.


Regards,
-Tony


Re: General doubts about bootstrap

2013-07-10 Thread Rodrigo Felix
Currently, I'm using cassandra 1.1.5, but I'm considering to update to
1.2.x in order to make use of vnodes.
Doubling the size is not possible to me because I want to measure the
response while adding (or removing) single nodes.
Thank you guys. It help me a lot to understand better how cassandra works.

Att.

*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Ceará
Project Manager
MBA, CSM, CSPO, SCJP


On Wed, Jul 10, 2013 at 11:11 AM, Eric Stevens  wrote:

> > => Adding a new node between other nodes would avoid running move, but
> the ring would be unbalanced, right? Would this imply in having a node
> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each,
> supposing 3 nodes) overloaded? I'm refering
> http://wiki.apache.org/cassandra/Operations#Load_balancing
>>
>>
>>>
>>> Yes, if you're using a single vnode per server, or are running an older
> version of Cassandra.  For lowest impact, doubling the size of your cluster
> is recommended so that you can avoid doing moves.  Or if you're on
> Cassandra 1.2+, you can use vnodes, and you should not typically need to
> rebalance after bringing a new server online.
>
>
> On Tue, Jul 9, 2013 at 9:31 PM, Rodrigo Felix <
> rodrigofelixdealme...@gmail.com> wrote:
>
>> Thank you very much for you response. Follows my comments about your
>> email.
>>
>> Att.
>>
>> *Rodrigo Felix de Almeida*
>> LSBD - Universidade Federal do Ceará
>> Project Manager
>> MBA, CSM, CSPO, SCJP
>>
>>
>> On Mon, Jul 8, 2013 at 6:05 PM, Robert Coli  wrote:
>>
>>> On Sat, Jul 6, 2013 at 1:50 PM, Rodrigo Felix <
>>> rodrigofelixdealme...@gmail.com> wrote:
>>>

- Is it normal to take about 9 minutes to add a new node? Follows
the log generated by a script to add a new node.

 Sure.  => OK
>>>

- Is there a way to reduce the time to start cassandra?

 Not usually. => OK
>>>

- Sometimes cleanup operation takes make minutes (about 10). Is
this normal since the amount of data is small (1.7gb at maximum / seed)?

 Compaction is throttled, and cleanup is a type of compaction. Bootstrap
>>> is also throttled via the streaming throttle. => OK
>>>

- Considering that I have two seeds in the beginning, their tokens
are 0 and 85070591730234615865843651857942052864. When I add a new 
 machine,
do I need to execute move and cleanup on both seeds? Nowadays, I'm 
 running
cleanup on seed 0, move + cleanup on the other seed and neither move nor
cleanup on the just added node. Is this OK?

 Only nodes which have "lost" ranges need to run cleanup. In general you
>>> should add new nodes "between" other nodes such that "move" is not required
>>> at all.
>>>
>>
>> => Adding a new node between other nodes would avoid running move, but
>> the ring would be unbalanced, right? Would this imply in having a node
>> (with bigger range, 1/2 of the range while other 2 nodes have 1/2 each,
>> supposing 3 nodes) overloaded? I'm refering
>> http://wiki.apache.org/cassandra/Operations#Load_balancing
>>
>>>
- What if I do not run cleanup in any existing node when adding or
removing a node? Is the data that was not "cleaned up" still available 
 if I
send a scan, for instance, and the scan range is still in the node but 
 it
wouldn't be there if I had run cleanup? Data would be gather from other
node, ie. the one that properly has the range specified in the scan 
 query?

 If data for range [x] is on node [a] but node [a] is no longer
>>> considered an endpoint for range [x], it will never receive a request to
>>> serve range [x]. => OK
>>>

- After decommissioning a node, is it advisable to run cleanup in
the remaining nodes? The consequences of not to run are the same of not 
 to
run when adding a node?

 Cleanup is only for the node which lost a range. In decommission case,
>>> no live nodes lost a range, only some nodes gained one. => OK
>>>
>>> =Rob
>>>
>>
>>
>


Re: how to determine RF on the fly ?

2013-07-10 Thread Robert Coli
On Wed, Jul 10, 2013 at 12:58 AM, Илья Шипицин  wrote:

> is there easy way to determine current RF, for instance, via mx4j ?
>

The methods which show keyspace or schema (from CLI or cqlsh) show the
replication factor, as the replication factor is a keyspace property.

I don't believe it's available via JMX, but there's no reason it couldn't
be...

=Rob


Re: Quorum reads and response time

2013-07-10 Thread sankalp kohli
The coordinator node has to merge the results from 2 nodes and the request
is done in parallel. I have seen lot of GC pressure with range queries
because of tombstones.
Can you see logs to see if there is lot of GC going on. Also try to have GC
log enabled.


On Wed, Jul 10, 2013 at 9:57 AM, Baskar Duraikannu <
baskar.duraikannu...@gmail.com> wrote:

> Just adding few other details to my question.
>
> - We are using RandomPartitioner
> - 256 virtual nodes configured.
>
>
> On Wed, Jul 10, 2013 at 12:54 PM, Baskar Duraikannu <
> baskar.duraikannu...@gmail.com> wrote:
>
>> I have a 3 node cluster with RF=3.  All nodes are running. I have a table
>> with 39 rows and ~44,000 columns evenly spread across 39 rows.
>>
>> When I do range slice query on this table with consistency of one, it
>> returns the data back in about  ~600 ms.  I tried the same from all of the
>> 3 nodes,no matter which node I ran it from, queries were answered in 600 ms
>> for consistency level of one.
>>
>> But when I run the same query with consistency level as Quorum, it is
>> taking ~2.3 seconds.  It feels as if querying of the nodes are in sequence.
>>
>>
>> Is this normal?
>>
>> --
>> Regards,
>> Baskar Duraikannu
>>
>>
>


node tool ring displays 33.33% owns on 3 node cluster with replication

2013-07-10 Thread Jason Tyler
Hello,

I recently upgraded cassandra from 1.1.9 to 1.2.6 on a three node cluster with 
{replication_factor : 3}.

When I run nodetool's ring, I see 'Owns' now reports 33.33%.  Previously it 
reported 100.00% on each node.  The following snapshots are from two different 
clusters, so please ignore the Load diffs. I did verify {replication_factor : 
3} on both clusters.


1.1.9-xobni1 'nodetool -h 127.0.0.1 -p 8080 ring':

Address DC  RackStatus State   Load
Effective-Ownership Token

   170141183460469231731687303715884105728
Xxx.xx.xx.00   16  96  Up Normal  225.03 GB   100.00%   
  56713727820156410577229101238628035242
Xxx.xx.xx.01   16  97  Up Normal  226.43 GB   100.00%   
  113427455640312821154458202477256070484
Xxx.xx.xx.02   16  97  Up Normal  231.76 GB   100.00%   
  170141183460469231731687303715884105728



1.2.6-xobni1 'nodetool -h 127.0.0.1 -p 8080 ring':

AddressRackStatus State   LoadOwns
Token
  
170141183460469231731687303715884105728
Xxx.xx.xx.00   97  Up Normal  453.94 GB   33.33%  
56713727820156410577229101238628035242
Xxx.xx.xx.01   97  Up Normal  565.87 GB   33.33%  
113427455640312821154458202477256070484
Xxx.xx.xx.02   96  Up Normal  523.53 GB   33.33%  
170141183460469231731687303715884105728



Is this simply a display issue, or have I lost replication?

Thanks for any info.


Cheers,

~Jason


Re: node tool ring displays 33.33% owns on 3 node cluster with replication

2013-07-10 Thread Robert Coli
On Wed, Jul 10, 2013 at 4:04 PM, Jason Tyler  wrote:

>  Is this simply a display issue, or have I lost replication?
>

Almost certainly just a display issue. Do "nodetool -h localhost
getendpoints   0", which will tell you the
endpoints for the non-transformed key "0." It should give you 3 endpoints.
You could also do this test with a known existing key and then go to those
nodes and verify that they have that data on disk via sstable2json.

(FWIW, it is an odd display issue/bug if it is one. Because it has reverted
to pre-1.1 behavior...)

=Rob


Re: Working with libcql

2013-07-10 Thread aaron morton
The highlighted line will read all the rows from the system table that lists 
the keyspaces in the cluster. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 9/07/2013, at 9:46 PM, Shubham Mittal  wrote:

> yeah I tried that and below is the output I get 
> 
> LOG: resolving remote host localhost:9160
> LOG: resolved remote host, attempting to connect
> LOG: connection successful to remote host
> LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, stream: 
> 0x00, opcode: 0x05, length: 0} OPTIONS
> LOG: wrote to socket 8 bytes
> LOG: error reading header End of file
> 
> and I checked all the keyspaces in my cluster, it changes nothing in the 
> cluster.
> 
> I couldn't understand the code much. What is this code supposed to do anyways?
> 
> 
> On Tue, Jul 9, 2013 at 4:20 AM, aaron morton  wrote:
> Did you see the demo app ? 
> Seems to have a few examples of reading data. 
> 
> https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 9/07/2013, at 1:14 AM, Shubham Mittal  wrote:
> 
>> Hi,
>> 
>> I found out that there exist a C++ client libcql for cassandra but its 
>> github repository just provides the example on how to connect to cassandra. 
>> Is there anyone who has written some code using libcql to read and write 
>> data to a cassandra DB, kindly share it.
>> 
>> Thanks
> 
> 



temporarily running a cassandra side by side in production

2013-07-10 Thread Hiller, Dean
We have a 12 node production cluster and a 4 node QA cluster.  We are starting 
to think we are going to try to run a side by side cassandra instance in 
production while we map/reduce from one cassandra into the new instance.  We 
are intending to do something like this

Modify all ports in cassandra.yaml and the jmx port in cassandra-env.sh, 7000, 
7001, 9160, 9042, and cassandra-env 7199.

Can I assume a cassandra instance will not only bind to the new ports when I 
change these values but will talk to the other cassandra nodes on those same 
ports as well such that this cassandra instance is completely independent of my 
other cassandra instance?

Are there other gotchas that I have to be aware of?

(we are refactoring our model into a new faster model that we tested in QA with 
live data as well as moving randompartitioner to murmur)

Thanks,
Dean




Re: Decommissioned nodes not leaving and Hinted Handoff flood

2013-07-10 Thread aaron morton
Thanks for sharing, here is some more information…

> 1 - At first, one of my node came down 5 min and when it came back it get 
> flooded by Hinted Handoff so hard that it could not handle the real time 
> queries properly. I haven't find a way to prioritize app queries rather than 
> Hinted Handoff.
You can disable hint delivery with nodetool pausehandoff or reduce the hint 
throughput 
https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L50
 
> 2 - Nodes keep hints for a node that has been removed.
The hints are stored with a TTL that is the gc_grace_seconds for the CF a the 
time the hint is written, so they will eventually be purged by compaction. 

You can also delete the hints using the Hinted Handoff bean 
https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/HintedHandOffManagerMBean.java#L30

> 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be 
> decommissioned, they stuck after streaming their data.
The hint KS is defined using the LocalStrategy and so it not replicated. They 
should not be involved in streaming. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/07/2013, at 12:47 AM, Alain RODRIGUEZ  wrote:

> Hi,
> 
> C*1.2.2.
> 
> I have removed 4 nodes with "nodetool decommission". 2 of them have left with 
> no issue, while the 2 others nodes remained "leaving" even after streaming 
> their data.
> 
> The only specific thing of these 2 nodes is that they had a lot of hints 
> pending. Hints from a node that couldn't come back and that I removed earlier 
> (because of the heavy load induced by Hinted Handoff while coming back, which 
> induced a lot of latencies in our app. This node didn't manage to come back 
> after 10 minutes, I removed it).
> 
> So there I faced 3 bugs (or problems) :
> 
> 1 - At first, one of my node came down 5 min and when it came back it get 
> flooded by Hinted Handoff so hard that it could not handle the real time 
> queries properly. I haven't find a way to prioritize app queries rather than 
> Hinted Handoff.
> 2 - Nodes keep hints for a node that has been removed.
> 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be 
> decommissioned, they stuck after streaming their data.
> 
> 
> As solutions for this 3 issues I did the following:
> 
> Solution to 1 - I removed this down node (nodetool removenode)
> Solution to 2 - Stop the node remove system hints
> Solution to 3 - Stop the node and removenode instead of decommission
> 
> Now I have no more issue, yet I felt I had to report this. Maybe my 
> experience can help users to get out of tricky situations and commiters to 
> detect some issues,  specially about hinted handoff.
> 
> Alain
> 
> 



Re: Trying to write when at cassandra capacity

2013-07-10 Thread aaron morton
> It hits an OOM.
To add a little more colour because I stepped through this with someone the 
other day.

When memtables are not removed from the memtable flush queue (because they have 
not been written) the queue will fill up. When this happens the flush process 
will block trying to fill the queue, and will hold the internal switch lock 
used to syncronise around the commit log. This will prevent write threads from 
progressing. All the while writes will continue to be delivered to the node and 
the Mutation thread pool queue will fill. 

All of this results in extreme memory pressure, the JVM will spend a lot of 
time running GC to try and free some space. While all the GC is going on 
chances are the other nodes will see the failing node ad flapping as it fails 
to keep up with gossip. None of this will work and eventually the JVM will 
raise an OOM error that is normally trapped and results in the node trying to 
shut down. During the shutdown process it will try to disable the rpc / native 
transports and gossip.

It's a simple thing to test and a useful example to walk through (by looking at 
the logs) with an Ops team if they are just starting out. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/07/2013, at 5:36 AM, Robert Coli  wrote:

> On Mon, Jul 8, 2013 at 5:58 PM, Faraaz Sareshwala  
> wrote:
> What does cassandra do when it is at its data capacity (disk drives and 
> memtable
> is full) and writes continue to pour in? My intuition says that cassandra 
> won't
> be able to handle the new writes (they will either get silently dropped or
> cassandra will hit an OOM -- does anyone know which one?). The sstables on 
> disk
> won't magically disappear so cassandra won't be able to service the write
> requests.
> 
> It hits an OOM.
> 
> =Rob
>  



Re: Logging Cassandra Reads/Writes

2013-07-10 Thread aaron morton
Some info on request tracing 
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

> 1) Is it possible to log which node provides the real data in a read
> operation? 
It's available at the DEBUG level of logging. You probably just want to enable 
it on the org.apache.cassandra.db.StorageProxy class, see 
log4j-server.properties for info

> 2) Also, is it possible to log the different delays involved in each
> operation-- for example, 0.1 seconds to get digests from all nodes, 1 second
> to transfer data, etc.? 
Not Applicable as you've seen, we request to all replicas at the same time. 
There is more logging that will show when the responses are processed, try 
turning DEBUG logging on for a small 3 node cluster and send one request. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/07/2013, at 8:58 AM, Mohit Anchlia  wrote:

> There is a new tracing feature in Cassandra 1.2 that might help you with this.
> 
> On Tue, Jul 9, 2013 at 1:31 PM, Blair Zajac  wrote:
> No idea on the logging, I'm pretty new to Cassandra.
> 
> Regards,
> Blair
> 
> On Jul 9, 2013, at 12:50 PM, hajjat  wrote:
> 
> > Blair, thanks for the clarification! My friend actually just told me the
> > same..
> >
> > Any idea on how to do logging??
> >
> > Thanks!
> >
> >
> >
> > --
> > View this message in context: 
> > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Logging-Cassandra-Reads-Writes-tp7588893p7588896.html
> > Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> > Nabble.com.
> >
> 



Re: Working with libcql

2013-07-10 Thread Shubham Mittal
So, if I want to create a keyspace, what do I need to change in that file?


On Thu, Jul 11, 2013 at 5:04 AM, aaron morton wrote:

> The highlighted line will read all the rows from the system table that
> lists the keyspaces in the cluster.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9/07/2013, at 9:46 PM, Shubham Mittal  wrote:
>
> yeah I tried that and below is the output I get
>
> LOG: resolving remote host localhost:9160
> LOG: resolved remote host, attempting to connect
> LOG: connection successful to remote host
> LOG: sending message: 0x0105 {version: 0x01, flags: 0x00,
> stream: 0x00, opcode: 0x05, length: 0} OPTIONS
> LOG: wrote to socket 8 bytes
> LOG: error reading header End of file
>
> and I checked all the keyspaces in my cluster, it changes nothing in the
> cluster.
>
> I couldn't understand the code much. What is this code supposed to do
> anyways?
>
>
> On Tue, Jul 9, 2013 at 4:20 AM, aaron morton wrote:
>
>> Did you see the demo app ?
>> Seems to have a few examples of reading data.
>>
>> https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85
>>
>> Cheers
>>
>>-
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 9/07/2013, at 1:14 AM, Shubham Mittal  wrote:
>>
>> Hi,
>>
>> I found out that there exist a C++ client libcql for cassandra but its
>> github repository just provides the example on how to connect to cassandra.
>> Is there anyone who has written some code using libcql to read and write
>> data to a cassandra DB, kindly share it.
>>
>> Thanks
>>
>>
>>
>
>


unsubscribe

2013-07-10 Thread crigano
user@cassandra.apache.org
 
 Subject:  unsubscribe  
 



Re: Working with libcql

2013-07-10 Thread Baskar Duraikannu
You can replace "USE" statement with create statement and then change 
use_callback with whatever you want to do next.  

--
Thanks,
Baskar Duraikannu

Shubham Mittal  wrote:

>So, if I want to create a keyspace, what do I need to change in that file?
>
>
>
>On Thu, Jul 11, 2013 at 5:04 AM, aaron morton  wrote:
>
>The highlighted line will read all the rows from the system table that lists 
>the keyspaces in the cluster. 
>
>
>Cheers
>
>
>-
>
>Aaron Morton
>
>Freelance Cassandra Consultant
>
>New Zealand
>
>
>@aaronmorton
>
>http://www.thelastpickle.com
>
>
>On 9/07/2013, at 9:46 PM, Shubham Mittal  wrote:
>
>
>yeah I tried that and below is the output I get 
>
>
>LOG: resolving remote host localhost:9160
>
>LOG: resolved remote host, attempting to connect
>
>LOG: connection successful to remote host
>
>LOG: sending message: 0x0105 {version: 0x01, flags: 0x00, stream: 
>0x00, opcode: 0x05, length: 0} OPTIONS
>
>LOG: wrote to socket 8 bytes
>
>LOG: error reading header End of file
>
>
>and I checked all the keyspaces in my cluster, it changes nothing in the 
>cluster.
>
>
>I couldn't understand the code much. What is this code supposed to do anyways?
>
>
>
>On Tue, Jul 9, 2013 at 4:20 AM, aaron morton  wrote:
>
>Did you see the demo app ? 
>
>Seems to have a few examples of reading data. 
>
>
>https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85
>
>
>Cheers
>
>
>-
>
>Aaron Morton
>
>Freelance Cassandra Consultant
>
>New Zealand
>
>
>@aaronmorton
>
>http://www.thelastpickle.com
>
>
>On 9/07/2013, at 1:14 AM, Shubham Mittal  wrote:
>
>
>Hi,
>
>
>I found out that there exist a C++ client libcql for cassandra but its github 
>repository just provides the example on how to connect to cassandra. Is there 
>anyone who has written some code using libcql to read and write data to a 
>cassandra DB, kindly share it.
>
>
>Thanks
>
>
>
>
>


Re: Working with libcql

2013-07-10 Thread Faraaz Sareshwala
On that note, is anyone using this library in production? Can anyone speak to
its stability and readiness for use? I only noticed it on the list of cassandra
clients a few days ago and haven't heard much talk about it elsewhere.

Faraaz

On Wed, Jul 10, 2013 at 05:55:55PM -0700, Baskar Duraikannu wrote:
> You can replace "USE" statement with create statement and then change
> use_callback with whatever you want to do next.
> 
> --
> Thanks,
> Baskar Duraikannu
> 
> Shubham Mittal  wrote:
> 
> So, if I want to create a keyspace, what do I need to change in that file?
> 
> 
> On Thu, Jul 11, 2013 at 5:04 AM, aaron morton  wrote:
> 
> The highlighted line will read all the rows from the system table that
> lists the keyspaces in the cluster. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9/07/2013, at 9:46 PM, Shubham Mittal  wrote:
> 
> 
> yeah I tried that and below is the output I get 
> 
> LOG: resolving remote host localhost:9160
> LOG: resolved remote host, attempting to connect
> LOG: connection successful to remote host
> LOG: sending message: 0x0105 {version: 0x01, flags: 0x00,
> stream: 0x00, opcode: 0x05, length: 0} OPTIONS
> LOG: wrote to socket 8 bytes
> LOG: error reading header End of file
> 
> and I checked all the keyspaces in my cluster, it changes nothing in
> the cluster.
> 
> I couldn't understand the code much. What is this code supposed to do
> anyways?
> 
> 
> On Tue, Jul 9, 2013 at 4:20 AM, aaron morton 
> wrote:
> 
> Did you see the demo app ? 
> Seems to have a few examples of reading data. 
> 
> https://github.com/mstump/libcql/blob/master/demo/main.cpp#L85
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9/07/2013, at 1:14 AM, Shubham Mittal 
> wrote:
> 
> 
> Hi,
> 
> I found out that there exist a C++ client libcql for cassandra
> but its github repository just provides the example on how to
> connect to cassandra. Is there anyone who has written some 
> code
> using libcql to read and write data to a cassandra DB, kindly
> share it.
> 
> Thanks
> 
> 
> 
> 
> 
> 
> 
> 


Re: manually removing sstable

2013-07-10 Thread Theo Hultberg
thanks a lot. I can confirm that it solved our problem too.

looks like the C* 2.0 feature is perfect for us.

T#


On Wed, Jul 10, 2013 at 7:28 PM, Marcus Eriksson  wrote:

> yep that works, you need to remove all components of the sstable though,
> not just -Data.db
>
> and, in 2.0 there is this:
> https://issues.apache.org/jira/browse/CASSANDRA-5228
>
> /Marcus
>
>
> On Wed, Jul 10, 2013 at 2:09 PM, Theo Hultberg  wrote:
>
>> Hi,
>>
>> I think I remember reading that if you have sstables that you know
>> contain only data that whose ttl has expired, it's safe to remove them
>> manually by stopping c*, removing the *-Data.db files and then starting up
>> c* again. is this correct?
>>
>> we have a cluster where everything is written with a ttl, and sometimes
>> c* needs to compact over a 100 gb of sstables where we know ever has
>> expired, and we'd rather just manually get rid of those.
>>
>> T#
>>
>
>


RE: unsubscribe

2013-07-10 Thread Romain HARDOUIN
http://wiki.apache.org/cassandra/FAQ#unsubscribe

 a écrit sur 11/07/2013 02:25:28 :

> De : 
> A : user@cassandra.apache.org, 
> Date : 11/07/2013 02:26
> Objet : unsubscribe
> 
> user@cassandra.apache.org  
> 
>  Subject:  unsubscribe 
>