Re: Cassandra: Key-value or Column?

2017-03-28 Thread Andrey Ilinykh
yes, cassandra is a key-value. you can think about it as a wide row storage ( row key, column key) -> value On Tue, Mar 28, 2017 at 10:19 AM, Les Hartzman wrote: > I was doing some research on different NoSQL DBs and found this article at > Datastax, https://academy.datastax.com/planet-cassandra

Re: Splitting Cassandra Cluster between AWS availability zones

2017-03-07 Thread Andrey Ilinykh
I'd recommend three availability zones. In this case if you loose one AZ you still have a quorum (assuming replication factor of 3) Andrey On Tue, Mar 7, 2017 at 9:05 AM, Ney, Richard wrote: > We’ve collapsed our 2 DC – 3 node Cassandra clusters into a single 6 node > Cassandra cluster split be

Re: Left Cassandra mailing list

2016-08-01 Thread Andrey Ilinykh
To remove your address from the list, send a message to: On Mon, Aug 1, 2016 at 11:29 PM, Mohammad Kermani <98kerm...@gmail.com> wrote: > How can I leave Cassandra mailing list? > > I get some emails every day and currently I do not have time for it > >

Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh
est timestamp) wins and will be returned to the client." > > What do you think? > > " > > On Fri, Sep 4, 2015 at 6:41 PM, Andrey Ilinykh wrote: > >> Coordinator doesn't generate timestamp, it is generated by client. >> >> On Fri, Sep 4, 2015

Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh
p 4, 2015 at 6:29 PM, Andrey Ilinykh wrote: > >> Your application. >> >> On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi < >> ibrahimsaba...@gmail.com> wrote: >> >>> Dear folks, >>> >>> When we hear about the notion of Last-Wr

Re: who does generate timestamp during the write?

2015-09-04 Thread Andrey Ilinykh
Your application. On Fri, Sep 4, 2015 at 10:26 AM, ibrahim El-sanosi wrote: > Dear folks, > > When we hear about the notion of Last-Write-Wins in Cassandra according to > timestamp, *who does generate this timestamp during the write, > coordinator or each individual replica in which the write is

Re: Question about incremental backup

2014-08-23 Thread Andrey Ilinykh
keep in mind backing up SSTables is not enough. To have truly incremental backup you have to store commit logs also. Thank you, Andrey On Sat, Aug 23, 2014 at 11:30 AM, Robert Coli wrote: > On Sat, Aug 23, 2014 at 8:06 AM, Jens Rantil wrote: > >> I am setting backup and restoration tooling

Re: Nodetool Repair questions

2014-08-12 Thread Andrey Ilinykh
On Tue, Aug 12, 2014 at 4:46 PM, Viswanathan Ramachandran < vish.ramachand...@gmail.com> wrote: > Andrey, QUORUM consistency and no deletes makes perfect sense. > I believe we could modify that to EACH_QUORUM or QUORUM consistency and no > deletes - isnt that right? > yes.

Re: Nodetool Repair questions

2014-08-12 Thread Andrey Ilinykh
1. You don't have to repair if you use QUORUM consistency and you don't delete data. 2.Performance depends on size of data each node has. It's very difficult to predict. It may take days. Thank you, Andrey On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran < vish.ramachand...@gmail.com>

Re: too many open files

2014-08-08 Thread Andrey Ilinykh
You may have this problem if your client doesn't reuse the connection but opens new every type. So, run netstat and check the number of established connections. This number should not be big. Thank you, Andrey On Fri, Aug 8, 2014 at 12:35 PM, Marcelo Elias Del Valle < marc...@s1mbi0se.com.br>

Re: EC2 cassandra cluster node address problem

2014-06-25 Thread Andrey Ilinykh
ne way is to translate the ips myself. > > > On Tue, Jun 24, 2014 at 10:40 PM, Andrey Ilinykh > wrote: > >> you can set rpc_address to 0.0.0.0, then it will listen on all >> interfaces. Also you have to modify security group settings to allow >> incoming connection

Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Andrey Ilinykh
at 10:01 PM, Huiliang Zhang wrote: > Thanks. Is there a way to configure Cassandra to use elastic ip instead of > private ip? > > > On Tue, Jun 24, 2014 at 9:29 PM, Andrey Ilinykh > wrote: > >> Cassandra knows nothing about elastic ip. You have to use ssh tunnel o

Re: EC2 cassandra cluster node address problem

2014-06-24 Thread Andrey Ilinykh
Cassandra knows nothing about elastic ip. You have to use ssh tunnel or run your client on ec2 instance. Thank you, Andrey On Tue, Jun 24, 2014 at 8:55 PM, Huiliang Zhang wrote: > Hi, > > I am using Cassandra on EC2 instances. My cassandra always returns private > ips of the instances to the

Re: Cass 1.2.11: Replacing a node procedure

2014-02-13 Thread Andrey Ilinykh
decommission http://www.datastax.com/docs/1.1/cluster_management#replacing-a-dead-node On Thu, Feb 13, 2014 at 2:28 PM, Oleg Dulin wrote: > Here is what I am thinking. > > 1) Add the new node with token-1 of the old one and let it bootstrap. > 2) Once it bootstrapped, remove the old node from t

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Andrey Ilinykh
t; money? The amount of data transferred from the AWS server to the client >> should be same no matter where the client is connected? >> >> >> >> On Wed, Feb 12, 2014 at 10:33 AM, Andrey Ilinykh wrote: >> >>> yes, sure. Taking data from the same

Re: in AWS is it worth trying to talk to a server in the same zone as your client?

2014-02-12 Thread Andrey Ilinykh
yes, sure. Taking data from the same zone will reduce latency and save you some money. On Wed, Feb 12, 2014 at 10:13 AM, Brian Tarbox wrote: > We're running a C* cluster with 6 servers spread across the four us-east1 > zones. > > We also spread our clients (hundreds of them) across the four zone

Re: Clarification on how multi-DC replication works

2014-02-11 Thread Andrey Ilinykh
On Tue, Feb 11, 2014 at 10:14 AM, Mullen, Robert wrote: > Thanks for the feedback. > > The picture shows a sample request, which is why the coordinator points to > two specific nodes. What I was trying to convey that the coordinator node > would ensure that 2 of the 3 nodes were written to before

Re: Clarification on how multi-DC replication works

2014-02-11 Thread Andrey Ilinykh
1. reply part is missing. 2. It is confusing a little bit. I would not use term "synchronous". Everything is asynchronous here. Coordinator writes data to all local nodes and waits for response from ANY two of them (in case of quorum). In your picture it looks like the coordinator first makes deci

Re: Adding datacenter for move to vnodes

2014-02-06 Thread Andrey Ilinykh
My understanding is you can't mix vnodes and regular nodes in the same DC. Is it correct? On Thu, Feb 6, 2014 at 2:16 PM, Vasileios Vlachos < vasileiosvlac...@gmail.com> wrote: > Hello, > > My question is why would you need another DC to migrate to Vnodes? How > about decommissioning each node i

Re: Question 1: JMX binding, Question 2: Logging

2014-02-04 Thread Andrey Ilinykh
JMX stuff is in /conf/cassandra-env.sh On Tue, Feb 4, 2014 at 2:25 PM, Kyle Crumpton (kcrumpto) wrote: > Hi all, > > I'm fairly new to Cassandra. I'm deploying it to a PaaS. One thing this > entails is that it must be able to have more than one instance on a single > node. I'm running into th

Re: no more zookeeper?

2014-01-28 Thread Andrey Ilinykh
Why would cassandra use zookeeper? On Tue, Jan 28, 2014 at 7:18 AM, S Ahmed wrote: > Does C* no long use zookeeper? > > I don't see a reference to it in the > https://github.com/apache/cassandra/blob/trunk/build.xml > > If not, what replaced it? >

Re: token for agent

2014-01-22 Thread Andrey Ilinykh
No. There is no any special in value 0. On Wed, Jan 22, 2014 at 1:30 PM, Daniel Curry wrote: > I was wondering how important to have a cluster that has a node with a > token that begin with a zero for a three node cluster? > > > > 3 NODES > --- >0 <-- No

Re: Cassandra ring not behaving like a ring

2014-01-15 Thread Andrey Ilinykh
what is the RF? What does nodetool ring show? On Wed, Jan 15, 2014 at 1:03 PM, Narendra Sharma wrote: > Sorry for the odd subject but something is wrong with our cassandra ring. > We have a 9 node ring as below. > > N1 - UP/NORMAL > N2 - UP/NORMAL > N3 - UP/NORMAL > N4 - UP/NORMAL > N5 - UP/NORM

Re: Read/Write consistency issue

2014-01-10 Thread Andrey Ilinykh
For single thread, consistency ALL it should work. I believe you do something different. What are these three numbers exactly? old=60616 val =19 new =60635 On Fri, Jan 10, 2014 at 1:50 PM, Manoj Khangaonkar wrote: > Hi > > Using Cassandra 2.0.0. > 3 node cluster > Replication 2. > Using consiste

Re: Restore with archive commitlog

2013-12-13 Thread Andrey Ilinykh
As someone told you this feature was added by Netflix to work with Priam (cassandra management tool). Priam itself uses it for several months only, so I doubt if anybody uses this feature in production. Any way, you can ping guys working on Priam. This is your best bet. https://github.com/Netflix/P

Re: efficient way to store 8-bit or 16-bit value?

2013-12-11 Thread Andrey Ilinykh
Column metadata is about 20 bytes. So, there is no big difference if you save 1 or 4 bytes. Thank you, Andrey On Wed, Dec 11, 2013 at 2:42 PM, onlinespending wrote: > What do people recommend I do to store a small binary value in a column? > I’d rather not simply use a 32-bit int for a single

Re: vnodes on aws

2013-12-05 Thread Andrey Ilinykh
Andrey > On Dec 5, 2013 2:32 PM, "Andrey Ilinykh" wrote: > >> Hello everybody! >> We run cassandra 1.1 on ec2 instances. We use three availability zones, >> the replication factor is 3 also. NetworkTopologyStrategy guarantees each >> row is replicated i

vnodes on aws

2013-12-05 Thread Andrey Ilinykh
Hello everybody! We run cassandra 1.1 on ec2 instances. We use three availability zones, the replication factor is 3 also. NetworkTopologyStrategy guarantees each row is replicated in all availability zones. So, if we lost one zone quorum operations still work. We think about to upgrade to 1.2. Vir

Re: Minimum row size / minimum data point size

2013-10-03 Thread Andrey Ilinykh
It may help. https://docs.google.com/spreadsheet/ccc?key=0Atatq_AL3AJwdElwYVhTRk9KZF9WVmtDTDVhY0xPSmc#gid=0 On Thu, Oct 3, 2013 at 1:31 PM, Robert Važan wrote: > I need to store one trillion data points. The data is highly compressible > down to 1 byte per data point using simple custom compres

Re: Why Solandra stores Solr data in Cassandra ? Isn't solr complete solution ?

2013-09-30 Thread Andrey Ilinykh
> Also, be aware that while Cassandra has knobs to allow you to get > consistent read results (CL=QUORUM), DSE Search does not. If a node drops > messages for whatever reason, outtage, mutation, etc. its solr indexes will > be inconsistent with other nodes in its replication group. > > Will repair

Re: Commit log and data separation on SSD

2013-09-23 Thread Andrey Ilinykh
Actually, many SSD drives show much better performance for sequential write then random writes, so you may benefit from a separate drive for commit logs. On Mon, Sep 23, 2013 at 11:21 AM, Robert Coli wrote: > On Sun, Sep 22, 2013 at 4:02 PM, Shahryar Sedghi wrote: > >> This my first SSD experi

Re: Recomended storage choice for Cassandra on Amazon m1.xlarge instance

2013-09-03 Thread Andrey Ilinykh
You benefit from putting commit log on separate drive only if this drive is an isolated spinning device. EC2 ephemeral is a virtual device, so I don't think it makes sense to put commit log on a separated drive. I would build raid0 from 4 drives and put everything their. But it would be interesting

Re: Truncate question

2013-08-29 Thread Andrey Ilinykh
No. Andrey On Thu, Aug 29, 2013 at 3:48 PM, S C wrote: > Do we have to run "nodetool repair" or "nodetool cleanup" after Truncating > a Column Family? > > Thanks, > SC >

Re: Setting up a multi-node cluster

2013-08-28 Thread Andrey Ilinykh
To be sure ports are open try to connect from one node to another: telnet 7000 try all ports. Andrey On Wed, Aug 28, 2013 at 10:41 PM, Dinesh wrote: > Hi John, > > I had my firewall disabled in both the nodes > To make sure. I checked it > # rcSuSEfirewall2 status > Checking the status of

Re: configuring read write quorums in cassandra

2013-08-28 Thread Andrey Ilinykh
> > Thanks > Muntasir. > > On Wed, Aug 28, 2013 at 3:39 PM, Andrey Ilinykh > wrote: > > What do you mean "to change the cassandra read and write quorums"? > Quorum is > > quorum, 2 of 3 for example. What do you want to change? > > > &g

Re: configuring read write quorums in cassandra

2013-08-28 Thread Andrey Ilinykh
What do you mean "to change the cassandra read and write quorums"? Quorum is quorum, 2 of 3 for example. What do you want to change? Andrey On Wed, Aug 28, 2013 at 1:33 PM, Muntasir Raihan Rahman < muntasir.rai...@gmail.com> wrote: > Hello, > > Are there any tools (e.g like nodetool) that could

Re: Stable Priam version with Cassandra 1.2.5

2013-08-20 Thread Andrey Ilinykh
latest versions of Priam use default properties defined in this file https://github.com/Netflix/Priam/blob/master/priam/src/main/resources/Priam.properties you can override all of them in SDB. I had the problem with priam.cass.startscript which points to /mnt/cassandra. Also check tomcat process p

Re: Cassandra nodetool repair question

2013-08-08 Thread Andrey Ilinykh
nodetool repair just triggers repair procedure. You can kill nodetool after start, it doesn't change anything. To stop repair you have to use nodetool stop VALIDATION|COMPACTION Thank you, Andrey On Thu, Aug 8, 2013 at 1:00 PM, Andy Losey wrote: > Afternoon, > > We are noticing nodetool

Re: lots of small nodes vs fewer big nodes

2013-08-07 Thread Andrey Ilinykh
You still have the same amount of RAM, so you cache the same amount of data. I don't think you gain much here. On the other side, maintenance procedures (compaction, repair) may hit your 2CPU box. I wouldn't do it. Thank you, Andrey On Wed, Aug 7, 2013 at 10:24 AM, Paul Ingalls wrote: > Quic

Re: How often to run `nodetool repair`

2013-08-01 Thread Andrey Ilinykh
On Thu, Aug 1, 2013 at 12:26 PM, Robert Coli wrote: > On Thu, Aug 1, 2013 at 9:35 AM, Carl Lerche wrote: > >> I read in the docs that `nodetool repair` should be regularly run unless >> no delete is ever performed. In my app, I never delete, but I heavily use >> the ttl feature. Should repair st

Re: Recommended data size for Reads/Writes in Cassandra

2013-07-18 Thread Andrey Ilinykh
there is a limit of thrift message ( thrift_max_message_length_in_mb), by default it is 64m if I'm not mistaken. This is your limit. On Thu, Jul 18, 2013 at 2:03 PM, hajjat wrote: > Hi, > > Is there a recommended data size for Reads/Writes in Cassandra? I tried > inserting 10 MB objects and the

Re: what happen if coordinator node fails during write

2013-06-25 Thread Andrey Ilinykh
It depends on cassandra version. As far as I know in 1.2 coordinator logs request before it updates replicas. If it fails it will replay log on startup. In 1.1 you may have inconsistant state, because only part of your request is propagated to replicas. Thank you, Andrey On Tue, Jun 25, 2013 a

Re: Gossiper in Cassandra using unicast/broadcast/multicast ?

2013-06-20 Thread Andrey Ilinykh
Cassandra works very well in EC2 environment. EC2 doesn't support broadcast/multicast. So, you should be fine. Thank you, Andrey On Thu, Jun 20, 2013 at 7:22 PM, Jason Tang wrote: > Hi > >We are considering using Cassandra in virtualization environment. I > wonder is Cassandra using unic

Re: Cleanup understastanding

2013-05-28 Thread Andrey Ilinykh
cleanup removes data which doesn't belong to the current node. You have to run it only if you move (or add new) nodes. In your case there is no any reason to do it. On Tue, May 28, 2013 at 7:39 AM, Víctor Hugo Oliveira Molinar < vhmoli...@gmail.com> wrote: > Hello everyone. > I have a daily main

Re: Usage of getKeyRange method

2013-05-24 Thread Andrey Ilinykh
you can specify startKey/endKey only if you use ByteOrederedPartitioner. In this case startToken/endToken are null. I guess (but not sure) with RandomPartitioner you have to specify startToken/endToken, keys are null then. Thank you, Andrey On Fri, May 24, 2013 at 6:53 AM, Renato Marroquín Mog

Re: cfhistograms

2013-03-25 Thread Andrey Ilinykh
What I don't understand hete is "Row Size" column. Why is it always 0? Thank you, Andrey On Mon, Mar 25, 2013 at 9:36 AM, Brian Tarbox wrote: > I think we all go through this learning curve. Here is the answer I gave > last time this question was asked: > > The output of this command seems

Re: Can't replace dead node

2013-03-15 Thread Andrey Ilinykh
, Andrey On Fri, Mar 15, 2013 at 11:39 AM, Andrey Ilinykh wrote: > I removed Priam and get the same picture. > > > What I do is- I added to cassandra-env.sh two lines and start cassandra. > > JVM_OPTS="$JVM_OPTS > -Dcassandra.initial_token=a

Re: Can't replace dead node

2013-03-15 Thread Andrey Ilinykh
lix/Priam > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 7/03/2013, at 11:11 AM, Andrey Ilinykh wrote: > > Hello everybody! > > I used to run cassandra

Can't replace dead node

2013-03-07 Thread Andrey Ilinykh
Hello everybody! I used to run cassandra 1.1.5 with Priam. To replace dead node priam launches cassandra with cassandra.replace_token property. It works smoothly with 1.1.5. Couple days ago I moved to 1.1.10 and have a problem now. New cassandra successfully starts, joins the ring but it doesn't s

Re: what addresses to use in EC2 cluster (whenever an instance restarts it gets a new private ip)?

2013-02-11 Thread Andrey Ilinykh
You have to use private IPs, but if an instance dies you have to bootstrap it with replace token flag. If you use EC2 I'd recommend Netflix's Priam tool. It manages all that stuff, plus you have S3 backup. Andrey On Mon, Feb 11, 2013 at 11:35 AM, Brian Tarbox wrote: > How do I configure my clu

Re: Clarification on num_tokens setting

2013-02-05 Thread Andrey Ilinykh
On Tue, Feb 5, 2013 at 12:42 PM, aaron morton wrote: > With N nodes, the ring is divided into N*num_tokens. Correct? > > There is always num_tokens tokens in the ring. > Each node has (num_tokens / N) * RF ranges on it. > > That means every node should have the same num_token parameter? In other

Re: astyanax connection ring describe discovery

2013-01-25 Thread Andrey Ilinykh
I use astyanax 1.56.18 with cassandra 1.1.5. Everything works as supposed to. What does ThreadPoolMonitor report? Andrey On Fri, Jan 25, 2013 at 10:26 AM, Hiller, Dean wrote: > IS anyone using astyanax with their cassandra along with TOKEN AWARE as in > here (cassandra version 1.1.4) > > (see

Re: Cassandra at Amazon AWS

2013-01-17 Thread Andrey Ilinykh
I'd recommend Priam. http://techblog.netflix.com/2012/02/announcing-priam.html Andrey On Thu, Jan 17, 2013 at 5:44 AM, Adam Venturella wrote: > Jared, how do you guys handle data backups for your ephemeral based > cluster? > > I'm trying to move to ephemeral drives myself, and that was my last

Re: LCS not removing rows with all TTL expired columns

2013-01-16 Thread Andrey Ilinykh
To get column removed you have to meet two requirements 1. column should be expired 2. after that CF gets compacted I guess your expired columns are propagated to high tier CF, which gets compacted rarely. So, you have to wait when high tier CF gets compacted. Andrey On Wed, Jan 16, 2013 at 11

Re: Cassandra at Amazon AWS

2013-01-16 Thread Andrey Ilinykh
Storage size is not a problem, you always can add more nodes. Anyway, it is not recommended to have nodes with more then 500G (compaction, repair take forever). EC2 m1.large has 800G of ephemeral storage, EC2 m1.xlarge 1.6T. I'd recommend xlarge, it has 4 CPUs, so maintenance procedures don't affec

Re: Last Modified Time Series in cassandra

2012-12-21 Thread Andrey Ilinykh
You can select a column slice (specify time range wich for sure has last data), but ask cassandra to return only one column. It is latest one. To have the best performance use reversed sorting order. Andrey On Fri, Dec 21, 2012 at 6:40 AM, Ravikumar Govindarajan < ravikumar.govindara...@gmail.co

Re: Too Many Open files error

2012-12-20 Thread Andrey Ilinykh
1.4 to 1.1.5 is straightforward- install 1.1.5, stop 1.1.4 (nodetool drain), start 1.1.5 http://www.datastax.com/docs/1.0/install/upgrading#completing-upgrade Andrey > > Thanks > Santi > > > > > On Thu, Dec 20, 2012 at 1:44 PM, Andrey Ilinykh wrote: > >> This bug is

Re: Too Many Open files error

2012-12-20 Thread Andrey Ilinykh
This bug is fixed in 1.1.5 Andrey On Thu, Dec 20, 2012 at 12:01 AM, santi kumar wrote: > While running the nodetool repair , we are running into > FileNotFoundException with too many open files error. We increased the > ulimit value to 32768, and still we have seen this issue. > > THe number o

Re: Partition maintenance

2012-12-18 Thread Andrey Ilinykh
Just make month time stamp a part of row key. Then once a month select old data, move it and delete. Andrey On Tue, Dec 18, 2012 at 8:08 AM, wrote: > Hi folks. Still working through the details of building out a Cassandra > solution and I have an interesting requirement that I’m not sure how

Re: Cassandra on EC2 - describe_ring() is giving private IPs

2012-12-12 Thread Andrey Ilinykh
It makes sense. rpc_address is interface to listen. Try to set up public IP to broadcast_address. Andrey On Wed, Dec 12, 2012 at 9:33 AM, santi kumar wrote: > When I configured rpc_address with public IP, cassandra is not starting > up. It's trowing 'unable to create thrift socket on . When I

Re: Selecting rows efficiently from a Cassandra CF containing time series data

2012-12-11 Thread Andrey Ilinykh
I would consider to use wide rows. If you add timestamp to your column name you have naturally sorted data. You can easily select any time range without any indexes. Thank you, Andrey On Tue, Dec 11, 2012 at 6:23 AM, Chin Ko wrote: > I would like to get some opinions on how to select an incr

Re: how to take consistant snapshot?

2012-12-07 Thread Andrey Ilinykh
Agreed. On Fri, Dec 7, 2012 at 12:38 PM, Tyler Hobbs wrote: > Right. I don't personally think incremental backup is useful beyond > restoring individual nodes unless none of your data happens to reference > any other rows. > > > On Fri, Dec 7, 2012 at 11:37 A

Re: how to take consistant snapshot?

2012-12-07 Thread Andrey Ilinykh
On Fri, Dec 7, 2012 at 9:28 AM, Tyler Hobbs wrote: > Snapshots trigger a flush first, so data that's currently in the commit > log will be covered by the snapshot. > > > On Thu, Dec 6, 2012 at 11:52 PM, Andrey Ilinykh wrote: > >> >> >> >> On Thu, Dec

Re: Batch mutation streaming

2012-12-07 Thread Andrey Ilinykh
Cassandra uses thrift messages to pass data to and from server. A batch is just a convenient way to create such message. Nothing happens until you send this message. Probably, this is what you call "close the batch". Thank you, Andrey On Fri, Dec 7, 2012 at 5:34 AM, Ben Hood <0x6e6...@gmail.co

Re: how to take consistant snapshot?

2012-12-06 Thread Andrey Ilinykh
On Thu, Dec 6, 2012 at 7:34 PM, aaron morton wrote: > For background > > > http://wiki.apache.org/cassandra/Operations?highlight=%28snapshot%29#Consistent_backups > > If you it for a single node then yes there is

how to take consistant snapshot?

2012-12-05 Thread Andrey Ilinykh
Hello, everybody! I have production cluster with incremental backup on and I want to clone it (create test one). I don't understand one thing- each column family gets flushed (and copied to backup storage) independently. Which means the total snapshot is inconsistent. If I restore from such snapsho

Re: [BETA RELEASE] Apache Cassandra 1.2.0-beta3 released

2012-12-05 Thread Andrey Ilinykh
Hello, everybody! I have read blog about atomic batches in 1.2 http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2 It mentioned that atomic batches are by default starting 1.2. Also it said CQL allows to switch it off. How can I manipulate this setting using thrift API? Thank you, A

Re: splitting large sstables

2012-12-03 Thread Andrey Ilinykh
Could you provide more details how to use it? Let's say I already have a huge sstable. What am i supposed to do to split it? Thank you, Andrey On Sat, Dec 1, 2012 at 11:29 AM, Radim Kolar wrote: > from time to time people ask here for splitting large sstables, here is > patch doing that > >

Re: Java high-level client

2012-11-28 Thread Andrey Ilinykh
hat is the major advantage of astyanax over > Hector? > > Thanks. > -Wei > > ------ > *From:* Andrey Ilinykh > *To:* user@cassandra.apache.org > *Sent:* Wednesday, November 28, 2012 9:37 AM > > *Subject:* Re: Java high-level client > >

Re: Java high-level client

2012-11-28 Thread Andrey Ilinykh
+1 On Tue, Nov 27, 2012 at 10:10 AM, Michael Kjellman wrote: > Netflix has a great client > > https://github.com/Netflix/astyanax > >

Re: Strange delay in query

2012-11-08 Thread Andrey Ilinykh
What is the size of columns? Probably those two are huge. On Thu, Nov 8, 2012 at 4:01 AM, André Cruz wrote: > On Nov 7, 2012, at 12:15 PM, André Cruz wrote: > > > This error also happens on my application that uses pycassa, so I don't > think this is the same bug. > > I have narrowed it down t

Re: problem encrypting keys and data

2012-11-07 Thread Andrey Ilinykh
Honestly, I don't understand what encoding you are talking about. Just write/read data as a byte array. You will read back exactly you write. Thank you, Andrey On Wed, Nov 7, 2012 at 1:43 PM, Brian Tarbox wrote: > We have a requirement to store our data encrypted. > Our encryption system turn

Re: Replication factor and performance questions

2012-11-05 Thread Andrey Ilinykh
You will have one extra hop. Not big deal, actually. And many client libraries (astyanax for example) are token aware, so they are smart enough to call the right node. On Mon, Nov 5, 2012 at 9:12 AM, Oleg Dulin wrote: > Should be all under 400Gig on each. > > My question is -- is there additional

Re: Benifits by adding nodes to the cluster

2012-10-29 Thread Andrey Ilinykh
This is how cassandra scales. More nodes means better performance. thank you, Andrey On Mon, Oct 29, 2012 at 2:57 PM, Roshan wrote: > Hi All > > This may be a silly question, but what kind of benefits we can get by adding > new nodes to the cluster? > > Some may be high availability. Any other

Re: why does my "Effective-Ownership" and "Load" from ring give such different answers?

2012-10-19 Thread Andrey Ilinykh
Did you run cleanup? Andrey On Fri, Oct 19, 2012 at 10:23 AM, Brian Tarbox wrote: > I had a two node cluster that I expanded to four nodes. I ran the token > generation script and moved all the nodes so that when I run "nodetool ring" > each node reports 25% Effective-Ownership. > > However, my

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh
On Thu, Oct 18, 2012 at 2:31 PM, Jeremy Hanna wrote: > > On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh wrote: > >> On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman >> wrote: >>> Not sure I understand your question (if there is one..) >>> >>> You

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh
On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman wrote: > Not sure I understand your question (if there is one..) > > You are more than welcome to do CL ONE and assuming you have hadoop nodes > in the right places on your ring things could work out very nicely. If you > need to guarantee that you

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh
On Thu, Oct 18, 2012 at 1:24 PM, Michael Kjellman wrote: > Well there is *some* data locality, it's just not guaranteed. My > understanding (and someone correct me if I'm wrong) is that > ColumnFamilyInputFormat implements InputSplit and the getLocations() > method. > > http://hadoop.apache.org/do

Re: hadoop consistency level

2012-10-18 Thread Andrey Ilinykh
On Thu, Oct 18, 2012 at 12:00 PM, Michael Kjellman wrote: > Unless you have Brisk (however as far as I know there was one fork that got > it working on 1.0 but nothing for 1.1 and is not being actively maintained > by Datastax) or go with CFS (which comes with DSE) you are not guaranteed > all dat

hadoop consistency level

2012-10-18 Thread Andrey Ilinykh
Hello, everybody! I'm thinking about running hadoop jobs on the top of the cassandra cluster. My understanding is - hadoop jobs read data from local nodes only. Does it mean the consistency level is always ONE? Thank you, Andrey

Re: Cassandra nodes loaded unequally

2012-10-17 Thread Andrey Ilinykh
65843651857942052864 > > I have no idea how to fix it. > > Alain > > 2012/10/17 Ben Kaehne >> >> Nothing unusual. >> >> All servers are exactly the same. Nothing unusual in the log files. Is >> there any level of logging that I should be tu

Re: run repair on each node or every R nodes?

2012-10-17 Thread Andrey Ilinykh
> > In my mind it does make sense, and what you're saying is correct. But I read > that it was better to run repair in each node with a "-pr" option. > > Alain > Yes, it's correct. Running repair -pr on each node you repair whole cluster without job duplication. Andrey

Re: Cassandra nodes loaded unequally

2012-10-16 Thread Andrey Ilinykh
With your environment (3 nodes, RF=3) it is very difficult to get uneven load. Each node receives the same number of read/write requests. Probably something is wrong on low level, OS or VM. Do you see anything unusual in log files? Andrey On Tue, Oct 16, 2012 at 3:40 PM, Ben Kaehne wrote: > Not

Re: Is Anti Entropy repair idempotent with respect to transferred data?

2012-10-16 Thread Andrey Ilinykh
> In my experience running repair on some counter data, the size of > streamed data is much bigger than the cluster could possibly have lost > messages or would be due to snapshotting at different times. > > I know the data will eventually be in sync on every repair, but I'm > more interested in wh

Re: what happens while node is bootstrapping?

2012-10-16 Thread Andrey Ilinykh
> > > No. The bootstrapping node will writes for its new range while > bootstrapping as consistency optimization (more or less), but does not > contribute to the replication factor or consistency level; all of the > original replicas for that range still receive writes, serve reads, and are > the

Re: run repair on each node or every R nodes?

2012-10-15 Thread Andrey Ilinykh
Only one region (node-00 is responsible for) will get repaired on all three nodes. Andrey On Mon, Oct 15, 2012 at 11:56 AM, Alexis Midon wrote: > > Hi all, > > I have a 9-node cluster with a replication factor R=3. When I run repair -pr > on node-00, I see the exact same load and activity on node-

Re: what happens while node is bootstrapping?

2012-10-15 Thread Andrey Ilinykh
mplete. > > JLewis > > On Oct 13, 2012, at 11:19 PM, Andrey Ilinykh wrote: > >> Hello, everybody! >> I'd like to clarify a bootstrapping process. As far as I understand, >> bootstrapping node starts to accept writes immediately. What about >> reads? >&g

what happens while node is bootstrapping?

2012-10-13 Thread Andrey Ilinykh
Hello, everybody! I'd like to clarify a bootstrapping process. As far as I understand, bootstrapping node starts to accept writes immediately. What about reads? Bootstrapping node doesn't have all information, only replica nodes have. Does it mean read operations with CL ALL may fail during bootst

Re: Why data is not even distributed.

2012-10-08 Thread Andrey Ilinykh
The problem was - I calculated 3 tokens for random partitioner but used them with BOP, so nodes were not supposed to be loaded evenly. That's ok, I got it. But what I don't understand, why nodetool ring shows equal ownership. This is an example: I created small cluster with BOP and three tokens 00

Re: what's the most 1.1 stable version?

2012-10-05 Thread Andrey Ilinykh
In 1.1.5 file descriptor leak was fixed. In my case it was critical. Nodes went down every several days. But not everyone had this problem. Thank you, Andrey On Fri, Oct 5, 2012 at 7:42 AM, Alexandru Sicoe wrote: > Hello, > We are planning to upgrade from version 1.0.7 to the 1.1 branch. Whic

Re: Why data is not even distributed.

2012-10-04 Thread Andrey Ilinykh
ple, if you were using the UUID class in Java, these would be > composed from several components (related to dimensions such as time and > version), so you can not expect a random distribution over the whole space. > > > Cheers > Tom > > > > > On Wed, Oct 3, 2012 at 5

Why data is not even distributed.

2012-10-03 Thread Andrey Ilinykh
Hello, everybody! I'm observing very strange behavior. I have 3 node cluster with ByteOrderPartitioner. (I run 1.1.5) I created a key space with replication factor of 1. Then I created one column family and populated it with random data. I use UUID as a row key, and Integer as a column name. Row k

Re: Why data tripled in size after repair?

2012-10-02 Thread Andrey Ilinykh
On Tue, Oct 2, 2012 at 12:05 AM, Sylvain Lebresne wrote: >> It's in the 1.1 branch; I don't remember if it went into a release >> yet. If not, it'll be in the next 1.1.x release. > > As the ticket says, this is in since 1.1.1. I don't pretend this is > well documented, but it's in. > Nope. It is i

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 12:36 PM, Peter Schuller wrote: >> What is strange every time I run repair data takes almost 3 times more >> - 270G, then I run compaction and get 100G back. > > https://issues.apache.org/jira/browse/CASSANDRA-2699 outlines the > maion issues with repair. In short - in your

Re: Why data tripled in size after repair?

2012-09-27 Thread Andrey Ilinykh
On Thu, Sep 27, 2012 at 9:52 AM, Sylvain Lebresne wrote: >> I don't understand why it copied data twice. In worst case scenario it >> should copy everything (~90G) > > Sadly no, repair is currently peer-to-peer based (there is a ticket to > fix it: https://issues.apache.org/jira/browse/CASSANDRA-3

Re: Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
On Wed, Sep 26, 2012 at 11:07 AM, Rob Coli wrote: > On Wed, Sep 26, 2012 at 9:30 AM, Andrey Ilinykh wrote: >> [ repair ballooned my data size ] >> 1. Why repair almost triples data size? > > You didn't mention what version of cassandra you're running. In some >

Why data tripled in size after repair?

2012-09-26 Thread Andrey Ilinykh
Hello everybody! I have 3 node cluster with replication factor of 3. each node has 800G disk and it used to have 100G of data. What is strange every time I run repair data takes almost 3 times more - 270G, then I run compaction and get 100G back. Unfortunately, yesterday I forget to compact and run