BloomFilter

2013-02-02 Thread Kanwar Sangha
Hi - Couple of questions - 1) What is the ratio of the sstable file size to bloom filter size ? If i have a sstable of 1 GB, what is the approximate bloom filter size ? Assuming 0.000744 default val configured. 2) The bloom filters are stored in RAM but not in help from 1.2 onwards ? 3)

Index file

2013-02-02 Thread Kanwar Sangha
Hi - The index files created for the SSTables. Do they contain a sampling or the complete index ? Cassandra on startup loads these files based on the sampling rate in Cassandra.yaml ..right ?

DataModel Question

2013-02-05 Thread Kanwar Sangha
Hi - We are designing a Cassandra based storage for the following use cases- *Store SMS messages *Store MMS messages *Store Chat history What would be the ideal was to design the data model for this kind of application ? I am thinking on these lines .. Row-Key : Com

RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
a...@tok-media.com> Tel: +972 2 6409736 Mob: +972 54 8356490 Fax: +972 2 5612956 On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra mailto:mishra.v...@gmail.com>> wrote: Avoid super columns. If you need Sorted, wide rows then go for Composite columns. -Vivek On Wed, Feb 6, 2013 at 7:09

RE: DataModel Question

2013-02-06 Thread Kanwar Sangha
oper New Zealand @aaronmorton http://www.thelastpickle.com On 7/02/2013, at 1:47 AM, Kanwar Sangha mailto:kan...@mavenir.com>> wrote: 1) Version is 1.2 2) DynamicComposites : I read somewhere that they are not recommended ? 3) Good point. I need to think about that one. From: Tamar

Cassandra becnhmark

2013-02-11 Thread Kanwar Sangha
Hi - I am trying to do benchmark using the Cassandra-stress tool. They have given an example to insert data across 2 nodes - /tools/stress/bin/stress -d 192.168.1.101,192.168.1.102 -n 1000 But when I run this across my 2 node cluster, I see the same keys in both nodes. Replication is not en

Mutation dropped

2013-02-14 Thread Kanwar Sangha
Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. >From the wiki - For MUTATION messages this means that the mutation was not applied

RE: Mutation dropped

2013-02-14 Thread Kanwar Sangha
dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09

RE: Mutation dropped

2013-02-18 Thread Kanwar Sangha
nning in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha mailto:kan...@mavenir.com>> wrote: Hi - Is there a paramete

Cassandra backup

2013-02-18 Thread Kanwar Sangha
Hi - We have a req to store around 90 days of data per user. Last 7 days of data is going to be accessed frequently. Is there a way we can have the recent data (7 days) in SSD and the rest of the data in the HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster to serve t

RE: Cassandra backup

2013-02-18 Thread Kanwar Sangha
@cassandra.apache.org Subject: Re: Cassandra backup There is this: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement But you'll need to design your data model around the fact that this is only as granular as 1 column family Best, michael From: Kanwar S

SSTable Num

2013-02-20 Thread Kanwar Sangha
Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. There is no compaction job running in the background. Is there a limit on the size per sstable ? Or will the sstable compaction continue and eventually we will have 1 file ? Thanks, Kanwar

File Store

2013-02-20 Thread Kanwar Sangha
Hi - I am looking for some inputs on the file storage in Cassandra. Each file size can range from 200kb - 3MB. I don't see any limitation on the column size. But would it be a good idea to store these files as binary in the columns ? Thanks, Kanwar

Read IO

2013-02-20 Thread Kanwar Sangha
Hi - Can someone explain the worst case IOPS for a read ? No key cache, No row cache, sampling rate say 512. 1) Bloom filter will be checked to see existence of key (In RAM) 2) Index filer sample (IN RAM) will be checked to find approx. location in index file on disk 3) 1 IOPS

key cache size

2013-02-21 Thread Kanwar Sangha
Hi - What is the approximate overhead of the key cache ? Say each key is 50 bytes. What would be the overhead for this key in the key cache ? Thanks, Kanwar

RE: Read IO

2013-02-21 Thread Kanwar Sangha
Ok.. Cassandra default block size is 256k ? Now say my data in the column is 4 MB. And the disk is giving me 4k block size random reads @ 100 IOPS. I can read max 400k in one seek ? does that mean I would need multiple seeks to get the complete data ? -Original Message- From: sc...@sc

RE: SSTable Num

2013-02-21 Thread Kanwar Sangha
Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 21/02/2013, at 3:47 AM, Kanwar Sangha mailto:kan...@mavenir.com>> wrote: Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables. There is no compaction job running in the background. Is there a

RE: cassandra vs. mongodb quick question(good additional info)

2013-02-21 Thread Kanwar Sangha
“The limiting factors are the time it take to repair, the time it takes to replace a node, the memory considerations for 100's of millions of rows. If you the performance of those operations is acceptable to you, then go crazy” If I have a node which is attached to a RAID and the node crashes

Cassandra with SAN

2013-02-21 Thread Kanwar Sangha
Hi - Is it a good idea to use Cassandra with SAN ? Say a SAN which provides me 8 Petabytes of storage. Would I not be I/O bound irrespective of the no of Cassandra machines and scaling by adding machines won't help ? Thanks Kanwar

RE: Cassandra with SAN

2013-02-21 Thread Kanwar Sangha
need to have a large expensive SAN. Don't be tempted by the shiny expensive SAN. :) If money is no object instead throw SSD's in your nodes and run 10G between racks From: Kanwar Sangha mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassan

Read Perf

2013-02-25 Thread Kanwar Sangha
Hi - I am doing a performance run using modified YCSB client and was able to populate 8TB on a node and then ran some read workloads. I am seeing an average TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question - Will the read TPS degrade if the data size increases to sa

RE: Read Perf

2013-02-26 Thread Kanwar Sangha
on data size but not sure what that is. I know the column limit on a row is in the millions, somewhere lower than 10 million). Later, Dean From: Kanwar Sangha mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:use

RE: Read Perf

2013-02-26 Thread Kanwar Sangha
t the limit as I pretty sure it can't go above 10 million. (from previous posts on this list). Dean On 2/26/13 8:23 AM, "Kanwar Sangha" wrote: >Thanks. For our case, the no of rows will more or less be the same. The >only thing which changes is the columns and they keep getti

NetworkTopology

2013-02-28 Thread Kanwar Sangha
Hi - Quick question. When specifying the replication across 2 DCs, can we have 1 replication factor across 2 Data centres ? Does the below mean that there will be 2 copies of the data , 1 in DC1 and 1 in DC2 ? [default@unknown] CREATE KEYSPACE test WITH placement_strategy = 'NetworkTopolog

Storage question

2013-03-04 Thread Kanwar Sangha
Hi - Can someone suggest the optimal way to store files / images ? We are planning to use cassandra for meta-data for these files. HDFS is not good for small file size .. can we look at something else ? Thanks, Kanwar

RE: Storage question

2013-03-04 Thread Kanwar Sangha
n the cluster so you could check that out. Out of curiosity, why is HDFS not good for a small file size? For reading, it should be the bomb with RF=3 since you can read from multiple nodes and such. Writes might be a little slower but still shouldn't be too bad. Later, Dean From:

Replication Question

2013-03-04 Thread Kanwar Sangha
Hi - If I configure a RF across 2 Data centres as below and assuming 3 nodes per Data centre. DC1: 2, DC2:2 I do a write with consistency level - local_quorum which ensures that there is no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a read with CL = One. Will it return fail

RE: Replication Question

2013-03-04 Thread Kanwar Sangha
Reads also ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 04 March 2013 14:54 To: user@cassandra.apache.org Subject: Replication Question Hi - If I configure a RF across 2 Data centres as below and assuming 3 nodes per Data centre. DC1: 2, DC2:2 I do a write with consistency level

Hinted handoff

2013-03-06 Thread Kanwar Sangha
Hi - Is there a way to increase the hinted handoff throughput ? I am seeing around 8Mb/s (bits). Thanks, Kanwar

RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Got the param. thanks From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 06 March 2013 13:50 To: user@cassandra.apache.org Subject: Hinted handoff Hi - Is there a way to increase the hinted handoff throughput ? I am seeing around 8Mb/s (bits). Thanks, Kanwar

RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
After trying to bump up the "hinted_handoff_throttle_in_kb" to 1G/b per sec, It still does not go above 25Mb/s. Is there a limitation ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 06 March 2013 14:41 To: user@cassandra.apache.org Subject: RE: Hinted handoff Got the par

RE: Hinted handoff

2013-03-06 Thread Kanwar Sangha
Is this correct ? I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of 80 per disk. Data is ~9.5 TB So 4K * 80 * 9.5 = 3040 KB ~ 23.75 Mb/s. So basically I am limited at the disk rather than the n/w From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 06 March

RE: Hinted handoff

2013-03-07 Thread Kanwar Sangha
ron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 6/03/2013, at 1:22 PM, Kanwar Sangha mailto:kan...@mavenir.com>> wrote: Is this correct ? I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of 80 per disk. Data i

VNodes and nodetool repair

2013-03-07 Thread Kanwar Sangha
Hi Guys - I have a question on Vnodes and nodetool repair. If I have configured the nodes as vnodes, say for example 2 nodes with Rf=2. Questions - *There are some columns set with TTL as X. After X Cassandra will mark them as tombstones. Is there still a probability of running into

leveled compaction

2013-03-08 Thread Kanwar Sangha
Hi - Can someone explain the meaning for the levelled compaction in cfstats - SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0] SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0] SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0 Thanks, Kanwar

RE: leveled compaction

2013-03-08 Thread Kanwar Sangha
] So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth. '40/4' and '442/10' have numbers after slash, those are expected maximum number of SSTables in that level and only displayed when you have more than that threshold. On Friday, March 8, 2013 at 3:24 PM, Kanw

chunk lenght

2013-03-09 Thread Kanwar Sangha
Hi - Can someone help explain this parameter ? chunk_length_kb If we increase it from default 64k to 128k does it mean that the sstable will be compressed in blocks of 128k ? Does that mean if we are reading and writing data of 128k , it will give a better read/write performance ? Thanks, Kanw

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
Are your Keys spread across all SSTables ? That will cause every sstable read which will increase the I/O. What compaction are you using ? From: zod...@fifth-aeon.net [mailto:zod...@fifth-aeon.net] On Behalf Of Jon Scarborough Sent: 21 March 2013 23:00 To: user@cassandra.apache.org Subject: Hig

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
o not an extreme write load on 6 nodes as well though these posts causes read to check authorization and such of our system. Dean From: Kanwar Sangha mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.

RE: High disk I/O during reads

2013-03-22 Thread Kanwar Sangha
e never ran a major compaction but after we switched to LCS, we went from 300G to some 120G or something like that which was nice. We only have 300 data point posts / second so not an extreme write load on 6 nodes as well though these posts causes read to check authorization and such of

cfhistograms

2013-03-25 Thread Kanwar Sangha
Can someone explain how to read the cfhistograms o/p ? [root@db4 ~]# nodetool cfhistograms usertable data usertable/data histograms Offset SSTables Write Latency Read Latency Row Size Column Count 12857444 4051 0

Hinted Handoff

2013-03-25 Thread Kanwar Sangha
Hi - Quick question. Do hints contain the actual data or the data is read from the SStables and then sent to the other node when it comes up ? Thanks, Kanwar

Timeseries data

2013-03-27 Thread Kanwar Sangha
Hi - I have a query on Read with Cassandra. We are planning to have dynamic column family and each column would be on based a timeseries. Inserting data - key => ‘xxx′, {column_name => TimeUUID(now), :column_value => ‘value’ }, {column_name => TimeUUID(now), :column_value => ‘value’ },..

Client lib

2013-04-18 Thread Kanwar Sangha
Hi - We are planning to develop a custom client using the Thrift API for Cassandra. Are these available from the JMX ? - Can cassandra provide info abt node status? - DC Failover detection (data center down, vs some nodes are down) - How to get load info from each node? Thanks, Kanwar

RE: How to make compaction run faster?

2013-04-18 Thread Kanwar Sangha
Use the community edition and try it out. Compaction has nothing to do with the CPU. It's all on raw disk speed. What kind of disks do you have ? 7.2k, 10k, 15k RPM ? Are your keys unique or you are doing updates ? if unique writes, I would not worry about compaction too much and let it run fas

index filter

2013-04-19 Thread Kanwar Sangha
Guys - Quick question. The index filter file created for a sstable contains all keys/index offset for a sstable ? I know that when we bring up the node, it reads a sample of the keys from this file. So this file contains all keys and a sample is read on startup ? Thanks, Kanwar

Re: index filter

2013-04-19 Thread Kanwar Sangha
Let me rephrase. I am talking about the index file on disk created per sstable. Does that contain all key indexes? Sent from Samsung mobile Robert Coli wrote: On Fri, Apr 19, 2013 at 10:38 AM, Kanwar Sangha wrote: > Guys – Quick question. The index filter file created for a sstable conta

Networking

2013-04-24 Thread Kanwar Sangha
Hi - Is there a way we can separate the replication n/w and the interconnect n/w between the Cassandra nodes ? or does all data go over the same n/w interface ? What about a geo-link ? Can that be separated out ? Thanks, Kanwar

RE: Networking

2013-04-24 Thread Kanwar Sangha
: 192.168.1.1 Or perhaps this machine has a second NIC with ip 10.140.179.1 and so you split the traffic for the intra-cluster network traffic from the thrift traffic for better performance: rpc_address: 10.140.179.1 From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 24 April 2013 10:11 To: user

RE: Networking

2013-04-24 Thread Kanwar Sangha
I mean across 2 Data centres. -Original Message- From: Robert Coli [mailto:rc...@eventbrite.com] Sent: 24 April 2013 14:56 To: user@cassandra.apache.org Subject: Re: Networking On Wed, Apr 24, 2013 at 8:11 AM, Kanwar Sangha wrote: > What about a geo-link ? Can that be separated

local_quorum

2013-05-03 Thread Kanwar Sangha
Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL for reads. Say there is a RF factor = 2. (so 2 copies each in DC). If both nodes which own the data in DC1 are down and I do a read with CL as "local_quorum" , will I get an error back to the application ? or will Ca

RE: local_quorum

2013-05-05 Thread Kanwar Sangha
Anyone ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 03 May 2013 08:59 To: user@cassandra.apache.org Subject: local_quorum Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL for reads. Say there is a RF factor = 2. (so 2 copies each in DC). If both nodes

HintedHandoff

2013-05-07 Thread Kanwar Sangha
Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes across 2 DCs) Now we do a write with CL = ONE and Hinted Handoff enabled. *If node 'X ' in DC1 which is a 'replica' node is down and a write co

backup strategy

2013-05-07 Thread Kanwar Sangha
Hi - If we have a RF=2 in a 4 node cluster, how do we ensure that the backup taken is only for 1 copy of the data ? in other words, is it possible for us to take back-up only from 2 nodes and not all 4 and still have at least 1 copy of the data ? Thanks, Kanwar

Replica info

2013-05-08 Thread Kanwar Sangha
Is there a way in Cassandra that we can know which node has the replica for the data ? if we have 4 nodes and RF = 2, is there a way we can find which 2 nodes have the same data ? Thanks, Kanwar

RE: HintedHandoff

2013-05-08 Thread Kanwar Sangha
Is this correct guys ? From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 07 May 2013 14:07 To: user@cassandra.apache.org Subject: HintedHandoff Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes

RE: Replica info

2013-05-09 Thread Kanwar Sangha
info http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints This tells you where a key lives. (you need to hex encode the key) On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: nodetool describering {keyspace} From: Kanwar Sangha mail

slice query

2013-05-30 Thread Kanwar Sangha
Hi - We gave a dynamic CF which has a key and multiple columns which get added dynamically. For example - Key_1 , Column1, Column2, Column3,... Key_2 , Column1, Column2, Column3,. Now I want to get all columns after Column3...how do we query that ? The ColumnSliceIterator in hector al

block size

2013-06-20 Thread Kanwar Sangha
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?

RE: block size

2013-06-20 Thread Kanwar Sangha
Subject: Re: block size Have you seen this? http://www.datastax.com/dev/blog/cassandra-file-system-design Regards, Shahab On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha mailto:kan...@mavenir.com>> wrote: Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?

RE: is there a key to sstable index file?

2013-07-17 Thread Kanwar Sangha
Yes..Multiple SSTables can have same key and only after compaction the keys are merged reflect the latest value.. From: S Ahmed [mailto:sahmed1...@gmail.com] Sent: 17 July 2013 15:54 To: cassandra-u...@incubator.apache.org Subject: is there a key to sstable index file? Since SSTables are mutable

MailBox Impl

2013-07-18 Thread Kanwar Sangha
Hi - We are planning on using Cassandra for an IMAP based implementation. There are some questions that we are stuck with - 1) Each user will have a pre-defined mailbox size (say 10 MB). We need to maintain a field to check if the mail-box size exceeds the predefined size. Will using the

CPU Bound Writes

2013-07-19 Thread Kanwar Sangha
"Insert-heavy workloads will actually be CPU-bound in Cassandra before being memory-bound" Can someone explain why the internals of why writes are CPU bound ?

RE: maximum storage per node

2013-07-25 Thread Kanwar Sangha
Issues with large data nodes would be - * Nodetool repair will be impossible to run * Your read i/o will suffer since you will almost always go to disk (each read will take 3 IOPS worst case) * Boot-straping the node in case of failures will take days/weeks From: Pru

Cassandra Counter Family

2013-08-01 Thread Kanwar Sangha
Hi - We are struggling to understand how the counter family maintains consistency in Cassandra. Say Counter1 value is "1" and it is read by 2 clients at the same time who want to update the value. After both write, it will become "3" ?

RE: Cassandra HANGS after some writes

2013-08-13 Thread Kanwar Sangha
Cassandra on windows ? Please install Linux ! From: Romain HARDOUIN [mailto:romain.hardo...@urssaf.fr] Sent: 13 August 2013 10:17 To: user@cassandra.apache.org Subject: Re: Cassandra HANGS after some writes Naresh, My two cents is that you should run Cassandra on a Linux VM. Issues are more eas

Secondary Index Question

2013-08-20 Thread Kanwar Sangha
Hi - I was reading some blogs on implementation of secondary indexes in Cassandra and they say that "the read requests are sent sequentially to all the nodes" ? So if I have a query to fetch ALL records with the secondary index filter, will the co-ordinator node send the requests to nodes one b

RE: Secondary Index Question

2013-08-21 Thread Kanwar Sangha
r own) Later, Dean From: Kanwar Sangha mailto:kan...@mavenir.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Tuesday, August 20, 2013 6:57 PM To: "user@cassandra.apache.org<mailto

nodetool tpstats

2013-09-18 Thread Kanwar Sangha
Hi - During a write heavy load, the tpstats show the following - Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 0 MUTATION 65570 _TRACE 0 REQUEST_RESPONSE

RE: [Cassandra] Initial Setup - VMs for Research

2013-09-25 Thread Kanwar Sangha
What help are u looking for ? http://www.datastax.com/docs/datastax_enterprise3.1/install/install_deb_pkg -Original Message- From: shath...@e-z.net [mailto:shath...@e-z.net] Sent: 25 September 2013 15:27 To: user@cassandra.apache.org Subject: [Cassandra] Initial Setup - VMs for Research