Hi - Couple of questions -
1) What is the ratio of the sstable file size to bloom filter size ? If i have
a sstable of 1 GB, what is the approximate bloom filter size ? Assuming
0.000744 default val configured.
2) The bloom filters are stored in RAM but not in help from 1.2 onwards ?
3)
Hi - The index files created for the SSTables. Do they contain a sampling or
the complete index ? Cassandra on startup loads these files based on the
sampling rate in Cassandra.yaml ..right ?
Hi - We are designing a Cassandra based storage for the following use cases-
*Store SMS messages
*Store MMS messages
*Store Chat history
What would be the ideal was to design the data model for this kind of
application ? I am thinking on these lines ..
Row-Key : Com
a...@tok-media.com>
Tel: +972 2 6409736
Mob: +972 54 8356490
Fax: +972 2 5612956
On Wed, Feb 6, 2013 at 8:23 AM, Vivek Mishra
mailto:mishra.v...@gmail.com>> wrote:
Avoid super columns. If you need Sorted, wide rows then go for Composite
columns.
-Vivek
On Wed, Feb 6, 2013 at 7:09
oper
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 7/02/2013, at 1:47 AM, Kanwar Sangha
mailto:kan...@mavenir.com>> wrote:
1) Version is 1.2
2) DynamicComposites : I read somewhere that they are not recommended ?
3) Good point. I need to think about that one.
From: Tamar
Hi - I am trying to do benchmark using the Cassandra-stress tool. They have
given an example to insert data across 2 nodes -
/tools/stress/bin/stress -d 192.168.1.101,192.168.1.102 -n 1000
But when I run this across my 2 node cluster, I see the same keys in both
nodes. Replication is not en
Hi - I am doing a load test using YCSB across 2 nodes in a cluster and seeing a
lot of mutation dropped messages. I understand that this is due to the replica
not being written to the
other node ? RF = 2, CL =1.
>From the wiki -
For MUTATION messages this means that the mutation was not applied
dropped messages. But there are no
failures on the client. Does that mean other node is not able to persist the
replicated data ? Is there some timeout associated with replicated data
persistence ?
Thanks,
Kanwar
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 14 February 2013 09
nning in prod) RF3 and CL QUROUM is a more real world
scenario.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 15/02/2013, at 9:42 AM, Kanwar Sangha
mailto:kan...@mavenir.com>> wrote:
Hi - Is there a paramete
Hi - We have a req to store around 90 days of data per user. Last 7 days of
data is going to be accessed frequently. Is there a way we can have the recent
data (7 days) in SSD and the rest of the data in the
HDD ? Do we take a snapshot every 7 days and use a separate 'archive' cluster
to serve t
@cassandra.apache.org
Subject: Re: Cassandra backup
There is this:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-flexible-data-file-placement
But you'll need to design your data model around the fact that this is only as
granular as 1 column family
Best,
michael
From: Kanwar S
Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables.
There is no compaction job running in the background. Is there a limit on the
size per sstable ? Or will the sstable compaction continue and eventually we
will have 1 file ?
Thanks,
Kanwar
Hi - I am looking for some inputs on the file storage in Cassandra. Each file
size can range from 200kb - 3MB. I don't see any limitation on the column
size. But would it be a good idea to store these files as binary in the columns
?
Thanks,
Kanwar
Hi - Can someone explain the worst case IOPS for a read ? No key cache, No row
cache, sampling rate say 512.
1) Bloom filter will be checked to see existence of key (In RAM)
2) Index filer sample (IN RAM) will be checked to find approx. location in
index file on disk
3) 1 IOPS
Hi - What is the approximate overhead of the key cache ? Say each key is 50
bytes. What would be the overhead for this key in the key cache ?
Thanks,
Kanwar
Ok.. Cassandra default block size is 256k ? Now say my data in the column is 4
MB. And the disk is giving me 4k block size random reads @ 100 IOPS. I can
read max 400k in one seek ? does that mean I would need multiple seeks to get
the complete data ?
-Original Message-
From: sc...@sc
Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 21/02/2013, at 3:47 AM, Kanwar Sangha
mailto:kan...@mavenir.com>> wrote:
Hi - I have around 6TB of data on 1 node and the cfstats show 32 sstables.
There is no compaction job running in the background. Is there a
“The limiting factors are the time it take to repair, the time it takes to
replace a node, the memory considerations for 100's of millions of rows. If you
the performance of those operations is acceptable to you, then go crazy”
If I have a node which is attached to a RAID and the node crashes
Hi - Is it a good idea to use Cassandra with SAN ? Say a SAN which provides me
8 Petabytes of storage. Would I not be I/O bound irrespective of the no of
Cassandra machines and scaling by adding
machines won't help ?
Thanks
Kanwar
need to have a large expensive SAN.
Don't be tempted by the shiny expensive SAN. :)
If money is no object instead throw SSD's in your nodes and run 10G between
racks
From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassan
Hi - I am doing a performance run using modified YCSB client and was able to
populate 8TB on a node and then ran some read workloads. I am seeing an average
TPS of 930 ops/sec for random reads. There is no key cache/row cache. Question -
Will the read TPS degrade if the data size increases to sa
on data size
but not sure what that is. I know the column limit on a row is in the
millions, somewhere lower than 10 million).
Later,
Dean
From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:use
t the
limit as I pretty sure it can't go above 10 million. (from previous posts on
this list).
Dean
On 2/26/13 8:23 AM, "Kanwar Sangha" wrote:
>Thanks. For our case, the no of rows will more or less be the same. The
>only thing which changes is the columns and they keep getti
Hi - Quick question. When specifying the replication across 2 DCs, can we have
1 replication factor across 2 Data centres ? Does the below mean that there
will be 2 copies of the data , 1 in DC1 and 1 in DC2 ?
[default@unknown] CREATE KEYSPACE test
WITH placement_strategy = 'NetworkTopolog
Hi - Can someone suggest the optimal way to store files / images ? We are
planning to use cassandra for meta-data for these files. HDFS is not good for
small file size .. can we
look at something else ?
Thanks,
Kanwar
n the cluster so you could check that out.
Out of curiosity, why is HDFS not good for a small file size? For reading, it
should be the bomb with RF=3 since you can read from multiple nodes and such.
Writes might be a little slower but still shouldn't be too bad.
Later,
Dean
From:
Hi - If I configure a RF across 2 Data centres as below and assuming 3 nodes
per Data centre.
DC1: 2, DC2:2
I do a write with consistency level - local_quorum which ensures that there is
no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a read with CL
= One. Will it return fail
Reads also ?
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 04 March 2013 14:54
To: user@cassandra.apache.org
Subject: Replication Question
Hi - If I configure a RF across 2 Data centres as below and assuming 3 nodes
per Data centre.
DC1: 2, DC2:2
I do a write with consistency level
Hi - Is there a way to increase the hinted handoff throughput ? I am seeing
around 8Mb/s (bits).
Thanks,
Kanwar
Got the param. thanks
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 13:50
To: user@cassandra.apache.org
Subject: Hinted handoff
Hi - Is there a way to increase the hinted handoff throughput ? I am seeing
around 8Mb/s (bits).
Thanks,
Kanwar
After trying to bump up the "hinted_handoff_throttle_in_kb" to 1G/b per sec, It
still does not go above 25Mb/s. Is there a limitation ?
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March 2013 14:41
To: user@cassandra.apache.org
Subject: RE: Hinted handoff
Got the par
Is this correct ?
I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of
80 per disk. Data is ~9.5 TB
So 4K * 80 * 9.5 = 3040 KB ~ 23.75 Mb/s.
So basically I am limited at the disk rather than the n/w
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 06 March
ron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 6/03/2013, at 1:22 PM, Kanwar Sangha
mailto:kan...@mavenir.com>> wrote:
Is this correct ?
I have Raid 0 setup for 16 TB across 8 disks. Each disk is 7.2kRPM with IOPS of
80 per disk. Data i
Hi Guys - I have a question on Vnodes and nodetool repair. If I have
configured the nodes as vnodes, say for example 2 nodes with Rf=2.
Questions -
*There are some columns set with TTL as X. After X Cassandra will mark
them as tombstones. Is there still a probability of running into
Hi -
Can someone explain the meaning for the levelled compaction in cfstats -
SSTables in each level: [40/4, 442/10, 97, 967, 7691, 0, 0, 0]
SSTables in each level: [61/4, 9, 92, 945, 8146, 0, 0, 0]
SSTables in each level: [34/4, 1000/10, 100, 953, 8184, 0, 0, 0
Thanks,
Kanwar
]
So you have 40 SSTables in L0, 442 in L1, 97 in L2 and so forth.
'40/4' and '442/10' have numbers after slash, those are expected maximum number
of
SSTables in that level and only displayed when you have more than that
threshold.
On Friday, March 8, 2013 at 3:24 PM, Kanw
Hi - Can someone help explain this parameter ?
chunk_length_kb
If we increase it from default 64k to 128k does it mean that the sstable will
be compressed in blocks of 128k ? Does that mean if we are reading and writing
data of 128k , it will give a better read/write
performance ?
Thanks,
Kanw
Are your Keys spread across all SSTables ? That will cause every sstable read
which will increase the I/O.
What compaction are you using ?
From: zod...@fifth-aeon.net [mailto:zod...@fifth-aeon.net] On Behalf Of Jon
Scarborough
Sent: 21 March 2013 23:00
To: user@cassandra.apache.org
Subject: Hig
o not an extreme write load on 6
nodes as well though these posts causes read to check authorization and such of
our system.
Dean
From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassandra.
e never ran a major compaction but after we switched
to LCS, we went from 300G to some 120G or something like that which was nice.
We only have 300 data point posts / second so not an extreme write load on 6
nodes as well though these posts causes read to check authorization and such of
Can someone explain how to read the cfhistograms o/p ?
[root@db4 ~]# nodetool cfhistograms usertable data
usertable/data histograms
Offset SSTables Write Latency Read Latency Row Size
Column Count
12857444 4051 0
Hi - Quick question. Do hints contain the actual data or the data is read from
the SStables and then sent to the other node when it comes up ?
Thanks,
Kanwar
Hi - I have a query on Read with Cassandra. We are planning to have dynamic
column family and each column would be on based a timeseries.
Inserting data - key => ‘xxx′, {column_name => TimeUUID(now), :column_value
=> ‘value’ }, {column_name => TimeUUID(now), :column_value => ‘value’ },..
Hi - We are planning to develop a custom client using the Thrift API for
Cassandra. Are these available from the JMX ?
- Can cassandra provide info abt node status?
- DC Failover detection (data center down, vs some nodes are down)
- How to get load info from each node?
Thanks,
Kanwar
Use the community edition and try it out. Compaction has nothing to do with the
CPU. It's all on raw disk speed. What kind of disks do you have ? 7.2k, 10k,
15k RPM ?
Are your keys unique or you are doing updates ? if unique writes, I would not
worry about compaction too much and let it run fas
Guys - Quick question. The index filter file created for a sstable contains all
keys/index offset for a sstable ? I know that when we bring up the node, it
reads a sample of the keys from this file. So this file contains all keys and a
sample is read on startup ?
Thanks,
Kanwar
Let me rephrase. I am talking about the index file on disk created per sstable.
Does that contain all key indexes?
Sent from Samsung mobile
Robert Coli wrote:
On Fri, Apr 19, 2013 at 10:38 AM, Kanwar Sangha wrote:
> Guys – Quick question. The index filter file created for a sstable conta
Hi - Is there a way we can separate the replication n/w and the interconnect
n/w between the Cassandra nodes ? or does all data go over the same n/w
interface ?
What about a geo-link ? Can that be separated out ?
Thanks,
Kanwar
: 192.168.1.1
Or perhaps this machine has a second NIC with ip 10.140.179.1 and so you split
the traffic for the intra-cluster network traffic from the thrift traffic for
better performance:
rpc_address: 10.140.179.1
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 24 April 2013 10:11
To: user
I mean across 2 Data centres.
-Original Message-
From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: 24 April 2013 14:56
To: user@cassandra.apache.org
Subject: Re: Networking
On Wed, Apr 24, 2013 at 8:11 AM, Kanwar Sangha wrote:
> What about a geo-link ? Can that be separated
Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL
for reads. Say there is a RF factor = 2. (so 2 copies each in DC).
If both nodes which own the data in DC1 are down and I do a read with CL as
"local_quorum" , will I get an error back to the application ? or will
Ca
Anyone ?
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 03 May 2013 08:59
To: user@cassandra.apache.org
Subject: local_quorum
Hi - I have 2 data centres (DC1 and DC2) and I have local_quorum set as the CL
for reads. Say there is a RF factor = 2. (so 2 copies each in DC).
If both nodes
Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall
RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes across 2 DCs)
Now we do a write with CL = ONE and Hinted Handoff enabled.
*If node 'X ' in DC1 which is a 'replica' node is down and a write
co
Hi - If we have a RF=2 in a 4 node cluster, how do we ensure that the backup
taken is only for 1 copy of the data ? in other words, is it possible for us to
take back-up only from 2 nodes and not all 4 and still have at least 1 copy of
the data ?
Thanks,
Kanwar
Is there a way in Cassandra that we can know which node has the replica for the
data ? if we have 4 nodes and RF = 2, is there a way we can find which 2 nodes
have the same data ?
Thanks,
Kanwar
Is this correct guys ?
From: Kanwar Sangha [mailto:kan...@mavenir.com]
Sent: 07 May 2013 14:07
To: user@cassandra.apache.org
Subject: HintedHandoff
Hi -I had a question on hinted-handoff. We have 2 DCs configured with overall
RF = 2 (DC1:1, DC2:1) and 4 nodes in each DC (total - 8 nodes
info
http://www.datastax.com/docs/1.1/references/nodetool#nodetool-getendpoints
This tells you where a key lives. (you need to hex encode the key)
On Wed, May 8, 2013 at 5:14 PM, Hiller, Dean
mailto:dean.hil...@nrel.gov>> wrote:
nodetool describering {keyspace}
From: Kanwar Sangha
mail
Hi - We gave a dynamic CF which has a key and multiple columns which get added
dynamically. For example -
Key_1 , Column1, Column2, Column3,...
Key_2 , Column1, Column2, Column3,.
Now I want to get all columns after Column3...how do we query that ? The
ColumnSliceIterator in hector al
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?
Subject: Re: block size
Have you seen this?
http://www.datastax.com/dev/blog/cassandra-file-system-design
Regards,
Shahab
On Thu, Jun 20, 2013 at 3:17 PM, Kanwar Sangha
mailto:kan...@mavenir.com>> wrote:
Hi - What is the block size for Cassandra ? is it taken from the OS defaults ?
Yes..Multiple SSTables can have same key and only after compaction the keys are
merged reflect the latest value..
From: S Ahmed [mailto:sahmed1...@gmail.com]
Sent: 17 July 2013 15:54
To: cassandra-u...@incubator.apache.org
Subject: is there a key to sstable index file?
Since SSTables are mutable
Hi - We are planning on using Cassandra for an IMAP based implementation.
There are some questions that we are stuck with -
1) Each user will have a pre-defined mailbox size (say 10 MB). We need to
maintain a field to check if the mail-box size exceeds the predefined size.
Will using the
"Insert-heavy workloads will actually be CPU-bound in Cassandra before being
memory-bound"
Can someone explain why the internals of why writes are CPU bound ?
Issues with large data nodes would be -
* Nodetool repair will be impossible to run
* Your read i/o will suffer since you will almost always go to disk
(each read will take 3 IOPS worst case)
* Boot-straping the node in case of failures will take days/weeks
From: Pru
Hi - We are struggling to understand how the counter family maintains
consistency in Cassandra.
Say Counter1 value is "1" and it is read by 2 clients at the same time who want
to update the value. After both write, it will become "3" ?
Cassandra on windows ? Please install Linux !
From: Romain HARDOUIN [mailto:romain.hardo...@urssaf.fr]
Sent: 13 August 2013 10:17
To: user@cassandra.apache.org
Subject: Re: Cassandra HANGS after some writes
Naresh,
My two cents is that you should run Cassandra on a Linux VM.
Issues are more eas
Hi - I was reading some blogs on implementation of secondary indexes in
Cassandra and they say that "the read requests are sent sequentially to all the
nodes" ?
So if I have a query to fetch ALL records with the secondary index filter, will
the co-ordinator node send the requests to nodes one b
r own)
Later,
Dean
From: Kanwar Sangha mailto:kan...@mavenir.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"
mailto:user@cassandra.apache.org>>
Date: Tuesday, August 20, 2013 6:57 PM
To: "user@cassandra.apache.org<mailto
Hi - During a write heavy load, the tpstats show the following -
Message type Dropped
RANGE_SLICE 0
READ_REPAIR 0
BINARY 0
READ 0
MUTATION 65570
_TRACE 0
REQUEST_RESPONSE
What help are u looking for ?
http://www.datastax.com/docs/datastax_enterprise3.1/install/install_deb_pkg
-Original Message-
From: shath...@e-z.net [mailto:shath...@e-z.net]
Sent: 25 September 2013 15:27
To: user@cassandra.apache.org
Subject: [Cassandra] Initial Setup - VMs for Research
70 matches
Mail list logo