Hi aaron,
Thank you for your reply. We tried to increase PHI threshold but still met
same issue. We used Ec2Snitch and PropertyFileSnitch instead and they work
without this problem. It seems only happened with Ec2MultiRegionSnitch
config. Although we can workaround this problem by PropertyFileSnit
> > nodetool -h localhost flush didn't do much good.
Do you have 100's of millions of rows ?
If so see recent discussions about reducing the bloom_filter_fp_chance and
index_sampling.
If this is an old schema you may be using the very old setting of 0.000744
which creates a lot of bloom filters
> > Question 2: is this a sane strategy?
>
> On its face my answer is "not... really"?
I'd go with a solid no.
Just because the the three independent clusters have a schema that looks the
same does not make them the same. The schema is a versioned document, you will
not be able to merge them
> On some of my nodes, I'm getting the following exception when cassandra starts
How many nodes?
Is this a new node or an old one and this problem just started ?
What version are you on ?
Do you have this error from system.log ? It includes the thread name which is
handy to debug things. Also
> I am trying to see whether there will be any performance difference between
> Cassandra 1.0.8 vs Cassandra 1.2.2 for reading the data mainly?
1.0 has key and row caches defined per CF, 1.1 has global ones which are better
utilised and easier to manage.
1.2 moves bloom filters and compression m
> > 2) Second (in which I am more interested in) is for performance
> > (stress/load) testing.
Sometimes you can get cassandra-stress (shipped in the bin distro) to
approximate the expected work load. It's then pretty easy to benchmark and
tests you configuration changes.
Cheers
You should split the large blobs into multiple rows, and I would use 10MB per
row as a good rule of thumb.
See http://www.datastax.com/dev/blog/cassandra-file-system-design for a
description of blob store in cassandra
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Ze
> Do you think it's worth posting an issue, or not enough traceable evidence ?
If you can reproduce it then certainly file a bug.
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 20/06/2013, at 9:41 PM, Franc Carter
That looks like it may be a bug, can you raise a ticket at
https://issues.apache.org/jira/browse/CASSANDRA
Cheers
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 21/06/2013, at 1:56 AM, hiroshi.kise...@hitachi.com wrote:
>
> If I have a data in column of size 500KB,
>
Also some information here
http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/
The data files are memory mapped so it's sort of OS dependant.
A
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand
@aaronmorton
htt
On Fri, Jun 21, 2013 at 6:16 PM, aaron morton wrote:
> Do you think it's worth posting an issue, or not enough traceable evidence
> ?
>
> If you can reproduce it then certainly file a bug.
>
I'll keep my eye on it to see if it happens again and there is a pattern
cheers
>
> Cheers
>
>-
Ok. Thank you all you guys.
Att.
*Rodrigo Felix de Almeida*
LSBD - Universidade Federal do Ceará
Project Manager
MBA, CSM, CSPO, SCJP
On Wed, Jun 19, 2013 at 2:26 PM, Robert Coli wrote:
> On Wed, Jun 19, 2013 at 5:47 AM, Michal Michalski
> wrote:
> > You can also perform a major compaction v
Is there a way to replace a failed server using vnodes? I only had
occasion to do this once, on a relatively small cluster. At the time I
just needed to get the new server online and wasn't concerned about the
performance implications, so I just removed the failed server from the
cluster and boot
It's my understanding that if cardinality of the first part of the primary
key has low cardinality, you will struggle with cluster balance as (unless
you use WITH COMPACT STORAGE) the first entry of the primary key equates to
the row key from the traditional interface, thus all entries related to a
On Fri, Jun 21, 2013 at 2:53 AM, aaron morton wrote:
> > nodetool -h localhost flush didn't do much good.
>
> Do you have 100's of millions of rows ?
> If so see recent discussions about reducing the bloom_filter_fp_chance and
> index_sampling.
>
Yes, I have 100's of millions of rows.
>
> If thi
Yes. The problem is that I can't use "counter" as the partition key otherwise
I'd wind up with hot spots in my cluster where majority of the data is being
written to single node in the cluster. The only real way around this problem
with Cassandra is to follow along with what this blog does:
h
NREL has released their open source databus. They spin it as energy data (and
a system for campus energy/building energy) but it is very general right now
and probably will stay pretty general. More information can be found here
http://www.nrel.gov/analysis/databus/
The source code can be fou
We have a 3-node cassandra cluster on AWS. These nodes are running cassandra
1.2.2 and have 8GB memory. We didn't change any of the default heap or GC
settings. So each node is allocating 1.8GB of heap space. The rows are wide;
each row stores around 260,000 columns. We are reading the data usin
Hello Mohammed,
You should increase the heap space. You should also tune the garbage
collection so young generation objects are collected faster, relieving
pressure on heap We have been using jdk 7 and it uses G1 as the default
collector. It does a better job than me trying to optimise the JDK 6 G
Hi All,
I am using jdbc driver and noticed that if I run the same query twice the
second time it is much faster.
I setup the row cache and column family cache and it not seem to make a
difference.
I am wondering how to setup cassandra such that the first query is always as
fast as the second on
Hello Tony,
I would guess that the first queries data is put into the row cache and
the filesystem cache. The second query gets the data from the row cache and
or the filesystem cache so it'll be faster.
If you want to make it consistently faster having a key cache will
definitely help. The foll
Please note that I am currently using version 1.2.2 of Cassandra. Also we are
using virtual nodes.
My question mainly stems from the fact that the nodes appear to be aware that
the node uuid changes for the IP (from reading the logs), so I am just
wondering if this means the hinted handoffs ar
Hi,
I am experimenting with Cassandra-1.2.4, and got a crash while running
repair. The nodes has 24GB of ram with an 8GB heap. Any ideas on my I may
have missed in the config ? Log is below
ERROR [Thread-136019] 2013-06-22 06:30:05,861 CassandraDaemon.java (line
174) Exception in thread Thread[Th
bloom_filter_fp_chance = 0.7 is probably way too large to be effective and
you'll probably have issues compacting deleted rows and get poor read
performance with a value that high. I'd guess that anything larger than
0.1 might as well be 1.0.
-Bryan
On Fri, Jun 21, 2013 at 5:58 AM, srmore wro
We're potentially considering increasing the size of our sstables for some
column families from 10MB to something larger.
In test, we've been trying to verify that the sstable file sizes change and
then doing a bit of benchmarking. However when we run alter the column
family and then run "nodetool
On Fri, Jun 21, 2013 at 4:40 PM, Andrew Bialecki
wrote:
> However when we run alter the column
> family and then run "nodetool upgradesstables -a keyspace columnfamily," the
> files in the data directory have been re-written, but the file sizes are the
> same.
>
> Is this the expected behavior? If
I think the new SSTable will be in the new size. In order to do that, you need
to trigger a compaction so that the new SSTables will be generated. for LCS,
there is no major compaction though. You can run a nodetool repair and
hopefully you will bring some new SSTables and compactions will kick
I think you can remove the json file which stores the mapping of which
sstable is in which level. This will be treated by cassandra as all
sstables in level 0 which will trigger a compaction. But if you have lot of
data, it will be very slow as you will keep compacting data between L1 and
L0.
This
I will take a heap dump and see whats in there rather than guessing.
On Fri, Jun 21, 2013 at 4:12 PM, Bryan Talbot wrote:
> bloom_filter_fp_chance = 0.7 is probably way too large to be effective and
> you'll probably have issues compacting deleted rows and get poor read
> performance with a valu
Looks like memory map failed. In a 64 bit system, you should have unlimited
virtual memory but Linux has a limit on the number of maps. Looks at these
two places.
http://stackoverflow.com/questions/8892143/error-when-opening-a-lucene-index-map-failed
https://blog.kumina.nl/2011/04/cassandra-java-i
Looks like you are putting lot of pressure on the heap by doing a slice
query on a large row.
Do you have lot of deletes/tombstone on the rows? That might be causing a
problem.
Also why are you returning so many columns as once, you can use auto
paginate feature in Astyanax.
Also do you see lot of
Thanks Jabbar,
I ran nodetool as suggested and it 0 latency for the row count I have.
I also ran cli list command for the table hit by my JDBC perparedStatement and
it was slow like 121msecs the first time I ran it and second time I ran it it
was 40msecs versus jdbc call of 38msecs to start w
Hi Jabbar,
I think I know what is going on. I happened accross a change mentioned by the
jdbc driver developers regarding metadata caching. Seems the metadata caching
was moved from the connection object to the preparedStatement object. So I am
wondering if the time difference I am seeing on t
33 matches
Mail list logo