Re: Cassandra 2.0.7 keeps reporting errors due to no space left on device

2014-05-14 Thread DuyHai Doan
Thanks for the report back.

If LCS falls back to SizeTiered it means that you have a workload that
exposes intensive bursts on write. Maybe solving this by would be better
than hard tweaking the LCS code
Le 12 mai 2014 17:19, "Yatong Zhang"  a écrit :

Well, I finally resolved this issue by modifying cassandra to ignore
sstables that had size bigger than a threshold.

The leveled compaction will fall back to sized tiered compaction in some
situation and that's why I always got some old huge sstables compacted.
More details can be found in 'LeveledManifest.java' , the
'getCompactionCandidates' function. I modified the 'mostInterestingBucket'
method of 'SizeTieredCompactionStrategy.java' and added a filter before
function return:

Iterator iter = hottest.left.iterator();
> while (iter.hasNext()) {
> SSTableReader mysstable = iter.next();
> if (mysstable.onDiskLength() > 1099511627776L) {
> logger.info("Removed candidate {} ",
> mysstable.toString());
> iter.remove();
> }
> }
>

 I don't have much time to do some more research to figure out if this has
side effect or not, but this is a solution for me. I hope this would be
useful to those who had similar issues.


On Sun, May 4, 2014 at 5:10 PM, Yatong Zhang  wrote:

> I am using the latest 2.0.7. The 'nodetool tpstats' shows as:
>
> [root@storage5 bin]# ./nodetool tpstats
>> Pool NameActive   Pending  Completed   Blocked
>> All time blocked
>> ReadStage 0 0 628220
>> 0 0
>> RequestResponseStage  0 03342234
>> 0 0
>> MutationStage 0 03172116
>> 0 0
>> ReadRepairStage   0 0  47666
>> 0 0
>> ReplicateOnWriteStage 0 0  0
>> 0 0
>> GossipStage   0 0 756024
>> 0 0
>> AntiEntropyStage  0 0  0
>> 0 0
>> MigrationStage0 0  0
>> 0 0
>> MemoryMeter   0 0   6652
>> 0 0
>> MemtablePostFlusher   0 0   7042
>> 0 0
>> FlushWriter   0 0   4023
>> 0 0
>> MiscStage 0 0  0
>> 0 0
>> PendingRangeCalculator0 0 27
>> 0 0
>> commitlog_archiver0 0  0
>> 0 0
>> InternalResponseStage 0 0  0
>> 0 0
>> HintedHandoff 0 0 28
>> 0 0
>>
>> Message type   Dropped
>> RANGE_SLICE  0
>> READ_REPAIR  0
>> PAGED_RANGE  0
>> BINARY   0
>> READ 0
>> MUTATION 0
>> _TRACE   0
>> REQUEST_RESPONSE 0
>> COUNTER_MUTATION 0
>>
>
>  And here is another type of error, and these errors seem to occur after
> 'disk is full'
>
> ERROR [SSTableBatchOpen:2] 2014-04-30 13:47:48,348 CassandraDaemon.java
>> (line 198) Exception in thread Thread[SSTableBatchOpen:2,5,main]
>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>> java.io.EOFException
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.(CompressionMetadata.java:110)
>> at
>> org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:64)
>> at
>> org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:458)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:422)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:203)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:184)
>> at
>> org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:264)
>>
>> at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> Caused by: java.io.EOFException
>> at
>> java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
>> at java.io.DataInputStream.readUTF(DataInputStream.java:589)
>> 

Re: Bootstrap failure on C* 1.2.13

2014-05-14 Thread Paulo Ricardo Motta Gomes
Hello,

After about 3 months I was able to solve this issue, which happened again
after another node died.

The problem is the datastax 1.2 node replacement docs [1] said that "This
procedure applies to clusters using vnodes. If not using vnodes, use the
instructions in the Cassandra 1.1 documentation".

However, the 1.1 docs did not mention the property
"-Dcassandra.replace_address=address_of_dead_node", which was only
introduced in 1.2. So, what happens without this flag is that the
replacement node tries to stream data from the dead node, failing the
bootstrap process. Adding this flag solves the problem.

Big thanks to driftx from #cassandra who helped troubleshoot the issue. The
docs were already updated to mention the property even for non-vnodes
cluster.

[1]
http://www.datastax.com/documentation/cassandra/1.2/cassandra/operations/ops_replace_node_t.html

Cheers,

On Sat, Feb 15, 2014 at 3:31 PM, Alain RODRIGUEZ  wrote:

> Hi Rob,
>
> I don't understand how setting those "initial_token" might solve this
> issue. Even more since we cannot set them before bootstrapping...
>
> Plus, once those tokens set, we would have to modify them after any new
> bootstrap / decommission. Which would also imply to run a rolling restart
> for the new configuration (cassandra.yaml)  to be taken into account. This
> is quite a heavy process to perform a "NOOP"...
>
> What did I miss ?
>
> Thanks for getting involved and trying to help anyway :).
>
> Alain
>
>
> 2014-02-15 1:13 GMT+01:00 Robert Coli :
>
> On Fri, Feb 14, 2014 at 10:08 AM, Paulo Ricardo Motta Gomes <
>> paulo.mo...@chaordicsystems.com> wrote:
>>
>>> But in our case, our cluster was not using VNodes, so this workaround
>>> will probably not work with VNodes, since you cannot specify the 256 tokens
>>> from the old node.
>>>
>>
>> Sure you can, in a comma delimited list. I plan to write a short blog
>> post about this, but...
>>
>> I recommend that anyone using Cassandra, vnodes or not, always explicitly
>> populate their initial_token line in cassandra.yaml. There are a number of
>> cases where you will lose if you do not do so, and AFAICT no cases where
>> you lose by doing so.
>>
>> If one is using vnodes and wants to do this, the process goes like :
>>
>> 1) set num_tokens to the desired number of vnodes
>> 2) start node/bootstrap
>> 3) use a one liner like jeffj's :
>> "
>> nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
>> 's/,$/\n/'
>> "
>> to get a comma delimited list of the vnode tokens
>> 4) insert this comma delimited list in initial_token, and comment out
>> num_tokens (though it is a NOOP)
>>
>> =Rob
>>
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


Backup Solution

2014-05-14 Thread ng
I want to discuss the question asked by Rene last year again.


http://www.mail-archive.com/user%40cassandra.apache.org/msg28465.html

Is the following a good backup solution.
Create two data-centers:
- A live data-center with multiple nodes (commodity hardware) (6 nodes with
replication factor of 3). Clients
connect to this cluster with LOCAL_QUORUM.
- A backup data-center with 1 node (with fast SSDs). Clients do not connect
to this cluster. Cluster only used for creating and storing snapshots.
Advantages:
- No snapshots and bulk network I/O (transfer snapshots) needed on the live
cluster. Also no need to take snapshot on each node.
- Clients are not slowed down because writes to the backup data-center are
async.
- On the backup cluster snapshots are made on a regular basis. This again
does not affect the live cluster.
- The back-up cluster does not need to process client requests/reads, so we
need less machines for the backup cluster than the live cluster.
Are there any disadvantages with this approach?

I don't see any issue with it. It is backup solution...not replication
solution. Both DC can be on physically same location/network. Copy of the
snapshots can be placed to separate shared location on daily basis from
backup DC node.

I must be missing something..please advise.


Cassandra hadoop job fails if any node is DOWN

2014-05-14 Thread Paulo Ricardo Motta Gomes
Hello,

One of the nodes of our Analytics DC is dead, but ColumnFamilyInputFormat
(CFIF) still assigns Hadoop input splits to it. This leads to many failed
tasks and consequently a failed job.

* Tasks fail with: java.lang.RuntimeException:
org.apache.thrift.transport.TTransportException: Failed to open a transport
to XX.75:9160. (obviously, the node is dead)

* Job fails with: Job Failed: # of failed Map Tasks exceeded allowed limit.
FailedCount: 1. LastFailedTask: task_201404180250_4207_m_79

We use RF=2 and CL=LOCAL_ONE for hadoop jobs, C* 1.2.16. Is this expected
behavior?

I checked CFIF code, but it always assigns input splits to all the ring
nodes, no matter if the node is dead or alive. What we do to fix is patch
CFIF to blacklist the dead node, but this is not very automatic procedure.
Am I not getting something here?

Cheers,

-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


Re: Efficient bulk range deletions without compactions by dropping SSTables.

2014-05-14 Thread Jeremy Powell
Hi Kevin,

C* version: 1.2.xx
Astyanax: 1.56.xx

We basically do this same thing in one of our production clusters, but
rather than dropping SSTables, we drop Column Families. We time-bucket our
CFs, and when a CF has passed some time threshold (metadata or embedded in
CF name), it is dropped. This means there is a home-grown system that is
doing the bookkeeping/maintenance rather than relying on C*s inner
workings. It is unfortunate that we have to maintain a system which
maintains CFs, but we've been in a pretty good state for the last 12 months
using this method.

Some caveats:

By default, C* makes snapshots of your data when a table is dropped. You
can leave that and have something else clear up the snapshots, or if you're
less paranoid, set auto_snapshot: false in the cassandra.yaml file.

Cassandra does not handle 'quick' schema changes very well, and we found
that only one node should be used for these changes. When adding or
removing column families, we have a single, property defined C* node that
is designated as the schema node. After making a schema change, we had to
throw in an artificial delay to ensure that the schema change propagated
through the cluster before making the next schema change. And of course,
relying on a single node being up for schema changes is less than ideal, so
handling fail over to a new node is important.

The final, and hardest problem, is that C* can't really handle schema
changes while a node is being bootstrapped (new nodes, replacing a dead
node). If a column family is dropped, but the new node has not yet received
that data from its replica, the node will fail to bootstrap when it finally
begins to receive that data - there is no column family for the data to be
written to, so that node will be stuck in the joining state, and it's
system keyspace needs to be wiped and re-synced to attempt to get back to a
happy state. This unfortunately means we have to stop schema changes when a
node needs to be replaced, but we have this flow down pretty well.

Hope this helps,
Jeremy Powell


On Mon, May 12, 2014 at 5:53 PM, Kevin Burton  wrote:

> We have a log only data structure… everything is appended and nothing is
> ever updated.
>
> We should be totally fine with having lots of SSTables sitting on disk
> because even if we did a major compaction the data would still look the
> same.
>
> By 'lots' I mean maybe 1000 max.  Maybe 1GB each.
>
> However, I would like a way to delete older data.
>
> One way to solve this could be to just drop an entire SSTable if all the
> records inside have tombstones.
>
> Is this possible, to just drop a specific SSTable?
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ 
> profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>


Re: Disable reads during node rebuild

2014-05-14 Thread Paulo Ricardo Motta Gomes
That's a nice workaround, will be really helpful in emergency situations
like this.

Thanks,


On Mon, May 12, 2014 at 6:58 PM, Aaron Morton wrote:

> I'm not able to replace a dead node using the ordinary procedure
> (boostrap+join), and would like to rebuild the replacement node from
> another DC.
>
> Normally when you want to add a new DC to the cluster the command to use
> is nodetool rebuild $DC_NAME .(with auto_bootstrap: false) That will get
> the node to stream data from the $DC_NAME
>
> The problem is that if I start a node with auto_bootstrap=false to perform
> the rebuild, it automatically starts serving empty reads (CL=LOCAL_ONE).
>
> When adding a new DC the nodes wont be processing reads, that is not the
> case for you.
>
> You should disable the client API’s to prevent the clients from calling
> the new nodes, use -Dcassandra.start_rpc=false and
> -Dcassandra.start_native_transport=false in cassandra-env.sh or appropriate
> settings in cassandra.yaml
>
> Disabling reads from other nodes will be harder. IIRC during bootstrap a
> different timeout (based on ring_delay) is used to detect if the
> bootstrapping node is down. However if the node is running and you use
> nodetool rebuild i’m pretty sure the normal gossip failure detectors will
> kick in. Which means you cannot disable gossip to prevent reads. Also we
> would want the node to be up for writes.
>
> But what you can do is artificially set the severity of the node high so
> the dynamic snitch will route around it. See
> https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/locator/DynamicEndpointSnitchMBean.java#L37
>
>
> * Set the value to something high on the node you will be rebuilding, the
> number or cores on the system should do.  (jmxterm is handy for this
> http://wiki.cyclopsgroup.org/jmxterm)
> * Check nodetool gossipinfo on the other nodes to see the SEVERITY app
> state has propagated.
> * Watch completed ReadStage tasks on the node you want to rebuild. If you
> have read repair enabled it will still get some traffic.
> * Do rebuild
> * Reset severity to 0
>
> Hope that helps.
> Aaron
>
> -
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 13/05/2014, at 5:18 am, Paulo Ricardo Motta Gomes <
> paulo.mo...@chaordicsystems.com> wrote:
>
> Hello,
>
> I'm not able to replace a dead node using the ordinary procedure
> (boostrap+join), and would like to rebuild the replacement node from
> another DC. The problem is that if I start a node with auto_bootstrap=false
> to perform the rebuild, it automatically starts serving empty reads
> (CL=LOCAL_ONE).
>
> Is there a way to disable reads from a node while performing rebuild from
> another datacenter? I tried starting the node in write survery mode, but
> the nodetool rebuild command does not work in this mode.
>
> Thanks,
>
> --
> *Paulo Motta*
>
> Chaordic | *Platform*
> *www.chaordic.com.br *
> +55 48 3232.3200
>
>
>


-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br *
+55 48 3232.3200


RE: Datacenter understanding question

2014-05-14 Thread Mark Farnan
Yes they will

 

From: ng [mailto:pipeli...@gmail.com] 
Sent: Tuesday, May 13, 2014 11:07 PM
To: user@cassandra.apache.org
Subject: Datacenter understanding question

 

If I have configuration of two data center with one node each.
Replication factor is also 1.
Will these 2 nodes going to be mirrored/replicated?



Re: NTS, vnodes and 0% chance of data loss

2014-05-14 Thread William Oberman
After sleeping on this, I'm sure my original conclusions are wrong.  In all
of the referenced cases/threads, I internalized "rack awareness" and
"hotspots" to mean something different and wrong.  A hotspot didn't mean
multiple replicas in the same rack (as I had been thinking), it meant the
process of finding replica placement might hit the same vnode
proportionally wrong due to the random association of vnodes <-> {dc,rack}.

To not people astray, I think everything in my email below is correct
until: "Which means a rack failure (3 nodes) has a non-zero chance of data
failure (right?)."  And again, my flaw was thinking that when Cassandra
selected replicas for token "X" in a vnode world, that it would possibly
pick vnodes that happened to be on the same rack due to random placements
of the tokens.  That is wrong (looking at the source for NTS), as NTS does
skip over the same rack (though, it will allow multiple in the same rack if
you "fill up"... I guess if someone did DC:4 with 3 racks they'll always
get one rack with two copies of the data, for example).

will

On Tue, May 13, 2014 at 1:41 PM, William Oberman
wrote:

> I found this:
>
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201404.mbox/%3ccaeduwd1erq-1m-kfj6ubzsbeser8dwh+g-kgdpstnbgqsqc...@mail.gmail.com%3E
>
> I read the three referenced cases.  In addition, case 4123 references:
> http://www.mail-archive.com/dev@cassandra.apache.org/msg03844.html
>
> And even though I *think* I understand all of the issues now, I still want
> to double check...
>
> Assumptions:
> -A cluster using NTS with options [DC:3]
> -Physical layout = In DC, 3 nodes/rack for a total of 9 nodes
>
> No vnodes: I could do token selection using ideas from case 3810 such that
> each rack has one replica.  At this point, my "0% chance of data loss"
> scenarios are:
> 1.) Failure of two nodes at random
> 2.) Failure of 2 racks (6 nodes!)
>
> Vnodes: my "0% chance of data loss" scenarios are:
> 1.) Failure of two nodes at random
> Which means a rack failure (3 nodes) has a non-zero chance of data failure
> (right?).
>
> To get specific, I'm in AWS, so racks ~= "availability zones".  In the
> years I've been in AWS, I've seen several occasions of "single zone
> downtimes", and one time of "single zone catastrophic loss".  E.g. for AWS
> I feel like you *have* to plan for a single zone failure, and in terms of
> "safety first" you *should* plan for two zone failures.
>
> To mitigate this data loss risk seems rough for vnodes, again if I'm
> understanding everything correctly:
> -To ensure 0% data loss for one zone => I need RF=4
> -To ensure 0% data loss for two zones => I need RF=7
>
> I'd really like to use vnodes, but RF=7 is crazy.
>
> To reiterate what I think is the core idea of this message:
> 1.) for vnodes 0% data loss => RF=(# of allowed failures at once)+1
> 2.) racks don't change the above equation at all
>
> will
>


Re: Really need some advices on large data considerations

2014-05-14 Thread Yatong Zhang
Thank you Aaron, but we're planning about 20T per node, is that feasible?


On Mon, May 12, 2014 at 4:33 PM, Aaron Morton wrote:

> We've learned that compaction strategy would be an important point cause
> we've ran into 'no space' trouble because of the 'sized tiered'  compaction
> strategy.
>
> If you want to get the most out of the raw disk space LCS is the way to
> go, remember it uses approximately twice the disk IO.
>
> From our experience changing any settings/schema during a large cluster is
> on line and has been running for some time is really really a pain.
>
> Which parts in particular ?
>
> Updating the schema or config ? OpsCentre has a rolling restart feature
> which can be handy when chef / puppet is deploying the config changes.
> Schema / gossip can take a little to propagate with high number of nodes.
>
> On a modern version you should be able to run 2 to 3 TB per node, maybe
> higher. The biggest concerns are going to be repair (the changes in 2.1
> will help) and bootstrapping. I’d recommend testing a smaller cluster, say
> 12 nodes, with a high load per node 3TB.
>
> cheers
> Aaron
>
> -
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 9/05/2014, at 12:09 pm, Yatong Zhang  wrote:
>
> Hi,
>
> We're going to deploy a large Cassandra cluster in PB level. Our scenario
> would be:
>
> 1. Lots of writes, about 150 writes/second at average, and about 300K size
> per write.
> 2. Relatively very small reads
> 3. Our data will be never updated
> 4. But we will delete old data periodically to free space for new data
>
> We've learned that compaction strategy would be an important point cause
> we've ran into 'no space' trouble because of the 'sized tiered'  compaction
> strategy.
>
> We've read http://wiki.apache.org/cassandra/LargeDataSetConsiderationsand is 
> this enough or update-to-date? From our experience changing any
> settings/schema during a large cluster is on line and has been running for
> some time is really really a pain. So we're gathering more info and
> expecting some more practical suggestions before we set up  the cassandra
> cluster.
>
> Thanks and any help is of great appreciation
>
>
>


Cassandra token range support for Hadoop (ColumnFamilyInputFormat)

2014-05-14 Thread Anton Brazhnyk
Greetings,

I'm reading data from C* with Spark (via ColumnFamilyInputFormat) and I'd like 
to read just part of it - something like Spark's sample() function.
Cassandra's API seems allow to do it with its 
ConfigHelper.setInputRange(jobConfiguration, startToken, endToken) method, but 
it doesn't work.
The limit is just ignored and the entire column family is scanned. It seems 
this kind of feature is just not supported 
and sources of AbstractColumnFamilyInputFormat.getSplits confirm that (IMO).
Questions:
1. Am I right that there is no way to get some data limited by token range with 
ColumnFamilyInputFormat?
2. Is there other way to limit the amount of data read from Cassandra with 
Spark and ColumnFamilyInputFormat,
so that this amount is predictable (like 5% of entire dataset)?


WBR,
Anton




RE: Question about READS in a multi DC environment.

2014-05-14 Thread Mark Farnan
Perfect, 

 

Thanks, that solved it. 

 

 

Regards

 

Mark. 

 

From: Aaron Morton [mailto:aa...@thelastpickle.com] 
Sent: Monday, May 12, 2014 2:21 PM
To: Cassandra User
Subject: Re: Question about READS in a multi DC environment.

 

>  read_repair_chance=1.00 AND

 

There’s your problem. 

 

When read repair is active for a read request the coordinator will over read
to all UP replicas. Your client request will only block waiting for the one
request (the data request), the rest of the repair will happen in the
background. Setting this to 1.0 will mean it’s active across the entire
cluster for each read. 

 

Change read_repair_chance to 0 and set dclocal_read_repair_chance to 0.1 so
that read repair will only happen local to the DC you are connected to.

 

Hope that helps. 

A

 

 

-

Aaron Morton

New Zealand

@aaronmorton

 

Co-Founder & Principal Consultant

Apache Cassandra Consulting

http://www.thelastpickle.com

 

On 12/05/2014, at 5:37 pm, DuyHai Doan mailto:doanduy...@gmail.com> > wrote:





Ins't read repair supposed to be done asynchronously in background ?

 

On Mon, May 12, 2014 at 2:07 AM, graham sanderson mailto:gra...@vast.com> > wrote:



You have a read_repair_chance of 1.0 which is probably why your query is
hitting all data centers.


On May 11, 2014, at 3:44 PM, Mark Farnan mailto:devm...@petrolink.com> > wrote:

> Im trying to understand READ load in Cassandra across a multi-datacenter
cluster.   (Specifically why it seems to be hitting more than one DC) and
hope someone can help.
>
> From what Iím seeing here, a READ, with Consistency LOCAL_ONE,   seems to
be hitting All 3 datacenters, rather than just the one Iím connected to.   I
see  'Read 101 live and 0 tombstoned cells'  from EACH of the 3 DC"s in the
trace, which seems, wrong.
> I have tried every  Consistency level, same result.   This also is same
from my C# code via the DataStax driver, (where I first noticed the issue).
>
> Can someone please shed some light on what is occurring ?  Specifically I
dont' want a query on one DC, going anywhere near the other 2 as a rule, as
in production,  these DC's will be accross slower links.
>
>
> Query:  (NOTE:  Whilst this uses a kairosdb table,  i'm just playing with
queries against it as it has 100k columns in this key for testing).
>
> cqlsh:kairosdb> consistency local_one
> Consistency level set to LOCAL_ONE.
>
> cqlsh:kairosdb> select * from data_points where key =
0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f696365
3a limit 1000;
>
> ... Some return data  rows listed here which I've removed 
>

> 
> Query Respose Trace:
>
> activity
| timestamp| source | source_elapsed
>

--+-
-++
>
execute_cql3_query | 07:18:12,692 | 192.168.25.111 |  0
>
Message received from /192.168.25.111   |
07:18:00,706 | 192.168.25.131 | 50
>
Executing single-partition query on data_points | 07:18:00,707 |
192.168.25.131 |760
>
Acquiring sstable references | 07:18:00,707 | 192.168.25.131 |
814
>
Merging memtable tombstones | 07:18:00,707 | 192.168.25.131 |924
>
Bloom filter allows skipping sstable 191 | 07:18:00,707 | 192.168.25.131 |
1050
>
Bloom filter allows skipping sstable 190 | 07:18:00,707 | 192.168.25.131 |
1166
>
Key cache hit for sstable 189 | 07:18:00,707 | 192.168.25.131 |
1275
>
Seeking to partition beginning in data file | 07:18:00,707 | 192.168.25.131
|   1293
>Skipped 0/3
non-slice-intersecting sstables, included 0 due to tombstones | 07:18:00,708
| 192.168.25.131 |   2173
>
Merging data from memtables and 1 sstables | 07:18:00,708 | 192.168.25.131 |
2195
>
Read 1001 live and 0 tombstoned cells | 07:18:00,709 | 192.168.25.131 |
3259
>
Enqueuing response to /192.168.25.111   |
07:18:00,710 | 192.168.25.131 |   4006
>
Sending message to /192.168.25.111   | 07:18:00,710
| 192.168.25.131 |   4210
> Parsing select * from data_points where key =
0x6d61726c796e2e746573742e74656d7034000145b514a400726f6f6d3d6f696365
3a limit 1000; | 07:18:12,692 | 192.168.25.111 | 52
>
Preparing statement | 07:18:12,692 | 192.168.25.111 |257
>
Sending message to /192.168.25.121   | 07:18:12,693
| 192.168.25.111 |   1099
>
Sending message to /192.168.25.131   | 07:18:12,693
| 192.168.25.111 |   1254
>
Executing single-partition query on data_points | 07:18:12,693 |
192.168.25.111 |   1269
>
Acquiring sstable references | 07:18:12,693 | 192.168.25.111 |
1284
>
Merging memtable tombstones | 07:18:12,694 | 192.168.25.111 |   1315
>
Key cache hit

Re: row caching for frequently updated column

2014-05-14 Thread Chris Burroughs

You are close.

On 04/30/2014 12:41 AM, Jimmy Lin wrote:

thanks all for the pointers.

let' me see if I can put the sequences of event together 

1.2
people mis-understand/mis-use row cache, that cassandra cached the entire
row of data even if you are only looking for small subset of the row data.
e.g
select single_column from a_wide_row_table
will result in entire row cached even if you are only interested in one
single column of a row.



Yep!


2.0
and because of potential misuse of heap memory, Cassandra 2.0 remove heap
cache, and only support off-heap cache, which has a side effect that write
will invalidate the row cache(my original question)



"off-heap" is a common but misleading name for the 
SerializingCacheProvider.  It still stores several objects on heap per 
cached item and has to deser on read.



2.1
the coming 2.1 Cassandra will offer true cache by query, so the cached data
will be much more efficient even for wide rows(it cached what it needs).

do I get it right?
for the new 2.1 row caching, is it still true that a write or update to the
row will still invalidate the cached row ?



I don't think "true cache by query" is an accurate description of 
CASSANDRA-5357.  I think it's more like a "head of the row" cache.