Creating a copy of a C* cluster

2017-08-07 Thread Robert Wille
We need to make a copy of a cluster. We’re going to do some testing against the copy and then discard it. What’s the best way of doing that? I created another datacenter, and then have tried to divorce it from the original datacenter, but have had troubles doing so. Suggestions? Thanks in adva

Re: Why does `now()` produce different times within the same query?

2016-11-30 Thread Robert Wille
In my opinion, this is not broken and “fixing” it would break existing code. Consider a batch that includes multiple inserts, each of which inserts the value returned by now(). Getting the same UUID for each insert would be a major problem. Cheers Robert On Nov 30, 2016, at 4:46 PM, Todd Fast

Re: Cannot mix counter and non counter columns in the same table

2016-11-01 Thread Robert Wille
I used to think it was terrible as well. But it really isn’t. Just put your non-counter columns in a separate table with the same primary key. If you want to query both counter and non-counter columns at the same time, just query both tables at the same time with asynchronous queries. On Nov 1,

Re: Transaction failed because of timeout, retry failed because of the first try actually succeeded.

2016-06-30 Thread Robert Wille
I had this problem, and it was caused by my retry policy. For reasons I don’t remember (but is documented in a C* Jira ticket), when onWriteTimeout() is called, you cannot call RetryDecision.retry(cl), as it will be a CL that is incompatible with LWT. After the fix (2.1.?), you can pass null, an

Re: Intermittent CAS error

2016-05-19 Thread Robert Wille
have found a similar bug. The Java driver mailing list is the best place to follow up on this. It can be found at https://groups.google.com/a/lists.datastax.com/forum/#!forum/java-driver-user. On Thu, May 19, 2016 at 12:11 AM, Robert Wille mailto:rwi...@fold3.com>> wrote: When executing b

Intermittent CAS error

2016-05-18 Thread Robert Wille
When executing bulk CAS queries, I intermittently get the following error: SERIAL is not supported as conditional update commit consistency. Use ANY if you mean "make sure it is accepted but I don't care how many replicas commit it for non-SERIAL reads” This doesn’t make any sense. Obviously,

Re: Large primary keys

2016-04-14 Thread Robert Wille
to the document given the document text. -- Jack Krupansky On Mon, Apr 11, 2016 at 7:12 PM, James Carman mailto:ja...@carmanconsulting.com>> wrote: S3 maybe? On Mon, Apr 11, 2016 at 7:05 PM Robert Wille mailto:rwi...@fold3.com>> wrote: I do realize its kind of a weird use case, but it is

Re: Large primary keys

2016-04-11 Thread Robert Wille
eperate > table per day / hour or something like that, so you can quickly get all keys > for a time range. A query without the partition key may be very slow. > > Jan > > Am 11.04.2016 um 23:43 schrieb Robert Wille: >> I have a need to be able to use the text of a document

Large primary keys

2016-04-11 Thread Robert Wille
I have a need to be able to use the text of a document as the primary key in a table. These texts are usually less than 1K, but can sometimes be 10’s of K’s in size. Would it be better to use a digest of the text as the key? I have a background process that will occasionally need to do a full ta

Re: disable compaction if all data are read-only?

2016-04-08 Thread Robert Wille
You still need compaction. Compaction is what organizes your data into levels. Without compaction, every query would have to look at every SSTable. Also, due to commit log rotation, your memtable may get flushed from time to time before it is full, resulting in small SSTables that would benefit

Re: Practical limit on number of column families

2016-02-29 Thread Robert Wille
Yes, there is memory overhead for each column family, effectively limiting the number of column families. The general wisdom is that you should limit yourself to a few hundred. Robert On Feb 29, 2016, at 10:30 AM, Fernando Jimenez mailto:fernando.jime...@wealth-port.com>> wrote: Hi all I ha

Re: Duplicated key with an IN statement

2016-02-04 Thread Robert Wille
You shouldn’t be using IN anyway. It is better to issue multiple queries, each for a single key, and issue them in parallel. Better performance. Less GC pressure. On Feb 4, 2016, at 7:54 AM, Sylvain Lebresne mailto:sylv...@datastax.com>> wrote: That behavior has been changed in 2.2 and upwards

Re: Cassandra Performance on a Single Machine

2016-01-14 Thread Robert Wille
I disagree. I think that you can extrapolate very little information about RF>1 and CL>1 by benchmarking with RF=1 and CL=1. On Jan 13, 2016, at 8:41 PM, Anurag Khandelwal mailto:anur...@berkeley.edu>> wrote: Hi John, Thanks for responding! The aim of this benchmark was not to benchmark Cassa

Re: Write/read heavy usecase in one cluster

2015-12-23 Thread Robert Wille
I would personally classify both of those use cases as light, and I wouldn’t have any qualms about using a single cluster for both of those. On Dec 23, 2015, at 3:06 PM, cass savy wrote: > How do you determine if we can share cluster in prod for 2 different > applications > > 1. Has anybody

Re: lots of tombstone after compaction

2015-12-07 Thread Robert Wille
The nulls in the original data created the tombstones. They won’t go away until gc_grace_seconds have passed (default is 10 days). On Dec 7, 2015, at 4:46 PM, Kai Wang wrote: > I bulkloaded a few tables using CQLSStableWrite/sstableloader. The data are > large amount of wide rows with lots of

Re: Behavior difference between 2.0 and 2.1

2015-12-04 Thread Robert Wille
her if you have TTL on all non static (clustering and data) columns, you don’t (necessarily) want the static data to disappear when the other cells do - though you can achieve this with statement wide TTL-ing on insertion of the static data. On Dec 3, 2015, at 6:31 PM, Robert Wille mailto:rwi...

Behavior difference between 2.0 and 2.1

2015-12-03 Thread Robert Wille
With this schema: CREATE TABLE roll ( id INT, image BIGINT, data VARCHAR static, PRIMARY KEY ((id), image) ) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 }; if I run SELECT image FROM roll WHERE id = X on 2.0, where partitio

Re: Upgrade instructions don't make sense

2015-11-23 Thread Robert Wille
than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Nov 23, 2015 at 5:55 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: I’m wanting to upgrade from 2.

Upgrade instructions don't make sense

2015-11-23 Thread Robert Wille
I’m wanting to upgrade from 2.0 to 2.1. The upgrade instructions at http://docs.datastax.com/en/upgrade/doc/upgrade/cassandra/upgradeCassandraDetails.html has the following, which leaves me with more questions than it answers: If your cluster does not use vnodes, disable vnodes in each new cassa

2.1 counters and CL=ONE

2015-10-27 Thread Robert Wille
I’m planning an upgrade from 2.0 to 2.1, and was reading about counters, and ended up with a question. I read that in 2.0, counters are implemented by storing deltas, and in 2.1, read-before-write is used to store totals instead. What does this mean for the following scenario? Suppose we have a

Anything special about upgrading from 2.0 to 2.1

2015-10-22 Thread Robert Wille
I’m on 2.0.16 and want to upgrade to the latest 2.1.x. I’ve seen some comments about issues with counters not migrating properly. I have a lot of counters. Any concerns there? Do I need to run nodetool upgradesstables? Any other gotchas? Thanks Robert

Re: Duplicate records returned

2015-10-08 Thread Robert Wille
, 2015, at 2:33 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: It's a paging bug. I ALWAYS get a duplicated record every fetchSize records. Easily duplicated 100% of the time. I’ve logged a bug: https://issues.apache.org/jira/browse/CASSANDRA-10442 Robert On Oct 3, 2015, at 10:59

Node won't go away

2015-10-08 Thread Robert Wille
We had some problems with a node, so we decided to rebootstrap it. My IT guy screwed up, and when he added -Dcassandra.replace_address to cassandra-env.sh, he forgot the closing quote. The node bootstrapped, and then refused to join the cluster. We shut it down, and then noticed that nodetool st

Re: Duplicate records returned

2015-10-03 Thread Robert Wille
It's a paging bug. I ALWAYS get a duplicated record every fetchSize records. Easily duplicated 100% of the time. I’ve logged a bug: https://issues.apache.org/jira/browse/CASSANDRA-10442 Robert On Oct 3, 2015, at 10:59 AM, Robert Wille mailto:rwi...@fold3.com>> wrote: Oops, I was t

Re: Duplicate records returned

2015-10-03 Thread Robert Wille
(imageId == lastImageId) { logger.warn("Cassandra duplicated " + imageId); continue; } total++; lastImageId = imageId; } On Oct 3, 2015, at 10:54 AM, Robert Wille mailto:rwi...@fold3.com>> wrote: I don’t think its an application problem. The following simple snippets produce

Re: Duplicate records returned

2015-10-03 Thread Robert Wille
deviation. Especially since you don't see the duplicates in cqlsh, I have a hunch this is an application bug. On Fri, Oct 2, 2015 at 4:58 PM Robert Wille mailto:rwi...@fold3.com>> wrote: When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this table CRE

Duplicate records returned

2015-10-02 Thread Robert Wille
When I run the query "SELECT image FROM roll WHERE roll = :roll“ against this table CREATE TABLE roll ( roll INT, image BIGINT, data VARCHAR static, mid VARCHAR, imp_st VARCHAR, PRIMARY KEY ((roll), image) ) WITH gc_grace_seconds = 3456000 AND compaction = { 'class' : 'LeveledCompactionStrategy'

Re: Compaction not happening

2015-09-29 Thread Robert Wille
kes up into the thousands. In my case it's a LCS table with fluctuating-but-sometimes-pretty-high write load and lots of (intentional) overwrite, infrequent deletes. C* 2.1.7. On Thu, Sep 24, 2015 at 12:59 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: I have some tables that have

Re: High CPU usage on some of nodes

2015-09-10 Thread Robert Wille
It sounds like its probably GC. Grep for GC in system.log to verify. If it is GC, there are a myriad of issues that could cause it, but at least you’ve narrowed it down. On Sep 9, 2015, at 11:05 PM, Roman Tkachenko wrote: > Hey guys, > > We've been having issues in the past couple of days wit

Re: Order By limitation or bug?

2015-09-07 Thread Robert Wille
ESC; Essentially the order by clause has to specify the clustering columns in order in full. It doesn’t by default know that you have already essentially filtered by type. Alec Collier | Workplace Service Design Corporate Operations Group - Technology | Macquarie Group Limited • From: Robert Wil

Re: Order By limitation or bug?

2015-09-03 Thread Robert Wille
d. So to do an order by Id, C* will need to perform an in-memory re-ordering, not sure how bad it is for performance. In any case currently it's not possible, maybe you should create a JIRA to ask for lifting the limitation. On Thu, Sep 3, 2015 at 10:27 PM, Robert Wille mailto:rwi..

Order By limitation or bug?

2015-09-03 Thread Robert Wille
Given this table: CREATE TABLE import_file ( roll int, type text, id timeuuid, data text, PRIMARY KEY ((roll), type, id) ) This should be possible: SELECT data FROM import_file WHERE roll = 1 AND type = 'foo' ORDER BY id DESC; but it results in the following error: Bad Request: Order

Re: Written data is lost and no exception thrown back to the client

2015-08-21 Thread Robert Wille
But it shouldn’t matter. I have missing data, and no errors, which shouldn’t be possible except with CL=ANY. FWIW, I’m working on some sample code so I can post a Jira. Robert On Aug 21, 2015, at 5:04 AM, Robert Wille mailto:rwi...@fold3.com>> wrote: RF=1 with QUORUM consistency.

Re: Written data is lost and no exception thrown back to the client

2015-08-21 Thread Robert Wille
wrote: What consistency level were the writes? From: Robert Wille<mailto:rwi...@fold3.com> Sent: ‎8/‎20/‎2015 18:25 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Written data is lost and no exception thrown back to the clie

Written data is lost and no exception thrown back to the client

2015-08-20 Thread Robert Wille
I wrote a data migration application which I was testing, and I pushed it too hard and the FlushWriter thread pool blocked, and I ended up with dropped mutation messages. I compared the source data against what is in my cluster, and as expected I have missing records. The strange thing is that m

Re: Schema questions for data structures with recently-modified access patterns

2015-07-24 Thread Robert Wille
he new materialized view feature of Cassandra 3.0 would make it an even easier fit. -- Jack Krupansky On Thu, Jul 23, 2015 at 6:30 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: I obviously worded my original email poorly. I guess that’s what happens when you post at the end of the day just bef

Compaction no longer working properly

2015-07-24 Thread Robert Wille
I have a database which has a fair amount of churn. When I need to update a data structure, I create a new one, and when it is complete, I delete the old one. I have gc_grace_seconds=0, so the space for the old data structures should be reclaimed on the next compaction. This has been working fin

Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Robert Wille
Krupansky mailto:jack.krupan...@gmail.com>> wrote: Maybe you could explain in more detail what you mean by recently modified documents, since that is precisely what I thought I suggested with descending ordering. -- Jack Krupansky On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille mailto:rwi...

Re: Schema questions for data structures with recently-modified access patterns

2015-07-23 Thread Robert Wille
ng taking care of the delete, automatically. -- Jack Krupansky On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: The time series doesn’t provide the access pattern I’m looking for. No way to query recently-modified documents. On Jul 21, 2015, at 9:13

Re: Best Practise for Updating Index and Reporting Tables

2015-07-23 Thread Robert Wille
My guess is that you don’t understand what an atomic batch is, give that you used the phrase “updated synchronously”. Atomic batches do not provide isolation, and do not guarantee immediate consistency. The only thing an atomic batch guarantees is that all of the statements in the batch will eve

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
se as, due to the specified clustering order, the latest modification will always be first record in the row. Hope it helps. Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso> On 21 July 2015 at 05:59, Robert Wille mailto:rwi...@fold3.com>> wrote: Data structures

Re: Schema questions for data structures with recently-modified access patterns

2015-07-21 Thread Robert Wille
helps. Carlos Alonso | Software Engineer | @calonso<https://twitter.com/calonso> On 21 July 2015 at 05:59, Robert Wille mailto:rwi...@fold3.com>> wrote: Data structures that have a recently-modified access pattern seem to be a poor fit for Cassandra. I’m wondering if any of you smar

Schema questions for data structures with recently-modified access patterns

2015-07-20 Thread Robert Wille
Data structures that have a recently-modified access pattern seem to be a poor fit for Cassandra. I’m wondering if any of you smart guys can provide suggestions. For the sake of discussion, lets assume I have the following tables: CREATE TABLE document ( docId UUID, doc TEXT,

Truncate really slow

2015-07-01 Thread Robert Wille
I have two test clusters, both 2.0.15. One has a single node and one has three nodes. Truncate on the three node cluster is really slow, but is quite fast on the single-node cluster. My test cases truncate tables before each test, and > 95% of the time in my test cases is spent truncating tables

Re: Missing data

2015-06-15 Thread Robert Wille
You can get tombstones from inserting null values. Not sure if that’s the problem, but it is another way of getting tombstones in your data. On Jun 15, 2015, at 10:50 AM, Jean Tremblay mailto:jean.tremb...@zen-innovations.com>> wrote: Dear all, I identified a bit more closely the root cause o

Re: Dropped mutation messages

2015-06-13 Thread Robert Wille
Internode messages which are received by a node, but do not get not to be processed within rpc_timeout are dropped rather than processed. As the coordinator node will no longer be waiting for a response. If the Coordinator node does not receive Consistency Level responses before the rpc_timeout

Re: Dropped mutation messages

2015-06-12 Thread Robert Wille
I meant to say I’m *not* overloading my cluster. On Jun 12, 2015, at 6:52 PM, Robert Wille wrote: > I am preparing to migrate a large amount of data to Cassandra. In order to > test my migration code, I’ve been doing some dry runs to a test cluster. My > test cluster is 2.0.15, 3 no

Dropped mutation messages

2015-06-12 Thread Robert Wille
I am preparing to migrate a large amount of data to Cassandra. In order to test my migration code, I’ve been doing some dry runs to a test cluster. My test cluster is 2.0.15, 3 nodes, RF=1 and CL=QUORUM. I know RF=1 and CL=QUORUM is a weird combination, but my production cluster that will eventu

Coordination of expired TTLs compared to tombstones

2015-05-29 Thread Robert Wille
I was wondering something about Cassandra’s internals. Suppose I have CL > 1 and I read a partition with a bunch of tombstones. Those tombstones have to be sent to the coordinator for consistency reasons so that if another replica produces non-tombstone data that is older than the tombstone, it

Re: After running nodetool clean up, the used disk space was increased

2015-05-15 Thread Robert Wille
Have you cleared snapshots? On May 15, 2015, at 2:24 PM, Analia Lorenzatto mailto:analialorenza...@gmail.com>> wrote: The Replication Factor = 2. The RP is the default, but not sure how to check it. I am attaching the output of: nodetool ring Thanks a lot! On Fri, May 15, 2015 at 4:17 PM, Ki

Re: Consistency Issues

2015-05-13 Thread Robert Wille
Timestamps have millisecond granularity. If you make multiple writes within the same millisecond, then the outcome is not deterministic. Also, make sure you are running ntp. Clock skew will manifest itself similarly. On May 13, 2015, at 3:47 AM, Jared Rodriguez mailto:jrodrig...@kitedesk.com>>

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Robert Wille
batch update? On Wed, May 13, 2015 at 5:48 PM, Ali Akhtar mailto:ali.rac...@gmail.com>> wrote: The 6k is only the starting value, its expected to scale up to ~200 million records. On Wed, May 13, 2015 at 5:44 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: You could use lightweig

Re: Updating only modified records (where lastModified < current date)

2015-05-13 Thread Robert Wille
You could use lightweight transactions to update only if the record is newer. It doesn’t avoid the read, it just happens under the covers, so it’s not really going to be faster compared to a read-before-write pattern (which is an anti-pattern, BTW). It is probably the easiest way to avoid gettin

Re: query contains IN on the partition key and an ORDER BY

2015-05-02 Thread Robert Wille
Bag the IN clause and execute multiple parallel queries instead. It’s more performant anyway. On May 2, 2015, at 11:46 AM, Abhishek Singh Bailoo mailto:abhishek.singh.bai...@gmail.com>> wrote: Hi I have run into the following issue https://issues.apache.org/jira/browse/CASSANDRA-6722 when run

Re: Inserting null values

2015-04-29 Thread Robert Wille
I’ve come across the same thing. I have a table with at least half a dozen columns that could be null, in any combination. Having a prepared statement for each permutation of null columns just isn’t going to happen. I don’t want to build custom queries each time because I have a really cool syst

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
s nor their employees accept any responsibility. From: Robert Wille [mailto:rwi...@fold3.com] Sent: 22 April 2015 15:00 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: OperationTimedOut in selerct count statement in cqlsh I should have been more clear. What I meant was t

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
ir employees accept any responsibility. From: Robert Wille [mailto:rwi...@fold3.com] Sent: 22 April 2015 14:44 To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: Re: OperationTimedOut in selerct count statement in cqlsh Keep in mind that "select count(l)" and "

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility. From: Robert Wille [mailto:rwi...@fold3.com] Sent: 22 April 2015

Re: OperationTimedOut in selerct count statement in cqlsh

2015-04-22 Thread Robert Wille
Keep in mind that "select count(l)" and "select l" amount to essentially the same thing. On Apr 22, 2015, at 3:41 AM, Tommy Stendahl mailto:tommy.stend...@ericsson.com>> wrote: Hi, Checkout CASSANDRA-8899, my guess is that you have to increase the timeout in cqlsh. /Tommy On 2015-04-22 11:1

Re: Reading hundreds of thousands of rows at once?

2015-04-22 Thread Robert Wille
Add more nodes to your cluster On Apr 22, 2015, at 1:39 AM, John Anderson mailto:son...@gmail.com>> wrote: Hey, I'm looking at querying around 500,000 rows that I need to pull into a Pandas data frame for processing. Currently testing this on a single cassandra node it takes around 21 seconds

Re: Delete-only work loads crash Cassandra

2015-04-15 Thread Robert Wille
I can readily reproduce the bug, and filed a JIRA ticket: https://issues.apache.org/jira/browse/CASSANDRA-9194 I’m posting for posterity On Apr 13, 2015, at 11:59 AM, Robert Wille mailto:rwi...@fold3.com>> wrote: Unfortunately, I’ve switched email systems and don’t have my emails fro

Re: Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
lear. If so, what was the JIRA #? Have you filed a JIRA for the new problem? On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I did lots of deletes and no upserts, Cassandra woul

Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I did lots of deletes and no upserts, Cassandra would report that the memtable was 0 bytes because an accounting error. The memtable would never flush and Cassandra would eventually die. Someone was kind enough to create

Help understanding aftermath of death by GC

2015-03-31 Thread Robert Wille
I moved my site over to Cassandra a few months ago, and everything has been just peachy until a few hours ago (yes, it would be in the middle of the night) when my entire cluster suffered death by GC. By death by GC, I mean this: [rwille@cas031 cassandra]$ grep GC system.log | head -5 INFO [Sch

Re: Arbitrary nested tree hierarchy data model

2015-03-28 Thread Robert Wille
Ben Bromhead sent an email to me directly and expressed an interest in seeing some of my queries. I may as well post them for everyone. Here are my queries for the part of my code that reads and cleans up browse trees. @NamedCqlQueries({ @NamedCqlQuery( name = DocumentBrowseDaoImpl.Q_CHECK_TREE_

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Robert Wille
me to dig a little deeper, I’d be happy to. Just email me. Robert On Mar 27, 2015, at 5:35 PM, Ben Bromhead mailto:b...@instaclustr.com>> wrote: +1 would love to see how you do it On 27 March 2015 at 07:18, Jonathan Haddad mailto:j...@jonhaddad.com>> wrote: I'd be interested

Re: Arbitrary nested tree hierarchy data model

2015-03-26 Thread Robert Wille
I have a cluster which stores tree structures. I keep several hundred unrelated trees. The largest has about 180 million nodes, and the smallest has 1 node. The largest fanout is almost 400K. Depth is arbitrary, but in practice is probably less than 10. I am able to page through children and sib

Re: using or in select query in cassandra

2015-03-02 Thread Robert Wille
I would also like to add that if you avoid IN and use async queries instead, it is pretty trivial to use a semaphore or some other limiting mechanism to put a ceiling on the amount on concurrent work you are sending to the cluster. If you use a query with an IN clause with a thousand things, you

Unexplained query slowness

2015-02-25 Thread Robert Wille
Our Cassandra database just rolled to live last night. I’m looking at our query performance, and overall it is very good, but perhaps 1 in 10,000 queries takes several hundred milliseconds (up to a full second). I’ve grepped for GC in the system.log on all nodes, and there aren’t any recent GC e

nodetool repair options

2015-01-23 Thread Robert Wille
nodetool repair has some options that I don’t understand. Reading the documentation doesn’t exactly make things more clear. I’m running a 2.0.11 cluster with vnodes and a single data center. The docs say "Use -pr to repair only the first range returned by the partitioner”. What does this mean?

Why does C* repeatedly compact the same tables over and over?

2015-01-08 Thread Robert Wille
After bootstrapping a node, the node repeatedly compacts the same tables over and over, even though my cluster is completely idle. I’ve noticed the same behavior after extended periods of heavy writes. I realize that during bootstrapping (or extended periods of heavy writes) that compaction coul

Re: Hinted handoff not working

2014-12-16 Thread Robert Wille
4, 2014, at 11:44 PM, Jens Rantil mailto:jens.ran...@tink.se>> wrote: Hi Robert , Maybe you need to flush your memtables to actually see the disk usage increase? This applies to both hosts. Cheers, Jens On Sun, Dec 14, 2014 at 3:52 PM, Robert Wille mailto:rwi...@fold3.com>> wr

Re: does consistency=ALL for deletes obviate the need for tombstones?

2014-12-16 Thread Robert Wille
Tombstones have to be created. The SSTables are immutable, so the data cannot be deleted. Therefore, a tombstone is required. The value you deleted will be physically removed during compaction. My workload sounds similar to yours in some respects, and I was able to get C* working for me. I have

Re: Hinted handoff not working

2014-12-14 Thread Robert Wille
cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__hinted_handoff_enabled > > Rahul > >> On Dec 14, 2014, at 9:46 AM, Robert Wille wrote: >> >> I have a cluster with RF=3. If I shut down one node, add a bunch of data to

Hinted handoff not working

2014-12-14 Thread Robert Wille
I have a cluster with RF=3. If I shut down one node, add a bunch of data to the cluster, I don’t see a bunch of records added to system.hints. Also, du of /var/lib/cassandra/data/system/hints of the nodes that are up shows that hints aren’t being stored. When I start the down node, its data does

Observations/concerns with repair and hinted handoff

2014-12-09 Thread Robert Wille
I have spent a lot of time working with single-node, RF=1 clusters in my development. Before I deploy a cluster to our live environment, I have spent some time learning how to work with a multi-node cluster with RF=3. There were some surprises. I’m wondering if people here can enlighten me. I do

Pros and cons of lots of very small partitions versus fewer larger partitions

2014-12-05 Thread Robert Wille
At the data modeling class at the Cassandra Summit, the instructor said that lots of small partitions are just fine. I’ve heard on this list that that is not true, and that its better to cluster small partitions into fewer, larger partitions. Due to conflicting information on this issue, I’d be

Repair taking many snapshots per minute

2014-12-04 Thread Robert Wille
This is a follow-up to my previous post “Cassandra taking snapshots automatically?”. I’ve renamed the thread to better describe the new information I’ve discovered. I have a four node, RF=3, 2.0.11 cluster that was producing snapshots at a prodigious rate. I let the cluster sit idle overnight t

Re: Cassandra taking snapshots automatically?

2014-12-03 Thread Robert Wille
14 at 10:25:12 AM Robert Wille mailto:rwi...@fold3.com>> wrote: I built my first multi-node cluster and populated it with a bunch of data, and ran out of space far more quickly than I expected. On one node, I ended up with 76 snapshots, consuming a total of 220 GB of space. I only have 4

Re: Recommissioned node is much smaller

2014-12-03 Thread Robert Wille
really bad shuffle. Did you run removenode on the old host after you took it down (I assume so since all nodes are in UN status)? Is the test node in its own seeds list? On Tue Dec 02 2014 at 4:10:10 PM Robert Wille mailto:rwi...@fold3.com>> wrote: I didn’t do anything except kill t

Re: Cassandra taking snapshots automatically?

2014-12-03 Thread Robert Wille
r.html#reference_ds_qfg_n1r_1k__snapshot_before_compaction On Wed Dec 03 2014 at 10:25:12 AM Robert Wille mailto:rwi...@fold3.com>> wrote: I built my first multi-node cluster and populated it with a bunch of data, and ran out of space far more quickly than I expected. On one node, I ended

Cassandra taking snapshots automatically?

2014-12-03 Thread Robert Wille
I built my first multi-node cluster and populated it with a bunch of data, and ran out of space far more quickly than I expected. On one node, I ended up with 76 snapshots, consuming a total of 220 GB of space. I only have 40 GB of data. It took several snapshots per hour, sometimes within a min

Re: Recommissioned node is much smaller

2014-12-02 Thread Robert Wille
3:38 PM, Tyler Hobbs mailto:ty...@datastax.com>> wrote: On Tue, Dec 2, 2014 at 2:21 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: As a a test, I took down a node, deleted /var/lib/cassandra and restarted it. Did you decommission or removenode it when you took it down? If you

Re: Recommissioned node is much smaller

2014-12-02 Thread Robert Wille
at 12:21 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: As a a test, I took down a node, deleted /var/lib/cassandra and restarted it. After it joined the cluster, it’s about 75% the size of its neighbors (both in terms of bytes and numbers of keys). Prior to my test it was approximate

Recommissioned node is much smaller

2014-12-02 Thread Robert Wille
As a a test, I took down a node, deleted /var/lib/cassandra and restarted it. After it joined the cluster, it’s about 75% the size of its neighbors (both in terms of bytes and numbers of keys). Prior to my test it was approximately the same size. I have no explanation for why that node would shr

Re: Column family ID mismatch-Error on concurrent schema modifications

2014-11-28 Thread Robert Wille
I would suggest that dynamic table creation is, in general, not a great idea, regardless of the database. I would seriously consider altering your approach to use a fixed set of tables. On Nov 28, 2014, at 1:53 AM, Marcus Olsson mailto:marcus.ols...@ericsson.com>> wrote: Hi, We encountered th

Partial replication to a DC

2014-11-25 Thread Robert Wille
Is it possible to replicate a subset of the keyspaces to a data center? For example, if I want to run reports without impacting my production nodes, can I put the relevant column families in a keyspace and create a DC for reporting that replicates only that keyspace? Robert

Rule of thumb for concurrent asynchronous queries?

2014-11-25 Thread Robert Wille
Suppose I have the primary keys for 10,000 rows and I want them all. Is there a rule of thumb for the maximum number of concurrent asynchronous queries I should execute?

Re: Getting the counters with the highest values

2014-11-24 Thread Robert Wille
and aggregate at read time), or you can make each row a rolling 24 hours (aggregating at write time), depending on which use case fits your needs better. On Sun Nov 23 2014 at 8:42:11 AM Robert Wille mailto:rwi...@fold3.com>> wrote: I’m working on moving a bunch of counters out of our relati

Getting the counters with the highest values

2014-11-23 Thread Robert Wille
I’m working on moving a bunch of counters out of our relational database to Cassandra. For the most part, Cassandra is a very nice fit, except for one feature on our website. We manage a time series of view counts for each document, and display a list of the most popular documents in the last se

LOCAL_* consistency levels

2014-10-14 Thread Robert Wille
I’m wondering if there’s a best practice for an annoyance I’ve come across. Currently all my environments (dev, staging and live) have a single DC. In the future my live environment will most likely have a second DC. When that happens, I’ll want to use LOCAL_* consistency levels. However, if I w

Deleting counters

2014-10-09 Thread Robert Wille
At the Cassandra Summit I became aware of that there are issues with deleting counters. I have a few questions about that. What is the bad thing that happens (or can possibly happen) when a counter is deleted? Is it safe to delete an entire row of counters? Is there any 2.0.x version of Cassandr

Re: IN versus multiple asynchronous queries

2014-10-06 Thread Robert Wille
he coordinator memory. On Sat, Oct 4, 2014 at 3:09 PM, Robert Wille mailto:rwi...@fold3.com>> wrote: I have a table of small documents (less than 1K) that are often accessed together as a group. The group size is always less than 50. Which produces less load on the server, one query us

Cassandra + Solr

2014-10-04 Thread Robert Wille
I am architecting a solution for moving a large number of documents out of our MySQL database to C*. We use Solr to index these documents. I’ve recently become aware of a few different packages that integrate C* and Solr. At first blush, this seems like the perfect fit, as it would eliminate a c

IN versus multiple asynchronous queries

2014-10-04 Thread Robert Wille
I have a table of small documents (less than 1K) that are often accessed together as a group. The group size is always less than 50. Which produces less load on the server, one query using an IN clause to get all 50 back together, or 50 concurrent queries? Which one is fastest? Thanks Robert

2.0 or 2.1?

2014-09-12 Thread Robert Wille
I’m in a fairly unique position. Almost a year ago I developed code to migrate part of our MySQL database to Cassandra. Shortly after 2.0.6 was released, I was on the verge of rolling it to live when my project got shelved, and my team got put on a completely different product. In a month or two

Re: Manually deleting sstables

2014-08-21 Thread Robert Wille
> > 2) Are there any other recommended procedures for this? 0) stop writes to columnfamily 1) TRUNCATE columnfamily; 2) nodetool clearsnapshot # on the snapshot that results 3) DROP columnfamily; My two cents here is that this process is extremely difficult to automate, making testing that i

Re: Securing Cassandra database

2014-04-05 Thread Robert Wille
Password protection doesn¹t protect against an engineer accidentally running test cases using the live config file instead of the test config file. To protect against that, our RDBMS system will only accept connections from certain IP addresses. Is there an equivalent thing in Cassandra, or should

Re: Flushing after dropping a column family

2014-02-26 Thread Robert Wille
I use truncate between my test cases. Never had a problem with one test case inheriting the data from the previous one. I¹m using a single node, so that may be why. On 2/26/14, 9:27 AM, "Ben Hood" <0x6e6...@gmail.com> wrote: >On Wed, Feb 26, 2014 at 3:58 PM, DuyHai Doan wrote: >> Try truncate fo

Re: [OT]: Can I have a non-delivering subscription?

2014-02-22 Thread Robert Wille
Yeah, it¹s called a rule. Set one up to delete everything from user@cassandra.apache.org. On 2/22/14, 10:32 AM, "Paul "LeoNerd" Evans" wrote: >A question about the mailing list itself, rather than Cassandra. > >I've re-subscribed simply because I have to be subscribed in order to >send to the li

  1   2   >