Re: Are Triggers in Cassandra 2.1.2 performace Hog??

2015-01-07 Thread Jonathan Haddad
+1. Don't use triggers. On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli wrote: > On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK > wrote: >> >> We are trying to integrate elasticsearch with Cassandra and as the river >> plugin uses select * from any table it seems to be bad performance choice. >> So

Re: Storing PDF data on Cassandra db

2015-01-13 Thread Jonathan Haddad
For a new user, there's no point in learning Thrift if that user intends on upgrading past the version that they start with. Thrift is a deprecated protocol and there's no new functionality going into it. In 3.0 the sstable format is being upgraded to work primarily with native CQL partitions / r

Re: number of replicas per data center?

2015-01-18 Thread Jonathan Haddad
Personally I wouldn't go < 3 unless you have a good reason. On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton wrote: > How do people normally setup multiple data center replication in terms of > number of *local* replicas? > > So say you have two data centers, do you have 2 local replicas, for a > t

Re: number of replicas per data center?

2015-01-18 Thread Jonathan Haddad
local > replicas per datacenter? > > On Sun, Jan 18, 2015 at 7:53 PM, Jonathan Haddad > wrote: > >> Personally I wouldn't go < 3 unless you have a good reason. >> >> >> On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton wrote: >> >>> How d

Re: Disabling the write ahead log with 2 data centers?

2015-01-23 Thread Jonathan Haddad
Well... it depends. Are you saying whenever a machine dies, or any reason, you'd bootstrap a new one in it's place? Or do you just not care about the data? There are cases where it might be ok (if you're using Cassandra as a cache) but if it's your source of truth I think this is likely to bite

Re: Performance difference between Regular Statement Vs PreparedStatement

2015-01-29 Thread Jonathan Haddad
+1 to everything Eric said. The penalty of not using token aware routing increases as you add nodes, load, and network overhead. This is kind of like batch statements. People use them in dev, with 1 node, and think they're great to help with performance. But when you put them in production... n

Re: Upgrading from Cassandra 1.2.14 to Cassandra 2.10

2015-01-29 Thread Jonathan Haddad
> Once they have fully joined the cluster I would like to decommission a single Cassandra 1.2.14 instance, and repeat. Do not do this. Upgrade your nodes in place. On Thu Jan 29 2015 at 6:17:26 AM Sibbald, Charles wrote: > Hi All, > > I am looking into the possibility of upgrading from Cassa

Re: Writing the same column frequently - anti pattern?

2015-02-05 Thread Jonathan Haddad
Well... this is actually only true if your server times are perfectly in sync. The reality is if 1 server is 50ms ahead and 1 is 50 behind, your will actually end up with unpredictable results. On Thu Feb 05 2015 at 4:22:43 PM Philip Thompson < philip.thomp...@datastax.com> wrote: > You are corr

Re: Recommissioned a node

2015-02-11 Thread Jonathan Haddad
It could, because the tombstones that mark data deleted may have been removed. There would be nothing that says "this data is gone". If you're worried about it, turn up your gc grace seconds. Also, don't revive nodes back into a cluster with old data sitting on them. On Wed Feb 11 2015 at 11:18

Re: Recommissioned a node

2015-02-11 Thread Jonathan Haddad
And after decreasing your RF (rare but happens) On Wed Feb 11 2015 at 11:31:38 AM Robert Coli wrote: > On Wed, Feb 11, 2015 at 11:20 AM, Jonathan Haddad > wrote: > >> It could, because the tombstones that mark data deleted may have been >> removed. There would be nothing

Re: PySpark and Cassandra integration

2015-02-20 Thread Jonathan Haddad
Awesome! On Fri Feb 20 2015 at 10:23:54 AM Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > I will try it for sure Frens, very nice! > Thanks for sharing! > > From: user@cassandra.apache.org > Subject: Re:PySpark and Cassandra integration > > Hi all, > > Wanted to let you

Re: One node taking more resources than others in the ring

2015-02-23 Thread Jonathan Haddad
If you're not using prepared statements you won't get any token aware routing. That's an even better option than round robin since it reduces the number of nodes involved. On Mon, Feb 23, 2015 at 4:48 PM Robert Coli wrote: > On Mon, Feb 23, 2015 at 3:42 PM, Jaydeep Chovatia < > chovatia.jayd...@g

Re: Node stuck in joining the ring

2015-02-26 Thread Jonathan Haddad
I've seen this before, when I tried to be clever and add nodes of a different major version into a cluster. Any chance that's what's happening here? > On Feb 25, 2015, at 4:52 PM, Robert Coli wrote: > >> On Wed, Feb 25, 2015 at 3:38 PM, Batranut Bogdan wrote: >> I have a new node that I want

Re: using or in select query in cassandra

2015-03-02 Thread Jonathan Haddad
I'd like to add that in() is usually a bad idea. It is convenient, but not really what you want in production. Go with Jens' original suggestion of multiple queries. I recommend reading Ryan Svihla's post on why in() is generally a bad thing: http://lostechies.com/ryansvihla/2014/09/22/cassandra

Re: Running Cassandra on mixed OS

2015-03-02 Thread Jonathan Haddad
I would really not recommend this. There's enough issues that can come up with a distributed database that can make it hard to pinpoint problems. In an ideal world, every machine would be completely identical. Don't set yourself up for fail. Pin the OS & all packages to specific versions. On M

Re: Documentation of batch statements

2015-03-03 Thread Jonathan Haddad
Actually, that's not true either. It's technically possible for a batch to be partially applied in the current implementation, even with logged batches. "atomic" is used incorrectly here, imo, since more than 2 states can be visible, unapplied & applied. On Tue, Mar 3, 2015 at 9:26 AM Michael Dy

Re: Documentation of batch statements

2015-03-03 Thread Jonathan Haddad
if no intermedia states can be observed. It seems to jump directly from the initial state to the result state." - Concepts, Techniques, and Models of Computer Programming By Peter Van-Roy, Seif Haridi On Tue, Mar 3, 2015 at 2:30 PM Tyler Hobbs wrote: > > On Tue, Mar 3, 2015 at 2:39 PM,

Re: CQL 3.x Update ...USING TIMESTAMP...

2015-03-12 Thread Jonathan Haddad
In most datacenters you're going to see significant variance in your server times. Likely > 20ms between servers in the same rack. Even google, using atomic clocks, has 1-7ms variance. [1] I would +1 Tyler's advice here, as using the clocks is only valid if clocks are perfectly sync'ed, which t

Re: nodetool help

2015-03-16 Thread Jonathan Haddad
Be careful w/ that script if you're looking to upgrade, it nukes your data directory. sudo rm -rf /var/lib/cassandra/data/system/* On Mon, Mar 16, 2015 at 1:41 PM, Ali Akhtar wrote: > https://gist.github.com/aliakhtar/3649e412787034156cbb > > Best run from a fresh ubuntu server. > > On Tue, Mar

Re: upgrade from 1.0.12 to 1.1.12

2015-03-24 Thread Jonathan Haddad
Streaming is repair, adding & removing nodes. In general it's a bad idea to do any streaming op when you've got an upgrade in progress. On Tue, Mar 24, 2015 at 3:14 AM Jason Wee wrote: > Hello, > > Reading this documentation http://www.datastax.com/docs/ > 1.1/install/upgrading > > If you are u

Re: upgrade from 1.0.12 to 1.1.12

2015-03-25 Thread Jonathan Haddad
need to run nodetool upgradesstables as > stipulated in version 1.1.3 ? > > > > jason > > > > On Wed, Mar 25, 2015 at 1:04 AM, Jonathan Haddad > wrote: > >> Streaming is repair, adding & removing nodes. In general it's a bad > >> idea to do any

Re: Arbitrary nested tree hierarchy data model

2015-03-27 Thread Jonathan Haddad
I'd be interested to see that data model. I think the entire list would benefit! On Thu, Mar 26, 2015 at 8:16 PM Robert Wille wrote: > I have a cluster which stores tree structures. I keep several hundred > unrelated trees. The largest has about 180 million nodes, and the smallest > has 1 node. T

Re: upgrade from 1.0.12 to 1.1.12

2015-03-27 Thread Jonathan Haddad
Running upgrade is a noop if the tables don't need to be upgraded. I consider the cost of this to be less than the cost of missing an upgrade. On Thu, Mar 26, 2015 at 4:23 PM Robert Coli wrote: > On Wed, Mar 25, 2015 at 7:16 PM, Jonathan Haddad > wrote: > >> There'

Re: ('Unable to complete the operation against any hosts', {})

2015-03-28 Thread Jonathan Haddad
Don't use batches for this. Use a lot of async queries. https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/ Jon > On Mar 27, 2015, at 5:24 AM, Rahul Bhardwaj > wrote: > > Hi All, > > > > We are using cassandra version 2.1.2 with cqlsh 5.0.1 (cl

Re: Column value not getting updated

2015-03-31 Thread Jonathan Haddad
It's not enough to set up ntp, you're going to need to force the time to sync. ntp is only meant to correct for drift. You can either use ntpdate or I think there's a flag for ntpd (that I can't remember and am in a rush out the door) that you can use to force it to adjust to the correct time. O

Re: Frequent timeout issues

2015-04-02 Thread Jonathan Haddad
@Daemeon you may want to read through https://issues.apache.org/jira/browse/CASSANDRA-8150, there are perfectly valid cases for heap > 16gb. On Thu, Apr 2, 2015 at 10:07 AM daemeon reiydelle wrote: > May not be relevant, but what is the "default" heap size you have > deployed. Should be no more

Re: What's to think of when increasing disk size on Cassandra nodes?

2015-04-08 Thread Jonathan Haddad
Agreed with Jack. Cassandra is a database meant to scale horizontally by adding nodes, and what you're describing is vertical scale. Aside from the vertical scale issue, unless you're running a very specific workload (time series data w/ Date Tiered Compaction) and you REALLY know what you're doi

Re: Moving SSTables from one disk to another

2015-04-10 Thread Jonathan Haddad
I had submitted this issue which could have had (in theory) some serious performance benefit when using JBOD: https://issues.apache.org/jira/browse/CASSANDRA-8868 However, it was pointed out to me that https://issues.apache.org/jira/browse/CASSANDRA-6696 will be a better solution in a lot of cases

Re: rpc_timeout error

2015-04-11 Thread Jonathan Haddad
Maybe you're inserting nulls? If you're inserting nulls, those show up as tombstones. On Sat, Apr 11, 2015 at 1:32 PM Amlan Roy wrote: > Hi, > > I am trying to query a table from cqlsh and I get the following error: > *Request did not complete within rpc_timeout.* > > I found the following mess

Re: rpc_timeout error

2015-04-11 Thread Jonathan Haddad
> Average tombstones per slice (last five minutes): 40169.0 > > Regards. > On 12-Apr-2015, at 2:38 am, Jonathan Haddad wrote: > > Maybe you're inserting nulls? If you're inserting nulls, those show up as > tombstones. > > On Sat, Apr 11, 2015 at 1:32 PM Amlan Roy wr

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Jonathan Haddad
Ideally you'll be on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson wrote: > Hi all, > > > > I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, > just as a POC. Cassandra servers con

Re: Creating 'Put' requests

2015-04-24 Thread Jonathan Haddad
There's also Achilles: https://github.com/doanduyhai/Achilles On Fri, Apr 24, 2015 at 1:21 PM Jens Rantil wrote: > Matthew, > > Maybe this could also be of interest: > http://projects.spring.io/spring-data-cassandra/ > > Cheers, > Jens > > On Fri, Apr 24, 2015 at 12:50 PM, Phil Yang wrote: > >>

Re: Creating 'Put' requests

2015-04-24 Thread Jonathan Haddad
To add to Phil's point, there's no circumstance in which I would use an unlogged batch, under load I have yet to hear it do anything other than increase GC pauses. On Fri, Apr 24, 2015 at 11:50 AM Phil Yang wrote: > 2015-04-23 22:16 GMT+08:00 Matthew Johnson : >> >> In HBase, we do something lik

Re: Inserting null values

2015-04-29 Thread Jonathan Haddad
Enough tombstones can inflate the size of an SSTable causing issues during compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's no clustering key defined. Perhaps an edge case, but worth considering. On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens wrote: > Correct me if I'm wr

Re: Consistency

2015-04-29 Thread Jonathan Haddad
There's a lot going on, reading through some docs is probably your best bet: http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html On Wed, Apr 29, 2015 at 8:57 AM Nikolay Tikhonov wrote: > Hi, > > I try to understand how to Cassandra supports data consistency and

Re: Consistency

2015-04-30 Thread Jonathan Haddad
You can connect to any node in the cluster to issue a query. For that request, it's called the coordinator. The coordinator will figure out which node to talk to. The DataStax native drivers can use what's called token aware queries, in that they'll connect to one of the nodes that owns the data

Re: DateTieredCompactionStrategy and static columns

2015-04-30 Thread Jonathan Haddad
I suspect this will kill the benefit of DTCS, but haven't tested it to be 100% here. The benefit of DTCS is that sstables are selected for compaction based on the age of the data, not their size. When you mix TTL'ed data and non TTL'ed data, you end up screwing with the "drop the entire SSTable"

Re: Hive support on Cassandra

2015-05-07 Thread Jonathan Haddad
You may find Spark to be useful. You can do SQL, but also use Python, Scala or Java. I wrote a post last week on getting started with DataFrames & Spark, which you can register as tables & query using Hive compatible SQL: http://rustyrazorblade.com/2015/05/on-the-bleeding-edge-pyspark-dataframes-

Re: Viewing Cassandra's Internal table Structure in a CQL world

2015-05-13 Thread Jonathan Haddad
In Cassandra 3.0 there will be a massive rewrite of what an sstable even is, and the cli will be totally useless to inspect it. there won't be "column names" anymore, timestamps will be stored once per row (assuming they're the same) and a whole slew of other optimizations. If you want to look at

Re: Does Cassandra CQL supports 'Create Table as Select'?

2015-05-19 Thread Jonathan Haddad
It's not built into Cassandra. You'll probably want to take a look at Apache Spark & the DataStax connector. https://github.com/datastax/spark-cassandra-connector Jon On Tue, May 19, 2015 at 10:29 PM amit tewari wrote: > Hi > > We would like to have the ability of being able to create new tab

Re: Does Cassandra CQL supports 'Create Table as Select'?

2015-05-19 Thread Jonathan Haddad
Here's a simple example I did a little while ago that might be helpful: https://github.com/rustyrazorblade/spark-data-migration On Tue, May 19, 2015 at 10:53 PM Jonathan Haddad wrote: > It's not built into Cassandra. You'll probably want to take a look at > Apache

Re: Multiple cassandra instances per physical node

2015-05-21 Thread Jonathan Haddad
If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar wrote: > +1. > > I agree we need to be able to run multiple server instances on one > physical mach

Re: Does Cassandra support "delayed" cross-datacenter replication?

2015-05-21 Thread Jonathan Haddad
No. On Thu, May 21, 2015 at 7:07 AM Eax Melanhovich wrote: > Say I would like to have a replica cluster, which state is a state of > real cluster 12 hours ago. Does Cassandra support such feature? > > -- > Best regards, > Eax Melanhovich > http://eax.me/ >

Re: Multiple cassandra instances per physical node

2015-05-21 Thread Jonathan Haddad
> > Also ... wrt the container talk, is that a Docker container you're > talking about? > > > > On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad > wrote: > >> If you run it in a container with dedicated IPs it'll work just fine. >> Just be sure you

Re: Multiple cassandra instances per physical node

2015-05-21 Thread Jonathan Haddad
their own IPs and bind to default > ports. > > @Jonathan Haddad thanks for the blog post. To ensure the same host does > not replicate its own data, would I basically need the nodes on a single > host to be labeled as one rack? (Assuming I use vnodes) > > On Thu, May 21, 2015

Re: Multiple cassandra instances per physical node

2015-05-24 Thread Jonathan Haddad
What impact would vnodes have on strong consistency? I think the problem you're describing exists with or without them. On Sat, May 23, 2015 at 2:30 PM Nate McCall wrote: > >> So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra >> nodes (each with 5 data disks, 1 commit log dis

Re: 10000+ CF support from Cassandra

2015-05-28 Thread Jonathan Haddad
While Graham's suggestion will let you collapse a bunch of tables into a single one, it'll likely result in so many other problems it won't be worth the effort. I strongly advise against this approach. First off, different workloads need different tuning. Compaction strategies, gc_grace_seconds,

Re: 10000+ CF support from Cassandra

2015-06-01 Thread Jonathan Haddad
> Sorry for this naive question but how important is this tuning? Can this have a huge impact in production? Massive. Here's a graph of when we did some JVM tuning at my previous company: http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png About an or

Re: Reading too many tombstones

2015-06-04 Thread Jonathan Haddad
DateTiered is fantastic if you've got time series, TTLed data. That means no updates to old data. On Thu, Jun 4, 2015 at 10:58 AM Aiman Parvaiz wrote: > Hi everyone, > We are running a 10 node Cassandra 2.0.9 without vnode cluster. We are > running in to a issue where we are reading too many to

Re: Atomic behavior and efficiency of a DELETE query with an IN clause

2015-06-10 Thread Jonathan Haddad
Batches don't work like that. It's possible for some to succeed, and later, the rest will. Atomic is the incorrect word to use, it's more like "eventually they will all go through". Do not use IN(), use a whole bunch of prepared statements asynchronously. On Wed, Jun 10, 2015 at 9:26 AM Sotirio

Re: Atomic behavior and efficiency of a DELETE query with an IN clause

2015-06-12 Thread Jonathan Haddad
> DELETE FROM MastersOfTheUniverse WHERE mastersID = ?; > > and execute it asynchronously 3000 times or add 3000 of these DELETE (bound) > prepared statements to a BATCH statement executed asynchronously? > > > > > > > On Wednesday, June 10, 2015 9:51 AM, Jonat

Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Jonathan Haddad
How much memory do you have? Recently people have been seeing really great performance using G1GC with heaps > 8GB and offheap memtable objects. On Thu, Jun 18, 2015 at 1:31 AM Jason Wee wrote: > okay, iirc memtable has been removed off heap, google and got this > http://www.datastax.com/dev/bl

Re: nodetool repair

2015-06-18 Thread Jonathan Haddad
If you're using DSE, you can schedule it automatically using the repair service. If you're open source, check out Spotify cassandra reaper, it'll manage it for you. https://github.com/spotify/cassandra-reaper On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay < jean.tremb...@zen-innovations.com> w

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently using the thrift api to execute CQL until the native driver is out of beta. I'm a little biased in recommending it, since I'm one of the primary authors. If you've got cqlengine specific questions, head to the mailing lis

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
one > > > On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad wrote: > >> So, for cqlengine (https://github.com/cqlengine/cqlengine), we're >> currently using the thrift api to execute CQL until the native driver is >> out of beta. I'm a little biased in r

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
and have contributed some to its development. >>>>>> >>>>>> I have been careful to not push too fast on features until we need >>>>>> them. For example, we have just started using prepared statements - >>>>>> working >&g

Re: Choosing python client lib for Cassandra

2013-11-26 Thread Jonathan Haddad
No, 2.7 only. On Tue, Nov 26, 2013 at 3:04 PM, Kumar Ranjan wrote: > Hi Jonathan - Does cqlengine have support for python 2.6 ? > > > On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad wrote: > >> cqlengine supports batch queries, see the docs here: >> http://cqlengin

Re: cassandra performance problems

2013-12-05 Thread Jonathan Haddad
Do you mean high CPU usage or high load avg? (20 indicates load avg to me). High load avg means the CPU is waiting on something. Check "iostat -dmx 1 100" to check your disk stats, you'll see the columns that indicate mb/s read & write as well as % utilization. Once you understand the bottlenec

new project - Under Siege

2013-12-05 Thread Jonathan Haddad
I've recently pushed up a new project to github, which we've named Under Siege. It's a java agent for reporting Cassandra metrics to statsd. We've in the process of deploying it to our production clusters. Tested against Cassandra 1.2.11. The metrics library seems to change on every release of

Re: cassandra backup

2013-12-06 Thread Jonathan Haddad
I believe SSTables are written to a temporary file then moved. If I remember correctly, tools like tablesnap listen for the inotify event IN_MOVED_TO. This should handle the "try to back up sstable while in mid-write" issue. On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux wrote: > Hi Marcelo,

Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Jonathan Haddad
Please include the output of "nodetool ring", otherwise no one can help you. On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma wrote: > Any pointers? I am planning to do rolling restart of the cluster nodes to > see if it will help. > On Jan 15, 2014 2:59 PM, "Narendra Sharma" > wrote: > >> RF

Re: Cassandra ring not behaving like a ring

2014-01-16 Thread Jonathan Haddad
t; I am new to Cassandra Environment, does the order of the ring matter, as > long as the member joins the group? > > Yogi > > > On Thu, Jan 16, 2014 at 12:49 PM, Jonathan Haddad wrote: > >> Please include the output of "nodetool ring", otherwise no one can hel

Re: Recommended OS

2014-02-12 Thread Jonathan Haddad
I just would advise against it because it's going to be difficult to narrow down what's causing problems. For instance, if you have "Node A" which is performing GC, it will affect query times on "Node B" which is trying to satisfy a quorum read. "Node B" might actually have very low load, and it

abusing cassandra's multi DC abilities

2014-02-21 Thread Jonathan Haddad
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when changed data from DC1 shows up in DC2. Full Story: We're planning on adding data centers throughout the US. Our platform is used for business communications. Each DC currently utilizes elastic search and redis. A message can

abusing cassandra's multi DC abilities

2014-02-22 Thread Jonathan Haddad
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when changed data from DC1 shows up in DC2. Full Story: We're planning on adding data centers throughout the US. Our platform is used for business communications. Each DC currently utilizes elastic search and redis. A message can

Re: abusing cassandra's multi DC abilities

2014-02-24 Thread Jonathan Haddad
#x27;d urge you > to really consider another approach. > > Best, > Todd > > > On Saturday, February 22, 2014, Jonathan Haddad wrote: > >> Upfront TLDR: We want to do stuff (reindex documents, bust cache) when >> changed data from DC1 shows up in DC2. >> >

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
I have a nagging memory of reading about issues with virtualization and not actually having durable versions of your data even after an fsync (within the VM). Googling around lead me to this post: http://petercai.com/virtualization-is-bad-for-database-integrity/ It's possible you're hitting this

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
d. > > Does Cassandra quiesce the file system after a snapshot using fsfreeze or > xfs_freeze? Somehow I doubt it... > > > On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote: > >> I have a nagging memory of reading about issues with virtualization and >> not a

Re: Cassandra Snapshots giving me corrupted SSTables in the logs

2014-03-28 Thread Jonathan Haddad
Fri, Mar 28, 2014 at 1:32 PM, Laing, Michael wrote: > +1 for tablesnap > > > On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad wrote: > >> I will +1 the recommendation on using tablesnap over EBS. S3 is at least >> predictable. >> >> Additionally, from a pra

Re: Tune cache MB settings per table.

2014-06-01 Thread Jonathan Haddad
I think of all the areas you could spend your time, this will have the least returns. The OS will keep the most frequently used data in memory. There's no reason to require cassandra to do it. If you're curious as to what's been loaded into ram, try Al Tobey's pcstat utility. https://github.com

Re: Customized Compaction Strategy: Dev Questions

2014-06-04 Thread Jonathan Haddad
I'd suggest creating 1 table per day, and dropping the tables you don't need once you're done. On Wed, Jun 4, 2014 at 10:44 AM, Redmumba wrote: > Sorry, yes, that is what I was looking to do--i.e., create a > "TopologicalCompactionStrategy" or similar. > > > On Wed, Jun 4, 2014 at 10:40 AM, Rus

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-05 Thread Jonathan Haddad
You should read through the token docs, it has examples and specifications: http://cassandra.apache.org/doc/cql3/CQL.html#tokenFun On Thu, Jun 5, 2014 at 10:22 PM, Kevin Burton wrote: > I'm building a new schema which I need to read externally by paging > through the result set. > > My understa

Re: Bad Request: Type error: cannot assign result of function token (type bigint) to id (type int)

2014-06-05 Thread Jonathan Haddad
Sorry, the datastax docs are actually a bit better: http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html Jon On Thu, Jun 5, 2014 at 10:46 PM, Jonathan Haddad wrote: > You should read through the token docs, it has examples and > specifications: http://cassandra.apac

Re: VPC AWS

2014-06-06 Thread Jonathan Haddad
This may not help you with the migration, but it may with maintenance & management. I just put up a blog post on managing VPC security groups with a tool I open sourced at my previous company. If you're going to have different VPCs (staging / prod), it might help with managing security groups. h

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
Your other option is to fire off async queries. It's pretty straightforward w/ the java or python drivers. On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle wrote: > I was taking a look at Cassandra anti-patterns list: > > http://www.datastax.com/documentation/cassandra/2.0/cassandra/arch

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
> in (0, 1, 2)), I would do several, one for each value of "values" array > above. > In my head, this would mean more connections to Cassandra and the same > amount of work, right? What would be the advantage? > > []s > > > > > 2014-06-19 22:01 GMT-03:0

Re: Best way to do a multi_get using CQL

2014-06-19 Thread Jonathan Haddad
it be a problem? > Or when you use async the driver reuses the connection? > > []s > > > 2014-06-19 22:16 GMT-03:00 Jonathan Haddad : > >> If you use async and your driver is token aware, it will go to the >> proper node, rather than requiring the coordinator to do so.

Re: Best way to do a multi_get using CQL

2014-06-20 Thread Jonathan Haddad
t;>>>> network >>>>>>> round trips. >>>>>>> >>>>>>> With large numbers of queries you will still want to make sure you >>>>>>> split them into manageable batches before sending them, to control >>>&

Re: Adding large text blob causes read timeout...

2014-06-24 Thread Jonathan Haddad
Can you do you query in the cli after setting "tracing on"? On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan wrote: > Yes but adding the extra one ends up by * 1000. The limit in CQL3 > specifies the number of logical rows, not the number of physical columns in > the storage engine > Le 24 juin 20

Re: Triggers and their use in data indexing

2014-07-03 Thread Jonathan Haddad
Triggers only execute on the local coordinator. I would also not recommend using them. On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli wrote: > On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra > wrote: >> >> Now the question: is there any way to use triggers so that they will >> locally index data fro

Re: Triggers and their use in data indexing

2014-07-03 Thread Jonathan Haddad
4 at 10:04 AM, Jonathan Haddad wrote: > Triggers only execute on the local coordinator. I would also not > recommend using them. > > On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli wrote: >> On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra >> wrote: >>> >>> N

Re: Write Inconsistency to update a row

2014-07-03 Thread Jonathan Haddad
Did you make sure all the nodes are on the same time? If they're not, you'll get some weird results. On Thu, Jul 3, 2014 at 10:30 AM, Sávio S. Teles de Oliveira wrote: >> Are you sure all the nodes are working at that time? > > > Yes. They are working. > >> I would suggest increasing the replica

Re: Write Inconsistency to update a row

2014-07-03 Thread Jonathan Haddad
Make sure you've got ntpd running, otherwise this will be an ongoing nightmare. On Thu, Jul 3, 2014 at 5:00 PM, Sávio S. Teles de Oliveira wrote: > I have synchronized the clocks and works! > > > 2014-07-03 20:58 GMT-03:00 Sávio S. Teles de Oliveira > : > >>> Did you make sure all the nodes are o

Re: Cassandra use cases/Strengths/Weakness

2014-07-08 Thread Jonathan Haddad
I've used various databases in production for over 10 years. Each has strengths and weaknesses. I ran Cassandra for just shy of 2 years in production as part of both development teams and operations, and I only hit 1 serious problem that Rob mentioned. Ideally C* would have guarded against it, b

Re: horizontal query scaling issues follow on

2014-07-17 Thread Jonathan Haddad
The problem with starting without vnodes is moving to them is a bit hairy. In particular, nodetool shuffle has been reported to take an extremely long time (days, weeks). I would start with vnodes if you have any intent on using them. On Thu, Jul 17, 2014 at 6:03 PM, Robert Coli wrote: > On Thu

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
Hey Marcelo, You should check out spark. It intelligently deals with a lot of the issues you're mentioning. Al Tobey did a walkthrough of how to set up the OSS side of things here: http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html It'll be less work than writing a M/

Re: map reduce for Cassandra

2014-07-21 Thread Jonathan Haddad
python + Cassandra > will be supported just in the next version, but I would like to be wrong... > > Best regards, > Marcelo Valle. > > > > 2014-07-21 13:06 GMT-03:00 Jonathan Haddad : > >> Hey Marcelo, >> >> You should check out spark. It intelligently deal

Re: cluster rebalancing…

2014-07-22 Thread Jonathan Haddad
You don't need to specify tokens. The new node gets them automatically. > On Jul 22, 2014, at 7:03 PM, Kevin Burton wrote: > > So , shouldn't it be easy to rebalance a cluster? > > I'm not super excited to type out 200 commands to move around individual > tokens. > > I realize that this isn'

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
This is incorrect. Network Topology w/ Vnodes will be fine, assuming you've got RF= # of racks. For each token, replicas are chosen based on the strategy. Essentially, you could have a wild imbalance in token ownership, but it wouldn't matter because the replicas would be distributed across the

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
* When I say wild imbalance, I do not mean all tokens on 1 node in the cluster, I really should have said slightly imbalanced On Tue, Aug 5, 2014 at 8:43 AM, Jonathan Haddad wrote: > This is incorrect. Network Topology w/ Vnodes will be fine, assuming > you've got RF= # of racks

Re: vnode and NetworkTopologyStrategy: not playing well together ?

2014-08-05 Thread Jonathan Haddad
s and > NetworkTopologyStrategy, it's better to define a single (logical) rack // due > to "carefully chosen tokens" vs "randomly-generated token" clash. > > I don't see other options left. > Do you see other ones ? > > Regards, > Dominique &

Re: too many open files

2014-08-09 Thread Jonathan Haddad
It really doesn't need to be this complicated. You only need 1 session per application. It's thread safe and manages the connection pool for you. http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html On Sat, Aug 9, 2014 at 1:29 PM, Kevin Burton wrote: > Another idea

Re: Migrating data from 2 node cluster to a 3 node cluster

2013-07-04 Thread Jonathan Haddad
You should run a nodetool repair after you copy the data over. You could also use the sstable loader, which would stream the data to the proper node. On Thu, Jul 4, 2013 at 10:03 AM, srmore wrote: > We are planning to move data from a 2 node cluster to a 3 node cluster. We > are planning to co

Re: too many open files

2013-07-14 Thread Jonathan Haddad
Are you using leveled compaction? If so, what do you have the file size set at? If you're using the defaults, you'll have a ton of really small files. I believe Albert Tobey recommended using 256MB for the table sstable_size_in_mb to avoid this problem. On Sun, Jul 14, 2013 at 5:10 PM, Paul In

Re: CPU Bound Writes

2013-07-20 Thread Jonathan Haddad
Everything is written to the commit log. In the case of a crash, cassandra recovers by replaying the log. On Sat, Jul 20, 2013 at 9:03 AM, Mohammad Hajjat wrote: > Patricia, > Thanks for the info. So are you saying that the *whole* data is being > written on disk in the commit log, not just som

Re: VM dimensions for running Cassandra and Hadoop

2013-07-30 Thread Jonathan Haddad
Having just enough RAM to hold the JVM's heap generally isn't a good idea unless you're not planning on doing much with the machine. Any memory not allocated to a process will generally be put to good use serving as page cache. See here: http://en.wikipedia.org/wiki/Page_cache Jon On Tue, Jul 3

Re: CQL and undefined columns

2013-07-31 Thread Jonathan Haddad
It's advised you do not use compact storage, as it's primarily for backwards compatibility. The first of these option is COMPACT STORAGE. This option is meanly targeted towards backward compatibility with some table definition created before CQL3. But it also provides a slightly more compact layou

Re: Adding my first node to another one...

2013-08-01 Thread Jonathan Haddad
I recommend you do not add 1.2 nodes to a 1.1 cluster. We tried this, and ran into many issues. Specifically, the data will not correctly stream from the 1.1 nodes to the 1.2, and it will never bootstrap correctly. On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis wrote: > Hi Arthur, > > Thank

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad
> On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad wrote: > >> It's advised you do not use compact storage, as it's primarily for >> backwards compatibility. >> > > Many Apache Cassandra experts do not advise against using COMPACT STORAGE. > [1] Use CQL3 n

Re: CQL and undefined columns

2013-08-05 Thread Jonathan Haddad
7;t do with them". Frankly it makes me > nuts because... > > This little known web company named google produced a white paper about > what a ColumnFamily data model could do > http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the > BigTable/ColumnFamily data model. There was al

<    1   2   3   4   5   6   >