+1. Don't use triggers.
On Wed, Jan 7, 2015 at 10:49 AM, Robert Coli wrote:
> On Wed, Jan 7, 2015 at 5:40 AM, Asit KAUSHIK
> wrote:
>>
>> We are trying to integrate elasticsearch with Cassandra and as the river
>> plugin uses select * from any table it seems to be bad performance choice.
>> So
For a new user, there's no point in learning Thrift if that user intends on
upgrading past the version that they start with. Thrift is a deprecated
protocol and there's no new functionality going into it. In 3.0 the
sstable format is being upgraded to work primarily with native CQL
partitions / r
Personally I wouldn't go < 3 unless you have a good reason.
On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton wrote:
> How do people normally setup multiple data center replication in terms of
> number of *local* replicas?
>
> So say you have two data centers, do you have 2 local replicas, for a
> t
local
> replicas per datacenter?
>
> On Sun, Jan 18, 2015 at 7:53 PM, Jonathan Haddad
> wrote:
>
>> Personally I wouldn't go < 3 unless you have a good reason.
>>
>>
>> On Sun Jan 18 2015 at 7:52:10 PM Kevin Burton wrote:
>>
>>> How d
Well... it depends. Are you saying whenever a machine dies, or any
reason, you'd bootstrap a new one in it's place? Or do you just not care
about the data?
There are cases where it might be ok (if you're using Cassandra as a cache)
but if it's your source of truth I think this is likely to bite
+1 to everything Eric said.
The penalty of not using token aware routing increases as you add nodes,
load, and network overhead. This is kind of like batch statements. People
use them in dev, with 1 node, and think they're great to help with
performance. But when you put them in production... n
> Once they have fully joined the cluster I would like to decommission a
single Cassandra 1.2.14 instance, and repeat.
Do not do this. Upgrade your nodes in place.
On Thu Jan 29 2015 at 6:17:26 AM Sibbald, Charles
wrote:
> Hi All,
>
> I am looking into the possibility of upgrading from Cassa
Well... this is actually only true if your server times are perfectly in
sync. The reality is if 1 server is 50ms ahead and 1 is 50 behind, your
will actually end up with unpredictable results.
On Thu Feb 05 2015 at 4:22:43 PM Philip Thompson <
philip.thomp...@datastax.com> wrote:
> You are corr
It could, because the tombstones that mark data deleted may have been
removed. There would be nothing that says "this data is gone".
If you're worried about it, turn up your gc grace seconds. Also, don't
revive nodes back into a cluster with old data sitting on them.
On Wed Feb 11 2015 at 11:18
And after decreasing your RF (rare but happens)
On Wed Feb 11 2015 at 11:31:38 AM Robert Coli wrote:
> On Wed, Feb 11, 2015 at 11:20 AM, Jonathan Haddad
> wrote:
>
>> It could, because the tombstones that mark data deleted may have been
>> removed. There would be nothing
Awesome!
On Fri Feb 20 2015 at 10:23:54 AM Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:
> I will try it for sure Frens, very nice!
> Thanks for sharing!
>
> From: user@cassandra.apache.org
> Subject: Re:PySpark and Cassandra integration
>
> Hi all,
>
> Wanted to let you
If you're not using prepared statements you won't get any token aware
routing. That's an even better option than round robin since it reduces the
number of nodes involved.
On Mon, Feb 23, 2015 at 4:48 PM Robert Coli wrote:
> On Mon, Feb 23, 2015 at 3:42 PM, Jaydeep Chovatia <
> chovatia.jayd...@g
I've seen this before, when I tried to be clever and add nodes of a different
major version into a cluster. Any chance that's what's happening here?
> On Feb 25, 2015, at 4:52 PM, Robert Coli wrote:
>
>> On Wed, Feb 25, 2015 at 3:38 PM, Batranut Bogdan wrote:
>> I have a new node that I want
I'd like to add that in() is usually a bad idea. It is convenient, but not
really what you want in production. Go with Jens' original suggestion of
multiple queries.
I recommend reading Ryan Svihla's post on why in() is generally a bad
thing:
http://lostechies.com/ryansvihla/2014/09/22/cassandra
I would really not recommend this. There's enough issues that can come up
with a distributed database that can make it hard to pinpoint problems.
In an ideal world, every machine would be completely identical. Don't set
yourself up for fail. Pin the OS & all packages to specific versions.
On M
Actually, that's not true either. It's technically possible for a batch to
be partially applied in the current implementation, even with logged
batches. "atomic" is used incorrectly here, imo, since more than 2 states
can be visible, unapplied & applied.
On Tue, Mar 3, 2015 at 9:26 AM Michael Dy
if no intermedia states can be observed. It seems
to jump directly from the initial state to the result state."
- Concepts, Techniques, and Models of Computer Programming By Peter
Van-Roy, Seif Haridi
On Tue, Mar 3, 2015 at 2:30 PM Tyler Hobbs wrote:
>
> On Tue, Mar 3, 2015 at 2:39 PM,
In most datacenters you're going to see significant variance in your server
times. Likely > 20ms between servers in the same rack. Even google, using
atomic clocks, has 1-7ms variance. [1]
I would +1 Tyler's advice here, as using the clocks is only valid if clocks
are perfectly sync'ed, which t
Be careful w/ that script if you're looking to upgrade, it nukes your data
directory.
sudo rm -rf /var/lib/cassandra/data/system/*
On Mon, Mar 16, 2015 at 1:41 PM, Ali Akhtar wrote:
> https://gist.github.com/aliakhtar/3649e412787034156cbb
>
> Best run from a fresh ubuntu server.
>
> On Tue, Mar
Streaming is repair, adding & removing nodes. In general it's a bad idea
to do any streaming op when you've got an upgrade in progress.
On Tue, Mar 24, 2015 at 3:14 AM Jason Wee wrote:
> Hello,
>
> Reading this documentation http://www.datastax.com/docs/
> 1.1/install/upgrading
>
> If you are u
need to run nodetool upgradesstables as
> stipulated in version 1.1.3 ?
> >
> > jason
> >
> > On Wed, Mar 25, 2015 at 1:04 AM, Jonathan Haddad
> wrote:
> >> Streaming is repair, adding & removing nodes. In general it's a bad
> >> idea to do any
I'd be interested to see that data model. I think the entire list would
benefit!
On Thu, Mar 26, 2015 at 8:16 PM Robert Wille wrote:
> I have a cluster which stores tree structures. I keep several hundred
> unrelated trees. The largest has about 180 million nodes, and the smallest
> has 1 node. T
Running upgrade is a noop if the tables don't need to be upgraded. I
consider the cost of this to be less than the cost of missing an upgrade.
On Thu, Mar 26, 2015 at 4:23 PM Robert Coli wrote:
> On Wed, Mar 25, 2015 at 7:16 PM, Jonathan Haddad
> wrote:
>
>> There'
Don't use batches for this. Use a lot of async queries.
https://lostechies.com/ryansvihla/2014/08/28/cassandra-batch-loading-without-the-batch-keyword/
Jon
> On Mar 27, 2015, at 5:24 AM, Rahul Bhardwaj
> wrote:
>
> Hi All,
>
>
>
> We are using cassandra version 2.1.2 with cqlsh 5.0.1 (cl
It's not enough to set up ntp, you're going to need to force the time to
sync. ntp is only meant to correct for drift.
You can either use ntpdate or I think there's a flag for ntpd (that I can't
remember and am in a rush out the door) that you can use to force it to
adjust to the correct time.
O
@Daemeon you may want to read through
https://issues.apache.org/jira/browse/CASSANDRA-8150, there are perfectly
valid cases for heap > 16gb.
On Thu, Apr 2, 2015 at 10:07 AM daemeon reiydelle
wrote:
> May not be relevant, but what is the "default" heap size you have
> deployed. Should be no more
Agreed with Jack. Cassandra is a database meant to scale horizontally by
adding nodes, and what you're describing is vertical scale.
Aside from the vertical scale issue, unless you're running a very specific
workload (time series data w/ Date Tiered Compaction) and you REALLY know
what you're doi
I had submitted this issue which could have had (in theory) some
serious performance benefit when using JBOD:
https://issues.apache.org/jira/browse/CASSANDRA-8868
However, it was pointed out to me that
https://issues.apache.org/jira/browse/CASSANDRA-6696 will be a better
solution in a lot of cases
Maybe you're inserting nulls? If you're inserting nulls, those show up as
tombstones.
On Sat, Apr 11, 2015 at 1:32 PM Amlan Roy wrote:
> Hi,
>
> I am trying to query a table from cqlsh and I get the following error:
> *Request did not complete within rpc_timeout.*
>
> I found the following mess
> Average tombstones per slice (last five minutes): 40169.0
>
> Regards.
> On 12-Apr-2015, at 2:38 am, Jonathan Haddad wrote:
>
> Maybe you're inserting nulls? If you're inserting nulls, those show up as
> tombstones.
>
> On Sat, Apr 11, 2015 at 1:32 PM Amlan Roy wr
Ideally you'll be on the same network, but if you can't be, you'll need to
use the public ip in listen_address.
On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson
wrote:
> Hi all,
>
>
>
> I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
> just as a POC. Cassandra servers con
There's also Achilles: https://github.com/doanduyhai/Achilles
On Fri, Apr 24, 2015 at 1:21 PM Jens Rantil wrote:
> Matthew,
>
> Maybe this could also be of interest:
> http://projects.spring.io/spring-data-cassandra/
>
> Cheers,
> Jens
>
> On Fri, Apr 24, 2015 at 12:50 PM, Phil Yang wrote:
>
>>
To add to Phil's point, there's no circumstance in which I would use an
unlogged batch, under load I have yet to hear it do anything other than
increase GC pauses.
On Fri, Apr 24, 2015 at 11:50 AM Phil Yang wrote:
> 2015-04-23 22:16 GMT+08:00 Matthew Johnson :
>>
>> In HBase, we do something lik
Enough tombstones can inflate the size of an SSTable causing issues during
compaction (imagine a multi tb sstable w/ 99% tombstones) even if there's
no clustering key defined.
Perhaps an edge case, but worth considering.
On Wed, Apr 29, 2015 at 9:17 AM Eric Stevens wrote:
> Correct me if I'm wr
There's a lot going on, reading through some docs is probably your best
bet:
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html
On Wed, Apr 29, 2015 at 8:57 AM Nikolay Tikhonov
wrote:
> Hi,
>
> I try to understand how to Cassandra supports data consistency and
You can connect to any node in the cluster to issue a query. For that
request, it's called the coordinator. The coordinator will figure out
which node to talk to. The DataStax native drivers can use what's called
token aware queries, in that they'll connect to one of the nodes that owns
the data
I suspect this will kill the benefit of DTCS, but haven't tested it to be
100% here.
The benefit of DTCS is that sstables are selected for compaction based on
the age of the data, not their size. When you mix TTL'ed data and non
TTL'ed data, you end up screwing with the "drop the entire SSTable"
You may find Spark to be useful. You can do SQL, but also use Python,
Scala or Java.
I wrote a post last week on getting started with DataFrames & Spark, which
you can register as tables & query using Hive compatible SQL:
http://rustyrazorblade.com/2015/05/on-the-bleeding-edge-pyspark-dataframes-
In Cassandra 3.0 there will be a massive rewrite of what an sstable
even is, and the cli will be totally useless to inspect it. there
won't be "column names" anymore, timestamps will be stored once per
row (assuming they're the same) and a whole slew of other
optimizations. If you want to look at
It's not built into Cassandra. You'll probably want to take a look at
Apache Spark & the DataStax connector.
https://github.com/datastax/spark-cassandra-connector
Jon
On Tue, May 19, 2015 at 10:29 PM amit tewari wrote:
> Hi
>
> We would like to have the ability of being able to create new tab
Here's a simple example I did a little while ago that might be helpful:
https://github.com/rustyrazorblade/spark-data-migration
On Tue, May 19, 2015 at 10:53 PM Jonathan Haddad wrote:
> It's not built into Cassandra. You'll probably want to take a look at
> Apache
If you run it in a container with dedicated IPs it'll work just fine. Just
be sure you aren't using the same machine to replicate it's own data.
On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar
wrote:
> +1.
>
> I agree we need to be able to run multiple server instances on one
> physical mach
No.
On Thu, May 21, 2015 at 7:07 AM Eax Melanhovich wrote:
> Say I would like to have a replica cluster, which state is a state of
> real cluster 12 hours ago. Does Cassandra support such feature?
>
> --
> Best regards,
> Eax Melanhovich
> http://eax.me/
>
>
> Also ... wrt the container talk, is that a Docker container you're
> talking about?
>
>
>
> On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad
> wrote:
>
>> If you run it in a container with dedicated IPs it'll work just fine.
>> Just be sure you
their own IPs and bind to default
> ports.
>
> @Jonathan Haddad thanks for the blog post. To ensure the same host does
> not replicate its own data, would I basically need the nodes on a single
> host to be labeled as one rack? (Assuming I use vnodes)
>
> On Thu, May 21, 2015
What impact would vnodes have on strong consistency? I think the problem
you're describing exists with or without them.
On Sat, May 23, 2015 at 2:30 PM Nate McCall wrote:
>
>> So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra
>> nodes (each with 5 data disks, 1 commit log dis
While Graham's suggestion will let you collapse a bunch of tables into a
single one, it'll likely result in so many other problems it won't be worth
the effort. I strongly advise against this approach.
First off, different workloads need different tuning. Compaction
strategies, gc_grace_seconds,
> Sorry for this naive question but how important is this tuning? Can this
have a huge impact in production?
Massive. Here's a graph of when we did some JVM tuning at my previous
company:
http://33.media.tumblr.com/5d0efca7288dc969c1ac4fc3d36e0151/tumblr_inline_mzvj254quj1rd24f4.png
About an or
DateTiered is fantastic if you've got time series, TTLed data. That means
no updates to old data.
On Thu, Jun 4, 2015 at 10:58 AM Aiman Parvaiz wrote:
> Hi everyone,
> We are running a 10 node Cassandra 2.0.9 without vnode cluster. We are
> running in to a issue where we are reading too many to
Batches don't work like that. It's possible for some to succeed, and
later, the rest will. Atomic is the incorrect word to use, it's more like
"eventually they will all go through".
Do not use IN(), use a whole bunch of prepared statements asynchronously.
On Wed, Jun 10, 2015 at 9:26 AM Sotirio
> DELETE FROM MastersOfTheUniverse WHERE mastersID = ?;
>
> and execute it asynchronously 3000 times or add 3000 of these DELETE (bound)
> prepared statements to a BATCH statement executed asynchronously?
>
>
>
>
>
>
> On Wednesday, June 10, 2015 9:51 AM, Jonat
How much memory do you have? Recently people have been seeing really great
performance using G1GC with heaps > 8GB and offheap memtable objects.
On Thu, Jun 18, 2015 at 1:31 AM Jason Wee wrote:
> okay, iirc memtable has been removed off heap, google and got this
> http://www.datastax.com/dev/bl
If you're using DSE, you can schedule it automatically using the repair
service. If you're open source, check out Spotify cassandra reaper, it'll
manage it for you.
https://github.com/spotify/cassandra-reaper
On Thu, Jun 18, 2015 at 12:36 PM Jean Tremblay <
jean.tremb...@zen-innovations.com> w
So, for cqlengine (https://github.com/cqlengine/cqlengine), we're currently
using the thrift api to execute CQL until the native driver is out of beta.
I'm a little biased in recommending it, since I'm one of the primary
authors. If you've got cqlengine specific questions, head to the mailing
lis
one
>
>
> On Tue, Nov 26, 2013 at 1:19 PM, Jonathan Haddad wrote:
>
>> So, for cqlengine (https://github.com/cqlengine/cqlengine), we're
>> currently using the thrift api to execute CQL until the native driver is
>> out of beta. I'm a little biased in r
and have contributed some to its development.
>>>>>>
>>>>>> I have been careful to not push too fast on features until we need
>>>>>> them. For example, we have just started using prepared statements -
>>>>>> working
>&g
No, 2.7 only.
On Tue, Nov 26, 2013 at 3:04 PM, Kumar Ranjan wrote:
> Hi Jonathan - Does cqlengine have support for python 2.6 ?
>
>
> On Tue, Nov 26, 2013 at 4:17 PM, Jonathan Haddad wrote:
>
>> cqlengine supports batch queries, see the docs here:
>> http://cqlengin
Do you mean high CPU usage or high load avg? (20 indicates load avg to
me). High load avg means the CPU is waiting on something.
Check "iostat -dmx 1 100" to check your disk stats, you'll see the columns
that indicate mb/s read & write as well as % utilization.
Once you understand the bottlenec
I've recently pushed up a new project to github, which we've named Under
Siege. It's a java agent for reporting Cassandra metrics to statsd. We've
in the process of deploying it to our production clusters. Tested against
Cassandra 1.2.11. The metrics library seems to change on every release of
I believe SSTables are written to a temporary file then moved. If I
remember correctly, tools like tablesnap listen for the inotify event
IN_MOVED_TO. This should handle the "try to back up sstable while in
mid-write" issue.
On Fri, Dec 6, 2013 at 5:39 AM, Michael Theroux wrote:
> Hi Marcelo,
Please include the output of "nodetool ring", otherwise no one can help you.
On Thu, Jan 16, 2014 at 12:45 PM, Narendra Sharma wrote:
> Any pointers? I am planning to do rolling restart of the cluster nodes to
> see if it will help.
> On Jan 15, 2014 2:59 PM, "Narendra Sharma"
> wrote:
>
>> RF
t; I am new to Cassandra Environment, does the order of the ring matter, as
> long as the member joins the group?
>
> Yogi
>
>
> On Thu, Jan 16, 2014 at 12:49 PM, Jonathan Haddad wrote:
>
>> Please include the output of "nodetool ring", otherwise no one can hel
I just would advise against it because it's going to be difficult to narrow
down what's causing problems. For instance, if you have "Node A" which is
performing GC, it will affect query times on "Node B" which is trying to
satisfy a quorum read. "Node B" might actually have very low load, and it
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.
Full Story:
We're planning on adding data centers throughout the US. Our platform is
used for business communications. Each DC currently utilizes elastic
search and redis. A message can
Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
changed data from DC1 shows up in DC2.
Full Story:
We're planning on adding data centers throughout the US. Our platform is
used for business communications. Each DC currently utilizes elastic
search and redis. A message can
#x27;d urge you
> to really consider another approach.
>
> Best,
> Todd
>
>
> On Saturday, February 22, 2014, Jonathan Haddad wrote:
>
>> Upfront TLDR: We want to do stuff (reindex documents, bust cache) when
>> changed data from DC1 shows up in DC2.
>>
>
I have a nagging memory of reading about issues with virtualization and not
actually having durable versions of your data even after an fsync (within
the VM). Googling around lead me to this post:
http://petercai.com/virtualization-is-bad-for-database-integrity/
It's possible you're hitting this
d.
>
> Does Cassandra quiesce the file system after a snapshot using fsfreeze or
> xfs_freeze? Somehow I doubt it...
>
>
> On Fri, Mar 28, 2014 at 4:17 PM, Jonathan Haddad wrote:
>
>> I have a nagging memory of reading about issues with virtualization and
>> not a
Fri, Mar 28, 2014 at 1:32 PM, Laing, Michael
wrote:
> +1 for tablesnap
>
>
> On Fri, Mar 28, 2014 at 4:28 PM, Jonathan Haddad wrote:
>
>> I will +1 the recommendation on using tablesnap over EBS. S3 is at least
>> predictable.
>>
>> Additionally, from a pra
I think of all the areas you could spend your time, this will have the
least returns. The OS will keep the most frequently used data in memory.
There's no reason to require cassandra to do it.
If you're curious as to what's been loaded into ram, try Al Tobey's pcstat
utility. https://github.com
I'd suggest creating 1 table per day, and dropping the tables you don't
need once you're done.
On Wed, Jun 4, 2014 at 10:44 AM, Redmumba wrote:
> Sorry, yes, that is what I was looking to do--i.e., create a
> "TopologicalCompactionStrategy" or similar.
>
>
> On Wed, Jun 4, 2014 at 10:40 AM, Rus
You should read through the token docs, it has examples and specifications:
http://cassandra.apache.org/doc/cql3/CQL.html#tokenFun
On Thu, Jun 5, 2014 at 10:22 PM, Kevin Burton wrote:
> I'm building a new schema which I need to read externally by paging
> through the result set.
>
> My understa
Sorry, the datastax docs are actually a bit better:
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/paging_c.html
Jon
On Thu, Jun 5, 2014 at 10:46 PM, Jonathan Haddad wrote:
> You should read through the token docs, it has examples and
> specifications: http://cassandra.apac
This may not help you with the migration, but it may with maintenance &
management. I just put up a blog post on managing VPC security groups with
a tool I open sourced at my previous company. If you're going to have
different VPCs (staging / prod), it might help with managing security
groups.
h
Your other option is to fire off async queries. It's pretty
straightforward w/ the java or python drivers.
On Thu, Jun 19, 2014 at 5:56 PM, Marcelo Elias Del Valle
wrote:
> I was taking a look at Cassandra anti-patterns list:
>
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/arch
> in (0, 1, 2)), I would do several, one for each value of "values" array
> above.
> In my head, this would mean more connections to Cassandra and the same
> amount of work, right? What would be the advantage?
>
> []s
>
>
>
>
> 2014-06-19 22:01 GMT-03:0
it be a problem?
> Or when you use async the driver reuses the connection?
>
> []s
>
>
> 2014-06-19 22:16 GMT-03:00 Jonathan Haddad :
>
>> If you use async and your driver is token aware, it will go to the
>> proper node, rather than requiring the coordinator to do so.
t;>>>> network
>>>>>>> round trips.
>>>>>>>
>>>>>>> With large numbers of queries you will still want to make sure you
>>>>>>> split them into manageable batches before sending them, to control
>>>&
Can you do you query in the cli after setting "tracing on"?
On Mon, Jun 23, 2014 at 11:32 PM, DuyHai Doan wrote:
> Yes but adding the extra one ends up by * 1000. The limit in CQL3
> specifies the number of logical rows, not the number of physical columns in
> the storage engine
> Le 24 juin 20
Triggers only execute on the local coordinator. I would also not
recommend using them.
On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli wrote:
> On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra
> wrote:
>>
>> Now the question: is there any way to use triggers so that they will
>> locally index data fro
4 at 10:04 AM, Jonathan Haddad wrote:
> Triggers only execute on the local coordinator. I would also not
> recommend using them.
>
> On Thu, Jul 3, 2014 at 9:58 AM, Robert Coli wrote:
>> On Thu, Jul 3, 2014 at 4:41 AM, Bèrto ëd Sèra
>> wrote:
>>>
>>> N
Did you make sure all the nodes are on the same time? If they're not,
you'll get some weird results.
On Thu, Jul 3, 2014 at 10:30 AM, Sávio S. Teles de Oliveira
wrote:
>> Are you sure all the nodes are working at that time?
>
>
> Yes. They are working.
>
>> I would suggest increasing the replica
Make sure you've got ntpd running, otherwise this will be an ongoing nightmare.
On Thu, Jul 3, 2014 at 5:00 PM, Sávio S. Teles de Oliveira
wrote:
> I have synchronized the clocks and works!
>
>
> 2014-07-03 20:58 GMT-03:00 Sávio S. Teles de Oliveira
> :
>
>>> Did you make sure all the nodes are o
I've used various databases in production for over 10 years. Each has
strengths and weaknesses.
I ran Cassandra for just shy of 2 years in production as part of both
development teams and operations, and I only hit 1 serious problem
that Rob mentioned. Ideally C* would have guarded against it, b
The problem with starting without vnodes is moving to them is a bit
hairy. In particular, nodetool shuffle has been reported to take an
extremely long time (days, weeks). I would start with vnodes if you
have any intent on using them.
On Thu, Jul 17, 2014 at 6:03 PM, Robert Coli wrote:
> On Thu
Hey Marcelo,
You should check out spark. It intelligently deals with a lot of the
issues you're mentioning. Al Tobey did a walkthrough of how to set up
the OSS side of things here:
http://tobert.github.io/post/2014-07-15-installing-cassandra-spark-stack.html
It'll be less work than writing a M/
python + Cassandra
> will be supported just in the next version, but I would like to be wrong...
>
> Best regards,
> Marcelo Valle.
>
>
>
> 2014-07-21 13:06 GMT-03:00 Jonathan Haddad :
>
>> Hey Marcelo,
>>
>> You should check out spark. It intelligently deal
You don't need to specify tokens. The new node gets them automatically.
> On Jul 22, 2014, at 7:03 PM, Kevin Burton wrote:
>
> So , shouldn't it be easy to rebalance a cluster?
>
> I'm not super excited to type out 200 commands to move around individual
> tokens.
>
> I realize that this isn'
This is incorrect. Network Topology w/ Vnodes will be fine, assuming
you've got RF= # of racks. For each token, replicas are chosen based
on the strategy. Essentially, you could have a wild imbalance in
token ownership, but it wouldn't matter because the replicas would be
distributed across the
* When I say wild imbalance, I do not mean all tokens on 1 node in the
cluster, I really should have said slightly imbalanced
On Tue, Aug 5, 2014 at 8:43 AM, Jonathan Haddad wrote:
> This is incorrect. Network Topology w/ Vnodes will be fine, assuming
> you've got RF= # of racks
s and
> NetworkTopologyStrategy, it's better to define a single (logical) rack // due
> to "carefully chosen tokens" vs "randomly-generated token" clash.
>
> I don't see other options left.
> Do you see other ones ?
>
> Regards,
> Dominique
&
It really doesn't need to be this complicated. You only need 1
session per application. It's thread safe and manages the connection
pool for you.
http://www.datastax.com/drivers/java/2.0/com/datastax/driver/core/Session.html
On Sat, Aug 9, 2014 at 1:29 PM, Kevin Burton wrote:
> Another idea
You should run a nodetool repair after you copy the data over. You could
also use the sstable loader, which would stream the data to the proper node.
On Thu, Jul 4, 2013 at 10:03 AM, srmore wrote:
> We are planning to move data from a 2 node cluster to a 3 node cluster. We
> are planning to co
Are you using leveled compaction? If so, what do you have the file size
set at? If you're using the defaults, you'll have a ton of really small
files. I believe Albert Tobey recommended using 256MB for the
table sstable_size_in_mb to avoid this problem.
On Sun, Jul 14, 2013 at 5:10 PM, Paul In
Everything is written to the commit log. In the case of a crash, cassandra
recovers by replaying the log.
On Sat, Jul 20, 2013 at 9:03 AM, Mohammad Hajjat
wrote:
> Patricia,
> Thanks for the info. So are you saying that the *whole* data is being
> written on disk in the commit log, not just som
Having just enough RAM to hold the JVM's heap generally isn't a good idea
unless you're not planning on doing much with the machine.
Any memory not allocated to a process will generally be put to good use
serving as page cache. See here: http://en.wikipedia.org/wiki/Page_cache
Jon
On Tue, Jul 3
It's advised you do not use compact storage, as it's primarily for
backwards compatibility.
The first of these option is COMPACT STORAGE. This option is meanly
targeted towards backward compatibility with some table definition created
before CQL3. But it also provides a slightly more compact layou
I recommend you do not add 1.2 nodes to a 1.1 cluster. We tried this, and
ran into many issues. Specifically, the data will not correctly stream
from the 1.1 nodes to the 1.2, and it will never bootstrap correctly.
On Thu, Aug 1, 2013 at 2:07 PM, Morgan Segalis wrote:
> Hi Arthur,
>
> Thank
> On Wed, Jul 31, 2013 at 3:10 PM, Jonathan Haddad wrote:
>
>> It's advised you do not use compact storage, as it's primarily for
>> backwards compatibility.
>>
>
> Many Apache Cassandra experts do not advise against using COMPACT STORAGE.
> [1] Use CQL3 n
7;t do with them". Frankly it makes me
> nuts because...
>
> This little known web company named google produced a white paper about
> what a ColumnFamily data model could do
> http://en.wikipedia.org/wiki/BigTable . Cassandra was build on the
> BigTable/ColumnFamily data model. There was al
401 - 500 of 507 matches
Mail list logo