I would use something other than the page itself as the key. Maybe a
filename, something smaller.
Then you could use a LongType comparator for the columns and use the page
number for the column name, the value being the contents of the files.
On Wed, Jul 11, 2012 at 1:34 PM, Tomek Hankus wrote:
Was reading up on secondary indexes and on the Datastax post about them, it
mentions the additional management overhead, and also that if you alter an
existing column family, that data will be updated in the background. But
how do secondary indexes affect write performance?
If the answe
and another to the index column family, where in this index
> column family the key is the value of the secondary column, and the value
> is the key of the original row.
> On 08/04/2012 11:40 AM, David McNelis wrote:
>> Morning,
>> Was reading up on se
In using CQL (the python library, at least), I didn't see a way to pass in
multiple nodes as hosts. With other libraries (like Hector and Pycassa) I
can set multiple hosts and my app will work with anyone on that list. Is
there something similar going on in the background with CQL?
If not, then
I am currently running a cluster with 1.2.8. One of my larger column
families on one of my nodes has keyspace-tablename-ic--Data.db with a
modify date in August.
Since august we have added several nodes (with vnodes), with the same
number of vnodes as all the existing nodes.
As a result, (we
on Morton
> New Zealand
> @aaronmorton
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> On 30/12/2013, at 1:28 pm, David McNelis wrote:
> I am currently running a cluster with 1.2.8. One of my larger column
Not knowing anything about your data structure (to expand on what Edward
said), you could be running into something where you've got some hot keys
that are getting the majority of writes during those heavily loads more
specifically I might look for a single key that you're writing, since
A general thought, if you're using AWS for this, I'd strongly recommend you
consider using OpsWorks and custom chef recipes for your node deployment if
its an option for you. The easy of provisioning new nodes without the need
for snapshotting is certainly worth the hassle, and there are already
I thought, from the documentation, that
both of my nodes would show up in the ring if I ran 'ring' in nodetool.
This is a new cluster.
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology compan
I change my seed node to use its route-able IP address as its own seed
instead of I still, however, still see the same results when
running nodetool.
On Fri, Jun 3, 2011 at 11:37 AM, Edward Capriolo wrote:
> On Fri, Jun 3, 2011 at 12:21 PM, David McNelis > w
set the token values for you nodes? I remember having similar
> symptoms when I had a token conflict.
> ------
> *From: *"David McNelis"
> *To: *user@cassandra.apache.org
> *Sent: *Friday, June 3, 2011 5:06:10 PM
> *Subject: *Re: Setting up
Thanks, Jonathan. Both machines do have the exact same seed list.
On Fri, Jun 3, 2011 at 1:39 PM, Jonathan Ellis wrote:
> On Fri, Jun 3, 2011 at 11:21 AM, David McNelis
> wrote:
> > I want to make sure I'm not seeing things from a weird perspective. I
> have
> &g
Just to close this out, in case anyone was interested... my problem was
firewall related, in that I didn't have my messaging/data port (7000) open
on my seed node. Allowing traffic on this port resolved my issues.
On Fri, Jun 3, 2011 at 1:43 PM, David McNelis wrote:
> Thanks, Jonathan
ld me that the nodes were already
a part of the ring.
I can't imagine this is how it *should* be behaving... is there a piece I'm
missing in terms of getting one node to recognize the other as being Up?
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 63
t; node and how it relates to the seed node?
> On Fri, Jun 24, 2011 at 2:49 PM, David McNelis
> wrote:
> > I am running 0.8.0 on CentOS. I have a 2 nodes in my cluster, one is a
> > seed, the other is autobootstrapped.
> > After having an unexpected shutdown of both of
hine X from machine Y?
> On Fri, Jun 24, 2011 at 8:20 AM, David McNelis
> wrote:
> > Running on Centos.
> > We had a massive power failure and our UPS wasn't up to 48 hours without
> > power...
> > In this situation the IP addresses have all stayed the same.
sufficient throughput.
Anyone have any thoughts on a Blade v. Rackable solution for spinning up a
cassandra cluster?
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
manage our tokens to avoid getting into an unbalanced situation?
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
unts of rows? have you
> run cleanup and compact to make sure it's not unused data / obsolete
> replicas taking up the space?
> On Tue, Aug 16, 2011 at 1:41 PM, David McNelis
> wrote:
> > We are currently running a three node cluster where we assigned the
> initial
the tokens set correctly each would own 33.33%.
On Tue, Aug 16, 2011 at 3:33 PM, Jonathan Ellis wrote:
> Yes, that looks about right.
> Totally baffled how the wiki script could spit out those tokens for a
> 3-node cluster.
> On Tue, Aug 16, 2011 at 2:0
We have a node that is almost full and need to move it so that we can shift
its loadbut it already has a cleanup process running which, instead of
causing less data usage as expected, is actually growing the amount of
space taken at a pretty fast rate.
*David McNelis*
some streaming on the thought
that something may have failed, but that didn't yield any appreciable
Are we seeing completely abnormal behavior? Should I consider making the
token for the fourth node considerably smaller? We calculated the node's
tokens using the standard python scri
> Looks kind of like the 4th node was added to the cluster w/o bootstrapping.
> On Mon, Sep 12, 2011 at 3:59 PM, David McNelis
> wrote:
> > We are running the datastax .8 rpm distro. We have a situation where we
> > have 4 nodes and each owns 25% of the keys.
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 13/09/2011, at 9:32 AM, David McNelis wrote:
> Auto-bootstrapping is turned on and the node had been started several
> hours a
So we tried decommissionning the 100.5 node, then re added it to the ring.
It now appears to be streaming data properly to that node.
On Tue, Sep 13, 2011 at 6:56 AM, David McNelis
> I ran a repair on 100.5. It returned back almost immediately and netstats
> and tpstats don'
uld look like.
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
d cut our storage needs consistently.
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
inting or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
would need 10s
of gigs of ram on each node just to handle that overhead...at least as of
On Fri, Oct 7, 2011 at 9:40 AM, Jonathan Ellis wrote:
> On Fri, Oct 7, 2011 at 9:36 AM, David McNelis
> wrote:
> > In some documentation I've read it says that
> > keyspace'
Is it ok to use such keys if I want my data to be evenly distributed
> across my nodes or do I have to "do something" ?
> Thanks in advance.
> L. Aufrechter
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.38
itoring, trend analysis, etc.?
> ** **
> **JConsole is useful for single node monitoring/etc but not scalable &
> data obviously doesn't persist between sessions...**
> **
> **
> Many thanks,
> Brian
*David McNelis*
t index. This implies that we
> would need a possibility to insert values at defined positions. We know that
> this could lead to problems with concurrent inserts in a distributed
> environment, but this is handled by our application logic.
> What are your ideas on that?
/reduce with Cassandra? how agile is that? (for
> example, can you run map/reduce _very_ frequently?)
> Thanks!
> --
> Dotan, @jondot <http://twitter.com/jondot>
*David McNelis*
Lead Software Engineer
Agentis Energy
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
I am disinclined to think its an issue with not being able to connect to
JMX in general.
*David McNelis*
Lead Software Engineer
Agentis Energy
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
om the CLI and
> you should see the MBean afterwards.
> This also means your monitoring application should handle this error
> in the case of nodes restarting.
> On Tue, Nov 22, 2011 at 7:51 AM, David McNelis
> wrote:
> > Good morning,
> > I'm trying to set u
serve that
> purpose.
> Initialized, RPCServerRunning, OperationMode, Joined, and perhaps others
> Note that some of those may not exist depending on your version of
> cassandra, pick one appropriate for your version.
> On Tue, Nov 22, 2011 at 1:02 PM, David McNelis
*David McNelis*
Lead Software Engineer
Agentis Energy
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
In that case, I think that the documentation is incorrect, as it has
Service listed as the package related to the StorageService.
I apologize for the lack of the rest of the thread, everything is getting
bounced when I try to send it for some reason.
*David McNelis*
Lead Software Engineer
it to v1 once
we migrate there (I don't know what JMX calls have changed at this
point)...if someone wants to send me a list of updates to the JMX calls,
I'll add them in and update it to handle multiple versions.
*David McNelis
> **
> [ERROR] No plugin found for prefix 'assemble' in the current project and
> in the
> plugin groups [org.apache.maven.plugins, org.codehaus.mojo] available from
> the r
of data from our cluster we'd certainly want to run it, or after added a
new node and adjusting the tokens.
So I want to make sure I'm not missing something here and that there would
be other reasons to run cleanup regularly?
*David McNelis*
Lead Software Engineer
t; }
> }
> }
> is there anything that less an ideal doing it this way versus creating
> separate CF per sector?how do you create Super CF inside of Super CF
> via the CLI?
> thanks,
> deno
*David McNelis*
Lead Softwa
value that have a timeuuid of later than x minutes? i need to
> be able to find all symbols that have not been fetch in x minutes by
> sector. i know i get list of symbol by sector from my sector CF.
> thanks,
> deno
> On 11/30/2011 1:07 PM, David McNelis wrote:
moving nodes. It removes data
> that does not belong on the node anymore (in older versions it removed
> hints as well)
> Your debate is needing to run companion . In a write only workload you
> should let cassandra do its normal connection.(in most cases)
> On Wednes
Is anyone familiar with any tools that are already available to allow for
configurable synchronization of different clusters?
Specifically for purposes of development, i.e. Dev, staging, test, and
production cassandra environments, so that you can easily plug in the
information that you want to fi
You can see how to do this basic sort of thing on the Wiki's operations
page ( http://wiki.apache.org/cassandra/Operations )
In short, you'll want to run:
nodetool -h hostname move newtoken
Then, once you've update each of your tokens that you want to move, you'll
want to run
nodetool -h
; Data Stax make their chef cook books available here
> https://github.com/riptano/chef
> Cheers
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 11/01/2012, at 9:53 AM, David McNelis wrote:
> Is
for deployment and
> disaster recovery.
> Cheers
> -
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> On 12/01/2012, at 8:47 AM, David McNelis wrote:
> Not currently using any of those tools (though ce
The documentation for that section needs to be updated...
What happens is that if you just autobootstrap without setting a token it
will by default bisect the range of the largest node.
So if you go through several iterations of adding nodes, then this is what
you would see:
Gen 1:
Node A: 100%
That article is a good starting point. To make your life a bit easier,
consider checking out CassandraUnit that provides facilities to load
example data in a variety of ways.
Then you just need to be able to pass in which cassandra instance to
I'm running 1.2.3 and have both CQL3 tabels and old school style CFs in my
I'd had a large insert job running the last several days which just
ended it had been inserting using cql3 insert statements in a cql3
Now, I show no compactions going on in my cluster but for some reas
Appears that restarting a node makes CQL available on that node again, but
only that node.
Looks like I'll be doing a rolling restart.
On Fri, Mar 29, 2013 at 10:26 AM, David McNelis wrote:
> I'm running 1.2.3 and have both CQL3 tabels and old school style CFs in my
> cluste
erwise, if you wanted
to change from sync to hsha in a cluster you'd have to entirely restart the
cluster (not a big deal), but CQL would apparently not work at all until
all of your nodes had been restarted.
On Fri, Mar 29, 2013 at 10:35 AM, David McNelis wrote:
> Appears that restartin
I had a situation earlier where my shuffle failed after a hard disk drive
filled up. I went through and disabled shuffle on the machines while
trying to get the situation resolved. Now, while I can re-enable shuffle
on the machines, when trying to do an ls, I get a timeout.
Looking at the cassan
In order to do a query like that you'll need to have a timestamp/date as
the second portion of the primary key.
You'll only be able to do queries where you already know the key. Unless
you're using an OrderPreservingPartitioner, there is no way to get a
continuous set of information back based on
Was trying to do a test of writing SSTs for a CQL3 table. So I created the
following table:
CREATE TABLE test_sst_load (
mykey1 ascii,
mykey2 ascii,
value1 ascii,
PRIMARY KEY (mykey1, mykey2)
I then set up my writer like so: (moved to gist:
https://gist.github.com/dmcnelis/5424756 )
; The simple thing to do is use COMPACT STORAGE but that may not suite all
> use cases http://www.datastax.com/docs/1.2/cql_cli/cql/CREATE_TABLE
> Cheers
> -
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> @aaronmorton
> h
So, I had 7 nodes that I set up using vnodes, 256 tokens each, no problem.
I added two 512 token nodes, no problem, things seemed to balance.
The next 3 nodes I added, all at 256 tokens, and they have a cumulative
load of 116mb (where as the other nodes are at ~100GB and ~200GB (256 and
512 respe
27;ll need to decommission those 3 nodes, remove all data
> from them, then bootstrap them in again with the correct configuration from
> the start.
> Sam
> On 26 April 2013 06:07, David McNelis wrote:
>> So, I had 7 nodes that I set up using vnodes, 256
Another thing to keep in mind when doing this with CQL is to take into
account the ordering partitioner you may or may not be using. If you're
using one you'll need to make sure that if you have a larger number of rows
for the partitioner key than your query limit, then you can end up in a
I have a node in my ring (1.2.5) that when it was set up, had the wrong
number of vnodes assigned (double the amount it should have had).
As a result, and because we can't reduce the number of vnodes on a machine
(at least at this point), I need to decommission the node.
The problem is that we'v
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
> @aaronmorton
> http://www.thelastpickle.com
> On 18/06/2013, at 2:59 PM, David McNelis wrote:
> I have a node in my ring (1.2.5) that when it was set up, had the wrong
> number of vnodes
I think you'd just be better served with just a little different primary
If your primary key was (user_id, created_at) or (user_id, created_at,
question_id), then you'd be able to run the above query without a problem.
This will mean that the entire pantheon of a specific user_id will be
hich point Cassandra would basically shit the bed.
> Thanks for the help.
> On Wed, Jun 19, 2013 at 12:26 PM, David McNelis wrote:
>> I think you'd just be better served with just a little different primary
>> key.
>> If your primary key was (
Unfortunately if you've got a non-vnode cluster and are trying to convert,
you are likely going to at least want, if not have to, run shuffle. It
isn't a pleasant situation when you run into that because in order for the
shuffle to execute safely and successfully you need to have essentiall
I second Romain, do the upgrade and make sure the health is good first.
If you have or plan to have a large number of nodes, you might consider
using fewer than 256 as your initial vnodes amount. I think that number is
inflated from reasonable in the docs, as we've had some people talk about
Hey folks,
Because 1.2.8 hasn't been pushed to the repo yet, I see that I can pick
up the package at http://people.apache.org/~eevans/ and install it
manually. This is great. I'm wondering though, is there a place where
I can pick up Debian packages for older releases? I definitely prefer
the p
Thanks, fwiw, did I just blatantly miss some documentation saying those
existed there?
On Thu, Aug 1, 2013 at 3:32 PM, Blair Zajac wrote:
> On 08/01/2013 12:27 PM, David McNelis wrote:
>> Hey folks,
>> Because 1.2.8 hasn't been pushed to the repo yet, I
Morning folks,
For the last couple of days all of my nodes (17, all running 1.2.8) have
been stuck at various percentages of completion for compacting
system.hints. I've tried restarting the nodes (including a full rolling
restart of the cluster) to no avail.
When I turn on Debugging I am seeing
at preceded this happening?
> As for the thrift stuff, which rpc_server_type are you using?
> On Wed, Aug 7, 2013 at 6:14 AM, David McNelis wrote:
> > Morning folks,
> >
> > For the last couple of days all of my nodes (17, all running 1.2.8) have
> > been
; echo -n 'm' | nc localhost 9160
> On Wed, Aug 7, 2013 at 11:11 AM, David McNelis wrote:
> > Nate,
> >
> > We had a node that was flaking on us last week and had a lot of handoffs
> > fail to that node. We ended up decommissioning that node entirely
Is DevCenter a project that might end up open sourced? The original blog
post calls it free, and if its destined to stay so, I'd think that would be
a benefit (being OSS) to the community at large.
On Fri, Aug 9, 2013 at 1:02 PM, Rahul Gupta wrote:
> Hello,
> ** **
> Is there backward
Completely understandable. Thank you for all this work, Alex, et. al.
On Fri, Aug 9, 2013 at 3:27 PM, Alex Popescu wrote:
> On Fri, Aug 9, 2013 at 10:12 AM, David McNelis wrote:
>> Is DevCenter a project that might end up open sourced? The original blog
>> post call
You would, however, want to clear the snapshot folder afterword, right? I
thought that truncate, like drop table, created a snapshot (unless that
feature had been disabled in your yaml.
On Thu, Aug 29, 2013 at 6:51 PM, Robert Coli wrote:
> On Thu, Aug 29, 2013 at 3:48 PM, S C wrote:
>> Do w
I'm getting the following error (21 node cluster running 1.2.8)
FSReadError in
Looks to be the case, getting an IO error when trying to cp the file. That
is unfortunate. On the bright side, now we at least have a more narrow
scope of the problem's source.
On Mon, Sep 9, 2013 at 12:54 PM, Robert Coli wrote:
> On Mon, Sep 9, 2013 at 6:15 AM, David McNeli
Stable loader is the way to go to load up the new cluster.
On Tuesday, September 17, 2013, Juan Manuel Formoso wrote:
> > If your shuffle succeeds, you will be the first reported case of
> shuffle succeeding on a non-test cluster.
> Awesome! :O
> I'll try to migrate to a new cluster then.
As Rob mentioned, no one (myself included) has successfully used shuffle in
the wild (that I've heard of).
Shuffle is *supposed* to be a transparent background process... and is
designed, in theory, to take a long time to run (weeks is the right way to
think of it).
Be sure to keep an eye on your
It is a little more involved than just changing the heap size. Every
cluster is different, so there isn't much of a set formula. Some areas to
look into, though:
**Caveat, we're still running in the 1.2 branch and 2.0 has some
differences in what is on versus off heap memory usage, but the basic
Silly question, M us thousand or million? In print, thousand is M, fwiw
Sent from my Droid
On Jan 23, 2011 7:26 PM, "Maxim Potekhin" wrote:
> Aaron -- thanks!
> I don't have examples like Timo.
> But,
> I'm keen to use multiple indices over a database
> of 300M rows.
> Maxim
> On 1
y system generates 1 million (large) records every three days,
> Cheers,
> Maxim
> On 1/23/2011 8:35 PM, David McNelis wrote:
>> Silly question, M us thousand or million? In print, thousand is M, fwiw
>> Sent from my Droid
>> On
ess. Emails are
> not necessarily secure. The WorldPay Group does not accept responsibility
> for changes made to this message after it was sent. Please note that neither
> the WorldPay Group nor the sender accepts any responsibility for viruses and
> it is the responsibility of the re
hour (hour)
> Validation Class: org.apache.cassandra.db.marshal.IntegerType
> Index Type: KEYS
>Column Name: day (day)
> Validation Class: org.apache.cassandra.db.marshal.IntegerType
> Index Type: KEYS
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A Smart Grid technology company focused on helping consumers of energy
control an often under-managed resource.*
at if start (or end) columns don't exist? I'm
> guessing it's smart enough to get the columns in that range.
> Thanks!
> Bill-
> On Wed, Jan 26, 2011 at 4:12 PM, David McNelis
> wrote:
> > I would say in that case you might want to try a singl
listen_address to my server's
IP address instead of and that caused me not to be able to connect
in general. My rpc_address is
Has anyone else experienced this or have an inclination where I'm going
*David McNelis*
Lead Softwar
nd it came down to php not being
> able to resolve the target ... the solution was to add an entry to the
> hosts file ... of course, if there is a firewall blocking ... that's
> your problem. can you telnet from remote server to cassandra server
> on port 9160?
> O
t in short,
it looks like our main problem is with the Thrift PHP module.
On Wed, Mar 2, 2011 at 9:04 AM, David McNelis wrote:
> We are able to telnet to port 9160, and didn't have any issues resolving
> the target along those lines. So at this point, I don't think we're lo
In case anyone is interested. Our problem revolved around one machine
having the phpcassa thrift patch, and the other did not. Its resolved now.
On Wed, Mar 2, 2011 at 10:25 AM, David McNelis
> It looks like we are having an issue with the Thrift installation on the
> 'oth
>> is there an easy way to 'un-mess' things when the ip of a server is
> changed? updating the cassandra.yaml didn't help. when the member with the
> changed ip comes up, it's fine ... but other members in the ring don't see
> it and keep the old ip address re
That kind of aggregation is certainly possible today, programmatically...
but if you want to do it in cassandra only, you are out of luck, today.
But it sounds like the project DataStax just announced might help quite a
bit with a use case like that.
Sent from my Droid
On Mar 25, 2011 3:58
g in my attempt to
create the connection? Or am do I likely have something mis-configured in
my cassandra instance (which is stock, outside of having data upgraded from
*David McNelis*
Lead Software Engineer
Agentis Energy
o: 630.359.6395
c: 219.384.5143
*A S
at CqlTest.main(CqlTest.java:25)
On Wed, Apr 27, 2011 at 4:27 PM, Jonathan Ellis wrote:
> What's the stacktrace?
> On Wed, Apr 27, 2011 at 9:45 AM, David McNelis
> wrote:
> > I have a feeling that I'm likely doing something dumb. I have the
> > f
Are you sure the old Cassandra jar is no
> longer on your classpath?
> On Wed, Apr 27, 2011 at 4:29 PM, David McNelis
> wrote:
> > Attached:
> > 21 [main] INFO org.apache.cassandra.cql.jdbc.Connection - Connected to
> > localhost:9160
> > Exception in thread "m
93 matches
Mail list logo