Re: apache cassandra development process and future

2018-07-24 Thread Jeremy Hanna
For full disclosure, I've been in the Apache Cassandra community since 2010 and at DataStax since 2012. So DataStax moved on to focus on things for their customers, effectively putting most development effort into DataStax Enterprise. However, there have been a lot of fixes and improvements co

Re: Partition size

2016-09-12 Thread Jeremy Hanna
Generally if you foresee the partitions getting out of control in terms of size, a method often employed is to bucket according to some criteria. For example, if I have a time series use case, I might bucket by month or week. That presumes you can foresee it though. As far as limiting that ca

Re: Index build status

2012-08-20 Thread Jeremy Hanna
For an individual node, you can check the status of building indexes using nodetool compactionstats. And similarly, if you want to speed up building the indexes (and you have the extra IO) you can increase or unthrottle your compaction throughput temporarily - nodetool setcompactionthrough 0 to

Re: Cassandra 1.1.1 on Java 7

2012-09-09 Thread Jeremy Hanna
Starting with 1.6.0_34, you'll need xss set to 180k. It's updated with the forthcoming 1.1.5 as well as the next minor rev of 1.0.x (1.0.12). https://issues.apache.org/jira/browse/CASSANDRA-4631 See also the comments on https://issues.apache.org/jira/browse/CASSANDRA-4602 for the reference to wh

Re: Differences in row iteration behavior

2012-09-14 Thread Jeremy Hanna
Are there any deletions in your data? The Hadoop support doesn't filter out tombstones, though you may not be filtering them out in your code either. I've used the hadoop support for doing a lot of data validation in the past and as long as you're sure that the code is sound, I'm pretty confid

Re: cassandra/hadoop BulkOutputFormat failures

2012-09-14 Thread Jeremy Hanna
A couple of guesses: - are you mixing versions of Cassandra? Streaming differences between versions might throw this error. That is, are you bulk loading with one version of Cassandra into a cluster that's a different version? - (shot in the dark) is your cluster overwhelmed for some reason? I

Re: is multithreaded_compaction stable?

2012-09-15 Thread Jeremy Hanna
Generally the main knob for compaction performance is compaction_throughput_in_mb in cassandra.yaml. It defaults to 16. You can use nodetool setcompactionthroughput' to set it on a running server. The next time Cassandra server starts it will use what's in the yaml again. You might try usin

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
Another option that may or may not work for you is the support in Cassandra 1.1+ to use a secondary index as an input to your mapreduce job. What you might do is add a field to the column family that represents which virtual column family that it is part of. Then when doing mapreduce jobs, you

Re: 1000's of column families

2012-10-02 Thread Jeremy Hanna
It's always had data locality (since hadoop support was added in 0.6). You don't need to specify a partition, you specify the input predicate with ConfigHelper or the cassandra.input.predicate property. On Oct 2, 2012, at 2:26 PM, "Hiller, Dean" wrote: > So you're saying that you can access th

Re: cassandra + pig

2012-10-11 Thread Jeremy Hanna
The Dachis Group (where I just came from, now at DataStax) uses pig with cassandra for a lot of things. However, we weren't using the widerow implementation yet since wide row support is new to 1.1.x and we were on 0.7, then 0.8, then 1.0.x. I think since it's new to 1.1's hadoop support, it s

Re: cassandra + pig

2012-10-11 Thread Jeremy Hanna
t; On Thu, Oct 11, 2012 at 11:25 AM, Jeremy Hanna > wrote: > The Dachis Group (where I just came from, now at DataStax) uses pig with > cassandra for a lot of things. However, we weren't using the widerow > implementation yet since wide row support is new to 1.1.x and we were on 0

Re: hadoop consistency level

2012-10-18 Thread Jeremy Hanna
On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh wrote: > On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman > wrote: >> Not sure I understand your question (if there is one..) >> >> You are more than welcome to do CL ONE and assuming you have hadoop nodes >> in the right places on your ring things

Re: leveled compaction and tombstoned data

2012-11-08 Thread Jeremy Hanna
LCS works well in specific circumstances, this blog post gives some good considerations: http://www.datastax.com/dev/blog/when-to-use-leveled-compaction On Nov 8, 2012, at 1:33 PM, Aaron Turner wrote: > "kill performance" is relative. Leveled Compaction basically costs 2x disk > IO. Look at

Re: progress of cleanup operations

2012-11-29 Thread Jeremy Hanna
You can do check nodetool compactionstats to see progress for current cleanup operations. It essentially traverses all of your sstables and removes data that the node isn't responsible for. So that's the overall operation, so you would estimate in terms of how long it would take to go through

Re: Hybrid Hadoop Cassandra Cluster

2013-01-18 Thread Jeremy Hanna
Hi Naveen, You can start with http://wiki.apache.org/cassandra/HadoopSupport but there's also a commercial product that you can use, DataStax Enterprise: http://www.datastax.com/docs/datastax_enterprise2.2/solutions/hadoop_index which makes things more streamlined, but it's a commercial product

Re: Start token sorts after end token

2013-02-01 Thread Jeremy Hanna
See https://issues.apache.org/jira/browse/CASSANDRA-5168 - should be fixed in 1.1.10 and 1.2.2. On Jan 30, 2013, at 9:18 AM, Tejas Patil wrote: > While reading data from Cassandra in map-reduce, I am getting > "InvalidRequestException(why:Start token sorts after end token)" > > Below is the c

Re: Cassandra 1.20 with Cloudera Hadoop (CDH4) Compatibility Issue

2013-02-16 Thread Jeremy Hanna
Fwiw - here is are some changes that a friend said should make C*'s Hadoop support work with CDH4 - for ColumnFamilyRecordReader. https://gist.github.com/jeromatron/4967799 On Feb 16, 2013, at 8:23 AM, Edward Capriolo wrote: > Here is the deal. > > http://wiki.apache.org/hadoop/Defining%20Hado

Re: Authentication and Authorization with Cassandra 1.2.2.

2013-02-26 Thread Jeremy Hanna
does this help? Links at the bottom show the cql statements to add/modify users: http://www.datastax.com/docs/1.2/security/native_authentication On Feb 26, 2013, at 4:06 PM, C.F.Scheidecker Antunes wrote: > Hello all, > > Cassandra has changed and now has a default authentication and authori

Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
If I remember correctly when I configured pig, cassandra, and oozie to work together, I just used vanilla pig but gave it the jars it needed. What is the problem you’re experiencing that you are unable to do this? Jeremy On 28 Nov 2013, at 12:56, Miguel Angel Martin junquera wrote: > hi all;

Re: Pig-cassandra Scritps and Oozie

2013-11-28 Thread Jeremy Hanna
rt#Oozie > > I am using Cassandra 1.2.10, Oozie 4.0.0 adn pig 0.11.1. > > I try to test these options and see if it works- > > Thanks in advance > > > > > > > > > > > > 2013/11/28 Jeremy Hanna > >> If I rememb

Re: Snappy Load Error

2013-11-29 Thread Jeremy Hanna
With RHEL, there is a problem with snappy 1.0.5. You’d need to use 1.0.4.1 which works fine but you need to download it separately and put it in your lib directory. You can find the 1.0.4.1 file from https://github.com/apache/cassandra/tree/cassandra-1.1.12/lib Jeremy On 29 Nov 2013, at 10:1

Re: Pig 0.12.0 and Cassandra 2.0.2

2013-12-13 Thread Jeremy Hanna
I need to update those to be current with the Cassandra source download. You’re right, you would just use what’s in the examples directory now for Pig. You should be able to run the examples, but generally you need to specify the partitioner of the cluster, the host name of a node in the clust

Re: What % of cassandra developers are employed by Datastax?

2014-05-16 Thread Jeremy Hanna
Of the 16 active committers, 8 are not at DataStax. See http://wiki.apache.org/cassandra/Committers. That said, active involvement varies and there are other contributors inside DataStax and in the community. You can look at the dev mailing list as well to look for involvement in more detail

Re: PIG Cassandra - IPs of nodes in a ring

2011-05-10 Thread Jeremy Hanna
ething to do with different address for rpc_address >> and listen_address but not sure what it is... >> >> >> >> -Original Message- >> From: Jeremy Hanna [mailto:jeremy.hanna1...@gmail.com] >> Sent: Friday, May 06, 2011 11:10 PM >> To: u...@

Re: Keyspace creation error on 0.8 beta2

2011-05-11 Thread Jeremy Hanna
I download a fresh 0.8 beta2 and create keyspaces fine - including the ones below. I don't know if there are relics of a previous install somewhere or something wonky about the classpath. You said that you might have /var/lib/cassandra data left over so one thing to try is starting fresh there

Re: How to configure internode encryption in 0.8.0?

2011-05-16 Thread Jeremy Hanna
Take a look at cassandra.yaml in your 0.8 download at the very bottom. There are docs and examples there. e.g. http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.8.0-beta2/conf/cassandra.yaml On May 16, 2011, at 6:36 PM, Sameer Farooqui wrote: > I understand that 0.8.0 has configurable

Re: How to configure internode encryption in 0.8.0?

2011-05-18 Thread Jeremy Hanna
eystore .keystore -rfc -file jdoe.cer > 4) cat jdoe.cer > 5) keytool -import -alias jdoecert -file jdoe.cer -keystore .truststore > 6) keytool -list -v -keystore .truststore > > > - Sameer > > On Mon, May 16, 2011 at 5:35 PM, Jeremy Hanna > wrote: > Take a

Re: [hadoop] Counters in ColumnFamilyOutputFormat?

2011-05-19 Thread Jeremy Hanna
FWIW, as I mentioned in the 1497 comments, the patch makes it abstract so that you can have any rpc/marshalling format you want with a simple extension point. So if we want to move to something besides avro, or even like I mentioned do something with Dumbo for streaming, it's easy to extend. O

Re: rainbird question (why is the 1minute buffer needed?)

2011-05-23 Thread Jeremy Hanna
On May 23, 2011, at 2:23 PM, Ryan King wrote: > On Mon, May 23, 2011 at 12:06 PM, Yang wrote: >> Thanks Ryan, >> >> could you please share more details: according to what you observed in >> testing, why was performance worse if you do not do extra buffering? >> >> I was thinking (could be wr

Re: Link & mirrors to download Cassandra is down...

2011-05-24 Thread Jeremy Hanna
The link was fixed in cassandra.apache.org/download a couple of hours ago. For the time being it may be better to scroll down to the Backup Sites section and use one of those links. On May 24, 2011, at 12:24 PM, Sameer Farooqui wrote: > http://cassandra.apache.org/download > > If you click th

Re: Forcing Cassandra to free up some space

2011-05-26 Thread Jeremy Hanna
For the purposes of clearing out disk space, you might also occasionally check to see if you have snapshots that you no longer need. Certain operations create snapshots (point-in-time backups of sstables) in the (default) /var/lib/cassandra/data//snapshots directory. If you are absolutely sure

Re: clarification of the consistency guarantees of Counters

2011-05-30 Thread Jeremy Hanna
Some more recent documentation can be found here: http://wiki.apache.org/cassandra/Counters but even that may be out of date. One thing that has been added is multiple consistency levels are supported. There are a lot of other tickets that have been completed post 1072. Search for "cassandra

Re: Loading Keyspace from YAML in 0.8

2011-06-03 Thread Jeremy Hanna
In 0.8 (and 0.7) you can have a script that you create that you can run on the CLI that creates your schema. We create something like a ddl file and run it on a new cluster. You just pass it to the cli with -f . On Jun 3, 2011, at 11:14 AM, Paul Loy wrote: > We embed cassandra in our app. Whe

Re: nosql yes but yescql, no?

2011-06-08 Thread Jeremy Hanna
I think that's partly the idea of it. CQL could end up being a way forward and it currently builds on thrift. Then if it becomes the API/client of record to build on, then it could move to something else underneath that's more efficient and CQL itself wouldn't have to change at all. On Jun 8,

Re: hadoop/pig notes

2011-06-08 Thread Jeremy Hanna
I need to update the wiki with better pig info. I did put some information in the getting started docs of pygmalion, but it would be good to transfer that to cassandra's wiki and add to it. fwiw - https://github.com/jeromatron/pygmalion/wiki/Getting-Started Thanks for the rundown William! On

Re: Purge Data

2011-06-09 Thread Jeremy Hanna
Have you looked at the TTL column feature in 0.7? http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns Those will automatically expire columns after a certain time period - not when you near the column limit, but might be helpful for you. On Jun 9, 2011, at 10:51 AM, Bahadur

Re: Python Client

2011-06-10 Thread Jeremy Hanna
I would take a look at pycassa - https://github.com/pycassa/pycassa though there is also a twisted client named Telephus - http://github.com/driftx/Telephus. The complete list of current client language options are found here: http://wiki.apache.org/cassandra/ClientOptions On Jun 10, 2011, at

Re: New web client & future API

2011-06-15 Thread Jeremy Hanna
Yes - avro is alive and well. Avro as an RPC alternative for Cassandra is dead. See reasoning here: http://goo.gl/urENc On Jun 15, 2011, at 8:28 AM, Holger Hoffstaette wrote: > On Wed, 15 Jun 2011 10:04:53 +1200, aaron morton wrote: > >> Avro is dead. > > Just so that this is not misundersto

useful little way to run locally with (pig|hive) && cassandra

2011-06-15 Thread Jeremy Hanna
We started doing this recently and thought it might be useful to others. Pig (and Hive) have a sample function that allows you to sample data from your data store. In pig it looks something like this: mysample = SAMPLE myrelation 0.01; One possible use for this, with pig and cassandra is to sol

Re: useful little way to run locally with (pig|hive) && cassandra

2011-06-15 Thread Jeremy Hanna
ng keys even if you sampled in a way that didn't actually > produce any, etc. > > D > > On Wed, Jun 15, 2011 at 10:35 AM, Jeremy Hanna > wrote: >> We started doing this recently and thought it might be useful to others. >> >> Pig (and Hive) have a sample

Re: prep for cassandra storage from pig

2011-06-15 Thread Jeremy Hanna
Hi Will, That's partly why I like to use FromCassandraBag and ToCassandraBag from pygmalion - it does the work for you to get it back into a form that cassandra understands. Others may know better how to massage the data into that form using just pig, but if all else fails, you could write a u

Re: prep for cassandra storage from pig

2011-06-15 Thread Jeremy Hanna
(the script). > > On Wed, Jun 15, 2011 at 3:04 PM, Jeremy Hanna > wrote: > >> Hi Will, >> >> That's partly why I like to use FromCassandraBag and ToCassandraBag from >> pygmalion - it does the work for you to get it back into a form that >> cassandr

Re: pig integration & NoClassDefFoundError TypeParser

2011-06-20 Thread Jeremy Hanna
Try running with cdh3u0 version of pig and see if it has the same problem. They backported the patch (to pig 0.9 which should be out in time for the hadoop summit next week) that adds the updated jackson dependency for avro. The download URL for that is - http://archive.cloudera.com/cdh/3/pig

Re: pig integration & NoClassDefFoundError TypeParser

2011-06-20 Thread Jeremy Hanna
t; for jar in `ls *.jar` > do > jar -tf $jar | grep TypeParser > if [ $? -eq 0 ]; then > echo $jar > fi > done > > Shows me nothing in all the lib dirs > > > > On Mon, Jun 20, 2011 at 8:44 PM, Jeremy Hanna > wrote: >> Try running with cdh3u0 v

Re: pig integration & NoClassDefFoundError TypeParser

2011-06-20 Thread Jeremy Hanna
sr/local/src/apache-cassandra-0.8.0-src# echo $? >> 1 >> /usr/local/src/apache-cassandra-0.8.0-src# >> >> /usr/local/src/apache-cassandra-0.8.0-src# grep -Ri TypeError . >> /usr/local/src/apache-cassandra-0.8.0-src# echo $? >> 1 >> /usr/local/src/apache-cassa

Re: pig integration & NoClassDefFoundError TypeParser

2011-06-20 Thread Jeremy Hanna
ype$3.classBytesType.class >> MarshalException.class UUIDType.class >> AbstractType$4.classCounterColumnType.class >> TimeUUIDType.class >> AbstractType$5.classIntegerType.class >> UTF8Type$1.class >> &g

Re: Keys-only query

2011-06-21 Thread Jeremy Hanna
Also - there is an open ticket to create a .NET CQL driver - may be worth watching or if you'd like to help out with it somehow: https://issues.apache.org/jira/browse/CASSANDRA-2634 On Jun 21, 2011, at 9:31 AM, Stephen Pope wrote: > We just recently switched to 0.8 (from 0.7.4), and it looks lik

Re: solandra or pig or....?

2011-06-21 Thread Jeremy Hanna
Just wanted to mention that there is also a #solandra irc channel on freenode in case people are interested. On Jun 21, 2011, at 1:26 PM, Mark Kerzner wrote: > Me too! > > I would be interested to know how such queries are done in Solandra. I would > understand it if it creates a complete Luce

Re: bulk load

2011-06-22 Thread Jeremy Hanna
This ticket's outcome replaces what BMT was supposed to do: https://issues.apache.org/jira/browse/CASSANDRA-1278 0.8.1 is being voted on now and will hopefully be out in the next day or two. You can try it out with the 0.8-branch if you want - looking near the bottom of the comments on the ticke

Re: replacement for KsDef.replication_factor (deprecated in 0.8 API)

2011-06-27 Thread Jeremy Hanna
The replacement is to use the replication_factor variable in strategy options. If you look in http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.8.0/conf/schema-sample.txt you can see an example of that. The issue to do that was https://issues.apache.org/jira/browse/CASSANDRA-1263 The

ruby client struggling with UTF8 inserts?

2011-07-02 Thread Jeremy Hanna
A coworker of mine in the UK has been having problems with inserting UTF8 Strings into Cassandra using the Ruby thrift client. I'm just wondering if anyone else is seeing this or if they have a workaround. It may have to do with ruby/thrift itself: https://issues.apache.org/jira/browse/THRIFT-

secondary index performance

2011-07-03 Thread Jeremy Hanna
Anyone know if secondary index performance should be in the 100-500 ms range. That's what we're seeing right now when doing lookups on a single value. We've increased keys_cached and rows_cached to 100% for that column family and assume that the secondary index gets the same attributes. I've

Re: secondary index performance

2011-07-03 Thread Jeremy Hanna
On Jul 3, 2011, at 4:29 PM, Jeremy Hanna wrote: > Anyone know if secondary index performance should be in the 100-500 ms range. > That's what we're seeing right now when doing lookups on a single value. > We've increased keys_cached and rows_cached to 100% for t

Pig pulling an older value from cassandra

2011-07-06 Thread Jeremy Hanna
I'm seeing some strange behavior and not sure how it is possible. We updated some data using a pig script and that wrote back to cassandra. We get the value and list the value on the Cassandra CLI and it's the updated value - from MARKET to market. However, when doing a pig script to filter b

Re: Pig pulling an older value from cassandra

2011-07-08 Thread Jeremy Hanna
n the background, so it > may only be visible to subsequent reads. > > Cheers > > - > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 6 Jul 2011, at 20:52, Jeremy Hanna wrote: > >>

Re: Multiple input column families in Cassandra Hadoop mapreduce

2011-07-15 Thread Jeremy Hanna
+1 - We do a lot of this with Pig - joining over several column families. Pig makes it just work. I think Hive does something similar. Unless you really need that much control over your process, I would really use one of those two. On Jul 15, 2011, at 5:28 PM, Jonathan Ellis wrote: > The eas

Re: Is there a way to read a Double value from the CLI?

2011-07-18 Thread Jeremy Hanna
I know additional types have been added as of 0.8.1: https://issues.apache.org/jira/browse/CASSANDRA-2530 However, I'm not sure how those have propagated up to validators, the CLI, and hector though. On Jul 18, 2011, at 4:16 PM, Sameer Farooqui wrote: > I wrote some data to a standard column fa

Re: My "nodetool" in Java

2011-07-20 Thread Jeremy Hanna
If you look at the bin/nodetool file, it's just a shell script to run org.apache.cassandra.tools.NodeCmd. You could probably call that directly from your code. On Jul 20, 2011, at 3:18 PM, cbert...@libero.it wrote: > Hi all, > I'd like to build something like "nodetool" to show the status of t

Re: cqlsh error using assume

2011-07-21 Thread Jeremy Hanna
Just saw this and created a lhf ticket for it - http://issues.apache.org/jira/browse/CASSANDRA-2932 On Jul 21, 2011, at 8:20 AM, Stephen Pope wrote: > Boo-urns. Ok, thanks. > > -Original Message- > From: Brandon Williams [mailto:dri...@gmail.com] > Sent: Thursday, July 21, 2011 9:10 AM

Re: Error when set memtable_troughput with Cassandra-CLI

2011-07-27 Thread Jeremy Hanna
Try help on the CLI for how to do it, specifically "help update column family;" It looks like you're missing the "with." update column family columnfamily2 memtable_throughput=155; should be update column family columnfamily2 with memtable_throughput=155; On Jul 27, 2011, at 12:49 PM, lebron j

Re: Cassandra timeout exception when works with hadoop

2011-07-28 Thread Jeremy Hanna
See http://wiki.apache.org/cassandra/HadoopSupport#Troubleshooting - I would probably start with setting your rpc_timeout_in_ms to something like 3. On Jul 28, 2011, at 11:09 AM, Jian Fang wrote: > Hi, > > I run Cassandra 0.8.2 and hadoop 0.20.2 on three nodes, each node includes a > Cassa

Re: Cassandra timeout exception when works with hadoop

2011-07-28 Thread Jeremy Hanna
exceptions when I use hector to get > back data. > > Thanks, > > John > > On Thu, Jul 28, 2011 at 12:45 PM, Jian Fang > wrote: > > My current setting is 1. I will try 3. > > Thanks, > > John > > On Thu, Jul 28, 2011 at 12:

Re: Cassandra Pig with network topology and data centers.

2011-07-29 Thread Jeremy Hanna
fwiw - https://issues.apache.org/jira/browse/CASSANDRA-2970 thoughts? (please post on the ticket) On Jul 29, 2011, at 7:08 PM, Ryan King wrote: > It'd be great if we had different settings for inter- and intra-DC read > repair. > > -ryan > > On Fri, Jul 29, 2011 at 5:06 PM, Jake Luciani wrot

Re: Brisk and Hadoop question

2011-07-31 Thread Jeremy Hanna
Check out http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig and that whole page to see an intro to configuring your cluster. Brisk extends these basic ideas. On Jul 31, 2011, at 12:31 PM, mcasandra wrote: > Is it possible to add brisk nodes for analytics to already existing real tim

Re: Install Cassandra on EC2

2011-08-03 Thread Jeremy Hanna
Some quick thoughts that might be helpful: - use ephemeral instances and RAID0 over the local volumes for both cassandra's data as well as the log directory. The log directory because if you crash due to heap size, the heap dump will be stored in the log directory. you don't want that to go i

Re: cassandra 0.8.2 build failure (missing 2 artifacts)

2011-08-05 Thread Jeremy Hanna
That is something we have to update, thanks for mentioning that. We should just be depending on apache hadoop components now that we are no longer supporting hadoop output streaming. On Aug 5, 2011, at 10:27 AM, Dean Hiller wrote: > oh, cloudera repo is down like a previous poster just said...

Re: Cloudera repo down?

2011-08-05 Thread Jeremy Hanna
It won't be required in the future: https://issues.apache.org/jira/browse/CASSANDRA-2998 On Aug 5, 2011, at 1:34 PM, Martin Lansler wrote: > It solved itself as the cloudera repo is up again now... > > -Martin > > On Fri, Aug 5, 2011 at 12:06 PM, Martin Lansler > wrote: >> Hi, >> >> I'm tryin

Re: Client traffic encryption best practices....

2011-08-12 Thread Jeremy Hanna
Yes - that ticket was done by Nirmal Ranganathan for the intention of getting support in Cassandra. That's just for a java client though. In the future, I wonder if the CQL driver level is the right place for client encryption. On Aug 11, 2011, at 11:26 PM, Vijay wrote: > https://issues.apach

Re: Client traffic encryption best practices....

2011-08-12 Thread Jeremy Hanna
/browse/THRIFT-151 C# (patch attached but no progress in a while): https://issues.apache.org/jira/browse/THRIFT-181 PHP (patch attached but no progress in a while): https://issues.apache.org/jira/browse/THRIFT-948 On Aug 12, 2011, at 9:39 AM, Jeremy Hanna wrote: > Yes - that ticket was done by Nir

Re: What causes dropped messages?

2011-08-16 Thread Jeremy Hanna
http://wiki.apache.org/cassandra/FAQ#dropped_messages As to what's causing them - look in the logs and it will do the equivalent of a nodetool tpstats right after the dropped messages messages. That should give you a clue as to why there are dropped messages - which thread pools are backed up

hints system CF getting out of control

2011-08-18 Thread Jeremy Hanna
We're trying to bootstrap some new nodes and it appears when adding a new node that there is a lot of logging on hints being flushed and compacted. It's been taking about 75 minutes thus far to bootstrap for only about 10 GB of data. It's ballooned up to over 40 GB on the new node. I do 'ls -

Re: hints system CF getting out of control

2011-08-18 Thread Jeremy Hanna
: > I would assume it's because it thinks some node is down and is > creating hints for it. > > On Thu, Aug 18, 2011 at 6:31 PM, Jeremy Hanna > wrote: >> We're trying to bootstrap some new nodes and it appears when adding a new >> node that there is a lot

4/20 nodes get disproportionate amount of mutations

2011-08-22 Thread Jeremy Hanna
We've been having issues where as soon as we start doing heavy writes (via hadoop) recently, it really hammers 4 nodes out of 20. We're using random partitioner and we've set the initial tokens for our 20 nodes according to the general spacing formula, except for a few token offsets as we've re

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
On Aug 23, 2011, at 2:25 AM, Peter Schuller wrote: >> We've been having issues where as soon as we start doing heavy writes (via >> hadoop) recently, it really hammers 4 nodes out of 20. We're using random >> partitioner and we've set the initial tokens for our 20 nodes according to >> the ge

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-23 Thread Jeremy Hanna
m unreasonable - about a MB. I turned up logging to DEBUG for that class and I get plenty of dropped READ_REPAIR messages, but nothing coming out of DEBUG in the logs to indicate the time taken that I can see. > > Cheers > > - > Aaron Morton > Freelance Cass

Re: Memory overhead of vector clocks…. how often are they pruned?

2011-08-24 Thread Jeremy Hanna
At the point that book was written (about a year ago it was finalized), vector clocks were planned. In August or September of last year, they were removed. 0.7 was released in January. The ticket for vector clocks is here and you can see the reasoning for not using them at the bottom. https

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-25 Thread Jeremy Hanna
ext token of a different rack (depending on which it is looking for). So that is why alternating by rack is important. That might be able to be smarter in the future which would be nice - to not have to care and let Cassandra spread the replication around intelligently. On Aug 23, 2011, at 6:02 A

Re: 4/20 nodes get disproportionate amount of mutations

2011-08-28 Thread Jeremy Hanna
in token order, that can lead to serious hotspots. For more on this with ec2, see: http://www.slideshare.net/mattdennis/cassandra-on-ec2/5 where he talks about alternating zones. On Aug 25, 2011, at 10:45 AM, mcasandra wrote: > Thanks for the update > > Jeremy Hanna wrote: >&

minor compaction of secondary index that no longer exists?

2011-08-28 Thread Jeremy Hanna
I was watching compactionstats via opscenter and saw one of my nodes was minor compacting a secondary index column family. Problem is I removed all of my secondary indexes on Friday and just double checked on the CLI with 'show keyspaces;' and sure enough, no secondary indexes. Is this a bug?

Matt Dennis' presentation on Cassandra best practices on EC2

2011-08-29 Thread Jeremy Hanna
Just wanted to let people know about a great presentation that Matt Dennis did here at the Cassandra Austin meetup. It's on Cassandra best practices on EC2. We found the presentation extremely helpful. http://www.slideshare.net/mattdennis/cassandra-on-ec2

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to potentially move to Brisk because of the simplicity of operations there. Not sure what you mean about the true power of Hadoop. In my mind the true power of Hadoop is the ability to parallelize jobs and send each task to wher

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
I would not use nano time with cassandra. Internally and throughout the clients, milliseconds is pretty much a standard. You can get into trouble because when comparing nanoseconds with milliseconds as long numbers, nanoseconds will always win. That bit us a while back when we deleted someth

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
Ed- you're right - milliseconds * 1000. That's right. The other stuff about nano time still stands, but you're right - microseconds. Sorry about that. On Aug 30, 2011, at 1:20 PM, Edward Capriolo wrote: > > > On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna > wr

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
0. > > Anyone sees problem with this approach? > > On Tue, Aug 30, 2011 at 2:20 PM, Edward Capriolo > wrote: >> >> >> On Tue, Aug 30, 2011 at 1:41 PM, Jeremy Hanna >> wrote: >>> >>> I would not use nano time with cassandra. Internall

Re: Recommendations on moving to Hadoop/Hive with Cassandra + RDBMS

2011-08-30 Thread Jeremy Hanna
/repos/asf/cassandra/trunk/contrib/pig. Are there any > other resource that you can point me to? There seems to be a lack of samples > on this subject. > > On Tue, Aug 30, 2011 at 10:56 PM, Jeremy Hanna > wrote: > FWIW, we are using Pig (and Hadoop) with Cassandra and are looking to

Re: Updates lost

2011-08-30 Thread Jeremy Hanna
rive the current time in > nano seconds though? > > On Tue, Aug 30, 2011 at 2:39 PM, Jeremy Hanna > wrote: >> Yes - the reason why internally Cassandra uses milliseconds * 1000 is >> because System.nanoTime javadoc says "This method can only be used to >> mea

Re: Cassandra prod environment

2011-09-02 Thread Jeremy Hanna
We moved off of ubuntu because of kernel issues in the AMIs we found in 10.04 and 10.10 in ec2. So we're now on debian squeeze with ext4. It's been great for us. One thing that bit us is we'd been using property file snitch and the availability zones as racks and had an equal number of nodes

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
I would look at http://www.slideshare.net/mattdennis/cassandra-on-ec2 Also, people generally do raid0 on the ephemerals. EBS is a bad fit for cassandra - see the presentation above. However, that means you'll need to have a backup strategy, which is also mentioned in the presentation. Also ar

Re: need help setting up production environment

2011-09-03 Thread Jeremy Hanna
I dont remember setting up snitch. > > The servers are all in a VPC, the only thing I did was configure the seed IP > so all the nodes can see each other. > > Ben > > On Sat, Sep 3, 2011 at 11:13 PM, Jeremy Hanna > wrote: > I would look at http://www.slideshar

Re: cassandra 0.8.4 + pig (using cloudera rpms)

2011-09-04 Thread Jeremy Hanna
Thanks William - so you were able to get everything running correctly, right? FWIW, we're in the process of upgrading to 0.8.4 and found that all we needed was that first link you mentioned - the VersionedValue modification. It's running fine on our staging cluster and we're in the process of m

Re: Any tentative data for 0.8.5 release?

2011-09-07 Thread Jeremy Hanna
The voting started on Monday and is a 72 hour vote. So if there aren't any problems that people find, it should be released sometime Thursday (7 September). On Sep 7, 2011, at 10:41 AM, Roshan Dawrani wrote: > Hi, > > Quick check: is there a tentative date for release of Cassandra 0.8.5? > >

Re: Anybody out there using 0.8 in production

2011-09-08 Thread Jeremy Hanna
We run 0.8 in production and it's been working well for us. There are some new settings that we had to tune for - for example, the default concurrent compaction is the number of cores. We had to tune that down because we also run hadoop jobs on our nodes. On Sep 8, 2011, at 4:44 PM, Anand Som

Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
We are experiencing massive writes to column families when only doing reads from Cassandra. A set of 5 hadoop jobs are reading from Cassandra and then writing out to hdfs. That is the only thing operating on the cluster. We are reading at CL.QUORUM with hadoop and have written with CL.QUORUM.

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
0 0 InternalResponseStage 0 0 0 HintedHandoff 0 0 0 CompactionManager n/a29 MessagingServicen/a 0,34 On Sep 10, 2011, at 3:38 PM, Jeremy Hanna wrote: > We are experiencing mass

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
Oh and we're running 0.8.4 and the RF is 3. On Sep 10, 2011, at 3:49 PM, Jeremy Hanna wrote: > In addition, the mutation stage and the read stage are backed up like: > > Pool NameActive Pending Blocked > ReadStage32

Re: Massive writes when only reading from Cassandra

2011-09-10 Thread Jeremy Hanna
t; 2) You have something doing writes that you're not aware of, I guess > you could track that down using wireshark to see where the write > messages are coming from > > On Sat, Sep 10, 2011 at 3:56 PM, Jeremy Hanna > wrote: > > Oh and we're running 0.8.4 and the RF

Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
We just tried to disable hinted handoff by setting: hinted_handoff_enabled: false in all the nodes of our cluster and restarting them. When they come back up, we continue to see things like this: INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java (line 323) Started hinted h

Re: Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
rowse/CASSANDRA-3176 On Sep 10, 2011, at 5:50 PM, Jeremy Hanna wrote: > INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java > (line 323) Started hinted handoff for endpoint /10.1.2.3 > INFO [HintedHandoff:1] 2011-09-10 22:41:40,813 HintedHandOffManager.java > (l

Re: Disabling hinted handoff doesn't work in 0.8.4?

2011-09-10 Thread Jeremy Hanna
Turned out that wasn't a problem - I put some notes on the ticket. On Sep 10, 2011, at 6:22 PM, Jeremy Hanna wrote: > I tried looking through the source to see if the log statements would happen > regardless but it doesn't look like it. Also I looked at one of the nodes >

Re: Replace Live Node

2011-09-12 Thread Jeremy Hanna
Yeah - I would bootstrap at initial_token of -1 the current one. Then once that has bootstrapped, then decommission the old one. Avoid trying to use removetoken on anything before 0.8.3. Use decommission if you can if you're dealing with a live node. On Sep 12, 2011, at 10:42 AM, Kyle Gibson

  1   2   3   >