Re: Support/Consulting companies

2016-08-19 Thread Jim Ancona
There's also a list of companies that provide Cassandra-related services on the wiki: https://wiki.apache.org/cassandra/ThirdPartySupport Jim On Fri, Aug 19, 2016 at 3:37 PM, Chris Tozer wrote: > Instaclustr ( Instaclustr.com ) also offers Cassandra consulting > > > On Friday, August 19, 2016,

Re: Isolation in case of Single Partition Writes and Batching with LWT

2016-09-12 Thread Jim Ancona
Mark, Is there some official Apache policy on which sites it's appropriate to link to on an Apache mailing list? If so, could you please post a link to it so we can all understand the rules. Or is this your personal opinion on what you'd like to see here? Thanks! On Mon, Sep 12, 2016 at 7:34 AM,

Re: How to query '%' character using LIKE operator in Cassandra 3.7?

2016-09-22 Thread Jim Ancona
To answer DuyHai's question without introducing new syntax, I'd suggest: LIKE '%%%escape' means STARTS WITH '%' AND ENDS WITH 'escape' So the first two %'s are translated to a literal, non-wildcard % and the third % is a wildcard because it's not doubled. Jim On Thu, Sep 22, 2016 at 11:40 AM, M

Re: Cassandra users survey

2015-10-01 Thread Jim Ancona
Hi Jonathan, The survey asks about "your application." We have multiple applications using Cassandra. Are you looking for information about each application separately, or the sum of all of them? Jim On Wed, Sep 30, 2015 at 2:18 PM, Jonathan Ellis wrote: > With 3.0 approaching, the Apache Cass

Re: Replicating Data Between Separate Data Centres

2015-12-14 Thread Jim Ancona
Could you define what you mean by Casual Consistency and explain why you think you won't have that when using LOCAL_QUORUM? I ask because LOCAL_QUORUM and multiple data centers are the way many of us handle DR, so I'd like to understand why it doesn't work for you. I'm afraid I don't understand yo

Data Modeling: Partition Size and Query Efficiency

2016-01-04 Thread Jim Ancona
A problem that I have run into repeatedly when doing schema design is how to control partition size while still allowing for efficient multi-row queries. We want to limit partition size to some number between 10 and 100 megabytes to avoid operational issues. The standard way to do that is to figur

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
uster by querying each partition one at a time. > > Unfortunately due to the artificial partition key segment you cannot > iterate or page in any particular order...(at least across partitions) > Unless your hash function can also provide you some ordering guarantees. > > It all just depe

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
. Hence my reference to a "nasty distributed consensus problem" and Clint's reference to an "anti-pattern". I'd like to avoid it if I can. Jim > > -- Jack Krupansky > > On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona wrote: > >> Thanks for respondin

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
Hi Nate, Yes, I've been thinking about treating customers as either small or big, where "small" ones have a single partition and big ones have 50 (or whatever number I need to keep sizes reasonable). There's still the problem of how to handle a small customer who becomes too big, but that will hap

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-05 Thread Jim Ancona
> > Clint > On Jan 5, 2016 2:28 PM, "Jim Ancona" wrote: > >> Hi Nate, >> >> Yes, I've been thinking about treating customers as either small or big, >> where "small" ones have a single partition and big ones have 50 (or >> w

Re: Data Modeling: Partition Size and Query Efficiency

2016-01-06 Thread Jim Ancona
iters. What happens when you change the number of buckets? Does existing data have to be re-written into new buckets? If so, how do you make sure that's only done once for each bucket size increase? Or perhaps I'm misunderstanding your suggestion? Jim > On Tue, Jan 5, 2016 at 2:17 PM

Re: Cassandra Connection Pooling

2016-01-28 Thread Jim Ancona
It's typically handled by your client (e.g. https://docs.datastax.com/en/latest-java-driver/index.html) along with retries, timeouts and all the other things you would put in your datasource config for a SQL database in JBoss. On Thu, Jan 28, 2016 at 5:31 PM, KAMM, BILL wrote: > Hi, I’m looking

Re: Writing a large blob returns WriteTimeoutException

2016-02-08 Thread Jim Ancona
The "if not exists" in your INSERT means that you are incurring a performance hit by using Paxos. Do you need that? Have you tried your test without it? Jim

Re: best ORM for cassandra

2016-02-10 Thread Jim Ancona
Recent versions of the Datastax Java Driver include an object mapping API that might work for you: http://docs.datastax.com/en/latest-java-driver/java-driver/reference/objectMappingApi.html Jim On Wed, Feb 10, 2016 at 4:29 AM, Nirmallya Mukherjee wrote: > I have heard of that but I like to redu

Re: Is it possible to achieve "sticky" request routing?

2016-04-05 Thread Jim Ancona
Jon and Steve: I don't understand your point. The TokenAwareLoadBalancer identifies the nodes in the cluster that own the data for a particular token and route requests to one of them. As I understand it, the OP wants to send requests for a particular token to the same node every time (assuming it

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jim Ancona
Jack, the Datastax link he posted ( http://www.datastax.com/dev/blog/thrift-to-cql3) says that for column families with mixed dynamic and static columns: "The only solution to be able to access the column family fully is to remove the declared columns from the thrift schema altogether..." I think t

Re: Migrating to CQL and Non Compact Storage

2016-04-11 Thread Jim Ancona
ncerned: >> https://www.oreilly.com/ideas/apache-cassandra-for-analytics-a-performance-and-storage-analysis >> >> The flexibility of Cql comes at heavy cost until 3.x. >> >> >> >> Thanks >> Anuj >> Sent from Yahoo Mail on Android >> <http

nodetool cfstats and compression

2012-09-14 Thread Jim Ancona
Do the row size stats reported by 'nodetool cfstats' include the effect of compression? Thanks, Jim

Re: Effective partition key for time series data, which allows range queries?

2017-04-04 Thread Jim Ancona
The typical recommendation for maximum partition size is on the order of 100mb and/or 100,000 rows. That's not a hard limit, but you may be setting yourself up for issues as you approach or exceed those numbers. If you need to reduce partition size, the typical way to do this is by "bucketing," th

Re: Effective partition key for time series data, which allows range queries?

2017-04-05 Thread Jim Ancona
t to the new bucket size. You definitely don't want to > paint yourself into a corner where you need a smaller bucket size but your > data model didn't leave room for it. > > On Tue, Apr 4, 2017 at 2:59 PM Jim Ancona wrote: > >> The typical recommendation for maximu

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jim Ancona
There are clever ways to encode coordinates into a single scalar value where points that are close on a surface are also close in value, making queries efficient. Examples are Geohash and Google's S2

Re: Smart Table creation for 2D range query

2017-05-09 Thread Jim Ancona
ifferent nodes, and you > end up doing a scatter gather. > > If the goal is to provide a scalable solution, building a table that > functions as an R-Tree or Quad Tree is the only way I know that can solve > the problem without scanning the entire cluster. > > Jon > > On Ma

Re: Way to Cassandra File System

2015-03-24 Thread Jim Ancona
There's also Brisk (https://github.com/riptano/brisk), the original open source version of CFS before Riptano/Datastax made it proprietary. It's been moribund for years, but there does appear to be a fork with commits up to 2013: https://github.com/milliondreams/brisk Jim On Tue, Mar 24, 2015 at

Re: How to store unique visitors in cassandra

2015-04-01 Thread Jim Ancona
Very interesting. I had saved your email from three years ago in hopes of an elegant answer. Thanks for sharing! Jim On Tue, Mar 31, 2015 at 8:16 AM, Alain RODRIGUEZ wrote: > People keep asking me if we finally found a solution (even if this is 3+ > years old) so I will just update this thread

Re: What % of cassandra developers are employed by Datastax?

2014-05-23 Thread Jim Ancona
I took a look at the Ohloh stats here: https://www.ohloh.net/p/cassandra/contributors/summary Note that committers are not the same as contributors. Dozens of people contribute patches that are committed to the codebase without being committers. Over the last year, the top four contributors (Jona

UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-23 Thread Jim Ancona
Since upgrading to 0.7.6-2 I'm seeing the following exception in our server logs: ERROR [MutationStage:1184874] 2011-06-22 23:59:43,867 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[MutationStage:1184874,5,main] java.lang.UnsupportedOperationException: Index manager cann

Re: UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-23 Thread Jim Ancona
Is there any reason this fix can't be back-ported to 0.7? Jim On Thu, Jun 23, 2011 at 3:00 PM, Jonathan Ellis wrote: > Sorry, 0.8.2 is correct. > > On Thu, Jun 23, 2011 at 1:36 PM, Les Hazlewood wrote: >> The issue has the fix version as 0.8.2, not 0.7.7.  Is that incorrect? >> Cheers, >> Les >

Re: UnsupportedOperationException: Index manager cannot support deleting and inserting into a row in the same mutation

2011-06-23 Thread Jim Ancona
.6, not in production, but this is not "mostly a non-problem" here. Jim On Thu, Jun 23, 2011 at 3:25 PM, Jonathan Ellis wrote: > The patch probably applies as-is but I don't want to take any risks > with 0.7 to solve what is mostly a non-problem. > > On Thu, Jun 23, 2011 at 2:

Cassandra client loses connectivity to cluster

2011-06-29 Thread Jim Ancona
In reviewing client logs as part of our Cassandra testing, I noticed several Hector "All host pools marked down" exceptions in the logs. Further investigation showed a consistent pattern of "java.net.SocketException: Broken pipe" and "java.net.SocketException: Connection reset" messages. These erro

Re: do I need to add more nodes? minor compaction eat all IO

2011-07-26 Thread Jim Ancona
On Mon, Jul 25, 2011 at 6:41 PM, aaron morton wrote: > There are no hard and fast rules to add new nodes, but here are two > guidelines: > > 1) Single node load is getting too high, rule of thumb is 300GB is probably > too high. What is that rule of thumb based on? I would guess that working se

Re: Damaged commit log disk causes Cassandra client to get stuck

2011-08-02 Thread Jim Ancona
Ideally, I would hope that a bad disk wouldn't hang a node but would instead just cause writes to fail, but if that is not the case, perhaps the bad disk somehow wedged that server node completely so that requests were not being processed at all (maybe not even being timed out). At that point you'd

Re: Damaged commit log disk causes Cassandra client to get stuck

2011-08-02 Thread Jim Ancona
2011 at 4:19 PM, Jim Ancona wrote: > Ideally, I would hope that a bad disk wouldn't hang a node but would > instead just cause writes to fail, but if that is not the case, > perhaps the bad disk somehow wedged that server node completely so > that requests were not being processed a

Re: cassandra server disk full

2011-08-02 Thread Jim Ancona
On Mon, Aug 1, 2011 at 6:12 PM, Ryan King wrote: > On Fri, Jul 29, 2011 at 12:02 PM, Chris Burroughs > wrote: >> On 07/25/2011 01:53 PM, Ryan King wrote: >>> Actually I was wrong– our patch will disable gosisp and thrift but >>> leave the process running: >>> >>> https://issues.apache.org/jira/br

Re: Trying to find the problem with a broken pipe

2011-08-02 Thread Jim Ancona
On Tue, Aug 2, 2011 at 4:36 PM, Anthony Ikeda wrote: > I'm not sure if this is a problem with Hector or with Cassandra. > We seem to be seeing broken pipe issues with our connections on the client > side (Exception below). A bit of googling finds possibly a problem with the > amount of data we are

Re: Trying to find the problem with a broken pipe

2011-08-02 Thread Jim Ancona
Cassandra 0.8.1, Hector 0.8.0-1 Our issue is occurring with Cassandra 0.7.8 and Hector 0.7-30. We plan to deploy Hector 0.7-31 this week and to turn on useSocketKeepalive. Are you using that? We're also using tcpdump to capture packets when failures occur to see if there are anomalies i

Re: Updates lost

2011-08-31 Thread Jim Ancona
You could also look at Hector's approach in: https://github.com/rantav/hector/blob/master/core/src/main/java/me/prettyprint/cassandra/service/clock/MicrosecondsSyncClockResolution.java It works well and I believe there was some performance testing done on it as well. Jim On Tue, Aug 30, 2011 at

Re: Professional Support

2011-09-06 Thread Jim Ancona
We use Datastax (http://www.datastax.com) and we have been very happy with the support we've received. We haven't tried any of the other providers on that page, so I can't comment on them. Jim (Disclaimer: no connection with Datastax other than as a satisfied customer.) On Tue, Sep 6, 2011 at 1:

Re: Cassandra client loses connectivity to cluster

2011-09-06 Thread Jim Ancona
"Lesson: when you have millions of users it becomes easier to say things about averages, but harder to do the same for extremes." Jim On Wed, Jun 29, 2011 at 5:42 PM, Jim Ancona wrote: > In reviewing client logs as part of our Cassandra testing, I noticed > several Hector &quo

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-12 Thread Jim Ancona
On Mon, Sep 12, 2011 at 1:44 PM, Peter Schuller wrote: >> I am using 0.7.4.  so it is always okay to do the routine repair on >> Column Family basis? thanks! > > It's "okay" but won't do what you want; due to a bug you'll see > streaming of data for other column families than the one you're trying

Re: TransportException when storing large values

2011-09-20 Thread Jim Ancona
Pete, See this thread http://groups.google.com/group/hector-users/browse_thread/thread/cb3e72c85dbdd398/82b18ffca0e3940a?#82b18ffca0e3940a for a bit more info. Jim On Tue, Sep 20, 2011 at 9:02 PM, Tyler Hobbs wrote: > From cassandra.yaml: > > # Frame size for thrift (maximum field length). > #

Re: Update of column sometimes takes 10 seconds

2011-09-26 Thread Jim Ancona
Do you actually see the update occur if you wait for 10 seconds (as your subject implies), or do you just see intermittent failures when running the unit test? If it's the latter, are you sure that the update has a greater timestamp than the insert? I've seen similar unit tests fail because because

cassandra-cli: Create column family with composite column name

2011-10-05 Thread Jim Ancona
Using Cassandra 0.8.6, I've been trying to figure out how to use the CLI to create column families using composite keys and column names. The documentation on CompositeType seems pretty skimpy. But in the course of writing this email to ask how to do it, I figured out the proper syntax. In the hope

Re: yet a couple more questions on composite columns

2012-02-04 Thread Jim Ancona
I've used "special" values which still comply with the Composite schema for the metadata columns, e.g. a column of 1970-01-01:{accountId} for a metadata column where the Composite is DateType:UTF8Type. Jim On Sat, Feb 4, 2012 at 2:13 PM, Yiming Sun wrote: > Thanks Andrey and Chris.  It sounds li

Re: yet a couple more questions on composite columns

2012-02-06 Thread Jim Ancona
names must conform to that. Jim > > > On Sat, Feb 4, 2012 at 6:24 PM, Jim Ancona wrote: >> >> I've used "special" values which still comply with the Composite >> schema for the metadata columns, e.g. a column of >> 1970-01-01:{accountId} for a m

Re: single row key continues to grow, should I be concerned?

2012-03-23 Thread Jim Ancona
I'm dealing with a similar issue, with an additional complication. We are collecting time series data, and the amount of data per time period varies greatly. We will collect and query event data by account, but the biggest account will accumulate about 10,000 times as much data per time period as t

Secondary Indexes, Quorum and Cluster Availability

2012-06-01 Thread Jim Ancona
Hi, We have an application with two code paths, one of which uses a secondary index query and the other, which doesn't. While testing node down scenarios in our cluster we got a result which surprised (and concerned) me, and I wanted to find out if the behavior we observed is expected. Background

Re: Cassandra 1.1.1 release?

2012-06-02 Thread Jim Ancona
The release vote is going on now on the dev list. So probably in the next day or two, assuming no problems pop up. Jim On Wed, May 30, 2012 at 1:29 PM, Roland Mechler wrote: > Anyone have a rough idea of when Cassandra 1.1.1 is likely to be released? > > -Roland > >

Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-05 Thread Jim Ancona
an no longer be selected by the partitioner. > > Cheers > > - > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 2/06/2012, at 5:15 AM, Jim Ancona wrote: > > Hi, > > We have an application with two code paths,

Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-06 Thread Jim Ancona
On Tue, Jun 5, 2012 at 4:30 PM, Jim Ancona wrote: > It might be a good idea for the documentation to reflect the tradeoffs > more clearly. Here's a proposed addition to the Secondary Index FAQ at http://wiki.apache.org/cassandra/SecondaryIndexes Q: How does choice of Consistency L

Re: Secondary Indexes, Quorum and Cluster Availability

2012-06-07 Thread Jim Ancona
www.thelastpickle.com > > On 7/06/2012, at 7:54 AM, Jim Ancona wrote: > > On Tue, Jun 5, 2012 at 4:30 PM, Jim Ancona wrote: > >> It might be a good idea for the documentation to reflect the tradeoffs >> more clearly. > > > Here's a proposed addition to the

Re: Cassandra error while processing message

2012-06-15 Thread Jim Ancona
It's hard to tell exactly what happened--are there other messages in your client log before the "All host pools marked down"? Also, how many nodes are there in your cluster? I suspect that the Thrift protocol error was (incorrectly) retried by Hector, leading to the "All host pools marked down", bu

Re: What determines the memory that used by key cache??

2012-06-18 Thread Jim Ancona
On Mon, Jun 18, 2012 at 8:53 AM, mich.hph wrote: > Dear all! > In my cluster, I found every key needs 192bytes in the key cache.So I want > to know what determines the memory that used by key cache. How to calculate > the value. > According to http://cassandra-user-incubator-apache-org.3065146.n

Re: vnodes ready for production ?

2013-06-19 Thread Jim Ancona
On Tue, Jun 18, 2013 at 4:04 AM, aaron morton wrote: >> Even more if we could automate some up-scale thanks to AWS alarms, It >> would be awesome. > > I saw a demo for Priam (https://github.com/Netflix/Priam) doing that at > netflix in March, not sure if it's public yet. > >> Are the vnodes featur

Re: is there a SSTAbleInput for Map/Reduce instead of ColumnFamily?

2013-09-06 Thread Jim Ancona
Unfortunately, Netflix doesn't seem to have released Aegisthus as open source. Jim On Fri, Aug 30, 2013 at 1:44 PM, Jeremiah D Jordan < jeremiah.jor...@gmail.com> wrote: > FYI: > http://techblog.netflix.com/2012/02/aegisthus-bulk-data-pipeline-out-of.html > > -Jeremiah > > On Aug 30, 2013, at 9

Internal error when using SimpleSnitch and dynamic_snitch: true

2011-01-17 Thread Jim Ancona
We accidently configured our cluster with SimpleSnitch (instead of PropertyFileSnitch) and dynamic_snitch: true. This is with version 0.7.0. We saw the errors below on get_slice and batch_mutate calls. The errors went away when we switched to PropertyFileSnitch. Should dynamic_snitch work with Si

Any plans to support key metadata?

2010-10-29 Thread Jim Ancona
In 0.7, Cassandra now supports column metadata CfDef.default_validation_class and ColumnDef.validation_class. Is there any plan to provide similar metadata for keys, at the key space or column family level? Jim

Re: Any plans to support key metadata?

2010-10-29 Thread Jim Ancona
On Fri, Oct 29, 2010 at 10:07 AM, Jim Ancona wrote: > In 0.7, Cassandra now supports column metadata > CfDef.default_validation_class and ColumnDef.validation_class. Is > there any plan to provide similar metadata for keys, at the key space > or column family level? Sorry to respo

Re: Could Not connect to cassandra-cli on windows

2010-11-09 Thread Jim Ancona
On Mon, Nov 8, 2010 at 8:31 PM, Alaa Zubaidi wrote: > Hi, > Failing to connect to cassandra client: on windows > > [defa...@unknown] connect localhost/9160 > Exception connecting to localhost/9160. Reason: Connection refused: connect. > > [defa...@unknown] connect xxx.xxx.x.xx/9160 > Syntax error