I've seen a lot of deployments, and I think you captured the scenarios and
reasoning quite well. You can apply other nuances and details to #2 (e.g.
segment based on SLA or topology), but I agree with all of your reasoning.
-Tupshin
-Global Field Strategy
-Datastax
On Jul 8, 2014 10:54 AM, "Jeremy
lint
>
> On Wed, Jul 2, 2014 at 6:04 PM, Clint Kelly wrote:
> > Hi Tupshin,
> >
> > Thanks for the quick reply. Is the performance concern from the
> > Hadoop integration needing to set up separate SELECT operations for
> > all of the unique vnode ranges?
> >
For performance reasons, you shouldn't enable vnodes on any Cassandra/DSE
datacenter that is doing hadoop analytics workloads. Other DCs in the
cluster can use vnodes.
-Tupshin
On Jul 2, 2014 5:50 PM, "Clint Kelly" wrote:
> Hi everyone,
>
> Apologies if this is the incorrect forum for a questio
When one node or DC is down, coordinator nodes being written through will
notice this fact and store hints (hinted handoff is the mechanism), and
those hints are used to send the data that was not able to be replicated
initially.
http://www.datastax.com/dev/blog/modern-hinted-handoff
-Tupshin
On
While Astyanax 2.0 is still beta, I think you will find it provides a very
good migration path from the 1.0 thrift based version to the 2.0 native
driver version. Well worth considering if you like the Astyanax API and
functionality. I know of multiple DataStax customers planning on using it.
-
Pull requests encouraged. :)
-Tupshin
On May 17, 2014 7:43 PM, "Kevin Burton" wrote:
> AH… looks like there's one in the Datastax java driver. Looks like it
> doesn't support everything but probably supports the features I need ;)
>
> So I'll just use that!
>
>
> On Sat, May 17, 2014 at 12:39 P
It's often an excellent strategy. No known issues.
-Tupshin
On May 16, 2014 4:13 PM, "Anand Somani" wrote:
> Hi,
>
> It seems like it should be possible to have a keyspace replicated only to
> a subset of DC's on a given cluster spanning across multiple DCs? Is there
> anything bad about this a
No there isn't, though I would like to see such a feature, albeit more
at the CQL partition layer rather than the collection layer. Anyway,
that is sometimes referred to as a capped collection in other dbs, and
you might find the history in this ticket interesting. It points to
ways to simulate the
That is a good approach. I have no better alternative to suggest.
-Tupshin
On Apr 17, 2014 10:36 AM, "abhinav chowdary"
wrote:
> We have one use case where we need to pull the entire wide row from
> Cassandra (using 2.0, DSE4.0 ). Is there a preferred way to do this? ,
> currently we are using
y level that fails to meet the specified check may not throw an
>> Exception, even if some replica nodes are not accessible.
>>
>>
>>
>> On Wed, Apr 16, 2014 at 2:00 PM, Tupshin Harper wrote:
>>
>>> No, but you do need a quorum of nodes.
>>>
buteReplication_c.html
>>
>>
>> On Wed, Apr 16, 2014 at 1:44 PM, Vivek Mishra wrote:
>>
>>> Hi,
>>> Mine is a simple case. Running on single node only. Keyspace is:
>>>
>>> create keyspace twitter with replication = {'class':
#x27;replication_factor' : 3}
>
> -Vivek
>
>
> On Wed, Apr 16, 2014 at 1:27 AM, Tupshin Harper wrote:
>
>> Please provide your keyspace definition, and the output of "nodetool
>> ring"
>>
>> -Tupshin
>> On Apr 15, 2014 3:52 PM, &
Please provide your keyspace definition, and the output of "nodetool ring"
-Tupshin
On Apr 15, 2014 3:52 PM, "Vivek Mishra" wrote:
> Hi,
> I am trying Cassandra light weight transaction support with Cassandra 2.0.4
>
> cqlsh:twitter> create table user(user_id text primary key, namef text);
> cq
It is not common, but I know of multiple organizations running with RF=5,
in at least one DC, for HA reasons.
-Tupshin
On Apr 15, 2014 2:36 PM, "Robert Coli" wrote:
> On Tue, Apr 15, 2014 at 6:14 AM, Ken Hancock wrote:
>
>> Keep in mind if you lose the wrong two, you can't satisfy quorum. In a
tl;dr make sure you have enough capacity in the event of node failure. For
light workloads, that can be fulfilled with nodes=rf.
-Tupshin
On Apr 14, 2014 2:35 PM, "Robert Coli" wrote:
> On Mon, Apr 14, 2014 at 2:25 AM, Markus Jais wrote:
>
>> "It is generally not recommended to set a replicatio
ey can
> handle the load but eventually hardware will fail.
>
> Markus
>
>
>
>
>
> Tupshin Harper schrieb am 13:44 Montag, 14.April
> 2014:
>
> I do not agree with this advice. It can be perfectly reasonable to have
> #nodes < 2*RF.
> It is common to d
I do not agree with this advice. It can be perfectly reasonable to have
#nodes < 2*RF.
It is common to deploy a 3 node cluster with RF=3 and it works fine as long
as each node can handle 100% of your data, and keep up with the workload.
-Tupshin
On Apr 14, 2014 5:25 AM, "Markus Jais" wrote:
>
Constant deletes and rewrites are a very poor pattern to use with
Cassandra. It would be better to write to a new row and partition every
minute and use a TTL to auto expire the old data.
-Tupshin
On Apr 6, 2014 2:55 PM, "Yulian Oifa" wrote:
> Hello
> I am having raw in which approximately 100 v
d RF=3 ? That would make sense,
> wouldn't it...
>
> That's what I'll do for production.
>
> Oleg
>
> On 2014-04-07 12:23:51 +, Tupshin Harper said:
>
> Your us-east datacenter, has RF=2, and 2 racks, which is the right way
>> to do it (I would
More details would be helpful (exact schema), method of inserting data,
etc) but you can try just doing dropping the indices and recreate them
after the import is finished.
-Tupshin
On Apr 7, 2014 8:53 AM, "Fasika Daksa" wrote:
> We are running different workload test on Cassandra and Redis fo
Your us-east datacenter, has RF=2, and 2 racks, which is the right way
to do it (I would rarely recommend using a different number of racks
than your RF). But by having three nodes on one rack (1b) and only one
on the other(1a), you are telling Cassandra to distribute the data so
that no two copies
Read the automatic paging portion of this post :
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
On Mar 17, 2014 8:09 PM, "Philip G" wrote:
> On Mon, Mar 17, 2014 at 4:54 PM, Robert Coli wrote:
>
>> The form of your question suggests you are Doing It Wrong, FWIW.
>>
>
It's the difference between reading from only the partitions that you are
interested, vs reading every single partition before filtering the
results. At scale, and assuming you don't actually need to read every
partition, there would be a huge difference.
If the model requires you to read every p
cussing a
> similar concept with a co-worker and going over the pros/cons of various
> approaches to realizing the goal. I'm still digging into Presto. I saw some
> people are working on support for cassandra in presto.
>
>
>
> On Wed, Mar 12, 2014 at 12:15 PM, Tupshin Harper
Peter,
I didn't specifically call it out, but the interface I just proposed in my
last email would be very much with the goal of "make writing complex
queries less painful and more efficient." by providing a deep integration
mechanism to host that code. It's very much a "enough rope to hang
ourse
I agree that we are way off the initial topic, but I think we are spot on
the most important topic. As seen in various tickets, including #6704 (wide
row scanners), #6167 (end-slice termination predicate), the existence
of intravert-ug (Cassandra interface to intravert), and a number of others,
the
OK, cool. I can think of no such reason.
-Tupshin
On Mar 11, 2014 10:27 AM, "Wayne Schroeder"
wrote:
> I think it will work just fine. I was just asking for opinions on if
> there was some reason it would not work that I was not thinking of.
>
> On Mar 10, 2014, at 4
And to be clear, and to elaborate, null is the default state for a
Cassandra cell if you don't write to it, so you can always create a row
with a null column by writing the row without that column being specified.
Additionally, cql's delete statement optionally takes a columns argument,
so if you
at the expense of using LWT.
>
> Wayne
>
>
> On Mar 10, 2014, at 3:52 PM, Tupshin Harper wrote:
>
> If you really need to rely on this behavior, you should probably do the
> whole write as a lightweight transaction, despite the additional overhead.
>
>
>
Take a 3 node cluster with RF=3, and QUORUM reads and writes. Consistency
is achieved by ensuring that at least two nodes acknowledge a write, and at
least two nodes have to participate in a read. As a result, you know that
at least one of the two nodes that you are reading from has received the
la
20, easily. Probably far more, but I lack data points beyond that.
-Tupshin
On Mar 9, 2014 10:26 AM, "Lu, Boying" wrote:
> Hi, experts,
>
>
>
> Since the Cassandra 2.x supports DB that across multiple DCs, my question
> is how many DCs can Cassandra support in practice?
>
>
>
> Thanks
>
>
>
>
The complete rewrite of counters in 2.1(which should address the counter
accuracy issues) will still have this limitation. Deleting and recreating
counters is not supported and will continue to not be supported.
-Tupshin
On Mar 1, 2014 5:13 PM, "Manoj Khangaonkar" wrote:
> The last time I check
ults from this query would come
> from a single replica node (or set of replica nodes, if the consistency
> level is greater than 1).
>
> Would you mind clarifying? Thanks a lot!
>
> Best regards,
> Clint
>
>
>
>
>
>
> On Wed, Feb 26, 2014 at 4:56 AM, Tupshin
For the first question, try "select * from system.peers"
http://www.datastax.com/documentation/cql/cql_using/use_query_system_c.html?pagename=docs&version=1.2&file=cql_cli/using/query_system_tables
For the second, there is a JMX and nodetool command, but I'm not aware of
any way to get it directl
ack and does not seem to be perfect solution.
>
>
> On Thu, Feb 27, 2014 at 4:49 PM, Tupshin Harper wrote:
>
>> If you can programmatically roll over onto a new column family every 6
>> hours (or every day or other reasonable increment), and then just drop your
>> exi
If you can programmatically roll over onto a new column family every 6
hours (or every day or other reasonable increment), and then just drop your
existing column family after all the columns would have been expired, you
could skip your compaction entirely. It was not clear to me from your
descript
This is a known issue that is fixed in 2.1beta1.
https://issues.apache.org/jira/browse/CASSANDRA-5202
Until 2.1, we do not recommend relying on the recycling of tables through
drop/create or truncate.
However, on a single node cluster, I suspect that truncate will work far
more reliably than drop
And one last clarification. Where I said "stored procedure" earlier, I
meant "prepared statement". Sorry for the confusion. Too much typing while
tired.
-Tupshin
On Tue, Feb 25, 2014 at 10:36 PM, Tupshin Harper wrote:
> I failed to address the matter of not knowing
rieve your families in nice sized batches
SELECT family FROM id WHERE key=0;
and then do the fan-out selects that I described previously.
-Tupshin
On Tue, Feb 25, 2014 at 10:15 PM, Tupshin Harper wrote:
> Hi Clint,
>
> What you are describing could actually be accomplished with the
Hi Clint,
What you are describing could actually be accomplished with the Thrift API
and a multiget_slice with a slicerange having a count of 1. Initially I was
thinking that this was an important feature gap between Thrift and CQL, and
was going to suggest that it should be implemented (possible
ere any way to vote for that to get picked up again? :)
>
> Best regards,
> Clint
>
>
>
>
>
> On Mon, Feb 24, 2014 at 2:32 PM, Tupshin Harper wrote:
>
>> Hi Clint,
>>
>> That does appear to be an omission in CQL3. It would be possible to
>> simulate it
at 5:32 PM, Tupshin Harper wrote:
> Hi Clint,
>
> That does appear to be an omission in CQL3. It would be possible to
> simulate it by doing
> BEGIN BATCH
> UPDATE foo SET z = 10 WHERE x = 'a' AND y = 1 IF t= 2 AND z=10;
> UPDATE foo SET t = 5,z=6 where
see how I could do this with what you outlined above---just
> curious. It seems like what I describe above under the hood would be
> a compare-and-(batch)-set on a single wide row, so it maybe is
> possible with the Thrift API (I have to check).
>
> Thanks again!
>
> Best
You can use OpsCenter in production with DSC/Apache Cassandra clusters.
Some features are only enabled with DSE, but the rest work fine with DSC.
-Tupshin
On Feb 22, 2014 11:20 PM, "user 01" wrote:
> I would be using nodetool & JConsole for monitoring. Though it would
> be less informative but I
#5633 was actually closed because the static columns feature (
https://issues.apache.org/jira/browse/CASSANDRA-6561) which has been
checked in to the 2.0 branch but is not yet part of a release (it will be
in 2.0.6).
That feature will let you update multiple rows within a single partition by
doin
While, historically, it has been true that queuing in Cassandra has been an
anti-pattern, it is also true that Leveled Compaction addresses the worst
aspect of frequent deletes in Cassandra, and that overall, queuing in
Cassandra is nowhere near the anti-pattern that it used to be. This is
somethin
Scrub is very likely to resolve it. Note that scrub will drop (and log)
invalid items, so pay attention to what it logs and plan on doing a repair
afterwards to pull in a copy from a different node, assuming RF>1
-Tupshin
On Feb 17, 2014 8:54 AM, "Oleg Dulin" wrote:
> Bumping this up -- anyth
You don't mention disks and RAM, but I would assume that the additional
data meant that you could now cache a lower percentage and that you have to
seek on disk more often.
-Tupshin
On Feb 10, 2014 4:14 PM, "Jiaan Zeng" wrote:
> Hi All,
>
> I am using Cassandra 1.2.4. I wonder if update operati
This is a known issue until Cassandra 2.1
https://issues.apache.org/jira/browse/CASSANDRA-5202
-Tupshin
On Feb 6, 2014 10:05 PM, "Robert Coli" wrote:
> On Thu, Feb 6, 2014 at 8:39 AM, Ondřej Černoš wrote:
>
>> Update: I dropped the keyspace, the system keyspace, deleted all the data
>> and sta
ssion tracking ids?
> What’d that be for?
>
> - Drew
>
> On Jan 21, 2014, at 10:48 AM, Tupshin Harper wrote:
>
> It does sound right.
>
> You might want to have additional session tracking id's, separate from
> the user id, but that is an additional implementation d
that sound right to you?
>
> - Drew
>
>
>
> On Jan 21, 2014, at 10:01 AM, Tupshin Harper wrote:
>
> One CQL row per user, keyed off of the UUID.
>
> Another table keyed off of email, with another column containing the UUID
> for lookups in the first table. Only regi
One CQL row per user, keyed off of the UUID.
Another table keyed off of email, with another column containing the UUID
for lookups in the first table. Only registration will require a
lightweight transaction, and only for the purpose of avoiding duplicate
email registration race conditions.
-Tup
This should be the doc you are looking for.
http://www.datastax.com/documentation/cassandra/2.0/webhelp/index.html#cassandra/operations/ops_add_dc_to_cluster_t.html
-Tupshin
On Jan 21, 2014 2:14 AM, "Lu, Boying" wrote:
> Hi, All,
>
>
>
> I’m new to Cassandra. I want to know how to add a DC to
ns are sequential and from
> the same thread and with Consistency ALL,
> the write should not return until all replicas have committed. So I am
> expecting all replicas to have the same value, when the next read happens.
> Not true ??
>
> regards
>
>
> On Fri, Jan 10, 201
It is bad because of the risk of concurrent modifications. If you don't
have some kind of global lock on the document/row, then 2 readers might
read version A, reader 1 writes version B based on A, and reader 2 writes
version C based on A, overwriting the changes in B. This is *inherent* to
the not
Yes this is pretty close to the ultimate anti-pattern in Cassandra.
Whenever possible, we encourage models where your updates are idempotent,
and not dependent on a read before write. Manoj is looking for what is
essentially strong ordering in a distributed system, which always has
inherent trade-o
That is a fine option and can make perfect sense if you have keyspaces with
very different runtime characteristics.
-Tupshin
On Jan 7, 2014 7:30 AM, "Robert Wille" wrote:
> I’d like to have my keyspaces on different volumes, so that some can be on
> SSD and others on spinning disk. Is such a thi
This is a generally good interpretation of the state of vnodes with respect
to Cassandra versions 1.2.12 and 1.2.13.
Adding a new datacenter to a 1.2.12 cluster at your scale should be fine. I
consider vnodes fit for production at almost any scale after 1.2.13, or 50
nodes or less (ballpark) for 1
OK. Given the correction of my unfortunate partitioner error, you can, and
probably should, upgrade in place to 1.2, but with num_tokens=1 so it will
initially behave like 1.1 non vnodes would. Then you can do a rolling
conversion to more than one vnode per node, and once complete, shuffle your
vn
uster 1.2, you can backup your data and then use sstableloader (in this
>> case, you will not have to modify the timestamp as I did for the migration
>> from relational to Cassandra).
>> >
>> > Hope that helps !!
>> >
>> > Jean Armel
>> >
>> >
No. This is not going to work. The vnodes feature requires the murmur3
partitioner which was introduced with Cassandra 1.2.
Since you are currently using 1.1, you must be using the random
partitioner, which is not compatible with vnodes.
Because the partitioner determines the physical layout of
Increasing the phi value to 12 can be a partial workaround. It's certainly
not a fix, but it does partially alleviate the issue. Otherwise hang in
there until 1.2.12. Aaron is probably right that this is aggravated on
under powered nodes, but larger nodes can still see these symptoms.
-Tupshin
On
It's conceivable that one of the faster USB 3.0 sticks would be sufficient
for this. I wouldn't exactly call it an "enterprise" configuration, but
it's worth considering. Keep in mind that if you are comfortable using your
RF for durability, you can turn off durable_writes on your keyspace and not
There is potentially a DSE specific issue that you are running into and you
should probably contact Datastax support to confirm. Also, keep in mind
that Cassandra does recycle it's commitlog files instead of deleting and
recreating them, so you shouldn't expect them to disappear even when the
node
What is in your Cassandra log right before and after that freeze?
-Tupshin
On Mar 20, 2013 8:06 AM, "Joel Samuelsson"
wrote:
> Hello,
>
> I've been trying to load test a one node cassandra cluster. When I add
> lots of data, the Cassandra node freezes for 4-5 minutes during which
> neither reads
Unless I'm misreading the git history, the stack trace you referenced isn't
from 1.1.2. In particular, the "writeHintForMutation" method in
StorageProxy.java wasn't added to the codebase until September 9th (
https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commitdiff;h=b38ca2879cf1cbf5de1
Rules that apply:
2 - guaranteed access
3 - treatment of nulls (though different than an rdbms due to the inherent
sparse nature of rows)
4 - online catalog (not really true until Cassandra 1.2 and CQL 3
5 - comprehensive data sub language (only if you remove the word relational)
6 - view updating
What consistency level are you writing with? If you were writing with ANY,
try writing with a higher consistency level.
-Tupshin
On Nov 18, 2012 9:05 PM, "Chuan-Heng Hsiao"
wrote:
> Hi Aaron,
>
> Thank you very much for the replying.
>
> The 700 CFs were created in the beginning (before any inse
Yes. Any fuse filesystem is going to be substantially slower than a massive
one like ext4.
-Tupshin
On Oct 30, 2012 2:09 PM, "Brian Tarbox" wrote:
> I got some new ubuntu servers to add to my cluster and found that the file
> system is "fuseblk" which really means NTFS.
>
> All else being equal
I would generally recommend 1 drive for OS and commit log and 3 drive raid
0 for data. The raid does give you good performance benefit, and it can be
convenient to have the OS on a side drive for configuration ease and better
MTBF.
-Tupshin
On Oct 29, 2012 8:56 PM, "Ran User" wrote:
> I was hopi
Once you have created the CF from cqlsh, switch over to cassandra-cli
and run "describe schema". It will show you the schema for all your
column families in syntax that can be passed back into cassandra-cli
to create them.
The cassandr-cli syntax that you are looking for is probably the "and
colum
Any chance your server has been running for the last two weeks with the
leap second bug?
http://www.datastax.com/dev/blog/linux-cassandra-and-saturdays-leap-second-problem
-Tupshin
On Jul 12, 2012 1:43 PM, "Leonid Ilyevsky"
wrote:
> I am loading a large set of data into a CF with composite key.
Speaking from practical experience, it is possible to simulate this feature
by retrieving a slice of your row that only contains the most recent 100
items. You can then prevent the rows from growing out of control by checking
the size of the row and pruning it back to 100 every N writes, where N is
Generate a timeuuid for each post based on the original timestamp.
-Tupshin
On May 29, 2010 7:50 PM, "Erik" wrote:
Hi,
I have a list of posts I'm trying to insert into Cassandra. Each post has a
timestamp already (in the past) that is not necessarily unique. I'm trying
to
insert these posts
On 4/13/2010 3:39 PM, Christian Torres wrote:
Maybe some other people have asked this but I'm testing php to use
with cassandra and the error the example here gave me was:
[link] http://wiki.apache.org/cassandra/ThriftExamples03
*Bad type in structure.*
I think is because I'm using the new ve
75 matches
Mail list logo