> I believe, that the DateTieredCompactionStrategy would work for PRIMARY
> KEY (timeblock, timestamp) -- but does it also work for PRIMARY KEY
> (timeblock, timestamp, hash) ?
Yes.
(sure you don't want to be using a timeuuid instead?)
~mck
> Here "partition" is a random digit from 0 to (N*M)
> where N=nodes in cluster, and M=arbitrary number.
Hopefully it was obvious, but here (unless you've got hot partitions),
you don't need N.
~mck
y binary serialisation. We've learnt the hard way the
value of data transparency, and i'm guessing the storage cost is small
given c* compression.
Otherwise the advice here is largely repeating what Jens has already
said.
~mck
¹ slide 19+20 from
https://prezi.com/vt98oob9fvo4/cassandra-summit-cassandra-and-hadoop-at-finnno/
> Can I get data owned by a particular node and this way generate sum
> on different nodes by iterating over data from virtual nodes and later
> generate total sum by doing sum of data from all virtual nodes.
>
You're pretty much describing a map/reduce job using CqlInputFormat.
Any errors in your log file?
We saw something similar when bootstrap crashed when rebuilding
secondary indexes.
See CASSANDRA-8798
~mck
his is one of the videos where I recall an off-hand mention of the Spark
> connector working with vnodes:
> https://www.youtube.com/watch?v=1NtnrdIUlg0
Thanks.
~mck
eos that I watched discussed how the Cassandra Spark
> connecter has
> optimizations to deal with vnodes.
Are these videos public? if so got any link to them?
~mck
serves requests to web applications that
> need low latency.
Let it be said this isn't something i'd recommend, just the path we had
to take because of our small initial dedicated-HW cluster.
(You really want to separate online and offline datacenters, so that you
can maximise the offline clusters for the heavy batch reads).
~mck
always been our 'big data' platform,
hadoop/spark is just an extra tool on top.
We've never kept data in hdfs and are very grateful for having made that
choice.
~mck
ref
https://prezi.com/vt98oob9fvo4/cassandra-summit-cassandra-and-hadoop-at-finnno/
reasonable number of connections.
We do this, using code similar to this patch
https://github.com/michaelsembwever/cassandra/pull/2/files
~mck
¹ https://issues.apache.org/jira/browse/CASSANDRA-8358
> However I guess it can be easily changed ?
that's correct.
NetworkTopogolyStrategy gives you a better horizon and more flexibility
as you scale out, at least once you've gone past small cluster problems
like wanting RF=3 in a 4 node two dc cluster.
IMO I'd go with "DC:1,DC2:1".
~mck
> Are you using Leveled compaction strategy?
And if you're using Date Tiered compaction strategy on a table that
isn't time-series data, for example deletes happen, you find it
compacting over and over.
~mck
t data directly from Cassandra. See CqlInputFormat.
~mck
being able to replace HDFS with
Cassandra, but i don't think it's alive anymore.
~mck
to be
presented in cassandra.yaml?)
~mck
HSHA, particularly for our
offline (hadoop/spark) nodes.
Sorry i don't have the data anymore to support that statement, although
i can say that improvement paled in comparison to cross_node_timeout
which we enabled shortly afterwards.
~mck
: 56713727820156410577229101238628035242
node2: 113427455640312821154458202477256070484
If it is the former there's some important documentation missing.
~mck
ps CASSANDRA-1006 seems to be of some relation.
al 1.1 TB 33.33%
Token(bytes[76118303760208547436305468318170713656])
~mck
ntil you attach it to an issue. (I think a new issue is appropriate
here).
~mck
asses, or to something else?
~mck
On Tue, 2011-05-03 at 14:22 -0500, Jonathan Ellis wrote:
> Can you create a ticket?
CASSANDRA-2598
i can also reproduce the problem with hadoop and
ColumnFamilyOutputFormat.
Turning off snapshot_before_compaction seems to be enough to prevent
it.
~mck
On Tue, 2011-05-03 at 16:52 +0200, Mck wrote:
> Running a 3 node cluster with cassandra-0.8.0-beta1
>
> I'm seeing the first node logging many (thousands) times
Only "special" thing about this first node is it receives all the writes
from our sybase->cassandra import
r all column families (including system).
It happens a lot during startup.
The hardlinks do exist. Stopping, deleting the hardlinks, and starting
again does not help.
But i haven't seen it once on the other nodes...
~mck
ps the stacktrace
java.io.IOError: java.io.IOException: Unable to c
On Tue, 2011-04-26 at 12:53 +0100, Stephen Connolly wrote:
> (or did you want 20million unneeded deps for the
> client jars?)
Yes that's a good reason :-)
If there anything i can help with?
Will beta versions be available under releases repository?
~mck
On Fri, 2011-04-22 at 16:49 -0500, Eric Evans wrote:
> I am pleased to announce the release of Apache Cassandra 0.8.0 beta1.
*Truly Awesome!*
CQL rocks in so many ways.
Is 0.8.0-beta1 available in apache's maven repository?
And if not, why not?
~mck
ake that over a billion .clone(..) calls... :-(
byte[] copies are relatively quick and cheap, still i am seeing a
performance degradation in m/r reduce performance with cloning of keys.
It's not that you don't have my vote here, i'm just stating my
uncertainty on what the correct API should be.
~mck
signature.asc
Description: This is a digitally signed message part
jobs
(millions of records) and the performance impact here.
The key isn't the only potential live byte[]. You also have names and
values in all the columns (and supercolumns) for all the mutations.
~mck
On Wed, 2011-01-12 at 14:21 -0800, Ryan King wrote:
> What consistency level did you use to write the
> data?
R=1,W=1 (reads happen a long time afterwards).
~mck
--
"It is now quite lawful for a Catholic woman to avoid pregnancy by a
resort to mathematics, though she is still f
On Wed, 2011-01-12 at 23:04 +0100, mck wrote:
> > Caused by: TimedOutException()
>
> What is the exception in the cassandra logs?
Or tried increasing rpc_timeout_in_ms?
~mck
--
"When there is no enemy within, the enemies outside can't hurt you."
African p
> You're using an ordered partitioner and your nodes are evenly spread
> around the ring, but your data probably isn't evenly distributed.
This load number seems equals to `du -hs ` and
since i've got N == RF shouldn't the data size always be the same on
every node?
~
On Wed, 2011-01-12 at 18:40 +, Jairam Chandar wrote:
> Caused by: TimedOutException()
What is the exception in the cassandra logs?
~mck
--
"Don't use Outlook. Outlook is really just a security hole with a small
e-mail client attached to it." Brian Trosko | www.semb.wever.
on the first node (regardless
of the cf they belong to).
"cleanup" didn't help. "compact" only took away 2GB. Otherwise there is a lot
here i don't understand.
~mck
--
"The turtle only makes progress when it's neck is stuck out" Rollo May |
www.s
> Is this a bug or feature or a misuse?
i can confirm this bug.
on a 3 node cluster testing environment with RF 3.
(and no issue exists for it AFAIK).
~mck
--
"Simplicity is the ultimate sophistication" Leonardo Da Vinci's (William
of Ockham)
| www.semb.wever.
On Thu, 2010-12-30 at 08:03 -0600, Jonathan Ellis wrote:
> We don't have any explicit code for enabling that, no.
https://issues.apache.org/jira/browse/CASSANDRA-1921
the patch was simple (NodeCmd and NodeProbe). just testing it now...
~mck
--
"I'm not one of those who t
word.file" to nodetool doesn't
help...
Is there any support for nodetool to connect to a password authenticated jmx
service?
~mck
--
"There are only two ways to live your life. One is as though nothing is
a miracle. The other is as if everything is." Albert Einstein
|
issues.apache.org/jira/browse/CASSANDRA-1774
~mck
signature.asc
Description: This is a digitally signed message part
add at line 132
> results.add(getMutation(key, sum));
> +results.add(getMutation(new Text("doubled"), sum*2));
Only the last mutation for any key seems to be written.
~mck
--
echo '[q]sa[ln0=aln256%
Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
| www.sem
39 matches
Mail list logo