Why so?
What are pluses and minuses?
As for me, I am looking for number of files in directory.
700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
700GB/5MB*5 = 70 files, that is too much for single directory, too much
memory used for SST data, too huge compaction queue (that le
Hi All,
I'm organising the NoSQL and Big Data track at Developer Day Dundee:
http://dun.dddscotland.co.uk/
This is free mini conference at Dundee University, Dundee Scotland. For the
past 2 years we've had a track on NoSQL and had some great speakers.
However I don't believe we've had anyone f
Hi
Actually problem is that while we move the token in a 12 node cluster we
observe cassandra misses (no data as per cassandra for requested row key). As
per our understanding we expect that when we move token then that node will
first sync up the data as per the new assigned token & only after
Hi!
I ran
UPDATE COLUMN FAMILY cf_name WITH
compression_options={sstable_compression:SnappyCompressor,
chunk_length_kb:64};
I then ran on all my nodes (3)
sudo nodetool -h localhost scrub tok cf_name
I have replication factor 3. The size of the data on disk was cut in half
in the first node and i
Are there any tested patches around for fixing this issue in 1.0 branch?
I have to do keyspace wide flush every 30 seconds to survive delete-only
workload. This is very inefficient.
https://issues.apache.org/jira/browse/CASSANDRA-3741
Hi Guys
We've noticed a strange behavior on our 3-nodes staging Cassandra cluster
with RF=2 and LeveledCompactionStrategy. When we run "nodetool repair
-pr" on a node, the other nodes start "validation"
process and when this process is finished one of the other 2 nodes reports
that there are app
On Sun, Sep 23, 2012 at 11:30 PM, aaron morton wrote:
> If this is intended behavior, could somebody please point me to where this
> is
> documented?
>
> It is intended.
It is not in fact. We should either refuse the query as "yet
unsupported" or we should do the right thing, but returning nothin
On Thu, Sep 20, 2012 at 10:13:49AM +1200, aaron morton wrote:
> No.
> They use different minor file versions which are not backwards compatible.
Thanks Aaron.
Is upgradesstables capable of downgrading the files to 1.0.8?
Looking for a way to make this work.
Regards,
Arend-Jan
> On 18/09/2012
The Cassandra team is pleased to announce the release of the first beta for
the future Apache Cassandra 1.2.0.
Let me first stress that this is beta software and as such is *not* ready for
production use.
The goal of this release is to give a preview of what will become Cassandra
1.2 and to get w
2012/9/23 Hiller, Dean
> You need to split data among partitions or your query won't scale as more
> and more data is added to table. Having the partition means you are
> querying a lot less rows.
>
This will happen in case I can query just one partition. But if I need to
query things in multipl
Repair process by itself is going well in a background, but the issue
I'm concerned is a lot of unnecessary compaction tasks
number in compaction tasks counter is over estimated. For example i have
1100 tasks left and if I will stop inserting data, all tasks will finish
within 30 minutes.
I
I am confused. In this email you say you want "get all requests for a user"
and in a previous one you said "Select all the users which has new requests,
since date D" so let me answer both…
For latter, you make ONE query into the latest partition(ONE partition) of the
GlobalRequestsCF which gi
2012/9/24 Hiller, Dean
> I am confused. In this email you say you want "get all requests for a
> user" and in a previous one you said "Select all the users which has new
> requests, since date D" so let me answer both…
>
I have both needs. These are the two queries I need to perform on the mode
Hi folks,
I looked at my mail below, and Im rambling a bit, so Ill try to re-state my
queries pointwise.
a) what are the performance tradeoffs on reads & writes between creating a
standard column family and manually doing the counts by a lookup on a key,
versus using counters.
b) whats the
unsubscribe
Oh, ok, you were talking about the wide row pattern, right?
yes
But playORM is compatible with Aaron's model, isn't it?
Not yet, PlayOrm supports partitioning one table multiple ways as it indexes
the columns(in your case, the userid FK column and the time column)
Can I map exactly this using
IMO
You would use Cassandra Counters (or other variation of distributed
counting) in case of having determined that a centralized version of
counting is not going to work.
You'd determine the non_feasibility of centralized counting by figuring the
speed at which you need to sustain writes and rea
Dean,
There is one last thing I would like to ask about playOrm by this list,
the next questiosn will come by stackOverflow. Just because of the context,
I prefer asking this here:
When you say playOrm indexes a table (which would be a CF behind the
scenes), what do you mean? PlayOrm will
PlayOrm will automatically create a CF to index my CF?
It creates 3 CF's for all indices, IntegerIndice, DecimalIndice, and
StringIndice such that the ad-hoc tool that is in development can display the
indices as it knows the prefix of the composite column name is of Integer,
Decimal or String
On Fri, Sep 14, 2012 at 7:05 AM, Xu, Zaili wrote:
> I am pretty new to Cassandra. I have a script that needs to set up a schema
> first before starting up the cassandra node. Is this possible ? Can I create
> the schema directly on cassandra storage and then when the node starts up it
> will pick
Dean, this sounds like magic :D
I don't know details about the performance on the index implementations you
chose, but it would pay the way to use it in my case, as I don't need the
best performance in the world when reading, but I need to assure
scalability and have a simple model to maintain. I l
Is there anything I can do on the configuration side to prevent nodes from
going OOM due to queries that will read large amounts of data and exceed the
heap available?
For the past few days of we had some nodes consistently freezing/crashing with
OOM. We got a heap dump into MAT and figured ou
Hello,
We are running into an unusual situation that I'm wondering if anyone has any
insight on. We've been running a Cassandra cluster for some time, with
compression enabled on one column family in which text documents are stored.
We enabled compression on the column family, utilizing the S
I forgot to mention we are running Cassandra 1.1.2.
Thanks,
-Mike
On Sep 24, 2012, at 5:00 PM, Michael Theroux wrote:
> Hello,
>
> We are running into an unusual situation that I'm wondering if anyone has any
> insight on. We've been running a Cassandra cluster for some time, with
> compres
Suppose two cases:
1. I have a Cassandra column family with non-composite row keys =
incremental id
2. I have a Cassandra column family with a composite row keys =
incremental id 1 : group id
Which one will be faster to insert? And which one will be faster to
read by incremental
Hey...
>From my understanding, there are several ways to use composites
with SSTableSimpleUnsortedWriter but which is the best?
And as usual, code examples are welcome ;)
Thanks in advance!
On Thu, Sep 20, 2012 at 11:23 PM, Edward Kibardin wrote:
> Hi Everyone,
>
> I'm writing a conversion too
On Mon, Sep 24, 2012 at 10:02 AM, Віталій Тимчишин wrote:
> Why so?
> What are pluses and minuses?
> As for me, I am looking for number of files in directory.
> 700GB/512MB*5(files per SST) = 7000 files, that is OK from my view.
> 700GB/5MB*5 = 70 files, that is too much for single directory,
Haha Ok.
It is not a total waste, but practically your time is better spent in other
places. The problem is just about everything is a moving target, schema,
request rate, hardware. Generally tuning nudges a couple variables in one
direction or the other and you see some decent returns. But each nu
Thanks,
Akmal
Can you contribute your experience to this ticket
https://issues.apache.org/jira/browse/CASSANDRA-4670 ?
Thanks
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com
On 24/09/2012, at 6:22 AM, Michael Theroux wrote:
> Hello,
>
> We have been noticing
If you are using ext3 there is a hard limit on number if files in a
directory of 32K. EXT4 as a much higher limit (cant remember exactly
IIRC). So true that having many files is not a problem for the file
system though your VFS cache could be less efficient since you would
have a higher inode->dat
> What exactly is the problem with big rows?
During compaction the row will be passed through a slower two pass processing,
this add's to IO pressure.
Counting big rows requires that the entire row be read.
Repairing big rows requires that the entire row be repaired.
I generally avoid rows abo
You are going to need a fully optimized flux-capacitor for that.
On Tue, Sep 25, 2012 at 5:00 AM, Michael Theroux wrote:
> Hello,
>
> We are running into an unusual situation that I'm wondering if anyone has
> any insight on. We've been running a Cassandra cluster for some time, with
> compressi
> It is not a total waste, but practically your time is better spent in other
> places. The problem is just about everything is a moving target, schema,
> request rate, hardware. Generally tuning nudges a couple variables in one
> direction or the other and you see some decent returns. But each nud
Hi Manu,
Glad that you have the issue resolved.
If i understand the issue correctly
Your cassandra installation had RandomParitioner but the bulk loader
configuration (cassandra.yaml) had Murmur3Partitioner?
By fixing the cassandra.yaml for the bulk loader the issue got resolved?
If not then
Thanks Milind,
Has anyone implemented counting in a standard col family in cassandra, when you
can have increments and decrements to the count. Any comparisons in performance
to using counter column families?
Regards,Roshni
Date: Mon, 24 Sep 2012 11:02:51 -0700
Subject: RE: Cassandra Counters
I had Murmur3Partitioner for both of them, otherwise bulk loader would have
complained since I put them under the same project. I saw some negative
token issues of Murmur3Partitioner on JIRA recently so I moved back to
RandomPartitioner.
Thanks for your concern
On Tue, Sep 25, 2012 at 12:49 PM,
Maybe I'm missing the point, but counting in a standard column family would
be a little overkill.
I assume that "distributed counting" here was more of a map/reduce
approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
lot. We're doing some more complex counting (e.q. based on
Hi Radim
Unfortunately number of compaction tasks is not overestimated. The number
is decremented one-by-one and this process takes several hours for our 40GB
node(( Also, when a lot of compaction tasks appears, we see that total disk
space used (via JMX) is doubled and Cassandra really tries to c
Thanks for the reply and sorry for being bull - headed.
Once you're past the stage where you've decided its distributed, and NoSQL and
cassandra out of all the NoSQL options,Now to count something, you can do it in
different ways in cassandra. In all the ways you want to use cassandra's best
f
41 matches
Mail list logo