Sorry, something is wrong with my previous problem description. The fact is
that the cassandra deny my requests when I try to insert 50k rows (rather
than 50k columns) into a column family at one time. Each row with 1 column.
2012/8/10 Jin Lei
> Hello everyone,
> I'm a novice to cassandra and me
Thanks Aaron for your reply,
creating vector for raw data is good work around for decreasing disk space, but
I am not still clear tracking time for nodes, say if we want a query like give
me the list of nodes for a cluster between this period of time then how do we
get that information? do we sc
We've had a bug that caused one of our column families to grow very big
280 GB on a 500 GB disk. We're using size tiered compaction.
Since it's "only append" data I've now issued deletes of 260 GB of
superflous data.
1. There are som quite large SSTables (80 GB, 40 GB etc..). If I run a
major
Hi all,
I'm trying to build a Cascading tap for Cassandra. Cascading is a
layer on top of Hadoop. For this purpose I use ColumnFamilyInputFormat and
ColumnFamilyRecordReader from Cassandra.
I ran into a problem that the record reader would create an endless
iterator because something goes wrong w
Hi,
Has anyone of you made some experience with software raid (raid 1,
mirroring 2 disks)?
Our workload is rather read based at the moment (Commit Log directory only
grows by 128MB every 2-3 minutes), while the second hd is under high load
due to the read requests to our cassandra cluster.
I was
I was thinking about putting both the commit log and the data
directory on a software raid partition spanning over the two disks.
Would this increase the general read performance? In theory I could
get twice the read performance, but I don't know how the commit log
will influence the read per
In using CQL (the python library, at least), I didn't see a way to pass in
multiple nodes as hosts. With other libraries (like Hector and Pycassa) I
can set multiple hosts and my app will work with anyone on that list. Is
there something similar going on in the background with CQL?
If not, then
There are many YCSB forks on github that get optimized for specific
databases but the default one is decent across the defaults. Cassandra
has it's own internal stress tool that we like better.
The short comings are that generic tools and generic workloads are
generic and thus not real-world. But
I agree with Edward. We always develop our own stress tool that tests each
use case of interest. Every use case is different in certain ways that can
only be tested using custom stress tool.
On Fri, Aug 10, 2012 at 7:25 AM, Edward Capriolo wrote:
> There are many YCSB forks on github that get opt
You need to track node membership separately. I do that in a SQL
database, but you can use cassandra for that. For example:
rowkey = cluster name
column name Composite[ :] = [join|leave]
Then every time a node joins or leaves a cluster, write an entry.
Then you can just read the row (ordered b
On 3 August 2012 21:31, Data Craftsman 木匠 wrote:
>
> Nobody use Leveled Compaction with CQL 3.0 ?
I tried this, and I can't get it to work either.
I'm using:
[cqlsh 2.2.0 | Cassandra 1.1.2 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
Here's what my create table looks like:
CREATE TABLE php_
Hi all
Just replaced ( clean install ) version 1.0.9 with 1.1.3 - two node
amazon cluster. After yaml modification and starting both nodes - they
do not see each other:
Note: Ownership information does not include topology, please specify
a keyspace.
Address DC Rack
Do both nodes refer to one another as seeds in cassandra.yaml?
On Fri, Aug 10, 2012 at 1:46 PM, Dwight Smith
wrote:
> Hi all
>
> ** **
>
> Just replaced ( clean install ) version 1.0.9 with 1.1.3 – two node amazon
> cluster. After yaml modification and starting both nodes – they do not see
>
Yes - BUT they are the node hostnames and not the ip addresses
From: Derek Barnes [mailto:sj.clim...@gmail.com]
Sent: Friday, August 10, 2012 2:00 PM
To: user@cassandra.apache.org
Subject: Re: Problem with version 1.1.3
Do both nodes refer to one another as seeds in cassandra.yaml?
On Fri
Derek
I added both node hostnames to the seeds and it now has the correct
nodetool ring:
Address DC RackStatus State Load
OwnsToken
85070591730234615865843651857942052863
10.168.87.107 datacenter1 rack1 Up Normal 13.5 KB
50.00%
I want to know it too.
http://www.datastax.com/support-forums/topic/when-will-pycassa-support-cql
Connection pool and load balance is a necessary feature for multi-user
production application.
Thanks,
Charlie | DBA
On Fri, Aug 10, 2012 at 6:47 AM, David McNelis wrote:
> In using CQL (the pytho
Further info - it seems I had the seeds list backwards - it did not need
both nodes - I have corrected that with each pointing to the other as a
single seed entry - and it works fine.
Thanks again for the quick response.
From: Dwight Smith [mailto:dwight.sm...@genesyslab.com]
Sent: Friday,
** 3. In my test below, I see there is now 8Gig of data and 9,000,000 rows.
Does that sound right?, nearly 1MB of space is used per row for a 50 column
row That sounds like a huge amount of overhead. (my values are long on
every column, but that is still not much). I was expecting KB
Ignore the third one, my math was badŠworked out to 733 bytes / row and it
ended up being 6.6 gig as it compacted it some after it was done when the
load was light(noticed that a bit later)
But what about the other two? Is that the time is expected approximately?
Thanks,
Dean
On 8/10/12 3:50 PM
Curious, but does cassandra store the rowkey along with every
column/value pair on disk (pre-compaction) like Hbase does? If so
(which makes the most sense), I assume that's something that is
optimized during compaction?
--
Aaron Turner
http://synfin.net/ Twitter: @synfinatic
http://tcp
Rowkey is stored only once in any sstable file.
That is, in the spesial case where you get sstable file per column/value, you
are correct, but normally, I guess most of us are storing more per key.
Regards,
Terje
On 11 Aug 2012, at 10:34, Aaron Turner wrote:
> Curious, but does cassandra stor
Thanks Edward and Mohit.
We do have an in house tool, but that tests pretty much the same thing as
YCSB- read , write performance given a number of threads & type of operations
as an input.
The good thing here is that we own the code and we can modify it easily. YCSB
does not seem to be ve
22 matches
Mail list logo