Hi,
is there a way to fetch all row, where the value of one specific column has a
entry? And when yes, is this supported by CQL?
In normal SQL the statement would call like "SELECT key FROM table WHERE column
IS NOT NULL;"
I searched the CQL pages and CLI pages on Datastax.com and found nothin
Hi all,
I'm pleased to announce our next Cassandra meetup on 5th September in
London.
http://www.meetup.com/Cassandra-London/events/29668191/
We will be looking at failure modes in Cassandra (how it deals with nodes
failing and returning etc..) as well as a comparison with HBase. It's a
great o
but it seems the row cache is cluster wide, how will the change of row
cache affect the read speed?
On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis wrote:
> Or leave row cache enabled but disable cache saving (and remove the
> one already on disk).
>
> On Sun, Aug 14, 2011 at 5:05 PM, aaron mor
I saw alot slicequeryfilter things if changed the log level to DEBUG. just
thought even bring up a new node will be faster than start the old one.
it is wired
DEBUG [main] 2011-08-16 06:32:49,213 SliceQueryFilter.java (line 123)
collecting 0 of 2147483647: 76616c7565:false:225@13130688454743
Hi,
I am using Cassandra-0.8.4 and Hadoop-0.20.2.
Has this "bulk-loading" supported from Hadoop-API?
Regards,
Thamizhannal P
--- On Wed, 10/8/11, Jonathan Ellis wrote:
From: Jonathan Ellis
Subject: Re: Reg loading of hdfs file to Cassandra
To: user@cassandra.apache.org
Date: Wednesday, 10
How can I tell what's causing dropped messages?
Is it just too much activity? I'm not getting any other, more specific
messages, just these:
WARN [ScheduledTasks:1] 2011-08-15 11:33:26,136 MessagingService.java (line
504) Dropped 1534 MUTATION messages in the last 5000ms
WARN [ScheduledTasks:1] 2
I am wondering about a certain volume situation.
I currently load a Keyspace with a certain amount of SCFs.
Each SCF (Super Column Family) represents an entity.
Each Entity may have up to 6000 values.
I am planning to have 500,000 Entities (SCF) with
6000 Columns (within Super Columns - numbe
If you look at the source code and you will find there is no log message in
the ColumnFamilyOutputFormat class and the related classes.
How to trace the problem then? No one actually got this working?
On Thu, Aug 11, 2011 at 6:10 PM, aaron morton wrote:
> Turn the logging up in cassandra or your
Hello, what are the guarantees regarding truncates issued through the CLI ?
I have a 3 node ring at RF=3. No writes going to the keyspace at issue here.
I go to the CLI on one of the nodes and issue a truncate on all CF of the
keyspace. I run a list [CF] and make sure there is no data.
When I run
I think I've dropped all the indexes on a CF, but I see traces of them in
the CLI output of show keyspaces. I see a few validators left behind, and
one "built index". (output below)
1. Is there a better way to check schema for indexes?
2. I can't drop the "built" one so I assume they're all gone
Teijo,
Unfortunately my data set really does grow because it s a time series. I'm
going to add a trick to aggregate old data but it will still grow.
How often do you repair per day (or is it really continuous ?)
I've been running experiments and I wonder if your decision to perform
continuous rep
https://issues.apache.org/jira/browse/CASSANDRA-2950
On Tue, Aug 16, 2011 at 8:45 AM, Philippe wrote:
> Hello, what are the guarantees regarding truncates issued through the CLI ?
> I have a 3 node ring at RF=3. No writes going to the keyspace at issue here.
> I go to the CLI on one of the nodes
The title and the comments describe a node restarting. This is not my case.
could it still be the same thing ?
As always, I appreciate the quick answers you guys provide !
2011/8/16 Jonathan Ellis
> https://issues.apache.org/jira/browse/CASSANDRA-2950
>
> On Tue, Aug 16, 2011 at 8:45 AM, Philip
http://wiki.apache.org/cassandra/FAQ#dropped_messages
As to what's causing them - look in the logs and it will do the equivalent of a
nodetool tpstats right after the dropped messages messages. That should give
you a clue as to why there are dropped messages - which thread pools are backed
up
Probably not. In fact, I can't think of a scenario where truncated
data could reappear w/o a restart, assuming the truncate completes
successfully on all nodes. Is this 0.8.4? Can you reproduce with a
toy cluster using https://github.com/pcmanus/ccm ?
On Tue, Aug 16, 2011 at 9:44 AM, Philippe
I'll try and report in September (I'm "on vacation" but trying to get the
cluster to run !).
Thanks
2011/8/16 Jonathan Ellis
> Probably not. In fact, I can't think of a scenario where truncated
> data could reappear w/o a restart, assuming the truncate completes
> successfully on all nodes. I
[default@Race] list CounterCF;
Using default limit of 100
---
RowKey: Stats
=> (counter=APP, value=7503)
=> (counter=FILEUPLOAD, value=155)
=> (counter=MQUPLOAD, value=4726775)
=> (counter=PAGES, value=131948)
=> (counter=REST, value=3)
=> (counter=SOAP, value=44)
=> (counter=WS, va
yeah, sorry about that... pushed click before I added my comments.
I have a cluster of 5 nodes using 0.8.4 where I am using counters. One one
of my nodes, every time I do a list command I get different results. The
counters jump all over the place.
Any ideas? I have run nodetool repair on all
We are currently running a three node cluster where we assigned the initial
tokens using the Python script that is in the Wiki, and we're currently
using the Random Partitioner, RF=1, Cassandra 0.8 from the Riptano RPM
however we're seeing one node taken on over 60% of the data as we load
data.
May be the same as https://issues.apache.org/jira/browse/CASSANDRA-3006 ?
On Tue, Aug 16, 2011 at 12:20 PM, Ryan Lowe wrote:
> yeah, sorry about that... pushed click before I added my comments.
> I have a cluster of 5 nodes using 0.8.4 where I am using counters. One one
> of my nodes, every time
what tokens did you end up using?
are you sure it's actually due to different amounts of rows? have you
run cleanup and compact to make sure it's not unused data / obsolete
replicas taking up the space?
On Tue, Aug 16, 2011 at 1:41 PM, David McNelis
wrote:
> We are currently running a three nod
Currently we have the initial_token for the seed node blank, and then the
three tokens we ended up with are:
56713727820156410577229101238628035242
61396109050359754194262152792166260437
113427455640312821154458202477256070485
I would assume that we'd want to take the node that
is 613961090503597
Actually I think it was more related to our servers getting their time out
of sync... after finding this article:
http://ria101.wordpress.com/2011/02/08/cassandra-the-importance-of-system-clocks-avoiding-oom-and-how-to-escape-oom-meltdown/
I checked our servers, and sure enough, 2 of them were out
Thanks, Jonathan!
-Derek
--
Derek Deeter, Sr. Software Engineer Intuit Financial
Services
(818) 597-5932 (x76932)5601 Lindero Canyon Rd.
derek.dee...@digitalinsight.com Westlake, CA 91362
-Original Message-
F
Yes, that looks about right.
Totally baffled how the wiki script could spit out those tokens for a
3-node cluster.
On Tue, Aug 16, 2011 at 2:04 PM, David McNelis
wrote:
> Currently we have the initial_token for the seed node blank, and then the
> three tokens we ended up with are:
> 56713727820
I'm still trying different stuff. Here are my latest findings, maybe someone
will find them useful:
- I have been able to repair some small column families by issuing a
repair [KS] [CF]. When testing on the ring with no writes at all, it still
takes about 2 repairs to get "consistent" log
Hello,
can anyone give an explanation of
start.addComponent("abc", StringSerializer.get()) ;
end.addComponent("abc", StringSerializer.get(), "UTF8Type",
AbstractComposite.ComponentEquality.GREATER_THAN_EQUAL) ;
Suppose my composite column names are like ("bob", 1982), ("bob", 1976).
There are m
On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote:
> I have been able to repair some small column families by issuing a repair
> [KS] [CF]. When testing on the ring with no writes at all, it still takes
> about 2 repairs to get "consistent" logs for all AES requests.
I think I linked these in anoth
Even more interesting behavior : a repair on a CF has consequences on other
CFs. I didn't expect that.
There are no writes being issued to the cluster yet the logs indicate that
- SSTableReader has opened dozens and dozens of files, most of them
unrelated to the CF being repaired
- compa
Hi all,
I followed instructions here:
http://wiki.apache.org/cassandra/Operations#Token_selection
to create a Cassandra cluster spanning two datacenters. Now I see that
nodes belonging to DC2 datacenter own 0% of the ring. I would expect
them to own 50%.
Does anyone have an idea what's going on h
Thanks for the pointers, responses inline.
On Tue, Aug 16, 2011 at 3:48 PM, Philippe wrote:
> > I have been able to repair some small column families by issuing a repair
> > [KS] [CF]. When testing on the ring with no writes at all, it still takes
> > about 2 repairs to get "consistent" logs for
>
> 2)
> I'm doing batch writes to the database (pulling data from multiple resources
> and put them together). I wish to know if there's some better methods to
> improve the writing efficiency since it's just about the same speed as MySQL,
> when writing sequentially. Seems like the commit
Ian did you sort this out ?
What values do you see passed in when you run ps aux | grep cassandra ?
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 16/08/2011, at 1:08 PM, Ian Danforth wrote:
> All,
>
> When I connect to a no
Yes can get there with a rolling restart.
Check the NEWS.txt file for info on how to upgrade
https://github.com/apache/cassandra/blob/cassandra-0.8.4/NEWS.txt
One thing to be aware of is that repairs can fail until all the data files are
at the same version. This ticket is for better logging a
None that I am aware of. Secondary indexes require an equality operator.
You would need to build a custom secondary index if you needed to do this.
Cheers
-
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 16/08/2011, at 9:33 PM, Jens Har
the logs say it took a long time to read a saved row cache. Try removing the
files from the saved_caches dir as Jonathan suggested.
The collecting log lines with the INT max count are indicative of the
IdentityQueryFilter. One of the places it is used is when adding rows to the
cache.
Cheers
Hi,
yes, we saw exactly the same messages. We got rid of these by doing the
following:
* Set all row & key caches in your CFs to 0 via cassandra-cli
* Kill Cassandra
* Remove all files in the saved_caches directory
* Start Cassandra
* Slowly bring back row & key caches (if desired, we left them
One last thought : what happens when you ctrl-c a nodetool repair ? Does it
stop the repair on the server ? If not, then I think I have multiple repairs
still running. Is there any way to check this ?
Thanks
2011/8/16 Philippe
> Even more interesting behavior : a repair on a CF has consequences
Hi,
Unfortunately my data set really does grow because it s a time series. I'm
going to add a trick to aggregate old data but it will still grow.
That's fine, then you need to scale horizontally. Simply add a new node when
the load on a node exceeds a threshold (ballpark figure here is a max
Thanks Aaron.
>> 2)
>> I'm doing batch writes to the database (pulling data from multiple resources
>> and put them together). I wish to know if there's some better methods to
>> improve the writing efficiency since it's just about the same speed as
>> MySQL, when writing sequentially. See
Are you planning to create 500,000 Super Column Families or 500,000 rows in a
single Super Column Family ?
The former is a somewhat crazy. Cassandra schemas typically have up to a few
tens of Column Families. Each column family involves a certain amount of memory
overhead, this is now automati
I suggested turning up the logging to see if the server processed a
batch_mutate call. This is done from the CassandraServer class
(https://github.com/apache/cassandra/blob/cassandra-0.8.4/src/java/org/apache/cassandra/thrift/CassandraServer.java#L531)
, not the CFOF.
The first step will be to
> I think I've dropped all the indexes on a CF, but I see traces of them in the
> CLI output of show keyspaces. I see a few validators left behind, and one
> "built index". (output below)
What process did you use to drop the indexes ? You need to use update column
family and not include the c
Sounds like it's a similar case as mine. The files are definitely, extremely
big, 10x space overhead should be a good case if you are just putting values
into it.
I'm currently testing CASSANDRA-674 and hopes the better SSTable can solve the
space overhead problem. Please follow my e-mail t
ctrl-c will not stop the repair.
You kind of check things by looking at netstat compationstats , that will just
tell you if there are compactions backing up. Not necessarily if they are
validation compactions used during repairs. You can trawl the logs to look for
messages from the AntiEntropy
BTW,
If I'm going to insert a SCF row with ~400 columns and ~50 subcolumns under
each column, how often should I do a mutation? per column or per row?
On Aug 16, 2011, at 3:24 PM, Yi Yang wrote:
>
> Thanks Aaron.
>
>>> 2)
>>> I'm doing batch writes to the database (pulling data from multiple
> Is that because cassandra really cost a huge disk space?
The general design approach is / has been that storage space is cheap and
plentiful.
> Well my target is to simply get the 1.3T compressed to 700 Gig so that I can
> fit it into a single server, while keeping the same level of performa
does this need to be cluster wide? or I could just modify the caches
on one node? since I could not connect to the node with
cassandra-cli, it says "connection refused"
[default@unknown] connect node2/9160;
Exception connecting to node2/9160. Reason: Connection refused.
so if I change the cac
[default@mykeyspace] list myCF;
Using default limit of 100
---
RowKey: 49505f4652
=> (counter=0004c77581888c00, value=161326)
=> (counter=00040001c77581888c00, value=161326)
here the Rowkey is actually ascii encoding for "IP_FR",
for "get" command we could specify "as
>
> ctrl-c will not stop the repair.
>
Ok, so that's why I've been seeing logs of repairs on other CFs
That's probably the 2280 issue. Data from all CF's is streamed over
>
Ah, I get it now.
Thanks
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorto
http://www.datastax.com/dev/blog/bulk-loading indicates that "it is
perfectly reasonable to load data into a live, active cluster."
So lets say my cluster has a single KS & CF and it contains a key "test"
with a SC named "Cass" and a normal subcolumn named "Data" that has value 1.
If I SSTLoad da
51 matches
Mail list logo