Re: Distinct Counter Proposal for Cassandra

2012-06-13 Thread Utku Can Topçu
on of the algorithm which counts up to 10^9, > so may need some work. > > Other alternative is self-learning bitmap ( > http://ect.bell-labs.com/who/aychen/sbitmap4p.pdf) which, in my > understanding, is more memory efficient when counting small values. > > Yuki > > On W

Distinct Counter Proposal for Cassandra

2012-06-13 Thread Utku Can Topçu
Hi All, Let's assume we have a use case where we need to count the number of columns for a given key. Let's say the key is the URL and the column-name is the IP address or any cardinality identifier. The straight forward implementation seems to be simple, just inserting the IP Adresses as columns

Re: last record rowId

2011-06-15 Thread Utku Can Topçu
As far as I can tell, this functionality doesn't exist. However you can use such a method to insert the rowId into another column within a seperate row, and request the latest column. I think this would work for you. However every insert would need a get request, which I think would be performance

Re: expiring + counter column?

2011-05-28 Thread Utku Can Topçu
How about implementing a freezing mechanism on counter columns. If there are no more increments within "freeze" seconds after the last increments (it would be orders or day or so); the column would lock itself on increments and won't accept increment. And after this freeze perioid, the ttl should

Re: Corrupted Counter Columns

2011-05-28 Thread Utku Can Topçu
ri, May 27, 2011 at 1:59 PM, Sylvain Lebresne wrote: > On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu wrote: > > Hello, > > > > I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. > > > > Strangely counters are corrupted. Say, the actual value should be :

Re: Corrupted Counter Columns

2011-05-26 Thread Utku Can Topçu
Some additional information on the settings: I'm using CL.ONE for both reading and writing; and replicate_on_write is true on the Counters CF. I think the problem occurs after a restart when the commitlogs are read. On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu wrote: > Hello, >

Corrupted Counter Columns

2011-05-26 Thread Utku Can Topçu
Hello, I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes. Strangely counters are corrupted. Say, the actual value should be : 51664 and the value that cassandra sometimes outputs is: either 51664 or 18651001. And I have no idea on how to diagnose the problem or reproduce it. Can you help me in

Re: CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu
see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu wrote: > Hi guys, > > I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the > way it should be but: > - I create a ColumnFamily nam

CounterColumn increments gone after restart

2011-05-12 Thread Utku Can Topçu
Hi guys, I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way it should be but: - I create a ColumnFamily named Counters - do a few increments on a column. - kill cassandra - start cassandra When I look at the counter column, the value is 1. See the following pastebin ple

Re: Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
And I think this patch would still be useful and legitimate if the TTL of the initial increment is taken into account. On Thu, Feb 17, 2011 at 6:11 PM, Utku Can Topçu wrote: > Yes, I've read the discussion. My use-case is similar to the use-case of > the contributor. > > So

Re: Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
the point is that the > approach is fundamentally flawed. > > On Thu, Feb 17, 2011 at 10:16 AM, Utku Can Topçu > wrote: > > Can anyone confirm that this patch works with the current trunk? > > > > On Thu, Feb 17, 2011 at 4:16 PM, Sylvain Lebresne > > wrote: >

Re: Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
Can anyone confirm that this patch works with the current trunk? On Thu, Feb 17, 2011 at 4:16 PM, Sylvain Lebresne wrote: > https://issues.apache.org/jira/browse/CASSANDRA-2103 > > > On Thu, Feb 17, 2011 at 4:05 PM, Utku Can Topçu wrote: > >> Hi All, >> >> I

Re: Commercial support for cassandra

2011-02-17 Thread Utku Can Topçu
http://wiki.apache.org/cassandra/ThirdPartySupport On Thu, Feb 17, 2011 at 12:20 AM, Sal Fuentes wrote: > They also offer great training sessions. Have a look at their site for more > information: http://www.datastax.com/about-us > > > On Wed, Feb 16, 2011 at 3:13 PM, Michael Widmann < > michael

Does counter columns support TTL

2011-02-17 Thread Utku Can Topçu
Hi All, I'm experimenting and developing using counters. However, I've come to a usecase where I need counters to expire and get deleted after a certain time of inactivity (i.e. have countercolumn deleted one hour after the last increment). As far as I can tell counter columns don't have TTL in t

Re: Super Slow Multi-gets

2011-02-10 Thread Utku Can Topçu
ategory.me.prettyprint=DEBUG, stdout > > Thanks... > > Bill- > > > On Thu, Feb 10, 2011 at 12:53 PM, Bill Speirs > wrote: > > Each message row is well under 1K. So I don't think it is network... plus > > all boxes are on a fast LAN. > > > > Bill- >

Re: Super Slow Multi-gets

2011-02-10 Thread Utku Can Topçu
Dear Bill, How about the size of the row in the Messages CF. Is it too big? Might you be having an overhead of the bandwidth? Regards, Utku On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs wrote: > I have a 7 node setup with a replication factor of 1 and a read > consistency of 1. I have two colum

Re: Implemeting a LRU in Cassandra

2011-02-10 Thread Utku Can Topçu
he set. > > Would that work for you? > > Aaron > > On 9 Feb 2011, at 23:58, Utku Can Topçu wrote: > > > Hi All, > > > > I'm sure people here have tried to solve similar questions. > > Say I'm tracking pages, I want to access the least recently us

Implemeting a LRU in Cassandra

2011-02-09 Thread Utku Can Topçu
Hi All, I'm sure people here have tried to solve similar questions. Say I'm tracking pages, I want to access the least recently used 1000 unique pages (i.e. columnnames). How can I achieve this? Using a row with say, ttl=60 seconds would solve the problem of accessing the least recently used uniq

Re: Hadoop Integration doesn't work when one node is down

2011-01-02 Thread Utku Can Topçu
I've created an issue, was this what you were asking Jonathan? https://issues.apache.org/jira/browse/CASSANDRA-1927 On Mon, Jan 3, 2011 at 12:24 AM, Jonathan Ellis wrote: > Can you create one? > > On Sun, Jan 2, 2011 at 4:39 PM, mck wrote: > > > >> Is this a bug or feature or a misuse? > > >

Re: Hadoop Integration doesn't work when one node is down

2010-12-31 Thread Utku Can Topçu
Oops, I've forgotten to tell I'm using the 0.7-rc2 branch with some patches that has nothing to do with hadoop. On Fri, Dec 31, 2010 at 1:05 PM, Utku Can Topçu wrote: > Hi All, > > When I start the CFInputFormat to read a CF in a keyspace of RF=3 on a > 4-node cluster: &g

Hadoop Integration doesn't work when one node is down

2010-12-31 Thread Utku Can Topçu
Hi All, When I start the CFInputFormat to read a CF in a keyspace of RF=3 on a 4-node cluster: - If all the nodes are all up, everything works fine and I don't have any problems walking through the all data in the CF, however - If there's a node down, the hadoop job does not even start, just dies

Re: Replacing nodes of the cluster in 0.7.0-RC1

2010-12-05 Thread Utku Can Topçu
Since no reply came in afew days, I tried my proposed steps and it all worked fine. Just to let you know. On Sat, Dec 4, 2010 at 10:31 PM, Utku Can Topçu wrote: > Hi All, > > I'm currently not happy with the hardware and the operating system of our > 4-node cassandra cluster

Replacing nodes of the cluster in 0.7.0-RC1

2010-12-04 Thread Utku Can Topçu
Hi All, I'm currently not happy with the hardware and the operating system of our 4-node cassandra cluster. I'm planning to move the cluster to a different hardware/OS architecture. For this purpose I'm planning to bring up 4 new nodes, so that each node will be a replacement of another node in

Detecting failed nodes and restarting

2010-12-02 Thread Utku Can Topçu
Hi All, The question is really simple. Is there anyone out there using a set of scripts in production that detects failures of cassandra processes and restarts them or takes required actions. If so, how can we implement a generic solution for this problem? Regards, Utku

Re: Deleting the datadir for system keyspace in 0.7

2010-11-15 Thread Utku Can Topçu
gt; the token, hints. Everything but the hints can be replaced. > > > > Gary. > > > > On Mon, Nov 15, 2010 at 06:29, Utku Can Topçu wrote: > >> Hello All, > >> > >> I'm wondering before restarting the a node in a cluster. If I delete the > &

Deleting the datadir for system keyspace in 0.7

2010-11-15 Thread Utku Can Topçu
Hello All, I'm wondering before restarting the a node in a cluster. If I delete the system keyspace, what data would I be losing, would I be losing anything? Regards, Utku

Cassandra Hadoop Integration not compatible with Hadoop 0.21.0

2010-11-05 Thread Utku Can Topçu
When I try to read a CF from Hadoop, just after issuing the run I get this error: Exception in thread "main" java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSpli

Re: Time to wait for CF to be consistent after stopping writes.

2010-10-28 Thread Utku Can Topçu
k wrote: > On Wed, Oct 27, 2010 at 05:08, Utku Can Topçu wrote: > > Hi, > > > > For a columnfamily in a keyspace which has RF=3, I'm issuing writes with > > ConsistencyLevel.ONE. > > > > in the configuration I have: > > - memtable_flush_after_mins

Time to wait for CF to be consistent after stopping writes.

2010-10-27 Thread Utku Can Topçu
Hi, For a columnfamily in a keyspace which has RF=3, I'm issuing writes with ConsistencyLevel.ONE. in the configuration I have: - memtable_flush_after_mins : 30 - memtable_throughput_in_mb : 32 I'm writing to this columnfamily continuously for about 1 hour then stop writing. So the question is:

Re: Reading a keyrange when using RP

2010-10-21 Thread Utku Can Topçu
ck in order with RP. > > You can start out with a start key and end key of '' (empty) and use the > row count argument instead, if > your goal is paging the rows. To get the next page, start from the last > key you got in the > previous page. > > > On Thu

creating and dropping columnfamilies as a usecase

2010-10-21 Thread Utku Can Topçu
Hi All, In the current project I'm working on. I have use case for hourly analyzing the rows. Since the 0.7x branch supports creating and dropping columnfamilies on the fly; My use case proposal will be: * Create a CF at the very beginning of every hour * At the end of the 1-hour period, analyze

Reading a keyrange when using RP

2010-10-21 Thread Utku Can Topçu
If I'm not mistaken cassandra has been providing support for keyrange queries also on RP. However when I try to define a keyrange such as, start: (key100, end: key200) I get an error like: InvalidRequestException(why:start key's md5 sorts after end key's md5. this is not allowed; you probably sho

Re: using jna.jar "Unknown mlockall error 0"

2010-10-08 Thread Utku Can Topçu
get that mlockall > error 0. > Maybe there is another solution anyway. > > nico008 > > > > On 08/10/2010 11:33, Roger Schildmeijer wrote: > > > > On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu wrote: > >> Hi, >> >> In order to continue on memory

Re: using jna.jar "Unknown mlockall error 0"

2010-10-08 Thread Utku Can Topçu
I'm running an Ubuntu 9.10 linux box. On Fri, Oct 8, 2010 at 11:33 AM, Roger Schildmeijer wrote: > > > On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu wrote: > >> Hi, >> >> In order to continue on memory optimizations, I've been trying to use the >>

using jna.jar "Unknown mlockall error 0"

2010-10-08 Thread Utku Can Topçu
Hi, In order to continue on memory optimizations, I've been trying to use the JNA. However, when I copy the jna.jar to the lib directory? I get the warning. I'm currently running the 0.6.5 version of cassandra. WARN [main] 2010-10-08 09:16:18,924 FBUtilities.java (line 595) Unknown mlockall error

Re: Tuning cassandra to use less memory

2010-10-06 Thread Utku Can Topçu
Hi Oleg, I've been also looking into these after some research. I've been tacking with: 1. Setting the default max and min heap from 1G to 1500M. 2. I'm not using row caches, and the key caches are set to 1000, before they were 200K as default 3. I've lowered the memtable throughput to 32MB 4. We

Tuning cassandra to use less memory

2010-10-05 Thread Utku Can Topçu
Hi All, We're currently starting to get OOM exceptions in our cluster. I'm trying to push the limiations of our machines. Currently we have 1.7 G memory (ec2-medium) I'm wondering if by tweaking some of cassandra's configuration settings, is it possible to make it live in peace and less memory :)

A proposed use case, any comments and experience is appreciated

2010-10-04 Thread Utku Can Topçu
Hey All, I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are formed in such a fashion that, they are indexed in descending order by time. So I'll be analyzing the data for every hour iteratively. Since the current Hadoop integration does not support partial columnfamily anal

Re: A proposed use case, any comments and experience is appreciated

2010-10-04 Thread Utku Can Topçu
away. > > On Mon, Oct 4, 2010 at 8:48 AM, Utku Can Topçu wrote: > > Hi Jonathan, > > > > Thank you for mentioning about the expiring columns issue. I didn't know > > that it had existed. > > That's really great news. > > First of all, does the

Hardware change of a node in the cluster

2010-10-04 Thread Utku Can Topçu
Hey All, Recently I've tried to upgrade (hw upgrade) one of the nodes in my cassandra cluster from ec2-small to ec2-large. However, there were problems and since the IP of the new instance was different from the previous instance. The other nodes didnot recognize it in the ring. So what should b

Re: A proposed use case, any comments and experience is appreciated

2010-10-04 Thread Utku Can Topçu
gt; > On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu wrote: > > Hey All, > > > > I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are > > formed in such a fashion that, they are indexed in descending order by > time. > > So I'll be

Best strategy for adding new nodes to the cluster

2010-09-27 Thread Utku Can Topçu
Hi All, We're currently running a cassandra cluster with Replication Factor 3, consisting of 4 nodes. The current situation is: - The nodes are all identical (AWS small instances) - Data directory is in the partition (/mnt) which has 150G capacity and each node has around 90 GB load, so 60 G fre

Having different 0.6.x instances in one Cassandra cluster

2010-08-05 Thread Utku Can Topçu
Hi All, I'm planning to use the current 0.6.4 stable for creating an image that would be the base for nodes in our Cassandra cluster. However, the 0.6.5 release is on the way. When the 0.6.5 has been released. Is it possible to have some of the nodes stay in 0.6.4 and having new nodes in 0.6.5?

Lucene CassandraDirectory Implementation

2010-07-22 Thread Utku Can Topçu
Hi All, I was browsing through the Lucene JIRA and came across the issue named "A Column-Oriented Cassandra-Based Lucene Directory" at https://issues.apache.org/jira/browse/LUCENE-2456 Has anyone had a chance to test it? If so, do you think it's an efficient implementation as a replacement for th

Re: Implementing Counter on Cassandra

2010-07-01 Thread Utku Can Topçu
y,update. If this doesn't work for your application, then a > > (distributed) lock manager may be used until such time that you can > > take it out. Some are using ZooKeeper for this. > > > > > > On Tue, Jun 29, 2010 at 11:45 AM, Ryan King wrote: > >>

Implementing Counter on Cassandra

2010-06-29 Thread Utku Can Topçu
Hey Guys, Currently in a project I'm involved in, I need to have some columns holding incremented data. The easy approach for implementing a counter with increments is right now as I figured out is "read -> increment -> insert" however this approach is not an atomic operation and can easily be cor

Cassandra Data Model Design Visualization

2010-06-29 Thread Utku Can Topçu
Hey Guys, I've been into designing an application which consists of more than 20 ColumnFamily's. Each ColumnFamily has some columns referencing to keys in other ColumnFamily's, some keys in ColumnFamily are combination of keys/columns in other ColumnFamily's. I guess most of the people are using

Getting keys in a range sorted with respect to last access time

2010-06-07 Thread Utku Can Topçu
Hey All, First of all I'll start with some questions on the default behavior of get_range_slices method defined in the thrift API. Given a keyrange with start-key "kstart" and end-key "kend", assuming kstartkend? Will I get an empty result list? Secondly, I have use case where I need to access t

Re: Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Utku Can Topçu
call to achieve this. > > > > It’s read and write, plus a delete (if move) API calls I guess. > > > > *From:* Utku Can Topçu [mailto:u...@topcu.gen.tr] > *Sent:* Wednesday, May 26, 2010 9:09 PM > *To:* user@cassandra.apache.org > *Subject:* Moving/copying columns

Moving/copying columns in between ColumnFamilies

2010-05-26 Thread Utku Can Topçu
Hey All, Assume I have two ColumnFamilies in the same keyspace and I want to move or copy a range of columns (defined by a keyrange) into another columnfamily. Do you think it's somehow possible and doable with the current support of the API, if so how? Best Regards, Utku

Re: Anyone using hadoop/MapReduce integration currently?

2010-05-25 Thread Utku Can Topçu
Hi Jeremy, > Why are you using Cassandra versus using data stored in HDFS or HBase? - I'm thinking of using it for realtime streaming of user data. While streaming the requests, I'm also using Lucandra for indexing the data in realtime. It's a better option when you compare it with HBase or the na

Re: Real-time Web Analysis tool using Cassandra. Doubts...

2010-05-12 Thread Utku Can Topçu
What makes cassandra a poor choice is the fact that, you can't use a keyrange as input for the map phase for Hadoop. On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis wrote: > On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati > wrote: > > - First of all, my first thoughts is to have two CF o

Inverted Indexing a ColumnFamily

2010-05-11 Thread Utku Can Topçu
Hello All, I guess the subject talks for itself. I'm currently developing a document analysis engine using cassandra as the scalable storage. I just want to briefly make an overview of the data model I'm using for this purpose. "the key" is formed in the format of timestamp.random(), so that it'

Distributed export and import into cassandra

2010-05-03 Thread Utku Can Topçu
Hey All, I have a simple sample use case, The aim is to export the columns in a column family into flat files with the keys in range from k1 to k2. Since all the nodes in the cluster is supposed to contain some of the distribution of data, is it possible to make each node dump its own local data v

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Utku Can Topçu
I meant in the first sentence "running the get_range_slices from a single point" On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu wrote: > Do you mean, running the get_range_slices from a single? Yes, it would be > reasonable for a relatively small key range, when it comes to an

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-30 Thread Utku Can Topçu
at 3:22 PM, Jonathan Ellis wrote: > Sounds like doing this w/o m/r with get_range_slices is a reasonable way to > go. > > On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu wrote: > > I'm currently writing collected data continuously to Cassandra, having > keys > >

ColumnFamilyOutputFormat?

2010-04-30 Thread Utku Can Topçu
Hey All, I've been looking at the documentation and related articles about Cassandra and Hadoop integration, I'm only seeing ColumnFamilyInputFormat for now. What if I want to write directly to cassandra after a reduce? What comes to my mind is, in the Reducer's setup I'd initialize a Cassandra c

Re: ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
hu, Apr 29, 2010 at 11:32 PM, Jonathan Ellis wrote: > It's technically possible but 0.6 does not support this, no. > > What is the use case you are thinking of? > > On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu > wrote: > > Hi, > > > > I've been

ColumnFamilyInputFormat KeyRange scans on a CF

2010-04-29 Thread Utku Can Topçu
Hi, I've been trying to use Cassandra for some kind of a supplementary input source for Hadoop MapReduce jobs. The default usage of the ColumnFamilyInputFormat does a full columnfamily scan for using within the MapReduce framework as map input. However I believe that, it should be possible to gi

Re: TimedOutException when using the ColumnFamilyInputFormat

2010-04-29 Thread Utku Can Topçu
batchsize > with a call to ConfigHelper.setRangeBatchSize(). This has eliminated > the TimedOutExceptions for us. > joost. > > On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Topçu > wrote: > > Hey All, > > > > I'm trying to run some tests on cassandra an Hadoo

TimedOutException when using the ColumnFamilyInputFormat

2010-04-29 Thread Utku Can Topçu
Hey All, I'm trying to run some tests on cassandra an Hadoop integration. I'm basically following the word count example at https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.javausing the ColumnFamilyInputFormat. Currently I have one-node cassandra and hadoop setup

Re: Lucandra - Lucene/Solr on Cassandra: April 26, NYC

2010-04-25 Thread Utku Can Topçu
Can you please release the talk at a place after it's been done? Best Regards, Utku On Thu, Apr 22, 2010 at 6:51 PM, Otis Gospodnetic < otis_gospodne...@yahoo.com> wrote: > Hello folks, > > Those of you in or near NYC and using Lucene or Solr should come to > "Lucandra - a Cassandra-based backen