on of the algorithm which counts up to 10^9,
> so may need some work.
>
> Other alternative is self-learning bitmap (
> http://ect.bell-labs.com/who/aychen/sbitmap4p.pdf) which, in my
> understanding, is more memory efficient when counting small values.
>
> Yuki
>
> On W
Hi All,
Let's assume we have a use case where we need to count the number of
columns for a given key. Let's say the key is the URL and the column-name
is the IP address or any cardinality identifier.
The straight forward implementation seems to be simple, just inserting the
IP Adresses as columns
As far as I can tell, this functionality doesn't exist.
However you can use such a method to insert the rowId into another column
within a seperate row, and request the latest column.
I think this would work for you. However every insert would need a get
request, which I think would be performance
How about implementing a freezing mechanism on counter columns.
If there are no more increments within "freeze" seconds after the last
increments (it would be orders or day or so); the column would lock itself
on increments and won't accept increment.
And after this freeze perioid, the ttl should
ri, May 27, 2011 at 1:59 PM, Sylvain Lebresne wrote:
> On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu wrote:
> > Hello,
> >
> > I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes.
> >
> > Strangely counters are corrupted. Say, the actual value should be :
Some additional information on the settings:
I'm using CL.ONE for both reading and writing; and replicate_on_write is
true on the Counters CF.
I think the problem occurs after a restart when the commitlogs are read.
On Thu, May 26, 2011 at 2:21 PM, Utku Can Topçu wrote:
> Hello,
>
Hello,
I'm using the the 0.8.0-rc1, with RF=2 and 4 nodes.
Strangely counters are corrupted. Say, the actual value should be : 51664
and the value that cassandra sometimes outputs is: either 51664 or 18651001.
And I have no idea on how to diagnose the problem or reproduce it.
Can you help me in
see the ticket https://issues.apache.org/jira/browse/CASSANDRA-2642 please
On Thu, May 12, 2011 at 3:28 PM, Utku Can Topçu wrote:
> Hi guys,
>
> I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the
> way it should be but:
> - I create a ColumnFamily nam
Hi guys,
I have strange problem with 0.8.0-rc1. I'm not quite sure if this is the way
it should be but:
- I create a ColumnFamily named Counters
- do a few increments on a column.
- kill cassandra
- start cassandra
When I look at the counter column, the value is 1.
See the following pastebin ple
And I think this patch would still be useful and legitimate if the TTL of
the initial increment is taken into account.
On Thu, Feb 17, 2011 at 6:11 PM, Utku Can Topçu wrote:
> Yes, I've read the discussion. My use-case is similar to the use-case of
> the contributor.
>
> So
the point is that the
> approach is fundamentally flawed.
>
> On Thu, Feb 17, 2011 at 10:16 AM, Utku Can Topçu
> wrote:
> > Can anyone confirm that this patch works with the current trunk?
> >
> > On Thu, Feb 17, 2011 at 4:16 PM, Sylvain Lebresne
> > wrote:
>
Can anyone confirm that this patch works with the current trunk?
On Thu, Feb 17, 2011 at 4:16 PM, Sylvain Lebresne wrote:
> https://issues.apache.org/jira/browse/CASSANDRA-2103
>
>
> On Thu, Feb 17, 2011 at 4:05 PM, Utku Can Topçu wrote:
>
>> Hi All,
>>
>> I
http://wiki.apache.org/cassandra/ThirdPartySupport
On Thu, Feb 17, 2011 at 12:20 AM, Sal Fuentes wrote:
> They also offer great training sessions. Have a look at their site for more
> information: http://www.datastax.com/about-us
>
>
> On Wed, Feb 16, 2011 at 3:13 PM, Michael Widmann <
> michael
Hi All,
I'm experimenting and developing using counters. However, I've come to a
usecase where I need counters to expire and get deleted after a certain time
of inactivity (i.e. have countercolumn deleted one hour after the last
increment).
As far as I can tell counter columns don't have TTL in t
ategory.me.prettyprint=DEBUG, stdout
>
> Thanks...
>
> Bill-
>
>
> On Thu, Feb 10, 2011 at 12:53 PM, Bill Speirs
> wrote:
> > Each message row is well under 1K. So I don't think it is network... plus
> > all boxes are on a fast LAN.
> >
> > Bill-
>
Dear Bill,
How about the size of the row in the Messages CF. Is it too big? Might you
be having an overhead of the bandwidth?
Regards,
Utku
On Thu, Feb 10, 2011 at 5:00 PM, Bill Speirs wrote:
> I have a 7 node setup with a replication factor of 1 and a read
> consistency of 1. I have two colum
he set.
>
> Would that work for you?
>
> Aaron
>
> On 9 Feb 2011, at 23:58, Utku Can Topçu wrote:
>
> > Hi All,
> >
> > I'm sure people here have tried to solve similar questions.
> > Say I'm tracking pages, I want to access the least recently us
Hi All,
I'm sure people here have tried to solve similar questions.
Say I'm tracking pages, I want to access the least recently used 1000 unique
pages (i.e. columnnames). How can I achieve this?
Using a row with say, ttl=60 seconds would solve the problem of accessing
the least recently used uniq
I've created an issue, was this what you were asking Jonathan?
https://issues.apache.org/jira/browse/CASSANDRA-1927
On Mon, Jan 3, 2011 at 12:24 AM, Jonathan Ellis wrote:
> Can you create one?
>
> On Sun, Jan 2, 2011 at 4:39 PM, mck wrote:
> >
> >> Is this a bug or feature or a misuse?
> >
>
Oops, I've forgotten to tell I'm using the 0.7-rc2 branch with some patches
that has nothing to do with hadoop.
On Fri, Dec 31, 2010 at 1:05 PM, Utku Can Topçu wrote:
> Hi All,
>
> When I start the CFInputFormat to read a CF in a keyspace of RF=3 on a
> 4-node cluster:
&g
Hi All,
When I start the CFInputFormat to read a CF in a keyspace of RF=3 on a
4-node cluster:
- If all the nodes are all up, everything works fine and I don't have any
problems walking through the all data in the CF, however
- If there's a node down, the hadoop job does not even start, just dies
Since no reply came in afew days, I tried my proposed steps and it all
worked fine.
Just to let you know.
On Sat, Dec 4, 2010 at 10:31 PM, Utku Can Topçu wrote:
> Hi All,
>
> I'm currently not happy with the hardware and the operating system of our
> 4-node cassandra cluster
Hi All,
I'm currently not happy with the hardware and the operating system of our
4-node cassandra cluster. I'm planning to move the cluster to a different
hardware/OS architecture.
For this purpose I'm planning to bring up 4 new nodes, so that each node
will be a replacement of another node in
Hi All,
The question is really simple. Is there anyone out there using a set of
scripts in production that detects failures of cassandra processes and
restarts them or takes required actions.
If so, how can we implement a generic solution for this problem?
Regards,
Utku
gt; the token, hints. Everything but the hints can be replaced.
> >
> > Gary.
> >
> > On Mon, Nov 15, 2010 at 06:29, Utku Can Topçu wrote:
> >> Hello All,
> >>
> >> I'm wondering before restarting the a node in a cluster. If I delete the
> &
Hello All,
I'm wondering before restarting the a node in a cluster. If I delete the
system keyspace, what data would I be losing, would I be losing anything?
Regards,
Utku
When I try to read a CF from Hadoop, just after issuing the run I get this
error:
Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
interface org.apache.hadoop.mapreduce.JobContext, but class was expected
at
org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSpli
k wrote:
> On Wed, Oct 27, 2010 at 05:08, Utku Can Topçu wrote:
> > Hi,
> >
> > For a columnfamily in a keyspace which has RF=3, I'm issuing writes with
> > ConsistencyLevel.ONE.
> >
> > in the configuration I have:
> > - memtable_flush_after_mins
Hi,
For a columnfamily in a keyspace which has RF=3, I'm issuing writes with
ConsistencyLevel.ONE.
in the configuration I have:
- memtable_flush_after_mins : 30
- memtable_throughput_in_mb : 32
I'm writing to this columnfamily continuously for about 1 hour then stop
writing.
So the question is:
ck in order with RP.
>
> You can start out with a start key and end key of '' (empty) and use the
> row count argument instead, if
> your goal is paging the rows. To get the next page, start from the last
> key you got in the
> previous page.
>
>
> On Thu
Hi All,
In the current project I'm working on. I have use case for hourly analyzing
the rows.
Since the 0.7x branch supports creating and dropping columnfamilies on the
fly;
My use case proposal will be:
* Create a CF at the very beginning of every hour
* At the end of the 1-hour period, analyze
If I'm not mistaken cassandra has been providing support for keyrange
queries also on RP.
However when I try to define a keyrange such as, start: (key100, end:
key200) I get an error like:
InvalidRequestException(why:start key's md5 sorts after end key's md5. this
is not allowed; you probably sho
get that mlockall
> error 0.
> Maybe there is another solution anyway.
>
> nico008
>
>
>
> On 08/10/2010 11:33, Roger Schildmeijer wrote:
>
>
>
> On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu wrote:
>
>> Hi,
>>
>> In order to continue on memory
I'm running an Ubuntu 9.10 linux box.
On Fri, Oct 8, 2010 at 11:33 AM, Roger Schildmeijer
wrote:
>
>
> On Fri, Oct 8, 2010 at 11:27 AM, Utku Can Topçu wrote:
>
>> Hi,
>>
>> In order to continue on memory optimizations, I've been trying to use the
>>
Hi,
In order to continue on memory optimizations, I've been trying to use the
JNA. However, when I copy the jna.jar to the lib directory? I get the
warning. I'm currently running the 0.6.5 version of cassandra.
WARN [main] 2010-10-08 09:16:18,924 FBUtilities.java (line 595) Unknown
mlockall error
Hi Oleg,
I've been also looking into these after some research.
I've been tacking with:
1. Setting the default max and min heap from 1G to 1500M.
2. I'm not using row caches, and the key caches are set to 1000, before they
were 200K as default
3. I've lowered the memtable throughput to 32MB
4. We
Hi All,
We're currently starting to get OOM exceptions in our cluster. I'm trying to
push the limiations of our machines. Currently we have 1.7 G memory
(ec2-medium)
I'm wondering if by tweaking some of cassandra's configuration settings, is
it possible to make it live in peace and less memory :)
Hey All,
I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
formed in such a fashion that, they are indexed in descending order by time.
So I'll be analyzing the data for every hour iteratively.
Since the current Hadoop integration does not support partial columnfamily
anal
away.
>
> On Mon, Oct 4, 2010 at 8:48 AM, Utku Can Topçu wrote:
> > Hi Jonathan,
> >
> > Thank you for mentioning about the expiring columns issue. I didn't know
> > that it had existed.
> > That's really great news.
> > First of all, does the
Hey All,
Recently I've tried to upgrade (hw upgrade) one of the nodes in my cassandra
cluster from ec2-small to ec2-large.
However, there were problems and since the IP of the new instance was
different from the previous instance. The other nodes didnot recognize it in
the ring.
So what should b
gt;
> On Mon, Oct 4, 2010 at 5:12 AM, Utku Can Topçu wrote:
> > Hey All,
> >
> > I'm planning to run Map/Reduce on one of the ColumnFamilies. The keys are
> > formed in such a fashion that, they are indexed in descending order by
> time.
> > So I'll be
Hi All,
We're currently running a cassandra cluster with Replication Factor 3,
consisting of 4 nodes.
The current situation is:
- The nodes are all identical (AWS small instances)
- Data directory is in the partition (/mnt) which has 150G capacity and each
node has around 90 GB load, so 60 G fre
Hi All,
I'm planning to use the current 0.6.4 stable for creating an image that
would be the base for nodes in our Cassandra cluster.
However, the 0.6.5 release is on the way. When the 0.6.5 has been released.
Is it possible to have some of the nodes stay in 0.6.4 and having new nodes
in 0.6.5?
Hi All,
I was browsing through the Lucene JIRA and came across the issue named "A
Column-Oriented Cassandra-Based Lucene Directory" at
https://issues.apache.org/jira/browse/LUCENE-2456
Has anyone had a chance to test it? If so, do you think it's an efficient
implementation as a replacement for th
y,update. If this doesn't work for your application, then a
> > (distributed) lock manager may be used until such time that you can
> > take it out. Some are using ZooKeeper for this.
> >
> >
> > On Tue, Jun 29, 2010 at 11:45 AM, Ryan King wrote:
> >>
Hey Guys,
Currently in a project I'm involved in, I need to have some columns holding
incremented data.
The easy approach for implementing a counter with increments is right now as
I figured out is "read -> increment -> insert" however this approach is not
an atomic operation and can easily be cor
Hey Guys,
I've been into designing an application which consists of more than 20
ColumnFamily's.
Each ColumnFamily has some columns referencing to keys in other
ColumnFamily's,
some keys in ColumnFamily are combination of keys/columns in other
ColumnFamily's.
I guess most of the people are using
Hey All,
First of all I'll start with some questions on the default behavior of
get_range_slices method defined in the thrift API.
Given a keyrange with start-key "kstart" and end-key "kend", assuming
kstartkend? Will I get an empty
result list?
Secondly, I have use case where I need to access t
call to achieve this.
>
>
>
> It’s read and write, plus a delete (if move) API calls I guess.
>
>
>
> *From:* Utku Can Topçu [mailto:u...@topcu.gen.tr]
> *Sent:* Wednesday, May 26, 2010 9:09 PM
> *To:* user@cassandra.apache.org
> *Subject:* Moving/copying columns
Hey All,
Assume I have two ColumnFamilies in the same keyspace and I want to move or
copy a range of columns (defined by a keyrange) into another columnfamily.
Do you think it's somehow possible and doable with the current support of
the API, if so how?
Best Regards,
Utku
Hi Jeremy,
> Why are you using Cassandra versus using data stored in HDFS or HBase?
- I'm thinking of using it for realtime streaming of user data. While
streaming the requests, I'm also using Lucandra for indexing the data in
realtime. It's a better option when you compare it with HBase or the na
What makes cassandra a poor choice is the fact that, you can't use a
keyrange as input for the map phase for Hadoop.
On Wed, May 12, 2010 at 4:37 PM, Jonathan Ellis wrote:
> On Tue, May 11, 2010 at 1:52 PM, Paulo Gabriel Poiati
> wrote:
> > - First of all, my first thoughts is to have two CF o
Hello All,
I guess the subject talks for itself.
I'm currently developing a document analysis engine using cassandra as the
scalable storage.
I just want to briefly make an overview of the data model I'm using for this
purpose.
"the key" is formed in the format of timestamp.random(), so that it'
Hey All,
I have a simple sample use case,
The aim is to export the columns in a column family into flat files with the
keys in range from k1 to k2.
Since all the nodes in the cluster is supposed to contain some of the
distribution of data, is it possible to make each node dump its own local
data v
I meant in the first sentence "running the get_range_slices from a single
point"
On Fri, Apr 30, 2010 at 4:08 PM, Utku Can Topçu wrote:
> Do you mean, running the get_range_slices from a single? Yes, it would be
> reasonable for a relatively small key range, when it comes to an
at 3:22 PM, Jonathan Ellis wrote:
> Sounds like doing this w/o m/r with get_range_slices is a reasonable way to
> go.
>
> On Thu, Apr 29, 2010 at 6:04 PM, Utku Can Topçu wrote:
> > I'm currently writing collected data continuously to Cassandra, having
> keys
> >
Hey All,
I've been looking at the documentation and related articles about Cassandra
and Hadoop integration, I'm only seeing ColumnFamilyInputFormat for now.
What if I want to write directly to cassandra after a reduce?
What comes to my mind is, in the Reducer's setup I'd initialize a Cassandra
c
hu, Apr 29, 2010 at 11:32 PM, Jonathan Ellis wrote:
> It's technically possible but 0.6 does not support this, no.
>
> What is the use case you are thinking of?
>
> On Thu, Apr 29, 2010 at 11:14 AM, Utku Can Topçu
> wrote:
> > Hi,
> >
> > I've been
Hi,
I've been trying to use Cassandra for some kind of a supplementary input
source for Hadoop MapReduce jobs.
The default usage of the ColumnFamilyInputFormat does a full columnfamily
scan for using within the MapReduce framework as map input.
However I believe that, it should be possible to gi
batchsize
> with a call to ConfigHelper.setRangeBatchSize(). This has eliminated
> the TimedOutExceptions for us.
> joost.
>
> On Thu, Apr 29, 2010 at 10:25 AM, Utku Can Topçu
> wrote:
> > Hey All,
> >
> > I'm trying to run some tests on cassandra an Hadoo
Hey All,
I'm trying to run some tests on cassandra an Hadoop integration. I'm
basically following the word count example at
https://svn.apache.org/repos/asf/cassandra/trunk/contrib/word_count/src/WordCount.javausing
the ColumnFamilyInputFormat.
Currently I have one-node cassandra and hadoop setup
Can you please release the talk at a place after it's been done?
Best Regards,
Utku
On Thu, Apr 22, 2010 at 6:51 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:
> Hello folks,
>
> Those of you in or near NYC and using Lucene or Solr should come to
> "Lucandra - a Cassandra-based backen
62 matches
Mail list logo