Re: Cassandra 0.8 questions

2011-05-24 Thread Victor Kabdebon
It's not really possible to give a general answer your second question, it
depends of your implementation. Personally I do two thing : the first one is
to map arrays with a key and then name of column as a key of your array and
value of column as the data storage. However for some application, as I am
using Java I just serialize my ArrayList (or List) and push all the content
to one column. It all depends on what you want to achieve.

Third question: try to make CF according to what you want to achieve. I am
designing an internal messaging system I use only two column family to hold
the message lists, message and message box. I would have used one; but I
need one that is sorted by TimeUUID and the other one by UTF8Type. I think
there is a general consensus here : try to avoid super columns. 2 sets of
columns can do the same jobs has one SuperColumn and it's
the preferred scheme.

Again just experiment and be ready to change your organization if you begin
with Cassandra, this is the best way to figure out what to do for your data
organization.

Victor Kabdebon
http://www.voxnucleus.fr
http://www.victorkabdebon.net

2011/5/24 Jian Fang 

> Does anyone have a good suggestion on my second question? I believe that
> question is a pretty common one.
>
> My third question is a design question. For the same data, we can stored
> them into multiple column families or a single column family with multiple
> super columns.
> From Cassandra read/write performance point of view, what are the general
> rules to make mutliple column families and when to use a single column
> family?
>
> Thanks again,
>
> John
>
>
> On Mon, May 23, 2011 at 5:47 PM, Jian Fang 
> wrote:
>
>> Hi,
>>
>> I am pretty new to Cassandra and am going to use Cassandra 0.8.0. I have
>> two questions (sorry if they are very basic ones):
>>
>> 1) I have a column family to hold many super columns, say 30. When I first
>> insert the data to the column family, do I need to insert each column one at
>> a time or can I insert the whole column family in one transaction (or
>> call?)? The latter one seems to be more efficient to me. Does Cassandra
>> support that?
>>
>> For example, I saw the following code to do insertion (with Hector),
>>
>> Mutator m = HFactory.createMutator(keyspace, stringSerializer);
>> //Mutator m =
>> HFactory.createMutator(keyspace,stringSerializer);
>> m.insert(p.getCassandraKey(), colFamily,
>> HFactory.createStringColumn("type",
>> p.getStringValue()));
>> m.insert(p.getCassandraKey(), colFamily,
>> HFactory.createColumn("data",
>> p.getCompressedXML(), StringSerializer.get(),
>> BytesArraySerializer.get()));
>>
>> Will the insertions be two separate calls to Cassandra? Or they are just
>> one transaction? If it is the former case, is there any way to make them as
>> one call to Cassandra?
>>
>> 2) How to store a list/array of data in Cassandra? For example, I have a
>> data field called categories, which include none or many categories and each
>> category includes a category id and a category description. Usually, how do
>> people handle this scenario when they use Cassandra?
>>
>> Thanks in advance,
>>
>> John
>>
>
>


Re: Appending to fields

2011-05-31 Thread Victor Kabdebon
As Jonathan stated I believe that the insert is in O(N + M), unless there
are some operations that I don't know.

There are other NoSQL database that  can be used with Cassandra as "buffers"
for quick access and modification and then after the content can be dumped
into Cassandra for long term storage. Here is an example with Redis :

http://redis.io/commands/append
The "append" command is said to be in O(1) but it is a little bit suspicious
to me...

Best regards,
Victor Kabdebon
http://www.voxnucleus.fr

2011/5/31 Jonathan Ellis 

> On Tue, May 31, 2011 at 2:22 PM, Marcus Bointon
>  wrote:
> > mysql reads the entire value of y, appends the data, then writes the
> whole thing back, which unfortunately is an O(n^2) operation.
>
> Actually, this analysis is incorrect. Appending M bytes to N is O(N +
> M) which isn't the same as N^2 at all.
>
> At least in Cassandra, nor can I think of any possible algorithm which
> would allow MySQL to achieve N^2, but I don't claim to be an expert
> there.
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>


Re: When should I use Solandra?

2011-06-04 Thread Victor Kabdebon
Why do you need Solandra for storing data ? If you want to retrieve data
simply use Cassandra. Solandra is for research and indexing it is a search
engine. I do not recommand you to store data uniquely in a search engine.

Use the following desgin :

*Store ALL data in Cassandra then extract from Cassandra only the data you
need to index in Solandra. For what it matters you can use Solr instead of
Solandra. In SolR you have something called schema.xml where you can set up
which fields to index. My advice is do not store you passwords in plain
text. Add salt (random sequence) AND hash it then insert the bytes in
Cassandra. Otherwise you'll end up like Sony and a massive lawsuit when
hackers will breach in your website and steal the passwords.*

If you really want to use Solandra I guess there is an equivalent to the
schema.xml where you have lines to tell wether or not to index some fields.

Victor Kabdebon
http://www.victorkabdebon.com


2011/6/4 Jean-Nicolas Boulay Desjardins 

> Hi,
>
> I am planning to use Cassandra to store my users passwords and at the same
> time data for my website that need to be accessible via search. My Question
> is should I use two DB: Cassandra (for users passwords) and Solandra (for
> the websites data) or can I put everything in Solandra?
>
> Is there a way to stop Solandra from indexing my users passwords?
>
> Thanks in advance for any help.
>


Re: When should I use Solandra?

2011-06-05 Thread Victor Kabdebon
Again I don't really know the specifics of Solandra but in Solr (so Solandra
being a cousin of Solr it should be true too) you have XML fields like this
:

Just turn indexed to false and it's not going to be indexed...

Thrift won't affect Solandra at all.

2011/6/4 Jean-Nicolas Boulay Desjardins 

> Hi,
>
> So if I understand Solandra.
>
> All the data are in Solandra and you can query them like you would normaly
> with a normal Cassandra setup and search through them.
>
> The data from the indexing of Solr is stored in Cassandra column family...
>
> Second, question. I have Thrift already install will it affect my setup of
> Solandra?
>
> Thrid question: my passwords (yes, I know, I will hash them :) I am no
> Sony) even hash I don't want them to be index by Solr in Solandra; is there
> a way to stop Solandra from indexing the hash passwords or any other data or
> should I put this information in another DB?
>
> Is Solandra as stable as Cassandra?
>
> Sorry, I am just EXTREMELY curious! :)
>
> Thanks allot for your time and help!
>
> On Sat, Jun 4, 2011 at 8:29 PM, Jake Luciani  wrote:
>
>> On Saturday, June 4, 2011, Kirk Peterson  wrote:
>> > I think the OP was asking if you can use the same Cassandra cluster
>> that Solandra is integrated with to store non-Solandra in a different
>> keyspace. This would remove the need to run two Cassandra clusters, one for
>> storing his Solandra index, and another for his other data.
>> >
>>
>> Yes. Both services are running. Cassandra thrift and solr
>>
>> > I'm not sure if Solandra supports this, but I would start by checking to
>> see if the Cassandra thrift daemon is binding when running the Solandra
>> server. If the thrift daemon for cassandra is available , then there is a
>> good chance (albeit, I'm not sure how you would configure it) that it would
>> be possible, so long as you didn't mess with the Solandra keyspace.
>> >
>> > cheers,
>> > -kirk
>> >
>> >
>> > On Sat, Jun 4, 2011 at 11:57 AM, Norman Maurer <
>> norman.mau...@googlemail.com> wrote:
>> >
>> > Are you sure you really need cassandra for this ? For me it sounds
>> > like mysql or other databases would be a better fit for you (if you
>> > don't need to store a very hugh amount of data...)
>> >
>> > Bye,
>> > Norman
>> >
>> > 2011/6/4 Jean-Nicolas Boulay Desjardins :
>> >> Hi,
>> >> I am planning to use Cassandra to store my users passwords and at the
>> same
>> >> time data for my website that need to be accessible via search. My
>> Question
>> >> is should I use two DB: Cassandra (for users passwords) and Solandra
>> (for
>> >> the websites data) or can I put everything in Solandra?
>> >> Is there a way to stop Solandra from indexing my users passwords?
>> >> Thanks in advance for any help.
>> >
>> >
>> > --
>> > ⑆gmail.com⑆necrobious⑈
>> >
>> >
>>
>> --
>> http://twitter.com/tjake
>>
>
>
>
> --
> Name / Nom: Boulay Desjardins, Jean-Nicolas
> Website / Site Web: www.jeannicolas.com
>


Re: New web client & future API

2011-06-14 Thread Victor Kabdebon
Hello Markus,

Actually from what I understood (please correct me if I am wrong) CQL is
based on Thrift / Avro.

Victor Kabdebon

2011/6/14 Markus Wiesenbacher | Codefreun.de 

>
> Hi,
>
> what is the future API for Cassandra? Thrift, Avro, CQL?
>
> I just released an early version of my web client 
> (<http://www.codefreun.de/apollo>
> http://www.codefreun.de/apollo) which is Thrift-based, and therefore I
> would like to know what the future is ...
>
> Many thanks
> MW
>


Re: New web client & future API

2011-06-15 Thread Victor Kabdebon
Ok thanks for the update. I thought the query string was translated to
Thrift, then send to a server.

Victor Kabdebon

2011/6/15 Eric Evans 

> On Tue, 2011-06-14 at 09:49 -0400, Victor Kabdebon wrote:
> > Actually from what I understood (please correct me if I am wrong) CQL
> > is based on Thrift / Avro.
>
> In this project, we tend to use the word "Thrift" as a sort of shorthand
> for "Cassandra's RPC interface", and not, "The serialization and RPC
> framework from the Apache Thrift project".
>
> CQL does not (yet )have its own networking protocol, so it uses Thrift
> as a means of delivering queries, and serializing the results, but it is
> *not* a wrapper around the existing RPC methods.  The query string you
> provide is parsed entirely on the server.
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: solandra or pig or....?

2011-06-21 Thread Victor Kabdebon
I can speak for what I know :

Pig I have taken only a quick look and maybe some guys from Twitter can
answer better than me on that particular program. Pig is not for "on demand"
queries: they are quite slow and as you said you extract relevant
information and append it to another CF where you can retrieve quickly the
statistics.

SolR is purely a search engine. It is not only text based but also time
based etc... To do statistics you need mathematical operations, statistics,
SolR won't provide that. It can do simple things in terms of statistics but
mostly it is a search engine.

Personally for what you are asking I would use Pig and stock that in CF. I
would update those CF regularly. For simple statistics you can generate them
with your favorite language or a specialized language such as R as long as
it concerns small sets.

Hope it helps,
Victor Kabdebon

2011/6/21 Sasha Dolgy 

> Folks,
>
> Simple question ... Assuming my current use case is the ability to log
> lots of trivial and seemingly useless sports statistics ... I want a
> user to be able to query / compare  For example:
>
> --> Show me all baseball players in cheektowaga and ontario,
> california who have hit a grandslam on tuesdays where it was just a
> leap year.
>
> Each baseball player is represented by a single row in a CF:
>
> player_uuid, fullname, hometown, game1, game2, game3, game4
>
> Game's are UUID's that are a reference to another row in the same CF
> that provides information about that game...
>
> location, final score, date (unix timestamp or ISO format) , and
> statitics which are represented as a new column timestamp:player_uuid
>
> I can use PIG, as I understand, to run a query to generate specific
> information about specific "things" and populate that data back into
> Cassandra in another CF ... similar to the hypothetical search
> aboveas the information is structured already, i assume PIG is the
> right tool for the job, but may not be ideal for a web application and
> enabling ad-hoc queries ... it could take anywhere from 2-?
> seconds for that query to generate, populate, and return to the
> user...?
>
> On the other hand, I have started to read about Solr / Solandra /
> Lucandra  can this provide similar functionality or better ?  or
> is it more geared towards full text search and indexing ...
>
> I don't want to get into the habit of guessing what my potential users
> want to search for ... trying to think of ways to offload this to
> them.
>
>
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>


Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

2010-12-18 Thread Victor Kabdebon
Hello everybody,

I actually have the exact same problem. I have very little amount of data (
few hundred kb) and the memory consumption goes up without any end. in
sight. For
On my node I have limited ram ( 2 Gb) to run cassandra, but since I have
very little data, I fought it was not a problem, here is the result of $du :

vic...@:~$ du /opt/cassandra/data/ -h
40K/opt/cassandra/data/system
1,7M/opt/cassandra/data/FallingDown
1,7M/opt/cassandra/data/

Now, if I look at :
vic...@:~$ sudo ps aux | grep "cassandra"
cassandra 11034  0.2 22.9 *1107772 462764* ?  Sl   Dec17   6:13
/usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
org.apache.cassandra.thrift.CassandraDaemon

Cassandra uses 462764 Kb, roughly 460 Mb for 2 Mb of data... And it keeps
getting bigger.
It is important to know that I have just a few insert, quite a lot of read
though. Also Cassandra seams to completly ignore the JVM limitations such as
Xmx.
If I don't stop and launch Cassandra every 15 ou 20 days it simply crashes,
due to oom errors.

Is there an explanation for this ?

Thank you all,
Victor

2010/12/18 Zhu Han 

> Here is a typo, sorry...
>
> best regards,
> hanzhu
>
>
> On Sun, Dec 19, 2010 at 10:29 AM, Zhu Han  wrote:
>
>> The problem seems still like the C-heap of JVM, which leaks 70MB every
>> day. Here is the summary:
>>
>> on 12/19: 010c3000 178548K rw---[ anon ]
>> on 12/18: 010c3000 110320K rw---[ anon ]
>> on 12/17: 010c3000  39256K rw---[ anon ]
>>
>> This should not be the JVM object heap, because the object heap size is
>> fixed up per the below JVM settings. Here is the map of JVM object heap,
>> which remains constant.
>>
>> 010c3000  39256K rw---[ anon ]
>>
>
> It should be :
> 2b58433c 1069824K rw---[ anon ]
>
>
>>
>> I'll paste it to open-jdk mailist to seek for help.
>>
>> Zhu,
>>> Couple of quick questions:
>>>  How many threads are in your JVM?
>>>
>>
>> There are hundreds of threads. Here is the settings of Cassandra:
>> 1)  *8
>>   128*
>>
>> The thread stack size on this server is 1MB. So I observe hundreds of
>> single mmap segment as 1MB.
>>
>>  Can you also post the full commandline as well?
>>>
>> Sure. All of them are default settings.
>>
>> /usr/bin/java -ea -Xms1G -Xmx1G -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
>> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
>> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
>> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8080
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> -Dstorage-config=bin/../conf -cp
>> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.8.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
>> org.apache.cassandra.thrift.CassandraDaemon
>>
>>
>>>  Also, output of cat /proc/meminfo
>>>
>>
>> This is an openvz based testing environment. So /proc/meminfo is not very
>> helpful. Whatever, I paste it here.
>>
>>
>> MemTotal:  9838380 kB
>> MemFree:   4005900 kB
>> Buffers: 0 kB
>> Cached:  0 kB
>> SwapCached:  0 kB
>> Active:  0 kB
>> Inactive:0 kB
>> HighTotal:   0 kB
>> HighFree:0 kB
>> LowTotal:  9838380 kB
>> LowFree:   4005900 kB
>> SwapTotal:

Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables)

2010-12-19 Thread Victor Kabdebon
Hello Peter,

So more information on that problem :
Yes I am using this node with very few data, it is used to design requests
so I don't need a very large dataset.
I am running Apache Cassandra 0.6.6 on a Debian Stable, with java version
"1.6.0_22".

I recently restarted cassandra, thus I have this low memory use, but if I
keep it running for 2 or 3 weeks then Cassandra will take about 1.5 Gb. Here
is the result of the command, one day after the previous one :

vic...@***:~$ sudo ps aux | grep "cassandra"

root 11034  0.2 26.8 1167304* 540176* ?  Sl   Dec17   8:09
/usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
org.apache.cassandra.thrift.CassandraDaemon

I have done very little work on it (a few insert and reads).

Thank you,
Victor

2010/12/19 Peter Schuller 

> > vic...@:~$ sudo ps aux | grep "cassandra"
> > cassandra 11034  0.2 22.9 1107772 462764 ?  Sl   Dec17   6:13
> > /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> > -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
> >
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar
> > org.apache.cassandra.thrift.CassandraDaemon
> >
> > Cassandra uses 462764 Kb, roughly 460 Mb for 2 Mb of data... And it keeps
> > getting bigger.
> > It is important to know that I have just a few insert, quite a lot of
> read
> > though. Also Cassandra seams to completly ignore the JVM limitations such
> as
> > Xmx.
> > If I don't stop and launch Cassandra every 15 ou 20 days it simply
> crashes,
> > due to oom errors.
>
> The resident size is not unexpected given that your Xmx is 512 MB. The
> virtual may or may not be expected depending; for example thread
> stacks as previously discussed in this thread.
>
> If you're not seeing the *resident* set size go above the maximum heap
> size, you're unlikely to be seeing the same problem.
>
> WIth respect to OOM, see
> http://www.riptano.com/docs/0.6/operations/tuning - but without more
> information it's difficult to know what specifically it is that you're
> hitting. Are you seriously saying you're running for 15-20 days with
> only 2 mb of live data?
>
> --
> / Peter Schuller
>


Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Dear all,

In a project I would like to store "big" objects in columns, serialized. For
example entire images (several Ko to several Mo), flash animations (several
Mo) etc...
Does someone use Cassandra with those relatively big columns and if yes does
it work well ? Is there any drawbacks using this method ?

Thank you,
Victor K.


Re: Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Is there any recommanded maximum size for a Column ? (not the very upper
limit which is 2Gb)
Why is it useful to chunk the content into multiple columns ?

Thank you,
Victor K.

2011/1/13 Ryan King 

> On Thu, Jan 13, 2011 at 2:38 PM, Victor Kabdebon
>  wrote:
> > Dear all,
> > In a project I would like to store "big" objects in columns, serialized.
> For
> > example entire images (several Ko to several Mo), flash animations
> (several
> > Mo) etc...
> > Does someone use Cassandra with those relatively big columns and if yes
> does
> > it work well ? Is there any drawbacks using this method ?
>
> I haven't benchmarked this myself, but I think you'll want to chunk
> your content into multiple columns in the same row.
>
> -ryan
>


Re: Storing big objects into columns

2011-01-13 Thread Victor Kabdebon
Ok thank you very much for these information !
If somebody has more insights on this matter I am still interested !

Victor K.

2011/1/13 Ryan King 

> On Thu, Jan 13, 2011 at 2:44 PM, Victor Kabdebon
>  wrote:
> > Is there any recommanded maximum size for a Column ? (not the very upper
> > limit which is 2Gb)
> > Why is it useful to chunk the content into multiple columns ?
>
> I think you're going to have to do some tests yourself.
>
> You want to chunk it so that you can pseudo-stream the content. You
> don't want to have to load the whole content at once.
>
> -ryan
>


Re: live data migration from mysql to cassandra

2011-01-14 Thread Victor Kabdebon
I personnally did it the other way around : from Cassandra to PostGreSQL, I
needed an hybrid system : Cassandra solidly holds all data while PostGreSQL
holds fewer data but request are simple and efficient ( with SELECT WHERE).
This is pretty easy once you master key browsing and iterating.

I think that Cassandra is a totally different design and Cassandra design is
tailored to once needs wether *SQL is more general. So all migrations are
different. Ruslan if  you are interested by "big" migrations you should
check Reddit's blog or Digg's blog. They switch from *SQL to Cassandra and
their hold a lot of data.

Best Regards,
Victor K.
http://www.voxnucleus.fr

2011/1/14 Edward Capriolo 

> On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov 
> wrote:
> > Hello
> >
> > Dear community please share your experience, home you make live(without
> > stop) migration from mysql or other RDBM to cassandra
> >
>
> There is no built in way to do this. I remember hearing at hadoop
> world this year that the hbase guys have a system to read mysql slave
> logs and replay into hbase. Since all the nosql community seems to do
> this maybe we can 'borrow' this idea.
>
> Edward
>


Re: live data migration from mysql to cassandra

2011-01-14 Thread Victor Kabdebon
gosh, sorry for the mistakes I am tired !

Victor K.

2011/1/14 Victor Kabdebon 

> I personnally did it the other way around : from Cassandra to PostGreSQL, I
> needed an hybrid system : Cassandra solidly holds all data while PostGreSQL
> holds fewer data but request are simple and efficient ( with SELECT WHERE).
> This is pretty easy once you master key browsing and iterating.
>
> I think that Cassandra is a totally different design and Cassandra design
> is tailored to once needs wether *SQL is more general. So all migrations are
> different. Ruslan if  you are interested by "big" migrations you should
> check Reddit's blog or Digg's blog. They switch from *SQL to Cassandra and
> their hold a lot of data.
>
> Best Regards,
> Victor K.
> http://www.voxnucleus.fr
>
> 2011/1/14 Edward Capriolo 
>
> On Fri, Jan 14, 2011 at 10:40 AM, ruslan usifov 
>> wrote:
>> > Hello
>> >
>> > Dear community please share your experience, home you make live(without
>> > stop) migration from mysql or other RDBM to cassandra
>> >
>>
>> There is no built in way to do this. I remember hearing at hadoop
>> world this year that the hbase guys have a system to read mysql slave
>> logs and replay into hbase. Since all the nosql community seems to do
>> this maybe we can 'borrow' this idea.
>>
>> Edward
>>
>
>


Re: Do you have a site in production environment with Cassandra? What client do you use?

2011-01-14 Thread Victor Kabdebon
Same here Hector + java

Best Regards,
Victor K

2011/1/14 Ran Tavory 

> Java
> On Jan 14, 2011 8:25 PM, "Ertio Lew"  wrote:
> > what is the technology stack do you use?
> >
> > On 1/14/11, Ran Tavory  wrote:
> >> I use Hector, if that counts. ..
> >> On Jan 14, 2011 7:25 PM, "Ertio Lew"  wrote:
> >>> Hey,
> >>>
> >>> If you have a site in production environment or considering so, what
> >>> is the client that you use to interact with Cassandra. I know that
> >>> there are several clients available out there according to the
> >>> language you use but I would love to know what clients are being used
> >>> widely in production environments and are best to work with(support
> >>> most required features for performance).
> >>>
> >>> Also preferably tell about the technology stack for your applications.
> >>>
> >>> Any suggestions, comments appreciated ?
> >>>
> >>> Thanks
> >>> Ertio
> >>
>


Re: Cassandra in less than 1G of memory?

2011-01-14 Thread Victor Kabdebon
Dear rajat,

Yes it is possible, I have the same constraints. However I must warn you,
from what I see Cassandra memory consumption is not bounded in 0.6.X on
debian 64 Bit

Here is an example of an instance launch in a node :

root 19093  0.1 28.3 1210696 *570052* ?  Sl   Jan11   9:08
/usr/bin/java -ea -Xms128M *-Xmx512M *-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
org.apache.cassandra.thrift.CassandraDaemon

Look at the second bold value, Xmx indicates the maximum memory that
cassandra can use; it is set to be 512, so it could easily fit into 1 Gb.
Now look at the first one, 570Mb > 512 Mb. Moreover if I come back in one
day the first value will be even higher. Probably around 610 Mb. Actually it
increases to the point where I need to restart it otherwise other program
are shut down by Linux for cassandra to further expand its memory usage...

By the way it's a call to other cassandra users, am I the only one to
encounter this problem ?

Best regards,

Victor K.

2011/1/14 Rajat Chopra 

> Hello.
>
>
>
> According to  JVM heap size topic at
> http://wiki.apache.org/cassandra/MemtableThresholds , Cassandra would need
> atleast 1G of memory to run. Is it possible to have a running Cassandra
> cluster with machines that have less than that memory… say 512M?
>
> I can live with slow transactions, no compactions etc, but do not want an
> OutOfMemory error. The reason for a smaller bound for Cassandra is that I
> want to leave room for other processes to run.
>
>
>
> Please help with specific parameters to tune.
>
>
>
> Thanks,
>
> Rajat
>
>
>


Re: Cassandra in less than 1G of memory?

2011-01-14 Thread Victor Kabdebon
Hi Jonathan, hi Edward,

Jonathan : but it looks like mmaping wants to consume the entire memory of
my server. It goes up to 1.7 Gb for a ridiculously small amount of data.
Am I doing something wrong or is there something I should change to prevent
this never ending increase of memory consumption ?
Edward : I am not sure, I will try to see that tomorrow but my disk access
mode is standard, not mmap.

Anyway thank you very much,
Victor K.

PS : here is some hours after the result of ps aux | grep cassandra
root 19093  0.1 30.0 1243940 *605060* ?  Sl   Jan11  10:15
/usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
org.apache.cassandra.thrift.CassandraDaemon


2011/1/15 Jonathan Ellis 

> mmapping only consumes memory that the OS can afford to feed it.
>
> On Fri, Jan 14, 2011 at 7:29 PM, Edward Capriolo 
> wrote:
> > On Fri, Jan 14, 2011 at 2:13 PM, Victor Kabdebon
> >  wrote:
> >> Dear rajat,
> >>
> >> Yes it is possible, I have the same constraints. However I must warn
> you,
> >> from what I see Cassandra memory consumption is not bounded in 0.6.X on
> >> debian 64 Bit
> >>
> >> Here is an example of an instance launch in a node :
> >>
> >> root 19093  0.1 28.3 1210696 570052 ?  Sl   Jan11   9:08
> >> /usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> >> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> >> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
> >> -Dcom.sun.management.jmxremote.ssl=false
> >> -Dcom.sun.management.jmxremote.authenticate=false
> >> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
> >>
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
> >> org.apache.cassandra.thrift.CassandraDaemon
> >>
> >> Look at the second bold value, Xmx indicates the maximum memory that
> >> cassandra can use; it is set to be 512, so it could easily fit into 1
> Gb.
> >> Now look at the first one, 570Mb > 512 Mb. Moreover if I come back in
> one
> >> day the first value will be even higher. Probably around 610 Mb.
> Actually it
> >> increases to the point where I need to restart it otherwise other
> program
> >> are shut down by Linux for cassandra to further expand its memory
> usage...
> >>
> >> By the way it's a call to other cassandra users, am I the only one to
> >> encounter this problem ?
> >>
> >> Best regards,
> >>
> >> Victor K.
> >>
> >> 2011/1/14 Rajat Chopra 
> >>>
> >>>

Re: cass0.7: Creating colum family & Sorting

2011-01-16 Thread Victor Kabdebon
Comparator comparates only the column inside a Key.
Key sorting is done by your partitionner.


Best regards,
Victor Kabdebon

2011/1/16 kh jo 

> I am having some problems with creating column families and sorting them,
>
> I want to create a countries column family where I can get a sorted list of
> countries(by country's name)
>
> the following command fails:
>
> create column family Countries with comparator=LongType
> and column_metadata=[
> {column_name: cid, validation_class: LongType, index_type: KEYS},
> {column_name: cname, validation_class: UTF8Type},
> {column_name: code, validation_class: UTF8Type, index_type: KEYS}
> ];
>
> IT SHOWS: 'id' could not be translated into a LongType.
>
>
> the following works:
>
> create column family Countries with comparator=UTF8Type
> and column_metadata=[
> {column_name: cid, validation_class: LongType, index_type: KEYS},
> {column_name: cname, validation_class: UTF8Type},
> {column_name: code, validation_class: UTF8Type, index_type: KEYS}
> ];
>
>
> but when I insert some columns, they are not sorted as I want
>
> $countries = new ColumnFamily(Cassandra::con(), 'Countries');
> $countries->insert('Afghanistan', array('cid'=> '1', 'cname' =>
> 'Afghanistan', 'code' => 'AF'));
> $countries->insert('Germany', array('cid'=> '2', 'cname' => 'Germany',
> 'code' =>'DE'));
> $countries->insert('Zimbabwe', array('cid'=> '3', 'cname' => 'Zimbabwe',
> 'code' =>'ZM'));
>
> now:
> list Countries;
>
> shows:
> ---
> RowKey: Germany
> => (column=cid, value=2, timestamp=1295211346716047)
> => (column=cname, value=Germany, timestamp=1295211346716047)
> => (column=code, value=DE, timestamp=1295211346716047)
> ---
> RowKey: Zimbabwe
> => (column=cid, value=3, timestamp=1295211346713570)
> => (column=cname, value=Zimbabwe, timestamp=1295211346713570)
> => (column=code, value=ZM, timestamp=1295211346713570)
> ---
> RowKey: Afghanistan
> => (column=cid, value=1, timestamp=1295211346709448)
> => (column=cname, value=Afghanistan, timestamp=1295211346709448)
> => (column=code, value=AF, timestamp=1295211346709448)
>
>
> I don't see any sorting here?!
>
>


Re: Cassandra in less than 1G of memory?

2011-01-16 Thread Victor Kabdebon
If it's because of swapping made by Linux, wouldn't I only see the swap
memory consumption rise ? Because the problem is (apart from swap becoming
bigger and bigger) that cassandra ram memory consumption is going through
the roof.

However I want to give a try to the proposed method.

Thank you very much,
Best Regards,
Victor Kabdebon

PS : memory consumption :

root 19093  0.1 35.8 *1362108 722312* ?  Sl   Jan11  14:01
/usr/bin/java -ea -Xms128M -Xmx512M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
org.apache.cassandra.thrift.CassandraDaemon


2011/1/16 Aaron Morton 

> The OS will make it's best guess as to how much memory if can give over to
> mmapped files. Unfortunately it will not always makes the best decision, see
> the information on adding JNA and mlockall() support in cassandra 0.6.5
> http://www.datastax.com/blog/whats-new-cassandra-065
>
> <http://www.datastax.com/blog/whats-new-cassandra-065>As Jonathan says,
> try setting the disk mode to standard to see the difference.
>
> WRT the resident memory for the process, not all memory allocation is done
> on the heap. To see the non heap usage connect to the processing using
> JConsole and take a look at the Memory tab. For example on my box now
> Cassandra has 110M of heap memory and 20M of non heap. AFAIK memory such as
> the class definitions are not included in the heap memory usage.
>
> Hope that helps.
> Aaron
>
>
> On 15 Jan, 2011,at 08:03 PM, Victor Kabdebon 
> wrote:
>
> Hi Jonathan, hi Edward,
>
> Jonathan : but it looks like mmaping wants to consume the entire memory of
> my server. It goes up to 1.7 Gb for a ridiculously small amount of data.
> Am I doing something wrong or is there something I should change to prevent
> this never ending increase of memory consumption ?
> Edward : I am not sure, I will try to see that tomorrow but my disk access
> mode is standard, not mmap.
>
> Anyway thank you very much,
> Victor K.
>
> PS : here is some hours after the result of ps aux | grep cassandra
> root 19093  0.1 30.0 1243940 *605060* ?  Sl   Jan11  10:15
> /usr/bin/java -ea -Xms128M *-Xmx512M* -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
> -Dcom.sun.management.jmxremote.port=8081
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
> org.apache.cassandra.thrift.CassandraDaemon
>
>
> 2011/1/15 Jonathan Ellis 
>
>> mmapping only consumes memory that the OS can afford to feed it.
>>
>>
>

Re: Cassandra in less than 1G of memory?

2011-01-17 Thread Victor Kabdebon
Peter : What do you recommand ? using Aaron Morton solution and using JNA or
just disable mmap ? (Or is it the same and I missed something ?)

Thank you all for your advice, I am surprised to be the only one to have
this problem even if I'm using a pretty standard distribution.

Best regards,
Victor K.

2011/1/16 Peter Schuller 

> > bigger and bigger) that cassandra ram memory consumption is going through
> > the roof.
>
> mmap():ed memory will be counted as virtual address space.
>
> Disable mmap() and use standard I/O if you want to see how it behaves
> for real;' then if you want mmap() for performance you can re-enable
> it.
>
> --
> / Peter Schuller
>


Re: Secondary Index information

2011-01-28 Thread Victor Kabdebon
Dear Sasha,

I am currently thinking about using secondary index in the future. I have
seen two pros :
_Simplicity, it is "more simple" to query based on a second index than going
for a first column then a second.
_ "Consistency" : depending on where you store your inverted index, it may
be unavailable to query because of a node down, or an error allows you to
insert in the first column but then you crash and cannot insert into the
inverted your inverse information. Because of that you cannot query and you
have to periodically check the consistency of the data in the double column.
That's what I am doing right now for my applications and making it simpler
and more consistent would be great.

Remember : I don't know the details of the implementation, I take this
principle as if it was perfectly working.
But I am interested in experiences.

Best regards,
Victor K.
http://www.voxnucleus.fr

2011/1/28 Sasha Dolgy 

> Thank you.  So, after reading, I'm still unsure if this feature will
> afford me a larger benefit when compared to an inverted index
> solution.
>
> Has anyone done a pros / cons ?
>
> -sd
>
>
> On Fri, Jan 28, 2011 at 3:22 PM, Jake Luciani  wrote:
> > http://www.datastax.com/blog/whats-new-cassandra-07-secondary-indexes
> >
> > On Fri, Jan 28, 2011 at 7:15 AM, Sasha Dolgy 
> wrote:
> >>
> >> Hi there,
> >>
> >> Where can I find information regarding secondary indexes?  Spent the
> >> past 2 days looking for some good details.
> >>
> >> http://wiki.apache.org/cassandra/SecondaryIndexes doesn't yet exist,
> >> althought it's referenced from
> >> http://wiki.apache.org/cassandra/StorageConfiguration
> >>
> >> Trying to understand if this feature will afford me a larger benefit
> >> when compared to an inverted index solution.
> >>
> >> Thanks in advance,
> >> -sd
> >>
> >> --
> >> Sasha Dolgy
> >> @sdolgy
> >> sasha.do...@gmail.com
>


Re: Cassandra and count

2011-01-28 Thread Victor Kabdebon
Buddasystem is right.
A count returns columns to the client which count it. My advice : do not
count big columns / supercolumns. People in the dev team are trying to
develop distributed counters but I don't know the state of this research.

Best regards,
Victor Kabdebon
http://www.voxnucleus.fr

2011/1/28 buddhasystem 

>
> As far as I know, there are no aggregate operations built into Cassandra,
> which means you'll have to retrieve all of the data to count it in the
> client. I had a thread on this topic 2 weeks ago. It's pretty bad.
>
> --
> View this message in context:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-and-count-tp5969159p5970315.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
> Nabble.com.
>


Re: Using Cassandra to store files

2011-02-03 Thread Victor Kabdebon
Dear Brendan,

I would really be interested by your findings too. I need a system to store
various documents, I am thinking of Cassandra (that I am already using) or
using a second type of database or any other system. Maybe like dan
suggested, using mogilefs.

Thank you,
Victor Kabdebon
http://www.voxnucleus.fr

2011/2/3 Dan Kuebrich 

>
>> CouchDB
>>
> That's not what document-oriented means! (har har)
>
> I don't know all the details of your case, but with serving static files I
> suspect you could do ok with something that has a much smaller memory/cpu
> footprint as you won't have as great of write throughput / read latency
> concerns.  I've used mogilefs <http://www.danga.com/mogilefs/> for this
> before.
>
> --
>>
>> View this message in context:
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Using-Cassandra-to-store-files-tp5988698p5989122.html
>> Sent from the cassandra-u...@incubator.apache.org mailing list archive at
>> Nabble.com.
>>
>
>


Re: revisioned data

2011-02-05 Thread Victor Kabdebon
Hello Raj,

No it actually doesn't make sense from the point of view of Cassandra;
OrderingPartioner preserves the order of the *keys*. The Ordering will be
done according to the *supercolumn name*. In that case you can set the
ordering with compare_super_with (sorry I don't remember exactly the new
term in Cassandra, but that's the idea). The compare_with will order your
columns inside your supercolumn.

However, and I think that many will agree here, tend to avoid SuperColumn.
Rather than using SuperColumns try to think like that :

CF1 : "ObjectStore"
Key :ID (long)
Columns : {
name
other fields
update time (long [date])
...}

CF2 : "ObjectOrder"
Key : "myorderedobjects
Column:{
   { name : identifier that can be sorted
   value :ObjectID},
   ...
}

Best regards,
Victor Kabdebon,
http://www.voxnucleus.fr

2011/2/5 Raj Bakhru 

> Hi all -
>
> We're new to Cassandra and have read plenty on the data model, but we
> wanted to poll for thoughts on how to best handle this structure.
>
> We have simple objects that have and ID and we want to maintain a history
> of all the revisions.
>
> e.g.
> MyObject:
> ID (long)
> name
> other fields
> update time (long [date])
>
>
> Any time the object changes, we'll store down a new version of the object
> (same ID, but different update time and other fields).  We need to be able
> to query out what the object was as-of any time historically.  We also need
> to be able to query out what some or all of the items of this object type
> were as-of any time historically..
>
> In SQL, we'd just find the max(id) where update time < queried_as_of_time
>
> In Cassandra, we were thinking of modeling as follows:
>
> CF:  MyObjectType
> Super-Column: ID of object (e.g. 625)
> Column:  updatetime  (e.g. "1000245242")
> Value: byte[] of serialized object
>
> We were thinking of using the OrderingPartitioner and using range queries
> against the data.
>
> Does this make sense?  Are we approaching this in the wrong way?
>
> Thanks a lot
>
>
>
>


Re: unique key generation

2011-02-07 Thread Victor Kabdebon
Hello Kallin.
If you use timeUUID the chance to generate two time the same uuid is the
following :
considering that both client generate the uuid at the *same millisecond*,
the chance of generating the same uuid is :

1/1.84467441 × 1019Which is equal to the probability for winning a national
lottery for 1e11 days in a row ( for 270 million years).
Well if you do have a collision you should play the lottery :).

Best regards,
Victor Kabdebon
http://www.voxnucleus.fr

2011/2/7 Kallin Nagelberg 

> Hey,
>
> I am developing a session management system using Cassandra and need
> to generate unique sessionIDs (cassandra columnfamily keys). Does
> anyone know of an elegant/simple way to accomplish this? I am not sure
> about using time based uuids on the client as there a chance that
> multiple clients could generate the same ID. I've heard suggestions of
> using zookeeper as a source for the IDs, but was just hoping that
> there might be something simpler for my purposes.
>
> Thanks,
> -Kal
>


Cassandra memory consumption

2011-02-07 Thread Victor Kabdebon
Dear all,

Sorry to come back again to this point but I am really worried about
Cassandra memory consumption. I have a single machine that runs one
Cassandra server. There is almost no data on it but I see a crazy memory
consumption and it doesn't care at all about the instructions...
Note that I am not using mmap, but "Standard", I use also JNA (inside lib
folder), i am running on debian 5 64 bits, so a pretty normal configuration.
I also use Cassandra 0.6.8.


Here are the informations I gathered on Cassandra :

105  16765  0.1 34.1 1089424* 687476* ?  Sl   Feb02  14:58
/usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
org.apache.cassandra.thrift.CassandraDaemon

result of nodetool info :

116024732779488843382476400091948985708
*Load : 1,94 MB*
Generation No: 1296673772
Uptime (seconds) : 467550
*Heap Memory (MB) : 120,26 / 253,94*


I have about 21 column families, none of them have a lot of information ( as
you see I have 2 Mb of text which is really small). Even if I set Xmx at 256
there is 687M of memory used. Where does this memory come from ? Bad garbage
collection ? Something that I ignore ?
Thank you for your help I really need to get rid of that problem.

Best regards,
Victor Kabdebon


Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
It is really weird that I am the only one to have this issue.
I restarted Cassandra today and already the memory compution is over the
limit :

root  1739  4.0 24.5 664968 *494996* pts/4   SLl  15:51   0:12
/usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dstorage-config=bin/../conf -cp
bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
org.apache.cassandra.thrift.CassandraDaemon

It is really an annoying problem if we cannot really foresee memory
consumption.

Best regards,
Victor K

2011/2/8 Victor Kabdebon 

> Dear all,
>
> Sorry to come back again to this point but I am really worried about
> Cassandra memory consumption. I have a single machine that runs one
> Cassandra server. There is almost no data on it but I see a crazy memory
> consumption and it doesn't care at all about the instructions...
> Note that I am not using mmap, but "Standard", I use also JNA (inside lib
> folder), i am running on debian 5 64 bits, so a pretty normal configuration.
> I also use Cassandra 0.6.8.
>
>
> Here are the informations I gathered on Cassandra :
>
> 105  16765  0.1 34.1 1089424* 687476* ?  Sl   Feb02  14:58
> /usr/bin/java -ea* -Xms128M* *-Xmx256M* -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
> -Dcom.sun.management.jmxremote.port=8081
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
> org.apache.cassandra.thrift.CassandraDaemon
>
> result of nodetool info :
>
> 116024732779488843382476400091948985708
> *Load : 1,94 MB*
> Generation No: 1296673772
> Uptime (seconds) : 467550
> *Heap Memory (MB) : 120,26 / 253,94*
>
>
> I have about 21 column families, none of them have a lot of information (
> as you see I have 2 Mb of text which is really small). Even if I set Xmx at
> 256 there is 687M of memory used. Where does this memory come from ? Bad
> garbage collection ? Something that I ignore ?
> Thank you for your help I really need to get rid of that problem.
>
> Best regards,
> Victor Kabdebon
>


Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Sorry Jonathan :

So most of these informations were taken using the command :

sudo ps aux | grep cassandra

For the nodetool information it is :

/bin/nodetool --host localhost --port 8081 info


Regars,

Victor K.


2011/2/8 Jonathan Ellis 

> I missed the part where you explained where you're getting your numbers
> from.
>
> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon
>  wrote:
> > It is really weird that I am the only one to have this issue.
> > I restarted Cassandra today and already the memory compution is over the
> > limit :
> >
> > root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> > -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
> > -Dcom.sun.management.jmxremote.ssl=false
> > -Dcom.sun.management.jmxremote.authenticate=false
> > -Dstorage-config=bin/../conf -cp
> >
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
> > org.apache.cassandra.thrift.CassandraDaemon
> >
> > It is really an annoying problem if we cannot really foresee memory
> > consumption.
> >
> > Best regards,
> > Victor K
> >
> > 2011/2/8 Victor Kabdebon 
> >>
> >> Dear all,
> >>
> >> Sorry to come back again to this point but I am really worried about
> >> Cassandra memory consumption. I have a single machine that runs one
> >> Cassandra server. There is almost no data on it but I see a crazy memory
> >> consumption and it doesn't care at all about the instructions...
> >> Note that I am not using mmap, but "Standard", I use also JNA (inside
> lib
> >> folder), i am running on debian 5 64 bits, so a pretty normal
> configuration.
> >> I also use Cassandra 0.6.8.
> >>
> >>
> >> Here are the informations I gathered on Cassandra :
> >>
> >> 105  16765  0.1 34.1 1089424 687476 ?  Sl   Feb02  14:58
> >> /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> >> -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
> >> -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
> >> -Dcom.sun.management.jmxremote.ssl=false
> >> -Dcom.sun.management.jmxremote.authenticate=false
> >> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
> >>
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
> >> org.apache.cassandra.thrift.CassandraDaemon
> >>
> >> result of nodetool info :
> >>
> >> 116024732779488843382476400091948985708
> >> Load : 1,94 MB
> >> Generation No: 1296673772
> >> Uptime (seconds) : 467550
> >> Heap Memory (MB) : 120,26 / 253,94
> >>
&

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Information on the system :

*Debian 5*
*Jvm :*
victor@testhost:~/database/apache-cassandra-0.6.6$ java -version
java version "1.6.0_22"
Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)

*RAM :* 2Go


2011/2/8 Victor Kabdebon 

> Sorry Jonathan :
>
> So most of these informations were taken using the command :
>
> sudo ps aux | grep cassandra
>
> For the nodetool information it is :
>
> /bin/nodetool --host localhost --port 8081 info
>
>
> Regars,
>
> Victor K.
>
>
> 2011/2/8 Jonathan Ellis 
>
> I missed the part where you explained where you're getting your numbers
>> from.
>>
>> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon
>>  wrote:
>> > It is really weird that I am the only one to have this issue.
>> > I restarted Cassandra today and already the memory compution is over the
>> > limit :
>> >
>> > root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
>> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1
>> > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
>> > -XX:+HeapDumpOnOutOfMemoryError -Dcom.sun.management.jmxremote.port=8081
>> > -Dcom.sun.management.jmxremote.ssl=false
>> > -Dcom.sun.management.jmxremote.authenticate=false
>> > -Dstorage-config=bin/../conf -cp
>> >
>> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
>> > org.apache.cassandra.thrift.CassandraDaemon
>> >
>> > It is really an annoying problem if we cannot really foresee memory
>> > consumption.
>> >
>> > Best regards,
>> > Victor K
>> >
>> > 2011/2/8 Victor Kabdebon 
>> >>
>> >> Dear all,
>> >>
>> >> Sorry to come back again to this point but I am really worried about
>> >> Cassandra memory consumption. I have a single machine that runs one
>> >> Cassandra server. There is almost no data on it but I see a crazy
>> memory
>> >> consumption and it doesn't care at all about the instructions...
>> >> Note that I am not using mmap, but "Standard", I use also JNA (inside
>> lib
>> >> folder), i am running on debian 5 64 bits, so a pretty normal
>> configuration.
>> >> I also use Cassandra 0.6.8.
>> >>
>> >>
>> >> Here are the informations I gathered on Cassandra :
>> >>
>> >> 105  16765  0.1 34.1 1089424 687476 ?  Sl   Feb02  14:58
>> >> /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC
>> >> -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1
>> >> -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly
>> >> -XX:+HeapDumpOnOutOfMemoryError
>> -Dcom.sun.management.jmxremote.port=8081
>> >> -Dcom.sun.management.jmxremote.ssl=false
>> >> -Dcom.sun.management.jmxremote.authenticate=false
>> >> -Dstorage-config=bin/../conf -Dcassandra-foreground=yes -cp
>> >>
>> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-0.6.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/../lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
I will do that in the future and I will post my results here ( I upgraded
the server to debian 6 to see if there is any change, so memory is back to
normal). I will report in a few days.
In the meantime I am open to any suggestion...

2011/2/8 Aaron Morton 

> When you attach to the JVM with JConsole how much non heap memory and how
> much heap memory is reported on the memory tab?
>
> Xmx controls the total size of the heap memory, which excludes the
> permanent generation.
> see
>
> http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizing
> and
>
> http://blogs.suncom/jonthecollector/entry/presenting_the_permanent_generation<http://blogs.sun.com/jonthecollector/entry/presenting_the_permanent_generation>
>
> <http://blogs.sun.com/jonthecollector/entry/presenting_the_permanent_generation>
> Total non-heap memory on a 0.7 box I have is around 27M. You numbers seem
> large but it would be interesting to know what the JVM is reporting.
>
> Aaron
>
> On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon 
> wrote:
>
> Information on the system :
>
> *Debian 5*
> *Jvm :*
> victor@testhost:~/database/apache-cassandra-0.6.6$ java -version
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
>
> *RAM :* 2Go
>
>
> 2011/2/8 Victor Kabdebon 
>
>> Sorry Jonathan :
>>
>> So most of these informations were taken using the command :
>>
>> sudo ps aux | grep cassandra
>>
>> For the nodetool information it is :
>>
>> /bin/nodetool --host localhost --port 8081 info
>>
>>
>> Regars,
>>
>> Victor K.
>>
>>
>> 2011/2/8 Jonathan Ellis 
>>
>>
>> I missed the part where you explained where you're getting your numbers
>>> from.
>>>
>>>
>>> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon
>>>  wrote:
>>> > It is really weird that I am the only one to have this issue.
>>> > I restarted Cassandra today and already the memory compution is over
>>> the
>>> > limit :
>>> >
>>> > root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
>>> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC
>>> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>>> -XX:MaxTenuringThreshold=1
>>> > -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:+UseCMSInitiatingOccupancyOnly
>>> > -XX:+HeapDumpOnOutOfMemoryError
>>> -Dcom.sun.management.jmxremote.port=8081
>>> > -Dcom.sun.management.jmxremotessl=false
>>>
>>> > -Dcom.sun.management.jmxremote.authenticate=false
>>> > -Dstorage-config=bin/../conf -cp
>>> >
>>> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-06.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/./lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:bin/../lib/slf4j-api-1.5.8.jar:bin/../lib/slf4j-log4j12-1.5.8.jar:bin/../lib/uuid-3.1.jar
>>>
>>> > org.apache.cassandra.thrift.CassandraDaemon
>>> >
>>> > It is really an annoying problem if we cannot really foresee memory
>>> > consumption.
>>> >
>>> > Best regards,
>>> > Victor K
>>> >
>>> > 2011/2/8 Victor Kabdebon 
>>> >>
>>> >> Dear all,
>>> >>
>>> >> Sorry to come back again to this point but I am really worried about
>>> >> Cassandra memory consumption. I have a single machine that runs one
>>> >> Cassandra server. There is almost no data on it but I see a crazy
>>> memory
>>> >> consumption and it doesn't care at all about the instructions...
>>> >> Note that I am not using mmap, but "Standard", I use also JNA (inside
>>> lib
>>> >> folder), i am running on debian 5 64 bits, so a pretty normal
>>> configuration.

Re: Cassandra memory consumption

2011-02-08 Thread Victor Kabdebon
Yes I have, but I have to add that this is a server where there is so little
data (2.0 Mo of text, rougly a book) that even if there were an overhead due
to those things it would be minimal.
I don't understand what's eating up all that memory, is it because of Linux
that has difficulty getting rid of used memory ... I really am puzzled. (by
the way it is not a Amazon EC2 server this is a dedicated server).

Regards,
Victor K.

2011/2/8 Edward Capriolo 

> On Tue, Feb 8, 2011 at 4:56 PM, Victor Kabdebon
>  wrote:
> > I will do that in the future and I will post my results here ( I upgraded
> > the server to debian 6 to see if there is any change, so memory is back
> to
> > normal). I will report in a few days.
> > In the meantime I am open to any suggestion...
> >
> > 2011/2/8 Aaron Morton 
> >>
> >> When you attach to the JVM with JConsole how much non heap memory and
> how
> >> much heap memory is reported on the memory tab?
> >> Xmx controls the total size of the heap memory, which excludes the
> >> permanent generation.
> >> see
> >>
> >>
> http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#generation_sizing
> >> and
> >>
> >>
> http://blogs.suncom/jonthecollector/entry/presenting_the_permanent_generation
> >> Total non-heap memory on a 0.7 box I have is around 27M. You numbers
> seem
> >> large but it would be interesting to know what the JVM is reporting.
> >> Aaron
> >> On 09 Feb, 2011,at 05:57 AM, Victor Kabdebon  >
> >> wrote:
> >>
> >> Information on the system :
> >>
> >> Debian 5
> >> Jvm :
> >> victor@testhost:~/database/apache-cassandra-0.6.6$ java -version
> >> java version "1.6.0_22"
> >> Java(TM) SE Runtime Environment (build 1.6.0_22-b04)
> >> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03, mixed mode)
> >>
> >> RAM : 2Go
> >>
> >>
> >> 2011/2/8 Victor Kabdebon 
> >>>
> >>> Sorry Jonathan :
> >>>
> >>> So most of these informations were taken using the command :
> >>>
> >>> sudo ps aux | grep cassandra
> >>>
> >>> For the nodetool information it is :
> >>>
> >>> /bin/nodetool --host localhost --port 8081 info
> >>>
> >>>
> >>> Regars,
> >>>
> >>> Victor K.
> >>>
> >>>
> >>> 2011/2/8 Jonathan Ellis 
> >>>
> >>>> I missed the part where you explained where you're getting your
> numbers
> >>>> from.
> >>>>
> >>>>
> >>>> On Tue, Feb 8, 2011 at 9:32 AM, Victor Kabdebon
> >>>>  wrote:
> >>>> > It is really weird that I am the only one to have this issue.
> >>>> > I restarted Cassandra today and already the memory compution is over
> >>>> > the
> >>>> > limit :
> >>>> >
> >>>> > root  1739  4.0 24.5 664968 494996 pts/4   SLl  15:51   0:12
> >>>> > /usr/bin/java -ea -Xms128M -Xmx256M -XX:+UseParNewGC
> >>>> > -XX:+UseConcMarkSweepGC
> >>>> > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> >>>> > -XX:MaxTenuringThreshold=1
> >>>> > -XX:CMSInitiatingOccupancyFraction=75
> >>>> > -XX:+UseCMSInitiatingOccupancyOnly
> >>>> > -XX:+HeapDumpOnOutOfMemoryError
> >>>> > -Dcom.sun.management.jmxremote.port=8081
> >>>> > -Dcom.sun.management.jmxremotessl=false
> >>>> > -Dcom.sun.management.jmxremote.authenticate=false
> >>>> > -Dstorage-config=bin/../conf -cp
> >>>> >
> >>>> >
> bin/../conf:bin/../build/classes:bin/../lib/antlr-3.1.3.jar:bin/../lib/apache-cassandra-06.6.jar:bin/../lib/avro-1.2.0-dev.jar:bin/../lib/cassandra-javautils.jar:bin/../lib/clhm-production.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-collections-3.2.1.jar:bin/../lib/commons-io-1.4.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/commons-pool-1.5.4.jar:bin/../lib/google-collections-1.0.jar:bin/../lib/hadoop-core-0.20.1.jar:bin/../lib/hector-0.6.0-14.jar:bin/../lib/high-scale-lib.jar:bin/./lib/ivy-2.1.0.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-r917130.jar:bin/../lib/log4j-1.2.14.jar:bin/../lib/perf4j-0.9.12.jar:b

Re: unique key generation

2011-02-09 Thread Victor Kabdebon
Yes i have done a mistake I know ! But I hoped nobody would notice :).

It is the odds of winning 3 days in a row (standard probability fail). Still
it is totally unlikely

Sorry about this mistake,

Best regards,
Victor K.


Re: online chat scenario

2011-02-15 Thread Victor Kabdebon
Hello Sasha.

In this sort of real time application the way you insert (QUORUM, ONE,
etc..) and  the way you retrieve is extremely important because your data
may not have had the time to propagate to all your nodes. Be sure to use
adequate policies to do that : insert to a certain number of nodes but don't
sacrifice to much time doing that to keep the real time component.
Here is a presentation of how the chat is made in Facebook, it may be useful
to you :

http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf

It's more focused on erlang, but it might give you ideas on how to deal with
that problem (I am not sure that DB are the best way to deal with that...
but it's just my opinion).

Victor Kabdebon
http://www.voxnucleus.fr



2011/2/15 Sasha Dolgy 

> thanks for the response.  thinking about this, this would not allow for the
> sorting of messages into a chronological order for end user display.  i had
> thought about having each message as its own column against the room or the
> user, but i have had some inconsistencies in retrieving the data.  sometimes
> i get 3 columns, sometimes i get 50...( i think this is because of the
> random partitioner)
>
> i had thought about this structure:
>
> [messages][nickname][message id => message data]
> [chatrooms][room_name][message id]
>
> this way i can pull all messages a user ever posted, not specific to a
> room.  what i haven't been able to do so far is print the timestamp on the
> row or column.  does this have to be explicitly added somewhere or can it be
> returned as part of a 'get' request?
>
> -sd
>
>
> On Tue, Feb 15, 2011 at 2:12 PM, Michal Augustýn <
> augustyn.mic...@gmail.com> wrote:
>
>> The schema design depends on chatrooms/users/messages numbers. I.e. you
>> can have one CF, where key is chatroom, column name is username, column
>> value is the message and message time is the same as column timestamp.
>> You can add day-timestamp to the chatroom name to avoid large rows.
>>
>> Augi
>>
>> 2011/2/15 Andrey V. Panov 
>>
>> I never did it. But I suppose you can use "chatroom name" as key and store
>>> messages & nicks as columns in JSON and timestamp as columnName.
>>>
>>
>>
>
>
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>


Re: Subscribe

2011-02-15 Thread Victor Kabdebon
Looks like your wish has been granted.

2011/2/15 Chris Goffinet 

> I would like to subscribe to your newsletter.
>
> On Tue, Feb 15, 2011 at 8:04 AM, A J  wrote:
>
>>
>>
>


Re: Cassandra memory consumption

2011-02-16 Thread Victor Kabdebon
Yes I didn't see there was 2 different parameters. I was personally setting
( in cassandra 0.6.6 ) MemTableThoughputInMB, but I don't know what
BinaryMemtableThroughtputInMB is.

And I take this opportunity to ask a question :
If you have a small amount of data per key so that your memtable is maybe a
few Ko big. Is the memory footprint of the memtable going to be
MemTableThoughputInMB mb or few Ko + overhead ?

Ruslan I have seen your question in the other mail and I have the same
problem. How many CF do you have ?



2011/2/16 ruslan usifov 

>
> Each of your 21 column families will have its own memtable if you have
>> the default memtable settings your memory usage will grow quite large
>> over time. Have you tuned down your memtable size?
>>
>
> Which config parameter make this? binary_memtable_throughput_in_mb?
>


Re: Cassandra memory consumption

2011-02-16 Thread Victor Kabdebon
Someone please correct me if I am wrong, but I think the overhead you can
expect is something like :

16* MemTableThroughtPutInMB
 but I don't know when BinaryMemTableThroughputInMb come into account..

2011/2/16 ruslan usifov 

>
>
> 2011/2/16 Victor Kabdebon 
>
>
>>
>> Ruslan I have seen your question in the other mail and I have the same
>> problem. How many CF do you have ?
>>
>>
>> 16
>


Re: Cassandra memory consumption

2011-02-16 Thread Victor Kabdebon
Thanks robert, and do you know if there is a way to control the maximum
likely number of memtables ? (I'd like to cap it at 2)

2011/2/16 Robert Coli 

> On Wed, Feb 16, 2011 at 7:12 AM, Victor Kabdebon
>  wrote:
> > Someone please correct me if I am wrong, but I think the overhead you can
> > expect is something like :
> >
>
> MemTableThroughtPutInMB *  *  number of such memtables which might exist at once, due to flushing
> logic>
>
> JavaOverHeadFudgeFactor is "at least 2".
>
> The maximum likely number of such memtables is usually roughly "3"
> when considered across an assortment of columnfamilies with different
> write patterns.
>
> >  but I don't know when BinaryMemTableThroughputInMb come into account..
>
> BinaryMemTable options are only considered when using the Binary
> Memtable interface. If you don't know what that is, you're not using
> it.
>
> =Rob
>


Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Is it possible to change the maximum JVM heap memory use in 0.6.X ?

2011/2/17 Aaron Morton 

> What are you using for disk_access_mode ?
> Have you tried reducing the JVM head size?
> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to lock
> the JVM memory.
>
>
> Aaron
>
>
> On 17/02/2011, at 9:20 PM, ruslan usifov  wrote:
>
>
>
> 2011/2/16 Aaron Morton < aa...@thelastpickle.com>
>
>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh
>>
>> Memory mapped files will use additional virtual memory, is controlled in
>> conf/Cassandra.yaml disk_access_mode
>>
>>
> And??? JVM memory heap in cassandra 0.7 is by default half of memory is
> system in my case 4GB, here is a part of cassandra-env.sh:
>
> calculate_heap_size()
> {
> case "`uname`" in
> Linux)
> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
> return 0
> ;;
> FreeBSD)
> system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
> return 0
> ;;
> *)
> MAX_HEAP_SIZE=1024M
> return 1
> ;;
> esac
> }
>
>
>
> I set all this options by default. All my nodes have 8GB of memory. And i
> affraid that after some time all my nodes goes to hard swap, and only reboot
> help them :-(((
>
> PS: as i understand that down sometime of cassandra is normal?
>
>


Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Oh right but Cassandra doesn't really respect that, I thought there was
another option to set that.

Just for your information, I set xms and xmx very low with a small amount of
data. I am waiting to be able to connect jconsole, I don't know why it is
not reachable at the moment. Here is my result :


105  26115  0.2 27.3 1125328 755316 ?  Sl   Feb09  23:58
/usr/bin/java -ea -Xms64M -Xmx128M

2011/2/17 Aaron Morton 

> bin/cassandra.in.sh
> set Xms and Xmx in the JVM_OPTS
>
> Aaron
>
>
> On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon 
> wrote:
>
> Is it possible to change the maximum JVM heap memory use in 0.6.X ?
>
> 2011/2/17 Aaron Morton 
>
>> What are you using for disk_access_mode ?
>> Have you tried reducing the JVM head size?
>> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to
>> lock the JVM memory.
>>
>>
>> Aaron
>>
>>
>>
>> On 17/02/2011, at 9:20 PM, ruslan usifov  wrote:
>>
>>
>>
>>
>>
>> 2011/2/16 Aaron Morton < aa...@thelastpickle.com
>> >
>>
>>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh
>>>
>>> Memory mapped files will use additional virtual memory, is controlled in
>>> conf/Cassandra.yaml disk_access_mode
>>>
>>>
>> And??? JVM memory heap in cassandra 0.7 is by default half of memory is
>> system in my case 4GB, here is a part of cassandra-env.sh:
>>
>> calculate_heap_size()
>> {
>> case "`uname`" in
>> Linux)
>> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
>> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
>> return 0
>> ;;
>> FreeBSD)
>> system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
>> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
>> return 0
>> ;;
>> *)
>> MAX_HEAP_SIZE=1024M
>> return 1
>> ;;
>> esac
>> }
>>
>>
>>
>> I set all this options by default. All my nodes have 8GB of memory. And i
>> affraid that after some time all my nodes goes to hard swap, and only reboot
>> help them :-(((
>>
>> PS: as i understand that down sometime of cassandra is normal?
>>
>>
>


Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Sorry I forgot to say that this is the partial result of :
ps aux | grep cassandra

Best regards

2011/2/17 Victor Kabdebon 

> Oh right but Cassandra doesn't really respect that, I thought there was
> another option to set that.
>
> Just for your information, I set xms and xmx very low with a small amount
> of data. I am waiting to be able to connect jconsole, I don't know why it is
> not reachable at the moment. Here is my result :
>
>
> 105  26115  0.2 27.3 1125328 755316 ?  Sl   Feb09  23:58
> /usr/bin/java -ea -Xms64M -Xmx128M
>
>
> 2011/2/17 Aaron Morton 
>
>> bin/cassandra.in.sh
>> set Xms and Xmx in the JVM_OPTS
>>
>> Aaron
>>
>>
>> On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon 
>> wrote:
>>
>> Is it possible to change the maximum JVM heap memory use in 0.6.X ?
>>
>> 2011/2/17 Aaron Morton 
>>
>>> What are you using for disk_access_mode ?
>>> Have you tried reducing the JVM head size?
>>> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to
>>> lock the JVM memory.
>>>
>>>
>>> Aaron
>>>
>>>
>>>
>>> On 17/02/2011, at 9:20 PM, ruslan usifov 
>>> wrote:
>>>
>>>
>>>
>>>
>>>
>>> 2011/2/16 Aaron Morton < 
>>> aa...@thelastpickle.com>
>>>
>>>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh
>>>>
>>>> Memory mapped files will use additional virtual memory, is controlled in
>>>> conf/Cassandra.yaml disk_access_mode
>>>>
>>>>
>>> And??? JVM memory heap in cassandra 0.7 is by default half of memory is
>>> system in my case 4GB, here is a part of cassandra-env.sh:
>>>
>>> calculate_heap_size()
>>> {
>>> case "`uname`" in
>>> Linux)
>>> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
>>> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
>>> return 0
>>> ;;
>>> FreeBSD)
>>> system_memory_in_bytes=`sysctl hw.physmem | awk '{print $2}'`
>>> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
>>> return 0
>>> ;;
>>> *)
>>> MAX_HEAP_SIZE=1024M
>>> return 1
>>> ;;
>>> esac
>>> }
>>>
>>>
>>>
>>> I set all this options by default. All my nodes have 8GB of memory. And i
>>> affraid that after some time all my nodes goes to hard swap, and only reboot
>>> help them :-(((
>>>
>>> PS: as i understand that down sometime of cassandra is normal?
>>>
>>>
>>
>


Re: memory consuption

2011-02-17 Thread Victor Kabdebon
Already done. The disk access mode is standard in storage-conf.xml (I am
using 0.6.6 at the moment, I will upgrade to 0.7.x later). But this memory
consumption is a real issue.

2011/2/17 Aaron Morton 

> Looks like you are using virtual memory for memmapped files. Change the
> disk_access_mode to standard if you want to reduce the overall memory
> usage.
>
> Aaron
>
> On 18 Feb, 2011,at 09:34 AM, Victor Kabdebon 
> wrote:
>
> Sorry I forgot to say that this is the partial result of :
> ps aux | grep cassandra
>
> Best regards
>
> 2011/2/17 Victor Kabdebon 
>
>> Oh right but Cassandra doesn't really respect that, I thought there was
>> another option to set that.
>>
>> Just for your information, I set xms and xmx very low with a small amount
>> of data. I am waiting to be able to connect jconsole, I don't know why it is
>> not reachable at the moment. Here is my result :
>>
>>
>> 105  26115  0.2 273 1125328 755316 ?  Sl   Feb09  23:58
>> /usr/bin/java -ea -Xms64M -Xmx128M
>>
>>
>>
>> 2011/2/17 Aaron Morton 
>>
>>> bin/cassandra.in.sh
>>> set Xms and Xmx in the JVM_OPTS
>>>
>>> Aaron
>>>
>>>
>>>
>>> On 18 Feb, 2011,at 09:10 AM, Victor Kabdebon 
>>> wrote:
>>>
>>>
>>> Is it possible to change the maximum JVM heap memory use in 0.6.X ?
>>>
>>> 2011/2/17 Aaron Morton 
>>>
>>>> What are you using for disk_access_mode ?
>>>> Have you tried reducing the JVM head size?
>>>> Have you added the Jna.jar file to lib/ ? This will allow Cassandra to
>>>> lock the JVM memory.
>>>>
>>>>
>>>> Aaron
>>>>
>>>>
>>>>
>>>> On 17/02/2011, at 9:20 PM, ruslan usifov 
>>>> >
>>>> wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> 2011/2/16 Aaron Morton < 
>>>> aa...@thelastpickle.com>
>>>>
>>>>> JVM heap memory is controlled by the settings in conf/Cassandra-env.sh
>>>>>
>>>>> Memory mapped files will use additional virtual memory, is controlled
>>>>> in conf/Cassandra.yaml disk_access_mode
>>>>>
>>>>>
>>>> And??? JVM memory heap in cassandra 0.7 is by default half of memory is
>>>> system in my case 4GB, here is a part of cassandra-env.sh:
>>>>
>>>> calculate_heap_size()
>>>> {
>>>> case "`uname`" in
>>>> Linux)
>>>> system_memory_in_mb=`free -m | awk '/Mem:/ {print $2}'`
>>>> MAX_HEAP_SIZE=$((system_memory_in_mb / 2))M
>>>> return 0
>>>> ;;
>>>> FreeBSD)
>>>> system_memory_in_bytes=`sysctl hw.physmem | awk '{print
>>>> $2}'`
>>>> MAX_HEAP_SIZE=$((system_memory_in_bytes / 1024 / 1024 / 2))M
>>>> return 0
>>>> ;;
>>>> *)
>>>> MAX_HEAP_SIZE=1024M
>>>> return 1
>>>> ;;
>>>> esac
>>>> }
>>>>
>>>>
>>>>
>>>> I set all this options by default. All my nodes have 8GB of memory. And
>>>> i affraid that after some time all my nodes goes to hard swap, and only
>>>> reboot help them :-(((
>>>>
>>>> PS: as i understand that down sometime of cassandra is normal?
>>>>
>>>>
>>>
>>
>


Re: Abnormal memory consumption

2011-04-04 Thread Victor Kabdebon
And about the production 7Gb or RAM is sufficient ? Or 11 Gb is the minimum
?
Thank you for your inputs for the JVM I'll try to tune that


2011/4/4 Peter Schuller 

> > You can change VM settings and tweak things like memtable thresholds
> > and in-memory compaction limits to get it down and get away with a
> > smaller heap size, but honestly I don't recommend doing so unless
> > you're willing to spend some time getting that right and probably
> > repeating some of the work in the future with future versions of
> > Cassandra.
>
> That said, if you do want to do so to give it a try, I suggest (1)
> changing cassandra-env to remove all the GC stuff:
>
> VM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> And then setting a fixed heap size, and removing the manual fixation of new
> gen:
>
> JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"
>
> Then maybe remove the initial heap size enforcement, but that might
> not help depending:
>
> JVM_OPTS="$JVM_OPTS -Xms${MAX_HEAP_SIZE}"
>
> And then go through cassandra.yaml and tune down all the various
> limitations. Less concurrent readers/writers, all the *_mb_* settings
> way down, and the RPC framing limitations.
>
> But let me re-iterate: I don't recommend running in any such
> configuration in production. But if you just want it running for
> testing/for just being available, with no special requirements, and
> not in production, the above might work. I haven't really tested it
> myself; there may be gotchas involved.
>
> --
> / Peter Schuller
>


Re: database design

2011-04-13 Thread Victor Kabdebon
Dear Jean-Yves,

You can have a different approach of the problem.
You need on one side a relational database (MySQL, PostGreSQL) or SolR (as
an very efficient index) and on the other side Cassandra. The relational
database or SolR must contain the minimum amount of information possible : a
date and only the relevant data. It enabled me to keep a simple model for
Cassandra.
Cassandra will act as a "vault" where you keep all the data and then you
dispatch the data from Cassandra to the relational database or SolR. When
you want to query you query against SolR or the relational data the key /
column / supercolumn and you retrieve the complete data from Cassandra. The
hard thing is to maintain the coherence between the query part and the
Cassandra part.
I speak from personal experience but it was very hard for me to use only
Cassandra to do everything my (small amateur) website needed. Now I found an
alternative I use : Cassandra (data vault) + Redis (Sessions and other
volatile data) + SolR (Search engine) + PostGreSQL ( for relational
queries).

Best regards,
Victor Kabdebon
http://www.voxnucleus.fr

2011/4/13 Edward Capriolo 

> On Wed, Apr 13, 2011 at 10:39 AM, Jean-Yves LEBLEU 
> wrote:
> > Hi all,
> >
> > Just some thoughts and question I have about cassandra data modeling.
> >
> > If I understand well, cassandra is better on writing than on reading.
> > So you have to think about your queries to design cassandra schema. We
> > are doing incremental design, and already have our system in
> > production and we have to develop new queries.
> > How do you usualy do when you have new queries, do you write a
> > specific job to update data in the database to match the new query you
> > are writing ?
> >
> > Thanks for your help.
> >
> > Jean-Yves
> >
>
> Good point, Generally you will need to write some type of range
> scanning/map reduce application to process and back fill your data.
>


Re: CQL v1.0.0: why super column family not descirbed in it?

2011-05-05 Thread Victor Kabdebon
Hello Eric,

Compound columns seem to be a very interesting feature. Do you have any idea
in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X ?

Thanks,

Victor

2011/5/5 Eric Evans 

> On Thu, 2011-05-05 at 18:19 +0800, Guofeng Zhang wrote:
> > I read the CQL v1.0 document. There are operations about column
> > families, but it does not describe how to operate on super column
> > families. Why? Does this mean that super column families would not be
> > supported by CQL in this version? Will it be supported in the future?
>
> No CQL will never support super columns, but later versions (not 1.0.0)
> will support compound columns.  Compound columns are better; instead of
> a two-deep structure, you can have one of arbitrary depth.
>
> What you see is what you get for 1.0.0, there simply wasn't enough time
> to do everything (you have to start somewhere).
>
> --
> Eric Evans
> eev...@rackspace.com
>
>


Re: CQL v1.0.0: why super column family not descirbed in it?

2011-05-05 Thread Victor Kabdebon
Thank you, I will look into that and I will probably wait until there is an
"out of the box" comparator. But it's an excellent new feature !

Regards,
Victor K.

2011/5/5 Eric Evans 

> On Thu, 2011-05-05 at 10:49 -0400, Victor Kabdebon wrote:
> > Hello Eric,
> >
> > Compound columns seem to be a very interesting feature. Do you have any
> idea
> > in which Cassandra version it is going to be introduced : 0.8.X or 0.9.X
> ?
>
> You can use these today with a custom comparator[1].  There is an open
> issue[2] (marked as for-0.8.1) to ship one out-of-the-box.
>
> Language support[3] for CQL will probably take a bit longer.
>
> [1]: https://github.com/edanuff/CassandraCompositeType
> [2]: https://issues.apache.org/jira/browse/CASSANDRA-2231
> [3]: https://issues.apache.org/jira/browse/CASSANDRA-2474
>
> --
> Eric Evans
> eev...@rackspace.com
>
>