[ANN] Mojo's Cassandra Maven Plugin 0.7.0-1 released

2011-02-12 Thread Stephen Connolly
Hi,

The Mojo team is pleased to announce the release of Mojo's Cassandra
Maven Plugin version 0.7.0-1.

Mojo's Cassandra Plugin is used when you want to install and control a
test instance of Apache Cassandra from within your Apache Maven build.

The plugin has the following goals.

* cassandra:start Starts up a test instance of Cassandra in the background.
* cassandra:stop Stops the test instance of Cassandra that was started
using cassandra:start.
* cassandra:run Starts up a test instance of Cassandra in the foreground.
* cassandra:load Runs a cassandra-cli script against the test instance
of Cassandra.
* cassandra:repair Runs nodetool repair against the test instance of Cassandra.
* cassandra:flush Runs nodetool flush against the test instance of Cassandra.
* cassandra:compact Runs nodetool compact against the test instance of
Cassandra.
* cassandra:cleanup Runs nodetool cleanup against the test instance of
Cassandra.
* cassandra:delete Deletes the the test instance of Cassandra.

http://mojo.codehaus.org/cassandra-maven-plugin/

To use this version, simply specify the version in your project's
plugin configuration:


   org.codehaus.mojo
   cassandra-maven-plugin
   0.7.0-1


Release Notes

This is the first release of Mojo's Cassandra Maven Plugin

Enjoy,

The Mojo team.

Stephen Connolly

Apache, Apache Maven, Apache Cassandra, Maven and Cassandra are
trademarks of The Apache Software Foundation.


per-connection "read-after-my-write" consistency

2011-02-12 Thread Michal Augustýn
Hi,

I'm running 2 nodes with RF=2 (not optimal, I know), Cassandra 0.7.1.

During one connection, I write (CL.ONE) a row and subsequently read (CL.ONE)
the same row (via Thrift).
I supposed that if I write row to one node then I can immediately read this
row from this node.
It seems to be true for most cases, but circa 1 of 1000 attempts doesn't
work as expected - I get no row :(

Where is the problem please? Should I use another CL for read and/or write?
I would like just to achieve "per connection read-after-my-write
consistency".

Thank you very much!

Augi


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-12 Thread Aditya Narayan
What if the caching requirements, sorting needs of two kind of data
are very much similar, is it preferable to go with a single CF in
those cases ?


Regards
Aditya

>>> On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs  wrote:
>>
>> I read somewhere that more no of column families is not a good idea as
>> it consumes more memory and more compactions to occur
>
> This is primarily true, but not in every case.
>
>> But the caching requirements may be different as they cater to two
>> different features.
>
> This is a great reason to *not* merge them.  Besides the key and row
> caches,
> don't forget about the OS buffer cache.
>
>> Is it recommended to merge these two column families into one ??
>> Thoughts
>> ?
>
> No, this sounds like an anti-pattern to me.  The overhead from having
> two
> separate CFs is not that high.
>
> --
> Tyler Hobbs
> Software Engineer, DataStax
> Maintainer of the pycassa Cassandra Python client library


How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Aklin_81
I would like to text search for some of Entities/items stored in the
database through an AJAX powered application...Such that the user
starts typing and he can get the hints & suggested items. This is
implemented in SQL databases using the LIKE, is it possible to anyhow
implement this in an application powered by cassandra ?

How do I go forward to implement this feature, very much required for my case?

Would I have to consider a MySQL DB for implementing this particular
feature there, and rest in Cassandra ?


Thanks
-Asil


RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Vivek Mishra
You can use:
http://code.google.com/p/kundera/ to search text.
it provides a way to search by any key over Cassandra. I guess, nothing inbuilt 
is in place for this.

Vivek


From: rajkumar@gmail.com [rajkumar@gmail.com] on behalf of Aklin_81 
[asdk...@gmail.com]
Sent: 12 February 2011 17:27
To: user
Subject: How can I implement text based searching for the data/entities/items 
stored in Cassandra ?

I would like to text search for some of Entities/items stored in the
database through an AJAX powered application...Such that the user
starts typing and he can get the hints & suggested items. This is
implemented in SQL databases using the LIKE, is it possible to anyhow
implement this in an application powered by cassandra ?

How do I go forward to implement this feature, very much required for my case?

Would I have to consider a MySQL DB for implementing this particular
feature there, and rest in Cassandra ?


Thanks
-Asil



Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Vivek Mishra
Addtionally you can use cassandra indexes for specific search.


From: Vivek Mishra [vivek.mis...@impetus.co.in]
Sent: 12 February 2011 17:38
To: user@cassandra.apache.org
Subject: RE: How can I implement text based searching for the 
data/entities/items stored in Cassandra ?

You can use:
http://code.google.com/p/kundera/ to search text.
it provides a way to search by any key over Cassandra. I guess, nothing inbuilt 
is in place for this.

Vivek


From: rajkumar@gmail.com [rajkumar@gmail.com] on behalf of Aklin_81 
[asdk...@gmail.com]
Sent: 12 February 2011 17:27
To: user
Subject: How can I implement text based searching for the data/entities/items 
stored in Cassandra ?

I would like to text search for some of Entities/items stored in the
database through an AJAX powered application...Such that the user
starts typing and he can get the hints & suggested items. This is
implemented in SQL databases using the LIKE, is it possible to anyhow
implement this in an application powered by cassandra ?

How do I go forward to implement this feature, very much required for my case?

Would I have to consider a MySQL DB for implementing this particular
feature there, and rest in Cassandra ?


Thanks
-Asil



Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.



Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Cassandra 0.7.0 Release in Riptano public repository?

2011-02-12 Thread Rauan Maemirov
When I try `rpm -i riptano-release-5-1.el6.noarch.rpm`, it just freeze. Does
repository work?

2011/1/12 Michael Fortin 

> Thanks for your thoughtful and detailed replies Eric, it's much
> appreciated.
>
> Mike
>
> On Jan 11, 2011, at 11:23 AM, Eric Evans wrote:
>
> > On Tue, 2011-01-11 at 09:23 -0500, Michael Fortin wrote:
> >> This my understanding of 0.* releases.
> >> - They're not considered production ready by the maintainers
> >> - They subject to changes that break backwards compatibility
> >> - Generally poorly documented because the api is so volatile
> >> - Previous releases are unsupported
> >>
> >> for 1.* releases
> >> - The maintainer is saying this is tested and production ready,
> >> sometimes also marked as Final for GA
> >> - Minor releases do not break backward compatibility
> >> - The major and minor release have some level of support, with open
> >> source, that usually means docs and mailing lists but they should be
> >> very active.
> >> - thoroughly documented
> >
> > FWIW, your interpretation of what it means to be 1.0, is not wholly
> > unique, but it's far from universal either.
> >
> >> Sorting through the issue tracker is a little to fine grained to get a
> >> big picture view of where cassandra is going.
> >
> > Sorry, I should have been more clear here.
> >
> > The closest we have to a roadmap are the tickets that are marked as
> > blocking the next release, you shouldn't have to do any digging, they're
> > all available in one view here:
> >
> >
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12314820
> >
> > But, it's pretty fluid for the first few months after a new release.
> >
> >> And, just to be clear, I'm not questioning the maintainers approach,
> >> just humbling asking for a little more clarification.  Cassandra is
> >> awesome, and I'm itching to use it on some production projects where I
> >> think it would be a great fit, but 0.* designation scares me a little.
> >> Of course, a hastily released 1.* would be worse.
> >
> > I understand, but what I'm saying is a "1.0" release in this context
> > carries special significance that just doesn't map well to open source
> > projects.  And, in addition to being subjective, your criteria differs
> > from that of many people.  It might make things easier to just version
> > some future release 1.0 and be done with it, but I'd rather be honest
> > with you.
> >
> > This is honest:
> >
> > * We treated the Google code dump in 2008 as 0.1.0 (though no formal
> > release was made).
> > * We likewise treated the Apache code dump in 2009 as 0.2.0 (again, no
> > formal release).
> > * We called the first release under the Apache Incubator 0.3.0.
> > * We just now released 0.7.0.
> > * We maintain backward compatibility between the "minor" and "revision",
> > that is 0.6.1, 0.6.2, 0.6.3, etc.
> >
> > This is why I said my preference would be to just drop the leading 0.
> > We've been using the minor like a major, and the revision like a minor,
> > (and we haven't had need for a revision).  We've had 7 major releases,
> > (5 if you only want to count what's happened under Apache).
> >
> > Also:
> >
> > * Most of the "maintainers" would tell you that it is production-ready,
> > but then, they might be biased since most of them are running it in
> > production. YMMV.
> > * It is as poorly documented as most FLOSS projects.
> > * We provide support through the issue tracker, mailing lists, and IRC,
> > and you can purchase support contracts through Riptano.
> >
> >
> > --
> > Eric Evans
> > eev...@rackspace.com
> >
>
>


Cassandra concept

2011-02-12 Thread Filip Nguyen
Hey i have just downloaded cassandra and playing with it. It's kinda new thing 
for me (i always used relation dbs). I've read most of the material on wiki 
(including referenced) and I am puzzled why there is maximum of 3 dimensions 
(super column family - super column - columns)? who came up with 3? Why not 4 or 
5 or N dimensions?


Filip


Re: Cassandra concept

2011-02-12 Thread Sasha Dolgy
probably best to read the mail archives.  been a good thread recently in the
past month about supercolumns...

On Sat, Feb 12, 2011 at 1:40 PM, Filip Nguyen wrote:

> Hey i have just downloaded cassandra and playing with it. It's kinda new
> thing for me (i always used relation dbs). I've read most of the material on
> wiki (including referenced) and I am puzzled why there is maximum of 3
> dimensions (super column family - super column - columns)? who came up with
> 3? Why not 4 or 5 or N dimensions?
>
> Filip
>


Re: Indexes and hard disk

2011-02-12 Thread Bill de hÓra
On Fri, 2011-02-11 at 23:24 -0800, mcasandra wrote:

> Are indexes supported in Cassandra? If yes then what kind? Also, if it's
> supported then please point me to the place that give more information about
> it.



Secondary Indexes:





> Are there any kind of hard disk in particular recommended by Cassandra? We
> generally get only 500GB hard disks on our virtual machines. But I read that
> it's better to have 2TB HDD.


Hardware:






There's no replacement for understanding your application's workload and
the properties of the underlying hardware/vm and jvm you're being
supplied with (assuming you have a say in hardware selection). Cassandra
can run on pretty much anything, but the anything you get might not be
optimal for your app, which means you will have to tune the App and
Cassandra, or get different kit.

Bill


Re: Indexes and hard disk

2011-02-12 Thread Filip Nguyen


On 12.2.2011 13:56, Bill de hÓra wrote:

On Fri, 2011-02-11 at 23:24 -0800, mcasandra wrote:

Are indexes supported in Cassandra? If yes then what kind? Also, if it's
supported then please point me to the place that give more information about
it.

Secondary Indexes:




Why the secondary indexes are even present in Cassandra? I thought the point is 
that development in Cassandra is query driven, that when you want to search and 
fetch for example by birth date you should create new ColumnFamilly...




Re: Cassandra documentation

2011-02-12 Thread Bill de hÓra
On Fri, 2011-02-11 at 10:06 +1300, Aaron Morton wrote:
> There is also a book http://oreilly.com/catalog/0636920010852
> 
> I've not read it yet, so cannot comment on it's quality.

Good book, worth getting. Cassandra is a moving target so some detailed
info is already out of date, eg references to Avro and the code samples,
but the overall gist of things is fine. I hope O'Reilly will do a 2nd ed
for 0.8 this year.

A High Performance Cassandra a la Zawodny's MySQL book would be a no
brainer. Almost all the problems I see on user@ are
operational/perf/tuning related. 

Bill



Is Avro still supported?

2011-02-12 Thread Joshua Partogi
Hi,

I saw in the latest source in trunk, avro codes has been deleted. Does
this mean Avro is not supported anymore? If so, what was the decision
behind dropping the support for Avro?

Thanks

-- 
http://twitter.com/jpartogi


creating secondary index after data exists

2011-02-12 Thread Sasha Dolgy
hi there,

was playing around today and had a simple CF with sample user data with the
following columns:

| email | name | nickname |


>From the cassandra-cli I ran the following:

 update column family user with column_metadata=[{column_name: nickname,
validation_class: UTF8Type,index_type: KEYS},

{column_name: email, validation_class: UTF8Type, index_type: KEYS}];


Subsequently, when I tried with an index clause to retrieve the rows that
matched my email address (only 1) it returned no results.  It was only after
I performed the following from the cassandra-cli:

set user ['some-random-key']['email'] = 'my@email.address';


That it worked.  Is this normal behaviour?  In that, I should clearly define
which columns I want indexed prior to data population?

Thanks in advance,
-sd

-- 
Sasha Dolgy
sasha.do...@gmail.com


Re: Indexes and hard disk

2011-02-12 Thread Javier Canillas
As for indexes, cassandra supports columns as indexes (there is a
property called some sort of "type" that can have the value "KEYS".
Start your search in that direction (this is a feature from 0.7)
introduce when you create dinamically columnfamilies and you declare
its columns

On the other hand, if your application will use heavily cassandra i
recommend making a calcularon of the data volumen that will handle and
using a partition that is actually 3 to 5 times bigger. Doing so you
will be able to play around with compactation strategy to avoid
latency when your application is under heavy load.

Hope this information is useful to you

Regards,

Javier canillas

El 12/02/2011, a las 04:24, mcasandra  escribió:

>
> Are indexes supported in Cassandra? If yes then what kind? Also, if it's
> supported then please point me to the place that give more information about
> it.
>
> Are there any kind of hard disk in particular recommended by Cassandra? We
> generally get only 500GB hard disks on our virtual machines. But I read that
> it's better to have 2TB HDD.
> --
> View this message in context: 
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Indexes-and-hard-disk-tp6018234p6018234.html
> Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
> Nabble.com.


Re: Indexes and hard disk

2011-02-12 Thread Bill de hÓra
On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen wrote:


> Why the secondary indexes are even present in Cassandra? I thought the
> point is that development in Cassandra is query driven, that when you
> want to search and fetch for example by birth date you should create
> new ColumnFamilly...



Yes and no. Systems like Cassandra are designed such that you should
write the data out as you want to read it in (because writes are cheap).
However most systems will want to access data via a few other criteria.
For example a blogging system that supports tags will need to list your
blog entries by date and by tag equally efficiently . As you say, you
can spin up a new ColumnFamilly for that, but it's such a common need
that Cassandra 0.7 supports it directly and saves developers having to
manage indexes by hand  (under the hood, a 0.7 index is a 'private' CF).
This for me is one of the features that really sets Cassandra apart  -
scaling and indexing data at the same time is hard, and very few systems
do both well.

Bill 


Re: Is Avro still supported?

2011-02-12 Thread Edward Capriolo
https://issues.apache.org/jira/browse/CASSANDRA-926

On Sat, Feb 12, 2011 at 8:27 AM, Joshua Partogi  wrote:
> Hi,
>
> I saw in the latest source in trunk, avro codes has been deleted. Does
> this mean Avro is not supported anymore? If so, what was the decision
> behind dropping the support for Avro?
>
> Thanks
>
> --
> http://twitter.com/jpartogi
>


Re: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Shaun Cutts
There is/are lucandra/solandra: https://github.com/tjake/Lucandra
 
-- Shaun






On Feb 12, 2011, at 6:57 AM, Aklin_81 wrote:

> I would like to text search for some of Entities/items stored in the
> database through an AJAX powered application...Such that the user
> starts typing and he can get the hints & suggested items. This is
> implemented in SQL databases using the LIKE, is it possible to anyhow
> implement this in an application powered by cassandra ?
> 
> How do I go forward to implement this feature, very much required for my case?
> 
> Would I have to consider a MySQL DB for implementing this particular
> feature there, and rest in Cassandra ?
> 
> 
> Thanks
> -Asil



Re: creating secondary index after data exists

2011-02-12 Thread Jonathan Ellis
index creation isn't instant, no.

cli describe keyspace will tell you which ones are still building.

On Sat, Feb 12, 2011 at 7:40 AM, Sasha Dolgy  wrote:
> hi there,
> was playing around today and had a simple CF with sample user data with the
> following columns:
>
> | email | name | nickname |
>
> From the cassandra-cli I ran the following:
>
>  update column family user with column_metadata=[{column_name: nickname,
> validation_class: UTF8Type,index_type: KEYS},
>
> {column_name: email, validation_class: UTF8Type, index_type: KEYS}];
>
> Subsequently, when I tried with an index clause to retrieve the rows that
> matched my email address (only 1) it returned no results.  It was only after
> I performed the following from the cassandra-cli:
>
> set user ['some-random-key']['email'] = 'my@email.address';
>
> That it worked.  Is this normal behaviour?  In that, I should clearly define
> which columns I want indexed prior to data population?
> Thanks in advance,
> -sd
> --
> Sasha Dolgy
> sasha.do...@gmail.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: cassandra solaris x64 support

2011-02-12 Thread Sylvain Lebresne
On Sat, Feb 12, 2011 at 2:52 AM, Xiaobo Gu  wrote:

> On Fri, Feb 11, 2011 at 11:54 PM, Sylvain Lebresne 
> wrote:
> >
> >
> > On Fri, Feb 11, 2011 at 4:27 PM, Xiaobo Gu 
> wrote:
> >>
> >> On Fri, Feb 11, 2011 at 11:21 PM, Roland Gude <
> roland.g...@yoochoose.com>
> >> wrote:
> >> > This is a problem with the start scripts, not with Cassandra itself
> (or
> >> > any of its configuration)
> >> > The shell you are using cannot start the cassandra shell script.
> >> >
> >> > Try
> >> > #bash bin/cassandra -f
> >> You are right, but there are other problems, but you help me make a
> >> big step, I'll dig dipper tomorrow.
> >>
> >> -bash-3.00$ /bin/bash bin/cassandra -f
> >> Error occurred during initialization of VM
> >> Could not reserve enough space for object heap
> >> -bash-3.00$
> >
> > You apparently don't have enough free memory available for the
> > default setting (1G). Uncomment and change the value of
> > MAX_HEAP_SIZE in conf/cassandra-env.sh.
> >
> >>
> >> Does the start script automatically choose the 64bit Java virtual
> >> machine, I have both 64bit and 32bit jdk installed.
> >
> > It either use $JAVA_HOME/bin/java if $JAVA_HOME is defined in
> > your environment or the value returned by `which java`.
>
> 32bit and 64bit java binaries are mixed up in one repository, there is
> a $JAVA_HOME/bin/java, and there is also a $JAVA_HOME/bin/amd64/java,
> I think the first one is 32bit and the second is 64bit, how can I
> choose the 64bit one.


Look at bin/cassandra. It's a shell script that, in particular set the
JAVA variable that is the used binary. A quick (though slightly dirty)
way to force the binary would be to hardcode the value in there.

Otherwise, if you unset the $JAVA_HOME variable and make sure the
64 bit binary is in your path (but not the 32 bit one, or "after"), that
will
work too.

--
Sylvain


> >>
> >> > As far as I know, it should work fine. Actually it should work with sh
> >> > as well...
> >> >
> >> >
> >> > -Ursprüngliche Nachricht-
> >> > Von: Xiaobo Gu [mailto:guxiaobo1...@gmail.com]
> >> > Gesendet: Freitag, 11. Februar 2011 16:12
> >> > An: user@cassandra.apache.org
> >> > Betreff: Re: cassandra solaris x64 support
> >> >
> >> > On Fri, Feb 11, 2011 at 10:51 PM, Jonathan Ellis 
> >> > wrote:
> >> >> The vast majority run on Linux, but there are a few people running
> >> >> Cassandra on Solaris, FreeBSD, and Windows.
> >> > But I failed to start the one node test cluster,
> >> > # sh bin/cassandra -f
> >> > bin/cassandra: syntax error at line 22: `MAX_HEAP_SIZE=$' unexpected
> >> >
> >> > My environemnt is as follwoing:
> >> > # more /etc/release
> >> >   Solaris 10 10/09 s10x_u8wos_08a X86
> >> >   Copyright 2009 Sun Microsystems, Inc.  All Rights Reserved.
> >> >Use is subject to license terms.
> >> >   Assembled 16 September 2009
> >> >
> >> > # java -fullversion
> >> > java full version "1.6.0_23-b05"
> >> > # java -version
> >> > java version "1.6.0_23"
> >> > Java(TM) SE Runtime Environment (build 1.6.0_23-b05)
> >> > Java HotSpot(TM) Client VM (build 19.0-b09, mixed mode, sharing)
> >> >
> >> > I changed initial_token:0
> >> >
> >> >
> >> >> On Fri, Feb 11, 2011 at 4:40 AM, Xiaobo Gu 
> >> >> wrote:
> >> >>> Hi,
> >> >>> Because I can't access the archives of the mailing list, so my
> >> >>> apologies if someone have asked this before.
> >> >>>
> >> >>> Does any have successfully run Cassandra on Solaris 10 X64 clusters?
> >> >>>
> >> >>> Regards,
> >> >>>
> >> >>> Xiaobo Gu
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jonathan Ellis
> >> >> Project Chair, Apache Cassandra
> >> >> co-founder of DataStax, the source for professional Cassandra support
> >> >> http://www.datastax.com
> >> >>
> >> >
> >> >
> >> >
> >
> >
>


RE: How can I implement text based searching for the data/entities/items stored in Cassandra ?

2011-02-12 Thread Vivek Mishra
Absolutly, interestingly Solution based on Lucandra is integrated with Kundera..


From: Shaun Cutts [sh...@cuttshome.net]
Sent: 12 February 2011 21:18
To: user@cassandra.apache.org
Subject: Re: How can I implement text based searching for the 
data/entities/items stored in Cassandra ?

There is/are lucandra/solandra: https://github.com/tjake/Lucandra

-- Shaun






On Feb 12, 2011, at 6:57 AM, Aklin_81 wrote:

I would like to text search for some of Entities/items stored in the
database through an AJAX powered application...Such that the user
starts typing and he can get the hints & suggested items. This is
implemented in SQL databases using the LIKE, is it possible to anyhow
implement this in an application powered by cassandra ?

How do I go forward to implement this feature, very much required for my case?

Would I have to consider a MySQL DB for implementing this particular
feature there, and rest in Cassandra ?


Thanks
-Asil




Impetus to Present Big Data -- Analytics Solutions and Strategies at TDWI World 
Conference (Feb 13-18) in Las Vegas.We are also bringing cloud experts together 
at CloudCamp, Delhi on Feb 12. CloudCamp is an unconference where early 
adopters of Cloud Computing technologies exchange ideas.

Click http://www.impetus.com to know more.


NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Indexes and hard disk

2011-02-12 Thread Dave Revell
Indexes have another important advantage over multiple denormalized column
families. If you make the copies yourself, eventually the copies will
diverge from the base "true" column family due to routine occasional
failures. You'll probably want to find and fix these inconsistencies.

If you're using built-in indexes, you won't have this problem and you can
save some effort.

-Dave
On Feb 12, 2011 7:16 AM, "Bill de hÓra"  wrote:
> On Sat, 2011-02-12 at 14:03 +0100, Filip Nguyen wrote:
>
>
>> Why the secondary indexes are even present in Cassandra? I thought the
>> point is that development in Cassandra is query driven, that when you
>> want to search and fetch for example by birth date you should create
>> new ColumnFamilly...
>
>
>
> Yes and no. Systems like Cassandra are designed such that you should
> write the data out as you want to read it in (because writes are cheap).
> However most systems will want to access data via a few other criteria.
> For example a blogging system that supports tags will need to list your
> blog entries by date and by tag equally efficiently . As you say, you
> can spin up a new ColumnFamilly for that, but it's such a common need
> that Cassandra 0.7 supports it directly and saves developers having to
> manage indexes by hand (under the hood, a 0.7 index is a 'private' CF).
> This for me is one of the features that really sets Cassandra apart -
> scaling and indexing data at the same time is hard, and very few systems
> do both well.
>
> Bill


Re: Indexes and hard disk

2011-02-12 Thread mcasandra


Javier Canillas wrote:
> 
> 
> On the other hand, if your application will use heavily cassandra i
> recommend making a calcularon of the data volumen that will handle and
> using a partition that is actually 3 to 5 times bigger. Doing so you
> will be able to play around with compactation strategy to avoid
> latency when your application is under heavy load.
> 
> 

Can you please explain with an example? I just started reading about
Cassandra, all I know is that Cassandra partitions based on the hash space
and divides equally among all the nodes. The tokens (keys) are then
distributed equally accross N replica nodes.

-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Indexes-and-hard-disk-tp6018234p6019297.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Merging the rows of two column families(with similar attributes) into one ??

2011-02-12 Thread Aditya Narayan
Any comments/view points on this?


--On Sat, Feb 12, 2011 at 5:05 PM, Aditya Narayan wrote:

What if the caching requirements, sorting needs of two kind of data
are very much similar, is it preferable to go with a single CF in
those cases ?


Regards
Aditya

>
> >>> On Sat, Feb 5, 2011 at 10:43 AM, Tyler Hobbs
>  wrote:
> >>
> >> I read somewhere that more no of column families is not a good idea
> as
> >> it consumes more memory and more compactions to occur
> >
> > This is primarily true, but not in every case.
> >
> >> But the caching requirements may be different as they cater to two
> >> different features.
> >
> > This is a great reason to *not* merge them.  Besides the key and row
> > caches,
> > don't forget about the OS buffer cache.
> >
> >> Is it recommended to merge these two column families into one ??
> >> Thoughts
> >> ?
> >
> > No, this sounds like an anti-pattern to me.  The overhead from having
> > two
> > separate CFs is not that high.
> >
> > --
> > Tyler Hobbs
> > Software Engineer, DataStax
> > Maintainer of the pycassa Cassandra Python client library
>


Partioning and Sorting is it CF Key or Column Key?

2011-02-12 Thread mcasandra

public class ColumnFamily {
Byte[] name;

  // The key is a user generated key
  Map> value = null;
}
Some questions I have:

1) Is partitioning based on CF.KEY or KEY of Column? From what I read it's
based on column keys and not the CF keys but want to confirm.
2) When data is entered is it sorted based on CF.KEY or KEY of column or
both?
3) Looking at examples of indexing it looks like indexes are on columns.
Would it make sense to have index on CF.KEY also?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Partioning-and-Sorting-is-it-CF-Key-or-Column-Key-tp6019450p6019450.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Indexes and hard disk

2011-02-12 Thread Filip Nguyen


On 12.2.2011 16:15, Bill de hÓra wrote:

(under the hood, a 0.7 index is a 'private' CF).

Thanks for answer, that sounds very clever and neat!


Re: per-connection "read-after-my-write" consistency

2011-02-12 Thread Michal Augustýn
Hi,

I'm using .NET and I wrote my own client library (over Thrift) so I'm
absolutely sure that both operations are performed using the same
connection.
I can handle the current issue in application but I'm sure that I will not
be able to handle some future situation in application.

So the suggestion is to use at least 3 nodes with RF=3 and CL.QUORUM for
both write and reads where high consistency is required, right?

Thanks!

2011/2/12 Dan Hendry 

> Are you using a higher level client (hector/pelops/pycassa/etc) or the
> actual thrift API? Higher level clients often pool connections and two
> subsequent operations (read then write) may be performed with connections to
> different nodes.
>
>
>
> If you are sure you are using the same connection (the actual thrift api),
> there is a possible race condition. To the best of my understanding, here is
> how a write happens at cl ONE in your case :
>
> -  You make a request to node A which initiates a write to node A
> and B
>
> -  The server reports successful when the write to node A OR B is
> complete (can somebody else confirm?)
>
>
>
> Typically the write to A will complete quicker since that is the node you
> are connected to and there is additional network overhead initiating the
> write on node B. I suppose a 1:1000 chance of B completing first is
> possible, particularly if all nodes and the client are on the same network
> (or same machine) with very low latencies.
>
>
>
> Cassandra allows you to explicitly specify the trade-off between
> consistency and availability. When you read and write at ONE with RF=2,
> consistency is not guaranteed but high availability is (you can lose a node
> and continue to operate). If you require strong consistency you will either
> have to read or write at consistency level ALL. My suggestion is to either
> design your application to tolerate inconsistency (if possible) or move to
> RF=3 and quorum read and quorum writes.
>
>
>
> Dan
>
>
>
> *From:* Michal Augustýn [mailto:augustyn.mic...@gmail.com]
> *Sent:* February-12-11 4:13
> *To:* user@cassandra.apache.org
> *Subject:* per-connection "read-after-my-write" consistency
>
>
>
> Hi,
>
>
>
> I'm running 2 nodes with RF=2 (not optimal, I know), Cassandra 0.7.1.
>
>
>
> During one connection, I write (CL.ONE) a row and subsequently read
> (CL.ONE) the same row (via Thrift).
>
> I supposed that if I write row to one node then I can immediately read this
> row from this node.
>
> It seems to be true for most cases, but circa 1 of 1000 attempts doesn't
> work as expected - I get no row :(
>
>
>
> Where is the problem please? Should I use another CL for read and/or write?
> I would like just to achieve "per connection read-after-my-write
> consistency".
>
>
>
> Thank you very much!
>
>
>
> Augi
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11
> 02:34:00
>


RE: per-connection "read-after-my-write" consistency

2011-02-12 Thread Dan Hendry
Are you using a higher level client (hector/pelops/pycassa/etc) or the
actual thrift API? Higher level clients often pool connections and two
subsequent operations (read then write) may be performed with connections to
different nodes.

 

If you are sure you are using the same connection (the actual thrift api),
there is a possible race condition. To the best of my understanding, here is
how a write happens at cl ONE in your case : 

-  You make a request to node A which initiates a write to node A
and B

-  The server reports successful when the write to node A OR B is
complete (can somebody else confirm?)

 

Typically the write to A will complete quicker since that is the node you
are connected to and there is additional network overhead initiating the
write on node B. I suppose a 1:1000 chance of B completing first is
possible, particularly if all nodes and the client are on the same network
(or same machine) with very low latencies. 

 

Cassandra allows you to explicitly specify the trade-off between consistency
and availability. When you read and write at ONE with RF=2, consistency is
not guaranteed but high availability is (you can lose a node and continue to
operate). If you require strong consistency you will either have to read or
write at consistency level ALL. My suggestion is to either design your
application to tolerate inconsistency (if possible) or move to RF=3 and
quorum read and quorum writes.

 

Dan

 

From: Michal Augustýn [mailto:augustyn.mic...@gmail.com] 
Sent: February-12-11 4:13
To: user@cassandra.apache.org
Subject: per-connection "read-after-my-write" consistency

 

Hi,

 

I'm running 2 nodes with RF=2 (not optimal, I know), Cassandra 0.7.1.

 

During one connection, I write (CL.ONE) a row and subsequently read (CL.ONE)
the same row (via Thrift).

I supposed that if I write row to one node then I can immediately read this
row from this node.

It seems to be true for most cases, but circa 1 of 1000 attempts doesn't
work as expected - I get no row :(

 

Where is the problem please? Should I use another CL for read and/or write?
I would like just to achieve "per connection read-after-my-write
consistency".

 

Thank you very much!

 

Augi

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11
02:34:00



RE: per-connection "read-after-my-write" consistency

2011-02-12 Thread Dan Hendry
> So the suggestion is to use at least 3 nodes with RF=3 and CL.QUORUM for
both write and reads where high consistency is required, right?

 

Yes, this is the typical way to use Cassandra when both consistency and
availability are required.

 

Dan

 

From: Michal Augustýn [mailto:augustyn.mic...@gmail.com] 
Sent: February-12-11 17:37
To: user@cassandra.apache.org
Subject: Re: per-connection "read-after-my-write" consistency

 

Hi,

 

I'm using .NET and I wrote my own client library (over Thrift) so I'm
absolutely sure that both operations are performed using the same
connection.

I can handle the current issue in application but I'm sure that I will not
be able to handle some future situation in application.

 

So the suggestion is to use at least 3 nodes with RF=3 and CL.QUORUM for
both write and reads where high consistency is required, right?

 

Thanks!

2011/2/12 Dan Hendry 

Are you using a higher level client (hector/pelops/pycassa/etc) or the
actual thrift API? Higher level clients often pool connections and two
subsequent operations (read then write) may be performed with connections to
different nodes.

 

If you are sure you are using the same connection (the actual thrift api),
there is a possible race condition. To the best of my understanding, here is
how a write happens at cl ONE in your case : 

-  You make a request to node A which initiates a write to node A
and B

-  The server reports successful when the write to node A OR B is
complete (can somebody else confirm?)

 

Typically the write to A will complete quicker since that is the node you
are connected to and there is additional network overhead initiating the
write on node B. I suppose a 1:1000 chance of B completing first is
possible, particularly if all nodes and the client are on the same network
(or same machine) with very low latencies. 

 

Cassandra allows you to explicitly specify the trade-off between consistency
and availability. When you read and write at ONE with RF=2, consistency is
not guaranteed but high availability is (you can lose a node and continue to
operate). If you require strong consistency you will either have to read or
write at consistency level ALL. My suggestion is to either design your
application to tolerate inconsistency (if possible) or move to RF=3 and
quorum read and quorum writes.

 

Dan

 

From: Michal Augustýn [mailto:augustyn.mic...@gmail.com] 
Sent: February-12-11 4:13
To: user@cassandra.apache.org
Subject: per-connection "read-after-my-write" consistency

 

Hi,

 

I'm running 2 nodes with RF=2 (not optimal, I know), Cassandra 0.7.1.

 

During one connection, I write (CL.ONE) a row and subsequently read (CL.ONE)
the same row (via Thrift).

I supposed that if I write row to one node then I can immediately read this
row from this node.

It seems to be true for most cases, but circa 1 of 1000 attempts doesn't
work as expected - I get no row :(

 

Where is the problem please? Should I use another CL for read and/or write?
I would like just to achieve "per connection read-after-my-write
consistency".

 

Thank you very much!

 

Augi

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11
02:34:00

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3439 - Release Date: 02/12/11
02:34:00



0.7 PHP thrift example

2011-02-12 Thread Courtney Robinson
Does anyone have a working 0.7 thrift example in PHP...?
I compiled 0.5 version of thrift and built the PHP bindings
but when I try to run the php example on the wiki I get

TException: Error: Attempt to send non-object type as a T_STRUCT


Confused about get_slice SliceRange behavior with bloom filter

2011-02-12 Thread E S
I've gotten myself really confused by 
http://wiki.apache.org/cassandra/ArchitectureInternals and am hoping someone 
can 
help me understand what the io behavior of this operation would be.

When I do a get_slice for a column range, will it seek to every SSTable?  I had 
thought that it would use the bloom filter on the row key so that it would only 
do a seek to SSTables that have a very high probability of containing columns 
for that row.  In the linked doc above, it seems to say that it is only used 
for 
exact column names.  Am I misunderstanding this?

On a related note, if instead of using a SliceRange I provide an explicit list 
of columns, will I have to read all SSTables that have values for the columns, 
or is it smart enough to stop after finding a value from the most recent 
SSTable?



 

TV dinner still cooling? 
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/


Does Cassandra support multiple listen_address and rpc_address?

2011-02-12 Thread Xiaobo Gu
multiple network paths for inner-cluster communication will boost performance

Thanks.

Xiaobo Gu


Extra Large Memtables

2011-02-12 Thread E S
I am trying to minimize my SSTable count to help cut down my read latency.  I 
have some very beefy boxes for my cassandra nodes (96 gigs of memory each).  I 
think this gives me a lot of flexibility to cut down SSTable count by having a 
very large memtable throughput setting.

While experimenting with this, I found a bug where you can't have memtable 
throughput configured past 2 gigs without an integer overflow screwing up the 
flushes.  That makes me feel like I'm in uncharted territory :).  I'm guessing 
the standard answer to too many SSTables, is get more boxes, but I'm hoping I 
can squeeze a lot more juice out of the ones I have given the specs.

I had wanted to set the throughput to 8 gigs for my column family (I only have 
one) and set my heap to 30 gigs (still leaving 66 gigs for file cache).  When 
this failed due to the int overflow, I partitioned the column family into 4 
column families (simple mod operation on the row key when I save and retrieve) 
and set them each to 2 gigs throughput to replicate the same behavior.

I did some quick write stress tests and every things to perform well and 
stable.  Memory usage also seemed stable.  However, I am nervous since the 
defect makes me think most people are using much smaller memory loads.  Has 
anyone had any experience with having cassandra use this much memory?  Does 
anyone see any pitfalls that I'm missing?  If not, I'll let you guys know if I 
learn anything interesting!



  


Re: 0.7 PHP thrift example

2011-02-12 Thread Sasha Dolgy
Check out phpcassa on github maintained by thobbs
On 13 Feb 2011 01:52, "Courtney Robinson"  wrote:
> Does anyone have a working 0.7 thrift example in PHP...?
> I compiled 0.5 version of thrift and built the PHP bindings
> but when I try to run the php example on the wiki I get
>
> TException: Error: Attempt to send non-object type as a T_STRUCT