from:"Eldad Yamin"

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

2012-07-23 Thread Eldad Yamin

in addition, if you don't know how many rows will be needed - in each row,
you can store the key of the next one.
Just like in a linked list.

OR

have 1 row that will hold all the keys that combining your other rows.
1st select the main row (with the keys), then select the other rows.

On Mon, Jul 23, 2012 at 3:40 PM, rohit bhatia  wrote:

> You should probably try to break the one row scheme to
> 2*Number_of_nodes rows scheme.. This should ensure proper distribution
> of rows and still allow u to query from a few fixed number of rows.
> How u do it depends on how are u gonna choose ur 200-500 columns
> during reading (try having them in the same row)
>
> Even if u r forced to put them in seperate rows, u can make the row
> key as "some modulus of hash of column name", ensuring symmetry and
> easy access of columns...
>
> On Mon, Jul 23, 2012 at 6:02 PM, Ertio Lew  wrote:
> > Any ideas/suggestions please?
>

Re: Any meet ups in southern california

2011-07-07 Thread Eldad Yamin

You can use Watchitoo.com (LIke GoToMeeting/WebEX) to host an event.
using that tool, everyone around the world can join and take action.

the great thing about is that it's FREE!

On Wed, Jul 6, 2011 at 10:25 PM, Mike Rapuano  wrote:

> Hi all
>
> Are there any active cassandra meet ups in southern California?
>
> Thanks
> Mike
>

Re: Pre-CassandraSF Happy Hour on Sunday

2011-07-09 Thread Eldad Yamin

Can you please Watchitoo.com (its' free) and broadcast the event?

On Fri, Jul 8, 2011 at 8:54 PM, Richard Low  wrote:

> Hi all,
>
> If you're in San Francisco for CassandraSF on Monday 11th, then come
> and join fellow Cassandra users and committers on Sunday evening.
> Starting at 6:30pm at ThirstyBear, the famous brewing company.  We'll
> have drinks, food and more.
>
> RSVP at Eventbrite: http://pre-cassandrasf-happyhour.eventbrite.com/
>
> Hope you can join us!
>
> --
> Richard Low
> Acunu | http://www.acunu.com | @acunu
>

Cassandra Secondary index/Twissandra

2011-07-09 Thread Eldad Yamin

Hi,
I have few questions:

*Secondary index*

   1. Is there a limit on the number of columns in a single column family
   that serve as secondary indexes?
   2. Does performance decrease (significantly) if the uniqueness of the
   column’s values is high?


*Twissandra*

   1. Why in the source (or any tutorial I've read):
   The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and not
   TimeUUID?

   
https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
   2. Does performance decrease (significantly) if the uniqueness of the
   column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
   lots of columns?


Thanks!
Eldad

Re: Cassandra Secondary index/Twissandra

2011-07-10 Thread Eldad Yamin

Aaron - Thank you for the fast response!


   1. Does performance decrease (significantly) if the uniqueness of the
   column’s name is high when comparator is LONG_TYPE/TimeUUID and each row has
   lots of columns?

>Depends on what sort of operations you are doing. Some read operations have
to pay a constant cost to decode the row level column index, this can be
tuned though. AFAIK the comparator type has very little to do with the
performance.

In Twissandra, the columns are used as "alternative" index for the
Userline/Timeline. therefore the operation I'm going to do is slice_range.
I'm going to get (for example) the first 50  columns (using comparator of
TimeUUID/LONG).
Can you recommend on a better way of doing that or a way to tune Cassandra
to support those 2 CF?


Thanks!

On Sun, Jul 10, 2011 at 3:26 AM, aaron morton wrote:

>
>1. Is there a limit on the number of columns in a single column family
>that serve as secondary indexes?
>
> AFAIK there is no coded limit, however every index is implemented as
> another (hidden) Column Family that inherits the settings of the parent CF.
> So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
> Also, when an indexed column is updated there are potentially 3 operations
> that have to happen: read the old value, delete the old value, write the new
> value. More indexes == more index updating, just like any other database.
>
>
>1. Does performance decrease (significantly) if the uniqueness of the
>column’s values is high?
>
> Low cardinality is recommended
>
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html
>
>
>1. The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE"
>and not TimeUUID?
>
> Probably just to make the demo easier. It's used to order tweets in the
> user and public timelines by the current time
> https://github.com/twissandra/twissandra/blob/master/cass.py#L204
>
>
>1. Does performance decrease (significantly) if the uniqueness of the
>column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
> has
>lots of columns?
>
> Depends on what sort of operations you are doing. Some read operations have
> to pay a constant cost to decode the row level column index, this can be
> tuned though. AFAIK the comparator type has very little to do with the
> performance.
>
> Hope that helps.
>
> -
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9 Jul 2011, at 12:15, Eldad Yamin wrote:
>
> Hi,
> I have few questions:
>
> *Secondary index*
>
>1. Is there a limit on the number of columns in a single column family
>that serve as secondary indexes?
>2. Does performance decrease (significantly) if the uniqueness of the
>column’s values is high?
>
>
> *Twissandra*
>
>1. Why in the source (or any tutorial I've read):
>The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and
>not TimeUUID?
>
>
> https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
>2. Does performance decrease (significantly) if the uniqueness of the
>column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
> has
>lots of columns?
>
>
> Thanks!
> Eldad
>
>
>

Re: Cassandra Secondary index/Twissandra

2011-07-11 Thread Eldad Yamin

Hi Aaron,
Thank you again for your response.

I've read the article but I didn't understand everything. it would be great
if the benchmark will include the actual CLI/Python comments (that way it
will be easier to understand the query). in addition, an explanation about
row pages - what is it?.

Anyway, for a scale proportion, we can take as example
the average Facebook/Twitter user which can get 100K columns per user
(Userline).
So what is needed is to take the first 50 columns (order by TimeUUID), then
column 51 to 100, 101 to 150 etc.
Any suggestion on fast will it be? or how you recommend on configuring
Cassandra? or even a different way of achieving that goal?

Thanks,
Eldad.

On Sun, Jul 10, 2011 at 8:31 PM, aaron morton wrote:

> Can you recommend on a better way of doing that or a way to tune Cassandra
> to support those 2 CF?
>
> A select with no start or finish column name, a column count and not in
> reversed order is about the fastest read query.
>
> You will need to do a reversed query, which will be a little slower. But
> may still be plenty fast enough, depending on scale and throughput and all
> those other things. see
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/
>
> Cheers
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 00:14, Eldad Yamin wrote:
>
> Aaron - Thank you for the fast response!
>
>
>1. Does performance decrease (significantly) if the uniqueness of the
>column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
> has
>lots of columns?
>
> >Depends on what sort of operations you are doing. Some read operations
> have to pay a constant cost to decode the row level column index, this can
> be tuned though. AFAIK the comparator type has very little to do with the
> performance.
>
> In Twissandra, the columns are used as "alternative" index for the
> Userline/Timeline. therefore the operation I'm going to do is slice_range.
> I'm going to get (for example) the first 50  columns (using comparator of
> TimeUUID/LONG).
> Can you recommend on a better way of doing that or a way to tune Cassandra
> to support those 2 CF?
>
>
> Thanks!
>
> On Sun, Jul 10, 2011 at 3:26 AM, aaron morton wrote:
>
>>
>>1. Is there a limit on the number of columns in a single column family
>>that serve as secondary indexes?
>>
>> AFAIK there is no coded limit, however every index is implemented as
>> another (hidden) Column Family that inherits the settings of the parent CF.
>> So under 0.7 you may run out of memory, under 0.8 you may flush  a lot.
>> Also, when an indexed column is updated there are potentially 3 operations
>> that have to happen: read the old value, delete the old value, write the new
>> value. More indexes == more index updating, just like any other database.
>>
>>
>>1. Does performance decrease (significantly) if the uniqueness of the
>>column’s values is high?
>>
>> Low cardinality is recommended
>>
>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Secondary-indices-Why-low-cardinality-td6160509.html
>>
>>
>>1. The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE"
>>and not TimeUUID?
>>
>> Probably just to make the demo easier. It's used to order tweets in the
>> user and public timelines by the current time
>> https://github.com/twissandra/twissandra/blob/master/cass.py#L204
>>
>>
>>1. Does performance decrease (significantly) if the uniqueness of the
>>column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
>> has
>>lots of columns?
>>
>> Depends on what sort of operations you are doing. Some read operations
>> have to pay a constant cost to decode the row level column index, this can
>> be tuned though. AFAIK the comparator type has very little to do with the
>> performance.
>>
>> Hope that helps.
>>
>> -
>>  -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 9 Jul 2011, at 12:15, Eldad Yamin wrote:
>>
>> Hi,
>> I have few questions:
>>
>> *Secondary index*
>>
>>1. Is there a limit on the number of columns in a single column family
>>that serve as secondary indexes?
>>2. Does performance decrease (significantly) if the uniqueness of the
>>column’s values is high?
>>
>>
>> *Twissandra*
>>
>>1. Why in the source (or any tutorial I've read):
>>The CF for "Userline"/"Uimeline" - have comparator of "LONG_TYPE" and
>>not TimeUUID?
>>
>>
>> https://github.com/twissandra/twissandra/blob/master/tweets/management/commands/sync_cassandra.py
>>2. Does performance decrease (significantly) if the uniqueness of the
>>column’s name is high when comparator is LONG_TYPE/TimeUUID and each row 
>> has
>>lots of columns?
>>
>>
>> Thanks!
>> Eldad
>>
>>
>>
>
>

b-tree

2011-07-20 Thread Eldad Yamin

Hello,
Is there any good way of storing a binary-tree in Cassandra?
I wonder if someone already implement something like that and how
accomplished that without transaction supports (while the tree keep
evolving)?

I'm asking that becouse I want to save geospatial-data, and SimpleGeo did it
using b-tree:
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php

Thanks!

Re: b-tree

2011-07-21 Thread Eldad Yamin

Aaron,
Nested set is exactly what I had in mind.
But how will you be able to maintain it while it evolves and new data is
added without transactions?

Thanks!

On Thu, Jul 21, 2011 at 1:44 AM, aaron morton wrote:

> Just throwing out a (half baked) idea, perhaps the Nested Set Model of
> trees would work  http://en.wikipedia.org/wiki/Nested_set_model
>
> <http://en.wikipedia.org/wiki/Nested_set_model>* Ever row would represent
> a set with a left and right encoded into the key
> * Members are inserted as columns into *every* set / row they are a member.
> So we are de-normalising and trading space for time.
> * May need to maintain a custom secondary index of the materialised sets.
> e.g. slice a row to get the first column >= the left value you are
> interested in, that is the key for the set.
>
> I've not thought it through much further than that, a lot would depend on
> your data. The top sets may get very big, .
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
>
> Im not sure if I have an answer for you, anyway, but I'm curious
>
> A b-tree and a binary tree are not the same thing.  A binary tree is a
> basic fundamental data structure,  A b-tree is an approach to storing and
> indexing data on disc for a database.
>
> Which do you mean?
>
> On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin  wrote:
>
>> Hello,
>> Is there any good way of storing a binary-tree in Cassandra?
>> I wonder if someone already implement something like that and how
>> accomplished that without transaction supports (while the tree keep
>> evolving)?
>>
>> I'm asking that becouse I want to save geospatial-data, and SimpleGeo did
>> it using b-tree:
>> http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
>>
>> Thanks!
>>
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>
>
>

Re: b-tree

2011-07-21 Thread Eldad Yamin

Hi Jeffery,
I meant for binary tree. go an watch the video (in my first email), it will
give you a better understanding.

Eldad

On Wed, Jul 20, 2011 at 11:33 PM, Jeffrey Kesselman wrote:

> Im not sure if I have an answer for you, anyway, but I'm curious
>
> A b-tree and a binary tree are not the same thing.  A binary tree is a
> basic fundamental data structure,  A b-tree is an approach to storing and
> indexing data on disc for a database.
>
> Which do you mean?
>
>
> On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin  wrote:
>
>> Hello,
>> Is there any good way of storing a binary-tree in Cassandra?
>> I wonder if someone already implement something like that and how
>> accomplished that without transaction supports (while the tree keep
>> evolving)?
>>
>> I'm asking that becouse I want to save geospatial-data, and SimpleGeo did
>> it using b-tree:
>> http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
>>
>> Thanks!
>>
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>

Re: how to stop the whole cluster, start the whole cluster like in hadoop/hbase?

2011-07-21 Thread Eldad Yamin

I wonder if it wont make problems...
Anyine did it already?
 On Jul 21, 2011 10:39 PM, "Jonathan Ellis"  wrote:
> dsh -c -g cassandra /etc/init.d/cassandra stop
>
> http://www.netfort.gr.jp/~dancer/software/dsh.html.en
>
> P.S. mostly people are concerned about making sure their entire
> cluster does NOT stop at the same time :)
>
> On Thu, Jul 21, 2011 at 2:23 PM, Dean Hiller  wrote:
>> Is there a framework for stopping all nodes/starting all nodes for
>> cassandra?  I am okay with something like password-less ssh setup that
>> hadoop scripts did...just something that allows me to start and stop the
>> whole cluster.
>>
>> thanks,
>> Dean
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: b-tree

2011-07-22 Thread Eldad Yamin

In order order to split the nodes.
SimpleGeo have max 1,000 recods (i.e places) on each node in the tree, if
the number is >1,000 they split the node.
In order to avoid that more then 1 process will edit/split the node -
transaction is needed.
On Jul 22, 2011 1:01 AM, "aaron morton"  wrote:
>> But how will you be able to maintain it while it evolves and new data is
added without transactions?
>
> What is the situation you think you need transactions for ?
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 22 Jul 2011, at 00:06, Eldad Yamin wrote:
>
>> Aaron,
>> Nested set is exactly what I had in mind.
>> But how will you be able to maintain it while it evolves and new data is
added without transactions?
>>
>> Thanks!
>>
>> On Thu, Jul 21, 2011 at 1:44 AM, aaron morton 
wrote:
>> Just throwing out a (half baked) idea, perhaps the Nested Set Model of
trees would work http://en.wikipedia.org/wiki/Nested_set_model
>>
>> * Ever row would represent a set with a left and right encoded into the
key
>> * Members are inserted as columns into *every* set / row they are a
member. So we are de-normalising and trading space for time.
>> * May need to maintain a custom secondary index of the materialised sets.
e.g. slice a row to get the first column >= the left value you are
interested in, that is the key for the set.
>>
>> I've not thought it through much further than that, a lot would depend on
your data. The top sets may get very big, .
>>
>> Cheers
>>
>> -
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 21 Jul 2011, at 08:33, Jeffrey Kesselman wrote:
>>
>>> Im not sure if I have an answer for you, anyway, but I'm curious
>>>
>>> A b-tree and a binary tree are not the same thing. A binary tree is a
basic fundamental data structure, A b-tree is an approach to storing and
indexing data on disc for a database.
>>>
>>> Which do you mean?
>>>
>>> On Wed, Jul 20, 2011 at 4:30 PM, Eldad Yamin  wrote:
>>> Hello,
>>> Is there any good way of storing a binary-tree in Cassandra?
>>> I wonder if someone already implement something like that and how
accomplished that without transaction supports (while the tree keep
evolving)?
>>>
>>> I'm asking that becouse I want to save geospatial-data, and SimpleGeo
did it using b-tree:
>>> http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php
>>>
>>> Thanks!
>>>
>>>
>>>
>>> --
>>> It's always darkest just before you are eaten by a grue.
>>
>>
>

Question about eventually consistent

2011-07-31 Thread Eldad Yamin

Hi,

Let’s say that I have 2 datacenters, a key is changed on both of my
datacenters in the exact same time (even in 1-2 seconds diff).

Datacenter #1 remove a column and Datacenter #2 add 2 new columns.
Is there any problem with consistency or Cassandra will handle this
situation easily.



Thanks!

Question about eventually consistent in Cassandra

2011-08-02 Thread Eldad Yamin

Hi,
Let’s say that I have 2 datacenters, a key is changed on both of my
datacenters in the exact same time (even in 1-2 seconds diff).
Datacenter #1 add column "abc" with value X Datacenter #2 add column "abc"
with value Y.

What is the result of that situation?
Is there any different if the changes will be made withing the same data
center?

Thanks!
Eldad Yamin

HOW TO select a column or all columns that start with X

2011-08-02 Thread Eldad Yamin

Hello,
I wonder if I can select a column or all columns that start with X.
E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns that
start with ABC_ - is that possible?



Thanks!

cassandra consistency level

2011-08-02 Thread Eldad Yamin

Is consistency level "All" for write actually grenty that my data is updated
in all of my node?
is it apply to read actions as-well?

I've read it on the wiki, I just want to make sure.
Thanks!

geo-data in Cassandra

2011-08-02 Thread Eldad Yamin

Hello,
I'm trying to save geo-data in Cassandra,
according to SimpleGeo they did that using nested tree:
http://www.readwriteweb.com/cloud/2011/02/video-simplegeo-cassandra.php


I wonder if someone already implement something like that and how he
accomplished that without transaction supports (while the tree keep
evolving)?
In addition what consistency level he used?

Thanks!

Re: cassandra consistency level

2011-08-03 Thread Eldad Yamin

So what you're saying is that no matter what consistency level I'm using,
the data will be written to all CF nodes right away, the consistency level
is just for making sure that all CF nodes are UP and all data is written.
In other words, if one of the nodes is down - the write (or read) will fail.

I'm asking that because I'm a bit worried with consistency, for example:
Every action that my client is doing is stored in a CF.x in a specific
column by his user_id.
I'm doing that by de-serializing the data that already found in the column,
adding new data (the action), serializing and storing the data.
so I'm worrying that some of the user actions will "drop" due
low-consistency when there are lots of changes to a specific column in a
sort period of time.
I know that I can solve this situation in a different way by storing each
action in a new column etc... but this is just an example that explain my
question in a simple way.

Thanks!

On Wed, Aug 3, 2011 at 3:21 AM, aaron morton wrote:

> Not sure I understand your question exactly, but will take a shot…
>
> Writes are sent to every UP node, the consistency level is how many nodes
> we require to complete before we say the request completed successfully. So
> we also make sure that CL nodes are UP before we start the request. If you
> run CL ALL then Replication Factor nodes must be up for each key you are
> writing.
>
> With the exception of CL ONE reads are also sent to all UP replicas.
>
> Hope that helps.
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3 Aug 2011, at 09:32, Eldad Yamin wrote:
>
> > Is consistency level "All" for write actually grenty that my data is
> updated in all of my node?
> > is it apply to read actions as-well?
> >
> > I've read it on the wiki, I just want to make sure.
> > Thanks!
>
>

Re: HOW TO select a column or all columns that start with X

2011-08-03 Thread Eldad Yamin

Thanks!

On Wed, Aug 3, 2011 at 3:03 PM, aaron morton wrote:

> and AsciiType
>
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 3 Aug 2011, at 16:35, eldad87 wrote:
>
> Thank you!
> Will this situation work only for UTF8Type comparator?
>
>
> On Wed, Aug 3, 2011 at 4:50 AM, Tyler Hobbs  wrote:
>
>> A minor correction:
>>
>> To get all columns starting with "ABC_", you would set column_start="ABC_"
>> and column_finish="ABC`" (the '`' character comes after '_'), and ignore the
>> last column in your results if it happened to be "ABC`".
>>
>> column_finish, or the "slice end" in other clients, is inclusive.  You
>> could of course use "ABC_~" as column_finish and avoid the check if you know
>> that you don't have column names like "ABC_~FOO" that you want to include.
>>
>>
>> On Tue, Aug 2, 2011 at 7:17 PM, aaron morton wrote:
>>
>>> Yup, thats a pretty common pattern. How exactly depends on the client you
>>> are using.
>>>
>>> Say you were using pycassam, you would do a get()
>>> http://pycassa.github.com/pycassa/api/pycassa/columnfamily.html#pycassa.columnfamily.ColumnFamily.get
>>>
>>> with column_start="ABC_" , count to whatever, and column_finish not
>>> provided.
>>>
>>> You can also provide a finish and use the highest encoded character, e.g.
>>> ascii 126 is ~ so if you used column_finish = "ABC_~" you would get
>>> everything that starts with ABC_
>>>
>>> Cheers
>>>
>>>  -
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 3 Aug 2011, at 09:28, Eldad Yamin wrote:
>>>
>>> Hello,
>>> I wonder if I can select a column or all columns that start with X.
>>> E.g I have columns ABC_1, ABC_2, ZZZ_1 and I want to select all columns
>>> that start with ABC_ - is that possible?
>>>
>>>
>>>
>>> Thanks!
>>>
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> Software Engineer, DataStax <http://datastax.com/>
>> Maintainer of the pycassa <http://github.com/pycassa/pycassa> Cassandra
>> Python client library
>>
>>
>
>

Install Cassandra on EC2

2011-08-03 Thread Eldad Yamin

Hi,
Is there any manual or important notes I should know before I try to install
Cassandra on EC2?

Thanks!

Re: Install Cassandra on EC2

2011-08-03 Thread Eldad Yamin

Thanks!
But I prefer to learn how to Install first - if you have any good references
(I didn't find any, even general installation for a EC2/regular machine)
I'm also going to try and install Solandra, I hope that Whirr will support
it in the near future.

On Wed, Aug 3, 2011 at 5:43 PM, John Conwell  wrote:

> One thing you might want to look at is the Apache Whirr project (which is
> awesome by the way!).  It automagically handles spinning up a cluster of
> resources on EC2 (or rackspace for that matter), installing and configuring
> cassandra, and starting it.
>
> One thing to be aware of if you go this route.  By default in the yaml file
> all data is written under the /var folder.  But on a server started by
> Whirr, this folder only has something like 4gb.  Most of the  hard disk
> space is under the /mnt folder.  So you'll either need to change what
> folders are pointed to what drives (not sure if you can or not...I'm sure
> you could), or change the yaml file to point the /mnt folder.
>
>
> On Wed, Aug 3, 2011 at 6:28 AM, Eldad Yamin  wrote:
>
>> Hi,
>> Is there any manual or important notes I should know before I try to
>> install Cassandra on EC2?
>>
>> Thanks!
>>
>
>
>
> --
>
> Thanks,
> John C
>
>

Cassandra and Solandra Installation guid

2011-08-03 Thread Eldad Yamin

Hi,
I'd like to get tutorials on how to install Cassandra and Solandra - I
couldn't find anything helpful.
In addition, how to use (index/search) Solandra tutorials will be great.


Thanks!

Installation Exception

2011-08-03 Thread Eldad Yamin

Hi,
I'm trying to install Cassandra on Amazon EC2 without success, this is what
I did:

   1. Created new "Small" EC2 instance (this is just for testing), running
   Ubuntu OS - custom AIM (ami-596f3c1c) from:
   http://uec-images.ubuntu.com/releases/11.04/release/
   2. Installed Java:
   # sudo add-apt-repository "deb http://archive.canonical.com/ lucid
   partner"
   # sudo apt-get update
   # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
   openjdk-6-jre
   3. Upgraded:
   # sudo apt-get upgrade
   4. Downloaded Cassandra:
   # cd /usr/src/
   # sudo wget
   http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz

   # sudo tar xvfz apache-cassandra-*
   # cd apache-cassandra-*
   5. Config (according to README.txt)
   # sudo mkdir -p /var/log/cassandra
   # sudo chown -R `whoami` /var/log/cassandra
   # sudo mkdir -p /var/lib/cassandra
   # sudo chown -R `whoami` /var/lib/cassandra
   6. RUN CASSANDRA
   # bin/cassandra -f

The I got Exception:
"ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$ bin/cassandra
-f
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/cassandra/thrift/CassandraDaemon
Caused by: java.lang.ClassNotFoundException:
org.apache.cassandra.thrift.CassandraDaemon
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon.
Program will exit."


Any idea what is wrong?
Thanks!

Re: Installation Exception

2011-08-03 Thread Eldad Yamin

Thanks Jonathan,
I saw the EC2 AMI that was made by datastax - I prefer not to use it becuse
I want to learn how to install Cassandra first.

On Wed, Aug 3, 2011 at 8:03 PM, Jonathan Ellis  wrote:

>
> http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
>
> On Wed, Aug 3, 2011 at 10:44 AM, Eldad Yamin  wrote:
> > Hi,
> > I'm trying to install Cassandra on Amazon EC2 without success, this is
> what
> > I did:
> >
> > Created new "Small" EC2 instance (this is just for testing), running
> Ubuntu
> > OS - custom AIM (ami-596f3c1c) from:
> > http://uec-images.ubuntu.com/releases/11.04/release/
> > Installed Java:
> > # sudo add-apt-repository "deb http://archive.canonical.com/ lucid
> partner"
> > # sudo apt-get update
> > # sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
> > openjdk-6-jre
> > Upgraded:
> > # sudo apt-get upgrade
> > Downloaded Cassandra:
> > # cd /usr/src/
> > # sudo wget
> >
> http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz
> > # sudo tar xvfz apache-cassandra-*
> > # cd apache-cassandra-*
> > Config (according to README.txt)
> > # sudo mkdir -p /var/log/cassandra
> > # sudo chown -R `whoami` /var/log/cassandra
> > # sudo mkdir -p /var/lib/cassandra
> > # sudo chown -R `whoami` /var/lib/cassandra
> > RUN CASSANDRA
> > # bin/cassandra -f
> >
> > The I got Exception:
> > "ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$
> bin/cassandra
> > -f
> > Exception in thread "main" java.lang.NoClassDefFoundError:
> > org/apache/cassandra/thrift/CassandraDaemon
> > Caused by: java.lang.ClassNotFoundException:
> > org.apache.cassandra.thrift.CassandraDaemon
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
> > Could not find the main class:
> org.apache.cassandra.thrift.CassandraDaemon.
> > Program will exit."
> >
> > Any idea what is wrong?
> > Thanks!
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com
>

Re: Installation Exception

2011-08-03 Thread Eldad Yamin

Thanks! I missed that lol!
BTW, how do I compile it?

Thanks!

On Wed, Aug 3, 2011 at 6:51 PM, samal  wrote:

> did u compile source code? :)
> you have downloaded source code not binary.
>
> try with binary.
>
> On Wed, Aug 3, 2011 at 9:14 PM, Eldad Yamin  wrote:
>
>> Hi,
>> I'm trying to install Cassandra on Amazon EC2 without success, this is
>> what I did:
>>
>>1. Created new "Small" EC2 instance (this is just for testing),
>>running Ubuntu OS - custom AIM (ami-596f3c1c) from:
>>http://uec-images.ubuntu.com/releases/11.04/release/
>>2. Installed Java:
>># sudo add-apt-repository "deb http://archive.canonical.com/ lucid
>>partner"
>># sudo apt-get update
>># sudo apt-get install sun-java6-jre sun-java6-plugin sun-java6-fonts
>>openjdk-6-jre
>>3. Upgraded:
>># sudo apt-get upgrade
>>4. Downloaded Cassandra:
>># cd /usr/src/
>># sudo wget
>>
>> http://apache.mivzakim.net//cassandra/0.8.2/apache-cassandra-0.8.2-src.tar.gz
>>
>># sudo tar xvfz apache-cassandra-*
>># cd apache-cassandra-*
>>5. Config (according to README.txt)
>># sudo mkdir -p /var/log/cassandra
>># sudo chown -R `whoami` /var/log/cassandra
>># sudo mkdir -p /var/lib/cassandra
>># sudo chown -R `whoami` /var/lib/cassandra
>>6. RUN CASSANDRA
>># bin/cassandra -f
>>
>> The I got Exception:
>> "ubuntu@ip-10-170-31-128:/usr/src/apache-cassandra-0.8.2-src$
>> bin/cassandra -f
>> Exception in thread "main" java.lang.NoClassDefFoundError:
>> org/apache/cassandra/thrift/CassandraDaemon
>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.cassandra.thrift.CassandraDaemon
>> at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
>> Could not find the main class:
>> org.apache.cassandra.thrift.CassandraDaemon. Program will exit."
>>
>>
>> Any idea what is wrong?
>> Thanks!
>>
>
>

Solandra

2011-08-03 Thread Eldad Yamin

Hello,
I have a cluster of 3 Cassandra nodes and I would like to start using
Solandra.
1. How can I install Solandra and make use the existing nodes?
2. Will it be better to install Solandra on a new node and add it to the
existing cluster?
3. How Solandra index, does it operate automatically or I need to "tell"
Solandra to index CF.keys every time a new key is create or update?

Thanks!

Re: Planet Cassandra (an aggregation site for Cassandra News)

2011-08-03 Thread Eldad Yamin

Great! I hope it will be open soon!

On Wed, Aug 3, 2011 at 10:33 PM, Ed Anuff  wrote:

> Awesome, great news!
>
>
> On Wed, Aug 3, 2011 at 11:53 AM, Lynn Bender  wrote:
>
>> Greetings all,
>>
>> I just wanted to send a note out to let everyone know about Planet
>> Cassandra -- an aggregation site for Cassandra news and blogs. Andrew
>> Llavore from DataStax and I built the site.
>>
>> We are currently waiting for approval from the Apache Software Foundation
>> before we publicly launch. However, in the meantime, we'd love to hear from
>> you. If you have any favorite Cassandra-related blogs, or blogs that
>> frequently contain quality Cassandra content, please send us the URL, so
>> that we can contact the author about including a site feed.
>>
>> If you have any questions or comments, please send them to
>> pla...@geekaustin.org.
>>
>> -Lynn Bender
>>
>> --
>> -Lynn Bender
>> http://geekaustin.org
>> http://linuxagainstpoverty.org
>> http://twitter.com/linearb
>> http://twitter.com/geekaustin
>>
>>
>>
>>
>

Re: Install Cassandra on EC2

2011-08-03 Thread Eldad Yamin

HI Aaron,
Thanks for your replay.

I've already saw that, but at the moment I'm interesting in installing
Cassandra from scratch - I want to learn.
well, yesterday I've installed 1 node - now I'm looking on how to add more
nodes and read more about Cassandra's tools (node reaper etc.)

Thanks!

On Thu, Aug 4, 2011 at 1:23 AM, aaron morton wrote:

> Pre build AMI here
>
> http://www.datastax.com/dev/blog/setting-up-a-cassandra-cluster-with-the-datastax-ami
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4 Aug 2011, at 03:24, Jeremy Hanna wrote:
>
> Some quick thoughts that might be helpful:
>
> - use ephemeral instances and RAID0 over the local volumes for both
> cassandra's data as well as the log directory.  The log directory because if
> you crash due to heap size, the heap dump will be stored in the log
> directory.  you don't want that to go in your root/OS partition.
>
> - probably want to stripe across AZs so that a single AZ failure doesn't
> affect you as much.
>
> - for seeds, it's nice to use elastic ips so that your seed configuration
> doesn't have to change if a node is replaced.
>
> - the ec2snitch makes it so each AZ appears as a rack wrt topology -
> simpler as it inspects the ec2 metadata.  if you need more than one DC in
> your cluster (we need a second virtual DC for analytics), you'll probably
> want to use the property file snitch.  there's a cross region ec2snitch
> that's coming in 1.0.
>
> would probably be good to add some ec2 specific tips in the wiki.  the page
> that dave mentioned is a good step-by-step, but there's been a lot of
> community knowledge accumulated about best practices in the year since that
> was done.
>
> On Aug 3, 2011, at 8:28 AM, Eldad Yamin wrote:
>
> Hi,
>
> Is there any manual or important notes I should know before I try to
> install Cassandra on EC2?
>
>
> Thanks!
>
>
>
>

Re: Planet Cassandra (an aggregation site for Cassandra News)

2011-08-07 Thread Eldad Yamin

Great!
If possible, please blog about full-text-search options + how to use
them (Solandra, Elastic Search, Sphinx etc).

Thanks!

On Sun, Aug 7, 2011 at 5:58 AM, Edward Capriolo wrote:

>
>
> On Thu, Aug 4, 2011 at 5:12 AM, Boris Yen  wrote:
>
>> Looking forward to it. ^^
>>
>> On Thu, Aug 4, 2011 at 1:56 PM, Eldad Yamin  wrote:
>>
>>> Great! I hope it will be open soon!
>>>
>>>
>>> On Wed, Aug 3, 2011 at 10:33 PM, Ed Anuff  wrote:
>>>
>>>> Awesome, great news!
>>>>
>>>>
>>>> On Wed, Aug 3, 2011 at 11:53 AM, Lynn Bender  wrote:
>>>>
>>>>> Greetings all,
>>>>>
>>>>> I just wanted to send a note out to let everyone know about Planet
>>>>> Cassandra -- an aggregation site for Cassandra news and blogs. Andrew
>>>>> Llavore from DataStax and I built the site.
>>>>>
>>>>> We are currently waiting for approval from the Apache Software
>>>>> Foundation before we publicly launch. However, in the meantime, we'd love 
>>>>> to
>>>>> hear from you. If you have any favorite Cassandra-related blogs, or blogs
>>>>> that frequently contain quality Cassandra content, please send us the URL,
>>>>> so that we can contact the author about including a site feed.
>>>>>
>>>>> If you have any questions or comments, please send them to
>>>>> pla...@geekaustin.org.
>>>>>
>>>>> -Lynn Bender
>>>>>
>>>>> --
>>>>> -Lynn Bender
>>>>> http://geekaustin.org
>>>>> http://linuxagainstpoverty.org
>>>>> http://twitter.com/linearb
>>>>> http://twitter.com/geekaustin
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> I have started a blog to support the High Performance Cassandra Cookbook:
>
> http://www.jointhegrid.com/highperfcassandra/
>
> I am going to use blog to continue writing about features and tips for
> Cassandra in the writing style used for the book.
>
> Lynn, please consider it for syndication. All others, please enjoy.
>
>

Re: Best practices when deploying & upgrading a cassandra cluster

2011-08-14 Thread Eldad Yamin

Is there any good reason why shouldn't we build the latest version from
source?

Thanks!
On Fri, Aug 12, 2011 at 12:18 AM, aaron morton wrote:

> In a non dev system it's a lot easier to use the packages
> http://wiki.apache.org/cassandra/DebianPackaging
> http://www.datastax.com/docs/0.8/install/packaged_releases
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12 Aug 2011, at 02:30, Martin Lansler wrote:
>
> (Note: This is a repost from another thread which did not have a
> relevant subject, sorry for the spamming)
>
> Hi Eldad / All,
>
> On Wed, Aug 10, 2011 at 8:32 AM, Eldad Yamin  wrote:
>
> Can you please explain how did you upgraded. something like step-by-step.
>
> Thanks!
>
>
> I took the liberty of replying to the group as it would be interesting
> to hear how other folks out there are doing it...
>
> I'm *not* running a prod system, just a test system of three nodes on
> my laptop. So it would be nice to hear about real setups. Here is my
> test setup:
>
> apache-cassandra -> apache-cassandra-0.8.3
> apache-cassandra-0.8.2/
> apache-cassandra-0.8.3/
> node1/
> node2/
> node3/
>
> All nodeX look like:
> bin -> ../apache-cassandra/bin/
> commitlog/
> conf/
> data/
> interface -> ../apache-cassandra/interface/
> lib -> ../apache-cassandra/lib/
> saved_caches/
>
> The 'conf' directory is copied into each node from the virgin
> cassandra distribution. I then create a local GIT repo and add the
> 'conf' directory so I can track any configuration changes on a node.
> Then relevant node specific configuration settings are set. The
> 'commitlog', 'data' and 'saved_caches' are created by cassandra and
> must be configured in 'cassandra.yaml' for each node.
>
> When I upgrade I do the following:
>
> 1.
> Make a diff of the new conf files from the new version so that  get
> new parameters etc... I use emacs ediff-mode.
> 2.
> Remove the old "apache-cassandra" symlink and point it to the new cassandra
> dist
> 3.
> In a rolling fashion stop one node, and then restart it... as the
> symlink is changes it will then boot with the upgraded cassandra dist.
> (remember to cd out & in of the bin/ dir otherwise you will still be
> in the old directory).
> (4).
> Should something break... just re-create the old symlink and restart
> the node (provided cassandra has not performed any non backwards
> compatible changes to the db files, should be noted in the README)
>
> That's pretty much it.
>
> On a prod setup one would probably use a tool such as puppet
> (www.puppetlabs.com/) to ease setting up on many nodes... But there
> are many ways to do this, for instance pssh
> (http://code.google.com/p/parallel-ssh/).
>
> Regards,
> -Martin
>
>
>

Re: Cassandra London: failure modes and HBase

2011-08-17 Thread Eldad Yamin

HI Dave,
unfortunately, me and some guys that are very interesting won't be able to
get all the way to London.
Can you please consider using a video streaming service?

I recommend on using Watchitoo.com (I used to work there)
At the moment its free.

Thanks!

On Tue, Aug 16, 2011 at 12:47 PM, Dave Gardner  wrote:

> Hi all,
>
> I'm pleased to announce our next Cassandra meetup on 5th September in
> London.
>
> http://www.meetup.com/Cassandra-London/events/29668191/
>
> We will be looking at failure modes in Cassandra (how it deals with nodes
> failing and returning etc..) as well as a comparison with HBase.  It's a
> great opportunity to meet other users of Cassandra, so please come along!
>
>
> Dave
>

Re: Schema advice: (Single row or multiple row!?) How do I store millions of columns when I need to read a set of around 500 columns at a single read query using column names ?

Re: Any meet ups in southern california

Re: Pre-CassandraSF Happy Hour on Sunday

Cassandra Secondary index/Twissandra

Re: Cassandra Secondary index/Twissandra

Re: Cassandra Secondary index/Twissandra

b-tree

Re: b-tree

Re: b-tree

Re: how to stop the whole cluster, start the whole cluster like in hadoop/hbase?

Re: b-tree

Question about eventually consistent

Question about eventually consistent in Cassandra

HOW TO select a column or all columns that start with X

cassandra consistency level

geo-data in Cassandra

Re: cassandra consistency level

Re: HOW TO select a column or all columns that start with X

Install Cassandra on EC2

Re: Install Cassandra on EC2

Cassandra and Solandra Installation guid

Installation Exception

Re: Installation Exception

Re: Installation Exception

Solandra

Re: Planet Cassandra (an aggregation site for Cassandra News)

Re: Install Cassandra on EC2

Re: Planet Cassandra (an aggregation site for Cassandra News)

Re: Best practices when deploying & upgrading a cassandra cluster

Re: Cassandra London: failure modes and HBase

30 matches

Site Navigation

Mail list logo

Footer information