Re: Cassandra 0.7.0 Release in Riptano public repository?

2011-02-12 Thread Rauan Maemirov
When I try `rpm -i riptano-release-5-1.el6.noarch.rpm`, it just freeze. Does
repository work?

2011/1/12 Michael Fortin 

> Thanks for your thoughtful and detailed replies Eric, it's much
> appreciated.
>
> Mike
>
> On Jan 11, 2011, at 11:23 AM, Eric Evans wrote:
>
> > On Tue, 2011-01-11 at 09:23 -0500, Michael Fortin wrote:
> >> This my understanding of 0.* releases.
> >> - They're not considered production ready by the maintainers
> >> - They subject to changes that break backwards compatibility
> >> - Generally poorly documented because the api is so volatile
> >> - Previous releases are unsupported
> >>
> >> for 1.* releases
> >> - The maintainer is saying this is tested and production ready,
> >> sometimes also marked as Final for GA
> >> - Minor releases do not break backward compatibility
> >> - The major and minor release have some level of support, with open
> >> source, that usually means docs and mailing lists but they should be
> >> very active.
> >> - thoroughly documented
> >
> > FWIW, your interpretation of what it means to be 1.0, is not wholly
> > unique, but it's far from universal either.
> >
> >> Sorting through the issue tracker is a little to fine grained to get a
> >> big picture view of where cassandra is going.
> >
> > Sorry, I should have been more clear here.
> >
> > The closest we have to a roadmap are the tickets that are marked as
> > blocking the next release, you shouldn't have to do any digging, they're
> > all available in one view here:
> >
> >
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310865&fixfor=12314820
> >
> > But, it's pretty fluid for the first few months after a new release.
> >
> >> And, just to be clear, I'm not questioning the maintainers approach,
> >> just humbling asking for a little more clarification.  Cassandra is
> >> awesome, and I'm itching to use it on some production projects where I
> >> think it would be a great fit, but 0.* designation scares me a little.
> >> Of course, a hastily released 1.* would be worse.
> >
> > I understand, but what I'm saying is a "1.0" release in this context
> > carries special significance that just doesn't map well to open source
> > projects.  And, in addition to being subjective, your criteria differs
> > from that of many people.  It might make things easier to just version
> > some future release 1.0 and be done with it, but I'd rather be honest
> > with you.
> >
> > This is honest:
> >
> > * We treated the Google code dump in 2008 as 0.1.0 (though no formal
> > release was made).
> > * We likewise treated the Apache code dump in 2009 as 0.2.0 (again, no
> > formal release).
> > * We called the first release under the Apache Incubator 0.3.0.
> > * We just now released 0.7.0.
> > * We maintain backward compatibility between the "minor" and "revision",
> > that is 0.6.1, 0.6.2, 0.6.3, etc.
> >
> > This is why I said my preference would be to just drop the leading 0.
> > We've been using the minor like a major, and the revision like a minor,
> > (and we haven't had need for a revision).  We've had 7 major releases,
> > (5 if you only want to count what's happened under Apache).
> >
> > Also:
> >
> > * Most of the "maintainers" would tell you that it is production-ready,
> > but then, they might be biased since most of them are running it in
> > production. YMMV.
> > * It is as poorly documented as most FLOSS projects.
> > * We provide support through the issue tracker, mailing lists, and IRC,
> > and you can purchase support contracts through Riptano.
> >
> >
> > --
> > Eric Evans
> > eev...@rackspace.com
> >
>
>


Queries on secondary indexes

2011-02-18 Thread Rauan Maemirov
With this schema:

create column family Userstream with comparator=UTF8Type and rows_cached =
1 and keys_cached = 10
and column_metadata=[{column_name:account_id, validation_class:IntegerType,
index_type: 0, index_name:UserstreamAccountidIdx},
{column_name:from_id, validation_class:IntegerType, index_type: 0,
index_name:UserstreamFromidIdx},
{column_name:type, validation_class:IntegerType, index_type: 0,
index_name:UserstreamTypeIdx}];

I'm having this:

[default@Keyspace1] get Userstream where from_id=5 and type<4;
---
RowKey: 23:feed:12980301937245
=> (column=account_id, value=23, timestamp=1298031252270173)
=> (column=activities,
value=5b2232313864333936302d336235362d313165302d393838302d666235613434333135343865225d,
timestamp=1298031252270173)
=> (column=from_id, value=5, timestamp=1298031252270173)
=> (column=type, value=5, timestamp=1298031252270173)
---
RowKey: 5:feed:12980301937196
=> (column=account_id, value=5, timestamp=1298031252270173)
=> (column=activities,
value=5b2232313863376339302d336235362d313165302d623536342d666235303739333835303234225d,
timestamp=1298031252270173)
=> (column=from_id, value=5, timestamp=1298031252270173)
=> (column=type, value=5, timestamp=1298031252270173)
---
RowKey: 9:feed:12980301937207
=> (column=account_id, value=9, timestamp=1298031252270173)
=> (column=activities,
value=5b2232313863613637302d336235362d313165302d39622d373530393638613764326561225d,
timestamp=1298031252270173)
=> (column=from_id, value=5, timestamp=1298031252270173)
=> (column=type, value=0, timestamp=1298031252270173)

3 Rows Returned.


and

[default@Keyspace1] get Userstream where from_id=5 and type=5;

0 Row Returned.



What's wrong with it?


Re: Queries on secondary indexes

2011-02-19 Thread Rauan Maemirov
Hm, phpcassa works great, I guess that is the problem of cli. Now it doesn't
return any rows at all.

2011/2/18 Rauan Maemirov 

> With this schema:
>
> create column family Userstream with comparator=UTF8Type and rows_cached =
> 1 and keys_cached = 10
> and column_metadata=[{column_name:account_id, validation_class:IntegerType,
> index_type: 0, index_name:UserstreamAccountidIdx},
>  {column_name:from_id, validation_class:IntegerType, index_type: 0,
> index_name:UserstreamFromidIdx},
> {column_name:type, validation_class:IntegerType, index_type: 0,
> index_name:UserstreamTypeIdx}];
>
> I'm having this:
>
> [default@Keyspace1] get Userstream where from_id=5 and type<4;
> ---
> RowKey: 23:feed:12980301937245
> => (column=account_id, value=23, timestamp=1298031252270173)
> => (column=activities,
> value=5b2232313864333936302d336235362d313165302d393838302d666235613434333135343865225d,
> timestamp=1298031252270173)
> => (column=from_id, value=5, timestamp=1298031252270173)
> => (column=type, value=5, timestamp=1298031252270173)
> ---
> RowKey: 5:feed:12980301937196
> => (column=account_id, value=5, timestamp=1298031252270173)
> => (column=activities,
> value=5b2232313863376339302d336235362d313165302d623536342d666235303739333835303234225d,
> timestamp=1298031252270173)
> => (column=from_id, value=5, timestamp=1298031252270173)
> => (column=type, value=5, timestamp=1298031252270173)
> ---
> RowKey: 9:feed:12980301937207
> => (column=account_id, value=9, timestamp=1298031252270173)
> => (column=activities,
> value=5b2232313863613637302d336235362d313165302d39622d373530393638613764326561225d,
> timestamp=1298031252270173)
> => (column=from_id, value=5, timestamp=1298031252270173)
> => (column=type, value=0, timestamp=1298031252270173)
>
> 3 Rows Returned.
>
>
> and
>
> [default@Keyspace1] get Userstream where from_id=5 and type=5;
>
> 0 Row Returned.
>
>
>
> What's wrong with it?
>


Data model for activity feed

2011-02-19 Thread Rauan Maemirov
Hi, with the help of twissandra example, I tried to create a scheme for
activity feed.

Activities cf stores all activities:

Activities: {
  '4d5fbfdf47878': {
'user': 'larry',
'activity_type': 'comment',
...
activity info
  }
}

And userfeed cf stores each user's feed sorted by timestamp:

Userfeed: {
  'larry': {
1298120671317982: '4d5fbfdf47878',
1298120671392978: '4d5fbfdf5ff12',
1298120671417977: '4d5fbfdf660b9',
1298120671517972: '4d5fbfdf7e754',
1298120671542971: '4d5fbfdf848fb'
  }
}

That was simple. Now I want to group all similar activities by some little
time period (like it works in facebook and so on). The idea is to find
latest activity from user by activity_type and move it to the new activity
(in other words, merge activities and update timestamp). So it will be
something like:

Userfeed: {
  'larry': {
1298120671317982: '4d5fbfdf47878',
1298120671417977: '["4d5fbfdf660b9", "4d5fbfdf5ff12"]', //json encoded
array of activities
1298120671517972: '4d5fbfdf7e754',
1298120671542971: '["4d5fbfdf848fb", "4d5fbfdfd3e74"]'
  }
}

Is it okay? Maybe somebody did this more elegant way.
Maybe I should one more intermediate cf for grouped events?

The other problem is that here I can say that "Larry commented item1, item2
and item3", but how can be done "Larry, Curly and Hermes commented item1"?


Re: Data model for activity feed

2011-02-21 Thread Rauan Maemirov
Any advices? Maybe I should group events from application? Wouldn't it be to
much overhead?

2011/2/19 Rauan Maemirov 

> Hi, with the help of twissandra example, I tried to create a scheme for
> activity feed.
>
> Activities cf stores all activities:
>
> Activities: {
>   '4d5fbfdf47878': {
> 'user': 'larry',
> 'activity_type': 'comment',
> ...
> activity info
>   }
> }
>
> And userfeed cf stores each user's feed sorted by timestamp:
>
> Userfeed: {
>   'larry': {
> 1298120671317982: '4d5fbfdf47878',
> 1298120671392978: '4d5fbfdf5ff12',
> 1298120671417977: '4d5fbfdf660b9',
> 1298120671517972: '4d5fbfdf7e754',
> 1298120671542971: '4d5fbfdf848fb'
>   }
> }
>
> That was simple. Now I want to group all similar activities by some little
> time period (like it works in facebook and so on). The idea is to find
> latest activity from user by activity_type and move it to the new activity
> (in other words, merge activities and update timestamp). So it will be
> something like:
>
> Userfeed: {
>   'larry': {
> 1298120671317982: '4d5fbfdf47878',
> 1298120671417977: '["4d5fbfdf660b9", "4d5fbfdf5ff12"]', //json encoded
> array of activities
> 1298120671517972: '4d5fbfdf7e754',
> 1298120671542971: '["4d5fbfdf848fb", "4d5fbfdfd3e74"]'
>   }
> }
>
> Is it okay? Maybe somebody did this more elegant way.
> Maybe I should one more intermediate cf for grouped events?
>
> The other problem is that here I can say that "Larry commented item1, item2
> and item3", but how can be done "Larry, Curly and Hermes commented item1"?
>
>


Determining the issues of marking node down

2011-04-30 Thread Rauan Maemirov
I have a test cluster with 3 nodes, earlier I've installed OpsCenter to
watch my cluster. Every day I see, that the same one node goes down (at
different time, but every day). Then I just run `service cassandra start` to
fix that problem. system.log doesn't show me anything strange. What are the
steps to determine issues? I didn't change logging properties (and
cassandra.yaml is not far away from the default), so maybe there must be
some options to be switched to debug?

Btw, the node that goes down is the most loaded (in storage capacity). Maybe
the problem is in OPP?
Once I've ran loadbalance command and it changed token for the first node
from 0 to one of the keys (without touching another 2, I've generated tokens
with tokens.py).


Re: Poor performance; PHP & Thrift to blame

2010-03-30 Thread Rauan Maemirov
What about APC? Did you turn it on?

2010/3/30 Julian Simon :
> Hi,
>
> I've been trying to benchmark Cassandra for our use case and have been
> seeing poor performance on both writes and (extremely) poor
> performance on reads.
>
> Using Cassandra 0.51 stable & thrift-0.2.0.
>
> It turns out all the CPU time is going to the PHP client process - the
> JVM operating the Cassandra server isn't breaking much of a sweat.
>
> For reads the latency is often up to 1 second to fetch a row
> containing ~2000 columns, or around 300ms to fetch a 500-column wide
> row.  This is with get_slice(), and a predicate specifying the start &
> finish range.
>
> Using cachegrind and inspecting the code inside the Thrift bindings
> makes it pretty clear why the performance is so bad, particularly on
> reads. The biggest culprit is the translation code which casts data
> back and forth into binary representations for sending over the wire
> to the Cassandra server.
>
> There seems to be some 32-bit specific code which iterates heavily
> apparently due to a limitation in PHPs implementation of LONGs.
>
> However, testing on a 64-bit host doesn't yield any performance improvement.
>
> More surprisingly, if I compile and enable the PHP native thrift
> bindings (following this guide
> https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP)
> read performance actually degrades by another 50%.  I have verified
> that the Thrift code is recognizing and using the native PHP functions
> provided by the library.
>
> I've tested all of this on both 32-bit and 64-bit installations of
> both PHP 5.1 & 5.2.  Results are the same in all cases.
>
> My environment is on vanilla CentOS 5.4 server installations inside
> VMWare on a 4 core 64bit host with plenty of RAM and fast disks.
>
> Has anyone been able to produce decent performance with PHP &
> Cassandra?  If so, how have you done it?
>
> Thanks,
> Jules
>