Re: New User: OSX vs. Debian on Cassandra 0.5.0 with Thrift

2010-04-14 Thread Zhiguo Zhang
Hi,

sorry I can't help you, but could you please tell me, how could you get the
charts in the attachment?
Thanks.

Mike

On Wed, Apr 14, 2010 at 6:38 AM, Heath Oderman  wrote:

> Hi,
>
> I wrote a few days ago and got a few good suggestions.  I'm still seeing
> dramatic differences between Cassandra 0.5.0 on OSX vs. Debian Linux.
>
> I've tried on Debian with the Sun JRE and the Open JDK with nearly
> identical results. I've tried a mix of hardware.
>
> Attached are some graphs I've produced of my results which show that in
> OSX, Cassandra takes longer with a greater load but is wicked fast
> (expected).
>
> In the SunJDK or Open JDK on Debian I get amazingly consistent time taken
> to do the writes, regardless of the load and the times are always
> ridiculously high.  It's insanely slow.
>
> I genuinely believe that I must be doing something very wrong in my Debian
> setups, but they are all vanilla installs, both 64 bit and 32 bit machines,
> 64bit and 32 bit installs.  Cassandra packs taken from
> http://www.apache.org/dist/cassandra/debian.
>
> I am using Thrift, and I'm using a c# client because that's how I intend to
> actually use Cassandra and it seems pretty sensible.
>
> An example of what I'm seeing is:
>
> 5 Threads Each writing 100,000 Simple Entries
> OSX: 1 min 16 seconds ~ 6515 Entries / second
> Debian: 1 hour 15 seconds ~ 138 Records / second
>
> 15 Threads Each writing 100,000 Simple Entries
> OSX: 2min 30 seconds seconds writing ~10,000 Entries / second
> Debian: 1 hour 1.5 minutes ~406 Entries / second
>
> 20 Threads Each Writing 100,000 Simple Entries
> OSX: 3min 19 seconds ~ 10,050 Entries / second
> Debian: 1 hour 20 seconds ~ 492 Entries / second
>
> If anyone has any suggestions or pointers I'd be glad to hear them.
> Thanks,
> Stu
>
> Attached:
> 1. CassLoadTesting.ods (all my results and graphs in OpenOffice format
> downloaded from Google Docs)
> 2. OSX Records per Second - a graph of how many entries get written per
> second for 10,000 & 100,000 entries as thread count is increased in OSX.
> 3. Open JDK Records per Second - the same graph but of Open JDK on Debian
> 4. Open JDK Total Time By Thread - the total time taken from test start to
> finish (all threads completed) to write 10,000 & 100,000 entries as thread
> count is increased in Debian with Open JDK
> 5. OSX Total time by Thread - same as 4, but for OSX.
>
>
>


Re: History values

2010-04-14 Thread Zhiguo Zhang
I think it is still to young, and have to wait or write your self the
"graphical console", at least, I don't find any until now.

On Wed, Apr 14, 2010 at 10:04 AM, Bertil Chapuis  wrote:

> I'm also new to cassandra and about the same question I asked me if using
> super columns with one key per version was feasible. Is there limitations to
> this use case (or better practices)?
>
> Thank you and best regards,
>
> Bertil Chapuis
>
> On 14 April 2010 09:45, Sylvain Lebresne  wrote:
>
>> > I am new to using cassandra. In the documentation I have read,
>> understand,
>> > that as in other non-documentary databases, to update the value of a
>> > key-value tuple, this new value is stored with a timestamp different but
>> > without entirely losing the old value.
>> > I wonder, as I can restore the historic values that have had a
>> particular
>> > field.
>>
>> You can't. Upon update, the old value is lost.
>> From a technical standpoint, it is true that this old value is not
>> deleted (from disk)
>> right away, but it is deleted eventually by compaction (and you don't
>> really control
>> when the compactions occur).
>>
>> --
>> Sylvain
>>
>
>


Re: Time-series data model

2010-04-14 Thread Zhiguo Zhang
first of all I am a new bee by Non-SQL. I try write down my opinions as
references:

If I were you, I will use 2 columnfamilys:

1.CF,  key is devices
2.CF,  key is timeuuid

how do u think about that?

Mike


On Wed, Apr 14, 2010 at 3:02 PM, Jean-Pierre Bergamin wrote:

> Hello everyone
>
> We are currently evaluating a new DB system (replacing MySQL) to store
> massive amounts of time-series data. The data are various metrics from
> various network and IT devices and systems. Metrics i.e. could be CPU usage
> of the server "xy" in percent, memory usage of server "xy" in MB, ping
> response time of server "foo" in milliseconds, network traffic of router
> "bar" in MB/s and so on. Different metrics can be collected for different
> devices in different intervals.
>
> The metrics are stored together with a timestamp. The queries we want to
> perform are:
>  * The last value of a specific metric of a device
>  * The values of a specific metric of a device between two timestamps t1
> and
> t2
>
> I stumbled across this blog post which describes a very similar setup with
> Cassandra:
> https://www.cloudkick.com/blog/2010/mar/02/4_months_with_cassandra/
> This post gave me confidence that what we want is definitively doable with
> Cassandra.
>
> But since I'm just digging into columns and super-columns and their
> families, I still have some problems understanding everything.
>
> Our data model could look in json'isch notation like this:
> {
> "my_server_1": {
>"cpu_usage": {
>{ts: 1271248215, value: 87 },
>{ts: 1271248220, value: 34 },
>{ts: 1271248225, value: 23 },
>{ts: 1271248230, value: 49 }
>}
>"ping_response": {
>{ts: 1271248201, value: 0.345 },
>{ts: 1271248211, value: 0.423 },
>{ts: 1271248221, value: 0.311 },
>{ts: 1271248232, value: 0.582 }
>}
> }
>
> "my_server_2": {
>"cpu_usage": {
>{ts: 1271248215, value: 23 },
>...
>}
>"disk_usage": {
>{ts: 1271243451, value: 123445 },
>...
>}
> }
>
> "my_router_1": {
>"bytes_in": {
>{ts: 1271243451, value: 2452346 },
>...
>}
>"bytes_out": {
>{ts: 1271243451, value: 13468 },
>...
>}
>"errors": {
>{ts: 1271243451, value: 24 },
>...
>}
> }
> }
>
> What I don't get is how to created the two level hierarchy
> [device][metric].
>
> Am I right that the devices would be kept in a super column family? The
> ordering of those is not important.
>
> But the metrics per device are also a super column, where the columns would
> be the metric values ({ts: 1271243451, value: 24 }), isn't it?
>
> So I'd need a super column in a super column... Hm.
> My brain is definitively RDBMS-damaged and I don't see through columns and
> super-columns yet. :-)
>
> How could this be modeled in Cassandra?
>
>
> Thank you very much
> James
>
>
>


Re: PHP client crashed if a column value > 8192 bytes

2010-04-22 Thread Zhiguo Zhang
Maybe you have to send this message also to thrift user mail list?


On Thu, Apr 22, 2010 at 6:34 AM, Ken Sandney  wrote:

> After many attempts I found this error only occurred when using PHP
> thrift_protocol extension. I don't know if there are some parameters that I
> could adjust for this issue. By the way, without the ext the speed is
> obviously slow.
>
>
> On Thu, Apr 22, 2010 at 12:01 PM, Ken Sandney  wrote:
>
>> I am using PHP as client to talk to Cassandra server but I found out if
>> any column value > 8192 bytes, the client crashed with the following error:
>>
>> PHP Fatal error:  Uncaught exception 'TException' with message 'TSocket:
>>> timed out reading 1024 bytes from 10.0.0.177:9160' in
>>> /home/phpcassa/include/thrift/transport/TSocket.php:264
>>> Stack trace:
>>> #0 /home/phpcassa/include/thrift/transport/TBufferedTransport.php(126):
>>> TSocket->read(1024)
>>> #1 [internal function]: TBufferedTransport->read(8192)
>>> #2 /home/phpcassa/include/thrift/packages/cassandra/Cassandra.php(642):
>>> thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated),
>>> 'cassandra_Cassa...', false)
>>> #3 /home/phpcassa/include/thrift/packages/cassandra/Cassandra.php(615):
>>> CassandraClient->recv_batch_insert()
>>> #4 /home/phpcassa/include/phpcassa.php(197):
>>> CassandraClient->batch_insert('Keyspace1', '38246', Array, 1)
>>> #5 /home/phpcassa/test1.php(51): CassandraCF->insert('38246', Array)
>>> #6 {main}
>>>   thrown in /home/phpcassa/include/thrift/transport/TSocket.php on line
>>> 264
>>>
>>
>> Any idea about this?
>>
>
>


Re: New user asking for advice on database design

2010-04-22 Thread Zhiguo Zhang
do you have read the article "
WTF is a SuperColumn? An Intro to the Cassandra Data
Model"?
link: http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model


it is a good article for data model.


On Thu, Apr 22, 2010 at 10:38 AM, Yésica Rey  wrote:

> Hi David,
>
> I think your arquitecture is right. I'm also new in cassandra, and I ve
> designed my database similar than yours.
> I also think that division than data and indexes is more efficient in the
> queries.
>
> I had not raised your question about put them in a separated keyspaces, but
> I also appreciate any sugestion.
>
> Yess
>
> David Boxenhorn escribió:
>
>  Hi guys! I'm brand new to Cassandara, and I'm working on a database
>> design. I don't necessarily know all the advantages/limitations of
>> Cassandra, so I'm not sure that I'm doing it right...
>>  It seems to me that I can divide my database into two parts:
>>  1. The (mostly) normal data, where every piece of data appears only once
>> (I say "mostly" because I think I need reverse indexes for delete... and
>> once it's there, other things).
>>  2. The indexes, which I use for queries.
>>  Questions:
>>  1. Is the above a good architecture?
>> 2. Would there be an advantage to putting the two parts of the database in
>> different keyspaces? I expect the indexes to change every once in a while as
>> my querying needs progress, but the normal database won't change unless I
>> made a mistake.
>>  Any other advice?
>>
>