thrift/cassandra stacktrace....

2012-07-31 Thread Hiller, Dean
1. I would think thrift would give better errors, I have no idea where to look(thrift or cassandra?). Anyone have any idea how to figure this out? (I was hoping thrift would give astyanax a better error than it did :( )…. 2. Anyone know if I am feeding my composite column info incorrectly belo

Re: composite table with cassandra without using cql3?

2012-08-02 Thread Hiller, Dean
For how to do it with astyanax, you can see here... Lines 310 and 335 https://github.com/deanhiller/nosqlORM/blob/indexing/input/javasrc/com/alva zan/orm/layer3/spi/db/cassandra/CassandraSession.java For how to do with thrift, you could look at astyanax. I use it on that project for indexing f

Re: Use of SSD for commitlog

2012-08-08 Thread Hiller, Dean
Probably not since it is sequential writes….(ie. Seek performance is the big hit and if it is sequential it should not be seeking and is about just as fast as an SSD in theory). In practice, I have not measure the performance of one vs. the other though…that I always the best way to go.(you cou

Re: Use of SSD for commitlog

2012-08-08 Thread Hiller, Dean
your reply maybe I use a 15k RPM SCSI Disk, I think it'll perform better than a SSD disk. On Wed, Aug 8, 2012 at 10:01 AM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: Probably not since it is sequential writes….(ie. Seek performance is the big hit and if it is sequential it

anyone have any performance numbers? and here are some perf numbers of my own...

2012-08-10 Thread Hiller, Dean
** 3. In my test below, I see there is now 8Gig of data and 9,000,000 rows. Does that sound right?, nearly 1MB of space is used per row for a 50 column row That sounds like a huge amount of overhead. (my values are long on every column, but that is still not much). I was expecting KB

Re: anyone have any performance numbers? and here are some perf numbers of my own...

2012-08-10 Thread Hiller, Dean
PM, "Hiller, Dean" wrote: >** 3. In my test below, I see there is now 8Gig of data and 9,000,000 >rows. Does that sound right?, nearly 1MB of space is used per row for a >50 column row That sounds like a huge amount of overhead. (my values >are long on every column,

indexing question related to playOrm on github

2012-08-15 Thread Hiller, Dean
1. Can playOrm be listed on cassandra's list of ORMs? It supports a JQL/HQL query on a trillion rows in under 100ms (partitioning is the trick so you can JQL a partition) 2. Many applications have a common indexing problem and I was wondering if cassandra has or could have any support for t

Re: indexing question related to playOrm on github

2012-08-16 Thread Hiller, Dean
which is not going to happen. Internally when Cassandra updates a secondary index it does the same thing. But it synchronises updates around the same row so one thread will apply the changes at a time. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.

Re: indexing question related to playOrm on github

2012-08-16 Thread Hiller, Dean
Maybe this would be a special type of column family that could contain these as my other tables definitely don't want the feature below by the way. Dean On 8/16/12 6:29 AM, "Hiller, Dean" wrote: >Yes, the synch may work, and no, I do "not" want a transaction

Re: indexing question related to playOrm on github

2012-08-17 Thread Hiller, Dean
saw the error. Hope that helps. - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 17/08/2012, at 12:47 AM, "Hiller, Dean" mailto:dean.hil...@nrel.gov>> wrote: Maybe this would be a special type of column family that could contain thes

Re: Why so slow?

2012-08-20 Thread Hiller, Dean
IF one has 1ms delay per request and the other has .001, 1000 requests will be a one second delay tacked on(which is huge). This is why he suggested multi-threaded ;). Maybe there is some other factors as well. Dean From: Peter Morris mailto:mrpmor...@gmail.com>> Reply-To: "user@cassandra.ap

Re: What is the ideal server-side technology stack to use with Cassandra?

2012-08-20 Thread Hiller, Dean
As far as opinions go, the stack we are using is Playframework 1.2.5 (the stateless nature rocks compared to other platforms like tomcat or servlet container stuff). playOrm Astyanax Later, Dean On 8/17/12 11:54 AM, "Aaron Turner" wrote: >My stack: > >Java + JRuby + Rails + Torquebox > >I'm us

(new nosqlOrm linke) composite table with cassandra without using cql3?

2012-08-20 Thread Hiller, Dean
composite table with cassandra without using cql3? Hi Dean, I'm interested in this too, but I get a 404 with the link below, looks like I can't see your nosqlORM project. -Ben On Thu, Aug 2, 2012 at 9:04 AM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: For ho

Re: Why so slow?

2012-08-20 Thread Hiller, Dean
ve that connecting through a 1Gbps network cable is 14 times slower. I think I get a higher insert rate for SQL Server. On Mon, Aug 20, 2012 at 1:20 PM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: IF one has 1ms delay per request and the other has .001, 1000 requests will be a on

Re: Why so slow?

2012-08-20 Thread Hiller, Dean
nks for correcting me! On Mon, Aug 20, 2012 at 4:32 PM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: There is latency and throughput. These are two totally different things even for MySQL. If you are single threaded, each request (even with MySql) has to be delayed by 1ms or whatev

Re: nodetool , localhost connection refused

2012-08-20 Thread Hiller, Dean
My guess is "telnet localhost 7199" also fails? And if you are on linux and run netstat -anp, you will see no one is listening on that port? So database node did not start and bind to that port and you would see exception in the logs of that database nodeŠ.just a guess. Dean On 8/20/12 4:10 PM,

Re: Cluster per Application vs. Multi-Application Clusters

2012-08-22 Thread Hiller, Dean
Just an opinion here as we are having to do this ourselves loading tons of researchers datasets into one clusters. We are going the path of one keyspace as it makes it easier if you ever want to mine the data so you don't have to keep building different clients for another keyspace. We ended u

Re: Cluster per Application vs. Multi-Application Clusters

2012-08-22 Thread Hiller, Dean
der, etc, etc that is not >the case with C* so getting all data into the same physical "system" >is not as important. > > > >On Wed, Aug 22, 2012 at 8:25 AM, Hiller, Dean >wrote: >> Just an opinion here as we are having to do this ourselves loading tons >>of

Re: Cassandra API Library.

2012-08-23 Thread Hiller, Dean
playOrm has a raw layer that if your columns are not defined ahead of time and SQL with no limitations on <, =, <=, etc. etc. as well as joins being added shortly BUT joins are for joining partitions so that your system can still scale to infinity. Also has an in-memory database as well for unit t

new type of join just discovered on cassandra

2012-08-23 Thread Hiller, Dean
With playOrm we have been researching partitioning and joining partitions for OLTP applications which you typically partition per client anyways such that you can have infinite clients. Naturally, we have been looking at a lot of nested loop join, block nested loop join, sort merge join, and ha

Re: Cassandra API Library.

2012-08-23 Thread Hiller, Dean
email and any >attachments and destroy any copies thereof. Any review, retransmission, >dissemination, copying or other use of, or taking any action in reliance >upon, this information by persons or entities other than the intended >recipient is strictly prohibited. > > >

can you use hostnames in the topology file?

2012-08-27 Thread Hiller, Dean
In the example, I see all ips being used, but our machines are on dhcp so I would prefer using hostnames for everything(plus if a machine goes down, I can bring it back online on another machine with a different ip but same hostname). If I use hostname, does the listen_address have to be hardwir

Re: JMX(RMI) dynamic port allocation problem still exists?

2012-08-27 Thread Hiller, Dean
In cassandra-env.sh, search on JMX_PORT and it is set to 7199 (ie. Fixed) so that solves your issue, correct? Dean From: Yang mailto:tedd...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Monday, August 27, 2012 3

Re: Cassandra 1.1.4 RPM required

2012-08-28 Thread Hiller, Dean
You are probably inside a company and the company has a proxy which is doing basic auth is my guess…try your company username /password or do it from home. Dean From: aaron morton mailto:aa...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org" mailto:use

keyspace and column family creationŠhow to use ConsistencyLevel.ALL with creation?

2012-08-29 Thread Hiller, Dean
The playOrm test suite drops the keyspace and recreates it for tests to wipe out the in-memory or cassandra db. Today, we successfully ran our test suite on a 6 node cluster. The one issue I had though was I needed to sleep after keyspace creation and column family creation. BEFORE that I res

Re: Why Cassandra secondary indexes are so slow on just 350k rows?

2012-08-30 Thread Hiller, Dean
It seems to me you may want to revisit the design(but not 100% sure as I am not sure I understand the entire context) a bit as I could see having partitions and a few clients that poll in each partition so you can scale to infinity basically with no issues. If you are doing all this polling fro

Re: Cassandra and Apache Drill

2012-09-04 Thread Hiller, Dean
Many queries on small portion of the data….sounds like playORM ;). As long as you partition your data with playOrm, you can do really fast queries into that data by partition using Scalabla SQL (SQL with the addition of a partition clause in front as to what partitions you are querying). Joins

anyone know how to lookup non-continguous columns BUT for prefixes?

2012-09-04 Thread Hiller, Dean
I have a row that is an index like so Index row -> ., ., ., . , ., ., . I would like to get all of the pks for which are pk32 and pk7 And which are pk54 This is a trimmed down example of course. I am thinking maybe I might just use the astyanax async to send out 500 requests instead. T

Re: are asynchronous schema updates possible ?

2012-09-04 Thread Hiller, Dean
+1 What kinds of problems? Thanks, Dean From: Илья Шипицин mailto:chipits...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Tuesday, September 4, 2012 1:12 PM To: "user@cassandra.apache.org

playOrm now supports N-level joins on cassandra (no limitations on where clause)

2012-09-04 Thread Hiller, Dean
There is no = or < limitations. Joins are in beta and currently can only do inner joins at this time….Also, queries return a Cursor so you can page as well and keep the cursor in a web server session if needed for paging. It also looks like joins may be faster with cassandra/playOrm vs. DBMS/h

playOrm S-SQL comparison with CQL

2012-09-06 Thread Hiller, Dean
Someone asked so I wrote the difference here https://github.com/deanhiller/playorm/wiki/Fast-Scalable-Queries playOrm queries are geared for a different problem then CQL is geared for. Summary is basically playOrm uses significantly less resources as it only queries the partitions it is interes

cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
So we wrote 1,000,000 rows into cassandra and ran a simple S-SQL(Scalable SQL) query of PARTITIONS n(:partition) SELECT n FROM TABLE as n WHERE n.numShares >= :low and n.pricePerShare >= :price It ran in 60ms So basically playOrm is going to support millions of rows per partition. This is g

Re: cassandra performance looking great...

2012-09-07 Thread Hiller, Dean
: >Try to get Cassandra running the TPH-C benchmarks and beat oracle :) > >On Fri, Sep 7, 2012 at 10:01 AM, Hiller, Dean >wrote: >> So we wrote 1,000,000 rows into cassandra and ran a simple >>S-SQL(Scalable SQL) query of >> >> >> PARTITIONS n(:partition)

any way to prefer just 3 column families for partial row caching

2012-09-10 Thread Hiller, Dean
We have 3 tables for all indexing we do called IntegerIndexing DecimalIndexing StringIndexing playOrm would prefer that only these rows are cached as every row in those tables are indices. Customers/Clients of playOrm tend to always hit the same index rows over and over as they are using the ap

thoughts on this feature request

2012-09-12 Thread Hiller, Dean
Using wide rows for indexing is extremely common. I was wondering if we could get some type of command like so for index rows Remove . AND Add . such that if . is NOT found, the whole row will be scanned for . and remove that value instead. This would rock for all those people using wide rows

Re: Data Model

2012-09-14 Thread Hiller, Dean
playOrm uses EXACTLY that pattern where @OneToMany becomes student.rowkeyStudent1 student.rowkeyStudent2 and the other fields are fixed. It is a common pattern in noSQL. Dean From: aaron morton mailto:aa...@thelastpickle.com>> Reply-To: "user@cassandra.apache.org

Re: Composite Column Query Modeling

2012-09-14 Thread Hiller, Dean
There is another trick here. On the playOrm open source project, we need to do a sparse query for a join and so we send out 100 async requests and cache up the java "Future" objects and return the first needed result back without waiting for the others. With the S-SQLin playOrm, we have the IN

Re: Astyanax - build

2012-09-14 Thread Hiller, Dean
I didn't need to compile it. It is up in the maven repositories as we http://mvnrepository.com/artifact/com.netflix.astyanax/astyanax Or are you trying to see how it works? (We use the same client on playORM open source projectŠit works like a charm). Dean On 9/14/12 10:28 AM, "A J" wrote:

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
I wanted to clarify the where that statement comes from on wide rows …. Realize some people make the claim that if you don’t' have 1000's of columns in "some" rows in cassandra you are doing something wrong. This is not true, BUT it comes from the fact that people are setting up indexes. This

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
Until Aaron replies, here are my thoughts on the relational piece… If everything in my model fits into a relational database, if my data is structured, would it still be a good idea to use Cassandra? Why? The playOrm project explores exactly this issue……A query on 1,000,000 rows in a

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
Cassandra is fully aware of all tables created with playOrm and you can still use DataStax enterprise features to get real time analytics. Playroom is a layer on top of cassandra and with any layer it makes a developer more productive at a slight cost of performance just like hibernate on top o

Re: Is Cassandra right for me?

2012-09-18 Thread Hiller, Dean
today is it wouldn't be aware of the tables I create with playOrm, just of the column families this framework uses to store the data, right? 2012/9/18 Hiller, Dean mailto:dean.hil...@nrel.gov>> Until Aaron replies, here are my thoughts on the relational piece… If everythin

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
Yes, this scenario can occur(even with quorum writes/reads as you are dealing with different rows) as one write may be complete and the other not while someone else is reading from the cluster. Generally though, you can do read repair when you read it in ;). Ie. See if things are inconsistent

Re: Data Model - Consistency question

2012-09-19 Thread Hiller, Dean
These students are saved one per column in the courses row } We sometimes do this with playOrm and don't even bother with the S-SQL it has which also means you don't need to worry about partitioning in that case. Later, Dean On 9/19/12 6:46 AM, "Hiller, Dean" wrote: >

higher layer library makes things faster?

2012-09-19 Thread Hiller, Dean
So there is this interesting case where a higher layer library makes things slower. This is counter-intuitive as every abstraction usually makes things slower with an increase in productivity.It would be cool if more and more libraries supported something to help with this scenario I think.

Re: higher layer library makes things faster?

2012-09-19 Thread Hiller, Dean
59 AM, "jef...@gmail.com" wrote: >Actually its not uncommon at all. Any caching implemented on a higher >level will generally improve speed at a cost in memory. > >Beware common wisdom, its seldom very wise >Sent from my Verizon Wireless BlackBerry > >-Origin

Re: Correct model

2012-09-19 Thread Hiller, Dean
Thinking out loud and I think a bit towards playOrm's model though you don’t' need to use playroom for this. 1. I would probably have a User with the requests either embedded in or the Foreign keys to the requests…either is fine as long as you get the user get ALL FK's and make one request to g

Re: Correct model

2012-09-19 Thread Hiller, Dean
to:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Correct model 2012/9/19 Hiller, Dean mailto:dean.hil...@nrel.gov>> Thinking out loud and I think a bit towards playOrm's model though you don’t' need to use playroom for this. 1. I would p

Re: Correct model

2012-09-19 Thread Hiller, Dean
ldap and know no one's username is really going to change so username is our primary key. Later, Dean On 9/19/12 2:33 PM, "Hiller, Dean" wrote: >Uhm, unless I am mistaken, a NEW request implies a new UUID so you can >just write it to both the index to the request row

any ways to have compaction use less disk space?

2012-09-20 Thread Hiller, Dean
While diskspace is cheap, nodes are not that cheap, and usually systems have a 1T limit on each node which means we would love to really not add more nodes until we hit 70% disk space instead of the normal 50% that we have read about due to compaction. Is there any way to use less disk space du

Re: Correct model

2012-09-23 Thread Hiller, Dean
But the only advantage in this solution is to split data among partitions? You need to split data among partitions or your query won't scale as more and more data is added to table. Having the partition means you are querying a lot less rows. What do you mean here by current partition? He mea

found major difference in CQL vs Scalable SQL(PlayOrm) and question

2012-09-23 Thread Hiller, Dean
I have been digging more and more into CQL vs. PlayOrm S-SQL and found a major difference that is quite interesting(thought you might be interested plus I have a question). CQL uses a composite row key with the prefix so now any other tables that want to reference that entity have references to th

Re: compression

2012-09-23 Thread Hiller, Dean
As well as your unlimited column names may all have the same prefix, right? Like "accounts".rowkey56, "accounts".rowkey78, etc. etc. so the "accounts gets a ton of compression then. Later, Dean From: Tyler Hobbs mailto:ty...@datastax.com>> Reply-To: "user@cassandra.apache.org

Re: Correct model

2012-09-24 Thread Hiller, Dean
cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: Correct model 2012/9/23 Hiller, Dean mailto:dean.hil...@nrel.gov>> You need to split data among partitions or your query won't scale as more and more data is ad

Re: Correct model

2012-09-24 Thread Hiller, Dean
Elias Del Valle mailto:mvall...@gmail.com>> Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Date: Monday, September 24, 2012 11:07 AM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>"

Re: Correct model

2012-09-24 Thread Hiller, Dean
m using secondary indexes I need to update index values manually, right? I got confused when you said "PlayOrm indexes the columns you choose". How do I choose and what exactly it means? Best regards, Marcelo Valle. 2012/9/24 Hiller, Dean mailto:dean.hil...@nrel.gov>>

Re: Correct model

2012-09-25 Thread Hiller, Dean
't need the best performance in the world when reading, but I need to assure scalability and have a simple model to maintain. I liked the playOrm concept regarding this. I have more doubts, but I will ask them at stack over flow from now on. 2012/9/24 Hiller, Dean mailto:dean.hil...@nrel

Re: Correct model

2012-09-25 Thread Hiller, Dean
ike magic :D I don't know details about the performance on the index implementations you chose, but it would pay the way to use it in my case, as I don't need the best performance in the world when reading, but I need to assure scalability and have a simple model to maintain.

Re: Correct model

2012-09-25 Thread Hiller, Dean
veloped as well for the 1.2.x line. Later, Dean On 9/25/12 6:36 AM, "Hiller, Dean" wrote: >If you need anything added/fixed, just let PlayOrm know. PlayOrm has >been able to quickly add so farŠthat may change as more and more requests >come but so far PlayOrm seems to have manage

is this a cassandra bug?

2012-09-25 Thread Hiller, Dean
This is cassandra 1.1.4 Describe shows DecimalType and I test setting comparator TO the DecimalType and it fails (Realize I have never touched this column family until now except for posting data which succeeded) [default@unknown] use databus; Authenticated to keyspace: databus [default@da

Re: is this a cassandra bug?

2012-09-25 Thread Hiller, Dean
never saw anything client sideŠin fact, the client READ back the data fine so I am bit confused hereŠ..1.1.4Š..I tested this on a single node after seeing it in our 6 node cluster with the same results. Thanks, Dean On 9/25/12 2:13 PM, "Hiller, Dean" wrote: >This is cassandra 1.1.

any ideas on what these mean

2012-09-26 Thread Hiller, Dean
We were consistently getting this exception over and over as we put data into the system. A reboot caused it to go away but we don't want to be rebooting in the future…. 1. When does this occur? 2. Is it affecting my data put? (I have seen other weird validation exceptions where my data i

Re: is this a cassandra bug?

2012-09-26 Thread Hiller, Dean
bump On 9/25/12 2:40 PM, "Hiller, Dean" wrote: >Hmmm, is rowkey validation asynchronous to the actually sending of the >data to cassandra? > >I seem to be able to put an invalid type and GET that invalid data back >just fine even though my key type was an int and the ke

1000's of column families

2012-09-26 Thread Hiller, Dean
We are streaming data with 1 stream per 1 CF and we have 1000's of CF. When using the tools they are all geared to analyzing ONE column family at a time :(. If I remember correctly, Cassandra supports as many CF's as you want, correct? Even though I am going to have tons of funs with limitati

is node tool row count always way off?

2012-09-26 Thread Hiller, Dean
The node tool cfstats, what is the row count estimate usually off by(what percentage? Or what absolute number?) We have a CF with 4 rows that prints this out…. Column Family: bacnet11700AnalogInput8 SSTable count: 3 Space used (live): 13526

Re: Once again, super columns or composites?

2012-09-27 Thread Hiller, Dean
Can you describe your use-case in detail as it might be easier to explain a model with composite names. Later, Dean From: Edward Kibardin mailto:infa...@gmail.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Thursday, Septembe

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
with as many CF's as you want, does anyone know what that limit would be for 16G of RAM or something I could calculate with? Thanks, Dean On 9/27/12 2:37 AM, "Sylvain Lebresne" wrote: >On Thu, Sep 27, 2012 at 12:13 AM, Hiller, Dean >wrote: >> We are streaming data wi

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
ur data... But why wouldn't you use a single CF with partitions in these case? Wouldn't it be the same thing? I am asking because I might learn a new modeling technique with the answer. []s 2012/9/26 Hiller, Dean mailto:dean.hil...@nrel.gov>> We are streaming data with 1 stream pe

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
wouldn't it? Of course it is probably much harder than it might problably appear... :D Best regards, Marcelo Valle. 2012/9/27 Hiller, Dean mailto:dean.hil...@nrel.gov>> We have 1000's of different building devices and we stream data from these devices. The format and data from

Re: 1000's of column families

2012-09-27 Thread Hiller, Dean
one location), though we will see how implementing it goes. How much overhead per column family in RAM? So far we have around 4000 Cfs with no issue that I see yet. Dean On 9/27/12 11:10 AM, "Aaron Turner" wrote: >On Thu, Sep 27, 2012 at 3:11 PM, Hiller, Dean >wrote: >

Re: 1000's of column families

2012-09-28 Thread Hiller, Dean
eminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. 2012/9/27 Hiller, De

Re: Help for creating a custom partitioner

2012-10-01 Thread Hiller, Dean
I would be surprised if random partitioner hurt your performance. In general, doing performance tests on a 6 node cluster with PlayOrm Scalable SQL, even joins queries ended up faster as the parallel disks of reading all the rows was way faster than reading from a single machine(remember, one d

Re: Rebalancing cluster

2012-10-01 Thread Hiller, Dean
You should check the cassandra.yaml file. There is an initial_token in that file that you should have set. The comment above that property reads # You should always specify InitialToken when setting up a production # cluster for the first time, and often when adding capacity later. # The princi

Re: Rebalancing cluster

2012-10-01 Thread Hiller, Dean
Nodetool has a move command so you can move to a new better token. Read up on the documentation there. I have not used it yet myselfŠ.good idea to test it on your test cluster first. Dean On 10/1/12 8:03 AM, "Darvin Denmian" wrote: >as you can see there is no "Zero Token". Maybe I did somethi

Re: Advice on correct storage configuration

2012-10-01 Thread Hiller, Dean
What is really going to matter is what is the applications trying to read? That is really the critical piece of context. Without knowing what the application needs to read, it is very hard to design. One example from a previous post that was a great questions wasŠ 1. I need to get the last 100 r

Re: 1000's of column families

2012-10-01 Thread Hiller, Dean
the clocks of >> the two servers are skewed, you will severely compromise your schema >> (Cassandra will not understand in which order the >> updates must be applied). >> >> As I said, this applied to version 0.7, maybe current versions solved >>these >> p

read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
I know there is a 10 day limit if you have a node out of the cluster where you better be running read-repair or you end up with forgotten deletes, but what about on a clean cluster with all nodes always available? Shouldn't the deletes eventually take place or does one have to keep running read

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
you always need to run repair once per/gc_grace period. >You won't see empty/deleted rows go away until they're compacted away. > >On Mon, Oct 1, 2012 at 6:32 PM, Hiller, Dean wrote: >> I know there is a 10 day limit if you have a node out of the cluster >>where yo

Re: read-repair and deletes / forgotten deletes

2012-10-01 Thread Hiller, Dean
Oh, and I have been reading Aaron Mortan's article here http://thelastpickle.com/2011/05/15/Deletes-and-Tombstones/ On 10/1/12 12:46 PM, "Hiller, Dean" wrote: >Thanks, (actually new it was configurable) BUT what I don't get is why I >have to run a repair. IF all

1000's of CF's. virtual CFs do NOT workŠ..map/reduce

2012-10-02 Thread Hiller, Dean
So basically, with moving towards the 1000's of CF all being put in one CF, our performance is going to tank on map/reduce, correct? I mean, from what I remember we could do map/reduce on a single CF, but by stuffing 1000's of virtual Cf's into one CF, our map/reduce will have to read in all 999 v

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, to address your question, read my last post but to summarize, yes, there is less overhead in memory to prefix keys than manage multiple Cfs EXCEPT when doing map/reduce. Doing map/reduce, you will now have HUGE overhead in reading a whole slew of rows you don't care about as you can't map/

Re: Read latency issue

2012-10-02 Thread Hiller, Dean
Interesting results. With PlayOrm, we did a 6 node test of reading 100 rows from 1,000,000 using PlayOrm Scalable SQL. It only took 60ms. Maybe we have better hardware though??? We are using 7200 RPM drives so nothing fancy on the disk side of things. More nodes puts at a higher throughput

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
e this email and any >attachments and destroy any copies thereof. Any review, retransmission, >dissemination, copying or other use of, or taking any action in reliance >upon, this information by persons or entities other than the intended >recipient is strictly prohibited. > > >

Re: 1000's of CF's. virtual CFs possible Map/Reduce SOLUTION...

2012-10-02 Thread Hiller, Dean
out work to? That would be my only missing piece (well, that and the PlayOrm virtual CF feature but I can add that within a week probably though I am on vacation this Thursday to monday). Later, Dean On 10/2/12 6:35 AM, "Hiller, Dean" wrote: >So basically, with moving towards the

easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
If I understand –pr correctly… 1. -pr forces only the current nodes' stables to be fixed (so I run on each node once) 2. Can I run node tool –pr repair on just 1/RF of my nodes if I do the correct nodes? 3. Without the –pr, it will fix all the stuff on the current node AND the nodes with

Re: easy repair questions on -pr

2012-10-02 Thread Hiller, Dean
GREAT answer, thanks and one last questionŠ So, I suspect I can expect those rows to finally go away when queried from cassandra-cli once GCGraceSeconds has passed then? Or will they always be there forever and ever and ever(this can't be true, right). Thanks, Dean On 10/2/12 9:34 AM, "Sylvain

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Ben, Brian, By the way, PlayOrm offers a NoSqlTypedSession that is different than the ORM half of PlayOrm dealing in raw stuff that does indexing(so you can do Scalable SQL on data that has no ORM on top of it). That is what we use for our 1000's of CF's as we don't know the format of any of t

Re: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Hiller, Dean
Can you just use netstat and dig into the process id and do a ps -ef | grep to clear up all the confusion. Doing so you can tell which process communicates with which process(I am assuming you are on linuxŠ.on MAC or windows it is different commands). Then, just paste all that in the email to th

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
Because the data for an index is not all together(ie. Need a multi get to get the data). It is not contiguous. The prefix in a partition they keep the data so all data for a prefix from what I understand is contiguous. QUESTION: What I don't get in the comment is I assume you are referring to

Re: 1000's of column families

2012-10-02 Thread Hiller, Dean
2, 2012 1:01 PM To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>> Subject: Re: 1000's of column families Dean, On Tuesday, October 2, 2012 at 18:52, Hiller, Dean wrote: Because the data for an index is not all together(ie. Need a m

Re: Simple data model for 1 simple range query?

2012-10-03 Thread Hiller, Dean
Is timeframe/date your composite key? Where timeframe is the first time of a partition of time (ie. If you partition by month, it is the very first time of that month). If so, then, yes, it will be very fast. The smaller your partitions are, the smaller your indexes are as well(ie. B-trees whi

1000's of CF's. PlayOrm solves the cassandra limit on #ColFamily

2012-10-03 Thread Hiller, Dean
Okay, so it only took me two solid days not a week. PlayOrm in master branch now supports virtual CF's or virtual tables in ONE CF, so you can have 1000's or millions of virtual CF's in one CF now. It works with all the Scalable-SQL, works with the joins, and works with the PlayOrm command lin

Re: Query over secondary indexes

2012-10-09 Thread Hiller, Dean
primary key approach. -Vivek On Tue, Oct 9, 2012 at 6:20 PM, Hiller, Dean mailto:dean.hil...@nrel.gov>> wrote: Another option may be PlayOrm for you and it's scalable-SQL. We queried one million rows for 100 results in just 60ms. (and it does joins). Query CL =QUORUM. Dean From:

Re: Query over secondary indexes

2012-10-09 Thread Hiller, Dean
having like different dimensions of partitioning PlayOrm does plan on supporting CQL as well but it is not in yet. Later, Dean On 10/9/12 7:51 AM, "Hiller, Dean" wrote: >If I understand CQL correctly, behind the scenes in wide rows, there is a >B-tree. Even when doing the indexi

Re: Upgrading hardware on a node in a cluster

2012-10-10 Thread Hiller, Dean
Well, you could use amazon VPC in which case you DO pick the IP yourself ;)….it makes life a bit easier. Dean From: Martin Koch mailto:m...@issuu.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Wednesday, October 10, 2012 3:

Re: 1000's of CF's.

2012-10-10 Thread Hiller, Dean
ose CFs. All we have is plain Hector and custom ORM on top of it. As far as i understand VirtualKeyspace doesn't help in our case. Also i dont understand why not implement support for many CF ( or build-in partitioning ) on cassandra side. Anybody can explain why this can or cannot be done in

CQL Sets and Maps

2012-10-11 Thread Hiller, Dean
I was reading Brian's post http://mail-archives.apache.org/mod_mbox/cassandra-dev/201210.mbox/%3ccajhhpg20rrcajqjdnf8sf7wnhblo6j+aofksgbxyxwcoocg...@mail.gmail.com%3E In which he asks > Any insight into why CQL puts that in column name? > Where does it store the metadata related to compound key

Re: [problem with OOM in nodes]

2012-10-11 Thread Hiller, Dean
"Splitting one report to multiple rows is uncomfortably" WHY? Reading from N disks is way faster than reading from 1 disk. I think in terms of PlayOrm and then explain the model you can use so I think in objects first Report { String uniqueId String reportName; //may be indexable and query

Re: Option for ordering columns by timestamp in CF

2012-10-12 Thread Hiller, Dean
Good, I wasn't the only one confused…I was not sure at all how "make column timestamps optional" was related to your question either…..he must have been a hurry and misread the question. From: Ertio Lew mailto:ertio...@gmail.com>> Reply-To: "user@cassandra.apache.org

Re: unnecessary tombstone's transmission during repair process

2012-10-12 Thread Hiller, Dean
+1 I want to see how this plays out as well. Anyone know the answer? Dean From: Alexey Zotov mailto:azo...@griddynamics.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Friday, October 12, 2012 1:33 AM To: "user@cassandra.ap

  1   2   3   4   5   >