Dean, There is one last thing I would like to ask about playOrm by this list, the next questiosn will come by stackOverflow. Just because of the context, I prefer asking this here: When you say playOrm indexes a table (which would be a CF behind the scenes), what do you mean? PlayOrm will automatically create a CF to index my CF? Will it auto-manage it, like Cassandra's secondary indexes? In Cassandra, the application is responsible for maintaining the index, right? I might be wrong, but unless I am using secondary indexes I need to update index values manually, right? I got confused when you said "PlayOrm indexes the columns you choose". How do I choose and what exactly it means?
Best regards, Marcelo Valle. 2012/9/24 Hiller, Dean <dean.hil...@nrel.gov> > Oh, ok, you were talking about the wide row pattern, right? > > yes > > But playORM is compatible with Aaron's model, isn't it? > > Not yet, PlayOrm supports partitioning one table multiple ways as it > indexes the columns(in your case, the userid FK column and the time column) > > Can I map exactly this using playORM? > > Not yet, but the plan is to map these typical Cassandra scenarios as well. > > Can I ask playOrm questions in this list? > > The best place to ask PlayOrm questions is on stack overflow and tag with > PlayOrm though I monitor this list and stack overflow for questions(there > are already a few questions on stack overflow). > > The examples directory is empty for now, I would like to see how to set up > the connection with it. > > Running build or build.bat is always kept working and all 62 tests pass(or > we don't merge to master) so to see how to make a connection or run an > example > > 1. Run build.bat or build which generates parsing code > 2. Import into eclipse (it already has .classpath and .project for you > already there) > 3. In FactorySingleton.java you can modify IN_MEMORY to CASSANDRA or not > and run any of the tests in-memory or against localhost(We run the test > suite also against a 6 node cluster as well and all passes) > 4. FactorySingleton probably has the code you are looking for plus you > need a class called nosql.Persistence or it won't scan your jar file.(class > file not xml file like JPA) > > Do you mean I need to load all the keys in memory to do a multi get? > > No, you batch. I am not sure about CQL, but PlayOrm returns a Cursor not > the results so you can loop through every key and behind the scenes it is > doing batch requests so you can load up 100 keys and make one multi get > request for those 100 keys and then can load up the next 100 keys, etc. > etc. etc. I need to look more into the apis and protocol of CQL to see if > it allows this style of batching. PlayOrm does support this style of > batching today. Aaron would know if CQL does. > > Why did you move? Hector is being considered for being the "official" > client for Cassandra, isn't it? > > At the time, I wanted the file streaming feature. Also, Hector seemed a > bit cumbersome as well compared to astyanax or at least if you were > building a platform and had no use for typing the columns. Just personal > preference really here. > > I am not sure I understood this part. If I need to refactor, having the > partition id in the key would be a bad thing? What would be the > alternative? In my case, as I use userId : partitionId as row key, this > might be a problem, right? > > PlayOrm indexes the columns you choose(ie. The ones you want to use in the > where clause) and partitions by columns you choose not based on the key so > in PlayOrm, the key is typically a TimeUUID or something cluster > unique…..any tables referencing that TimeUUID never have to change. With > Cassandra partitioning, if you repartition that table a different way or go > for some kind of major change(usually done with map/reduce), all your > foreign keys "may" have to change….it really depends on the situation > though. Maybe you get the design right and never have to change. > > @NoSqlQuery(name="findWithJoinQuery", query="PARTITIONS t(:partId) SELECT > t FROM TABLE as t "+ > "INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares < > :shares"), > > What would happen behind the scenes when I execute this query? > > In this case, t or TABLE is a partitioned table since a partition is > defined. And t.activityTypeInfo refers to the ActivityTypeInfo table which > is not partitioned(AND ActivityTypeInfo won't scale to billions of rows > because there is no partitioning but maybe you don't need it!!!). Behind > the scenes when you call getResult, it returns a cursor that has NOT done > anything yet. When you start looping through the cursor, behind the scenes > it is batching requests asking for next 500 matches(configurable) so you > never run out of memory….it is EXACTLY like a database cursor. You can > even use the cursor to show a user the first set of results and when user > clicks next pick up right where the cursor left off (if you saved it to the > HttpSession). > > You can only use joins with partition keys, right? > > Nope, joins work on anything. You only need to specify the partitionId > when you have a partitioned table in the list of join tables. (That is what > the PARTITIONS clause is for, to identify partitionId = what?)…it was put > BEFORE the SQL instead of within it…CQL took the opposite approach but > PlayOrm can also join different partitions together as well ;) ). > > In this case, is partId the row id of TABLE CF? > > Nope, partId is one of the columns. There is a test case on this class in > PlayOrm …(notice the annotation NoSqlPartitionByThisField on the > column/field in the entity)… > > > https://github.com/deanhiller/playorm/blob/master/input/javasrc/com/alvazan/test/db/PartitionedSingleTrade.java > > PlayOrm allows partitioned tables AND non-partioned tables(non-partitioned > tables won't scale but maybe you will never have that many rows). You can > join any two combinations(non-partitioned with partitioned, non-partitioned > with non-partitioned, partition with another partition). > > I only prefer stackoverflow as I like referencing links/questions with > their urls. To reference this email is very hard later on as I have to > find it so in general, I HATE email lists ;) but it seems cassandra prefers > them so any questions on PlayOrm you can put there and I am not sure how > many on this may or may not be interested so it creates less noise on this > list too. > > Later, > Dean > > > From: Marcelo Elias Del Valle <mvall...@gmail.com<mailto: > mvall...@gmail.com>> > Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Date: Monday, September 24, 2012 11:07 AM > To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" < > user@cassandra.apache.org<mailto:user@cassandra.apache.org>> > Subject: Re: Correct model > > > > 2012/9/24 Hiller, Dean <dean.hil...@nrel.gov<mailto:dean.hil...@nrel.gov>> > I am confused. In this email you say you want "get all requests for a > user" and in a previous one you said "Select all the users which has new > requests, since date D" so let me answer both… > > I have both needs. These are the two queries I need to perform on the > model. > > For latter, you make ONE query into the latest partition(ONE partition) of > the GlobalRequestsCF which gives you the most recent requests ALONG with > the user ids of those requests. If you queried all partitions, you would > most likely blow out your JVM memory. > > For the former, you make ONE query to the UserRequestsCF with userid = > <your user id> to get all the requests for that user > > Now I think I got the main idea! This answered a lot! > > Sorry, I was skipping some context. A lot of the backing indexing > sometimes is done as a long row so in playOrm, too many rows in a partition > means == too many columns in the indexing row for that partition. I > believe the same is true in cassandra for their indexing. > > Oh, ok, you were talking about the wide row pattern, right? But playORM is > compatible with Aaron's model, isn't it? Can I map exactly this using > playORM? The hardest thing for me to use playORM now is I don't know > Cassandra well yet, and I know playORM even less. Can I ask playOrm > questions in this list? I will try to create a POC here! > Only now I am starting to understand what it does ;-) The examples > directory is empty for now, I would like to see how to set up the > connection with it. > > Cassandra spreads all your data out on all nodes with or without > partitions. A single partition does have it's data co-located though. > > Now I see. The main advantage of using partitions is keeping the indexes > small enough. It has nothing to do with the nodes. Thanks! > > If you are at 100k(and the requests are rather small), you could embed all > the requests in the user or go with Aaron's below suggestion of a > UserRequestsCF. If your requests are rather large, you probably don't want > to embed them in the User. Either way, it's one query or one row key > lookup. > > I see it now. > > Multiget ignores partitions…you feed it a LIST of keys and it gets them. > It just so happens that partitionId had to be part of your row key. > > Do you mean I need to load all the keys in memory to do a multiget? > > I have used Hector and now use Astyanax, I don't worry much about that > layer, but I feed astyanax 3 nodes and I believe it discovers some of the > other ones. I believe the latter is true but am not 100% sure as I have > not looked at that code. > > Why did you move? Hector is being considered for being the "official" > client for Cassandra, isn't it? I looked at the Astyanax api and it seemed > much more high level though > > As an analogy on the above, if you happen to have used PlayOrm, you would > ONLY need one Requests table and you partition by user AND time(two views > into the same data partitioned two different ways) and you can do exactly > the same thing as Aaron's example. PlayOrm doesn't embed the partition ids > in the key leaving it free to partition twice like in your case….and in a > refactor, you have to map/reduce A LOT more rows because of rows having the > FK of <partitionid><subrowkey> whereas if you don't have partition id in > the key, you only map/reduce the partitioned table in a redesign/refactor. > That said, we will be adding support for CQL partitioning in addition to > PlayOrm partitioning even though it can be a little less flexible sometimes. > > I am not sure I understood this part. If I need to refactor, having the > partition id in the key would be a bad thing? What would be the > alternative? In my case, as I use userId : partitionId as row key, this > might be a problem, right? > > Also, CQL locates all the data on one node for a partition. We have found > it can be faster "sometimes" with the parallelized disks that the > partitions are NOT all on one node so PlayOrm partitions are virtual only > and do not relate to where the rows are stored. An example on our 6 nodes > was a join query on a partition with 1,000,000 rows took 60ms (of course I > can't compare to CQL here since it doesn't do joins). It really depends > how much data is going to come back in the query though too? There are > tradeoff's between disk parallel nodes and having your data all on one node > of course. > > I guess I am still not ready for this level of info. :D > In the playORM readme, we have the following: > > > @NoSqlQuery(name="findWithJoinQuery", query="PARTITIONS t(:partId) SELECT > t FROM TABLE as t "+ > "INNER JOIN t.activityTypeInfo as i WHERE i.type = :type and t.numShares < > :shares"), > > What would happen behind the scenes when I execute this query? You can > only use joins with partition keys, right? > In this case, is partId the row id of TABLE CF? > > > Thanks a lot for the answers > > -- > Marcelo Elias Del Valle > http://mvalle.com - @mvallebr > -- Marcelo Elias Del Valle http://mvalle.com - @mvallebr