We have already done some testings on RDBMS and the performance is not acceptable to us. (for the second query, that means self join a table with 10 million records for n times). That's why we try GAE now.
Thank you. On Mar 16, 1:54 am, Max <[email protected]> wrote: > Hi John, > > I am designing a quite similar data model of cited *User-Skill* > problem, but not exactly the same. > > People may not be familiar with our domain. Basically it will be a > track record system. User can perform different tasks and same task > can be done by many users. According to historical data, more than > 5000 different users will finish the same task, and it is possible for > some users to finish more than 5000 tasks within the data archive > period. Additionally, following queries will be performed frequently: > 1, given 2 user, A and B, find out common tasks they have done (less > important) > 2, given several tasks, find out users who have done all these tasks > (more important) > > Translate our problem into user-skill scenario, in this case, one user > can have more than 5000 skills and there could be more than 5000 users > having the same skill. > 1, given 2 user, A and B, how to find out their mutual skills > 2, given n skills, how to find full list of users having all n skills > > Best regards, > Max > > On Mar 15, 5:59 pm, John Patterson <[email protected]> wrote: > > > I was meaning just put the UserSkills of the two people into the set. > > Each person only has a small number of skills yeah? > > > Perhaps I mis understood your last requirement "similarity between > > user A and User B" > > > On 15 Mar 2010, at 14:23, Max wrote: > > > > Hi John, > > > > Thanks for your reply. I need some time to study and test your codes. > > > > For the last point, Sets.intersection() means we need to load all keys > > > into memory and perform an in memory Sets.intersection(). Is that > > > possible to do this by a query directly. In other words, is that > > > possible to use more than one equality filter on a list property of a > > > relation entity index for their parents? > > > > Best regards, > > > Max > > > > On Mar 15, 2:45 pm, John Patterson <[email protected]> wrote: > > >> Hi Max, > > > >> Regarding your original question, a more efficient solution would be > > >> to embed the UesrSkill in the User instance which would allow you to > > >> find all results in a single query. Th problem is that embedded > > >> instances can only be queried on a single value. There would be no > > >> way to query skill and ability on the same UserSkill - just "java > > >> and c > > >> ++ with any skill over 3 and any skill over 5" > > > >> To solve this you could create a combined property in UserSkill for > > >> querying "skillAbility" which would hold values such as "java:5", "c > > >> ++: > > >> 4". This will only work with skill from 0-9 because it depends on > > >> lexical ordering (or e.g. 000 - 999) > > > >> Both Twig and Objectify but not JDO support embedded collections of > > >> instances. > > > >> In Twig it would be defined like this > > > >> class User > > >> { > > >> @Embed List<UserSkill> skills; > > > >> } > > > >> class UserSkill > > >> { > > >> String skillAbility; > > >> Skill skill; // direct reference to Skill instance > > >> int ability; > > > >> } > > > >> Disclaimer: I have not tried any of this code - it is just off the > > >> top > > >> of my head > > > >> You would then do a single range query to find "java-5", "java-6, > > >> "java-7"... > > > >> // find java developers with ability over 5 in a single query > > >> datastore.find().type(User.class) > > >> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > > >> 5") // range > > >> start > > >> .addFilter("skillAbility", LESS_THAN, "java-" + > > >> Character.MAX_VALUE) // range end > > >> .returnResultsNow(); > > > >> But that doesn't fully answer your question which includes an AND on > > >> multiple property values which is not supported by the datastore. To > > >> do this you will need to perform two queries and merge the results. > > > >> Twig has support for merging only OR queries right now so you can do: > > > >> // find users with c++ ability > 2 OR java ability > 5 > > > >> RootFindCommand or = datastore.find().type(User.class); // default > > >> (only) operator is OR > > > >> or.addChildCommand() > > >> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > > >> 5") // range > > >> start > > >> .addFilter("skillAbility", LESS_THAN, "java-" + > > >> Character.MAX_VALUE); // range end > > > >> or.addChildCommand() > > >> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > > >> 5") // range > > >> start > > >> .addFilter("skillAbility", LESS_THAN, "java-" + > > >> Character.MAX_VALUE); // end > > > >> // merges results from both queries into a single iterator > > >> Iterator<User> results = or.returnResultsNow(); > > > >> Supporting AND merges is coming! Add a feature request if you like. > > >> But for now you will have to do two separate queries as in the first > > >> example and join the results in your own code. You should make sure > > >> both queries are sorted by key then you can "stream" the results > > >> without loading them all into memory at once. > > > >> // find java developers with ability over 5 > > >> datastore.find().type(User.class) > > >> .addSort("skillAbility") // first sort required to be > > >> inequality filter > > >> .addSort(Entity.KEY_RESERVED_PROPERTY) // ensure results > > >> in same order > > >> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > > >> 5") // range > > >> start > > >> .addFilter("skillAbility", LESS_THAN, "java-" + > > >> Character.MAX_VALUE) // range end > > >> .returnResultsNow(); > > > >> // find c++ developers with ability over 2 > > >> datastore.find().type(User.class) > > >> .addSort("skillAbility") > > >> .addSort(Entity.KEY_RESERVED_PROPERTY) > > >> .addFilter("skillAbility", GREATER_THAN_EQUAL, "c++:2") // > > >> range start > > >> .addFilter("skillAbility", LESS_THAN, "c++:-" + > > >> Character.MAX_VALUE) // range end > > >> .returnResultsNow(); > > > >> // now iterate through both results and only include those in both > > >> iterators > > > >> Again, I have not run this code so I might have made a mistake. Let > > >> me know how you get on! I'll be adding support for these merged AND > > >> queries on multiple property values soon - unless someone else wants > > >> to contribute it first ;) > > > >> To find the similarity between two users is now simple now that they > > >> are just a property of the User? just do a Sets.intersection() of > > >> the > > >> skills. > > > >> John > > > >> On 15 Mar 2010, at 12:07, Max wrote: > > > >>> Thanks John, > > > >>> Bret Slatkins' talk is impressive. Let's say we have m skills with n > > >>> levels. (i.e., m x n SkillLevel entities). Each SkillLevel entity > > >>> consists of at least one SkillLevelIndex. > > > >>> We define similarity between user A and User B as number of skills > > >>> with same level. i.e., number of SkillLevel entities of query: > > >>> "from SkillLevel where userKeyList contains A and userKeyList > > >>> contains > > >>> B" > > > >>> It works fine if userKeyList contains all user keys. However, > > >>> after we > > >>> applied relation index pattern, we have more than one user keys > > >>> lists, > > >>> then how to perform a query to calculate similarity between two > > >>> users? > > > >>> On Mar 10, 12:21 pm, John Patterson <[email protected]> wrote: > > >>>> On 10 Mar 2010, at 10:53, Max wrote: > > > >>>>> Rusty Wright suggested a list of user keys to be stored in skill > > >>>>> entity. But that means only 5000 users can have the same skill. > > > >>>> If you use the "Relation Index Entity" pattern as described in Bret > > >>>> Slatkins talk you can store 5000 users per index entity and an > > >>>> unlimited number of index entities. This will actually give you > > >>>> much > > >>>> better query performance too because you will not need to load the > > >>>> list of 5000 Keys every time you read your main entity. > > > >>>> The new release of Twig has direct support for RIE's and can > > >>>> actually > > >>>> let you query for multiple skills *in parallel* then merge the > > >>>> results > > >>>> and return the parents : > > > >>>>http://code.google.com/p/twig-persist/wiki/Using#Relation_Index_Entities > > > >>> -- > > >>> You received this message because you are subscribed to the Google > > >>> Groups "Google App Engine for Java" group. > > >>> To post to this group, send email to > > >>> [email protected] > > >>> . > > >>> To unsubscribe from this group, send email to > > >>> [email protected] > > >>> . > > >>> For more options, visit this group > > >>> athttp://groups.google.com/group/google-appengine-java?hl=en > > >>> . > > > > -- > > > You received this message because you are subscribed to the Google > > > Groups "Google App Engine for Java" group. > > > To post to this group, send email to > > > [email protected] > > > . > > > To unsubscribe from this group, send email to > > > [email protected] > > > . > > > For more options, visit this group > > > athttp://groups.google.com/group/google-appengine-java?hl=en > > > . -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
