Hi John, Thanks for your reply. To be honest I am quite disappointed that GAE can not perform such query directly.
According do your suggestion, to query with n filters on one relation index entity, we need to: 1, perform n separate queries to obtain n cursors of UserSkill entities by some sorting order 2, use in-memory zig-zag method to obtain User keys match all query 3, use a batch get to obtain all User entities Please correct me if my understanding is wrong. Best regards, Max On Mar 16, 10:53 am, John Patterson <[email protected]> wrote: > Hi Max, in that case definitely the model suggested by Jeff would be > perfect. So something like: > > class User > { > String name; > > } > > class UserSkill > { > String skill; > int ability; > > } > > When you store them make sure the User is the parent of the UserSkill > so they can be updated in a single transaction. > > datastore.store(aUser); > datastore.store(aSkill, aUser); // set the parent relationship > > Twig has support for this "Relation Entity" pattern like so: > > // return all high skilled java developers > datastore.find().type(UserSkill.class) > .addFilter("skill", EQUAL, "java") > .addFilter("ability", GREATER_THAN, 4) > .returnParentsNow(); > > To find Users with Java > 4 and C++ > 2 you still need to do two > queries. To avoid having to load all the results into memory make > sure you sort by ability and key then "stream" the results using the > "zig-zag" method. This will soon be built into Twig. Also do > parallel queries to speed up the response. > > datastore.find().type(UserSkill.class) > .addSort("ability") > .addSort(Entity.KEY_RESERVED_PROPERTY) > .addFilter("skill", EQUAL, "java") > .addFilter("ability", GREATER_THAN, 4) > .returnParentsLater(); > > If you are expecting a lot of results you should tune this query by > setting fetchResultsBy(100) or something similar to reduce the number > of trips to the datastore. The default is 20 results returned at a > time. > > Hope this helps! > > John > > On 16 Mar 2010, at 01:06, Max wrote: > > > We have already done some testings on RDBMS and the performance is not > > acceptable to us. (for the second query, that means self join a table > > with 10 million records for n times). That's why we try GAE now. > > > Thank you. > > > On Mar 16, 1:54 am, Max <[email protected]> wrote: > >> Hi John, > > >> I am designing a quite similar data model of cited *User-Skill* > >> problem, but not exactly the same. > > >> People may not be familiar with our domain. Basically it will be a > >> track record system. User can perform different tasks and same task > >> can be done by many users. According to historical data, more than > >> 5000 different users will finish the same task, and it is possible > >> for > >> some users to finish more than 5000 tasks within the data archive > >> period. Additionally, following queries will be performed frequently: > >> 1, given 2 user, A and B, find out common tasks they have done (less > >> important) > >> 2, given several tasks, find out users who have done all these tasks > >> (more important) > > >> Translate our problem into user-skill scenario, in this case, one > >> user > >> can have more than 5000 skills and there could be more than 5000 > >> users > >> having the same skill. > >> 1, given 2 user, A and B, how to find out their mutual skills > >> 2, given n skills, how to find full list of users having all n skills > > >> Best regards, > >> Max > > >> On Mar 15, 5:59 pm, John Patterson <[email protected]> wrote: > > >>> I was meaning just put the UserSkills of the two people into the > >>> set. > >>> Each person only has a small number of skills yeah? > > >>> Perhaps I mis understood your last requirement "similarity between > >>> user A and User B" > > >>> On 15 Mar 2010, at 14:23, Max wrote: > > >>>> Hi John, > > >>>> Thanks for your reply. I need some time to study and test your > >>>> codes. > > >>>> For the last point, Sets.intersection() means we need to load all > >>>> keys > >>>> into memory and perform an in memory Sets.intersection(). Is that > >>>> possible to do this by a query directly. In other words, is that > >>>> possible to use more than one equality filter on a list property > >>>> of a > >>>> relation entity index for their parents? > > >>>> Best regards, > >>>> Max > > >>>> On Mar 15, 2:45 pm, John Patterson <[email protected]> wrote: > >>>>> Hi Max, > > >>>>> Regarding your original question, a more efficient solution > >>>>> would be > >>>>> to embed the UesrSkill in the User instance which would allow > >>>>> you to > >>>>> find all results in a single query. Th problem is that embedded > >>>>> instances can only be queried on a single value. There would be > >>>>> no > >>>>> way to query skill and ability on the same UserSkill - just "java > >>>>> and c > >>>>> ++ with any skill over 3 and any skill over 5" > > >>>>> To solve this you could create a combined property in UserSkill > >>>>> for > >>>>> querying "skillAbility" which would hold values such as "java: > >>>>> 5", "c > >>>>> ++: > >>>>> 4". This will only work with skill from 0-9 because it depends on > >>>>> lexical ordering (or e.g. 000 - 999) > > >>>>> Both Twig and Objectify but not JDO support embedded collections > >>>>> of > >>>>> instances. > > >>>>> In Twig it would be defined like this > > >>>>> class User > >>>>> { > >>>>> @Embed List<UserSkill> skills; > > >>>>> } > > >>>>> class UserSkill > >>>>> { > >>>>> String skillAbility; > >>>>> Skill skill; // direct reference to Skill instance > >>>>> int ability; > > >>>>> } > > >>>>> Disclaimer: I have not tried any of this code - it is just off the > >>>>> top > >>>>> of my head > > >>>>> You would then do a single range query to find "java-5", "java-6, > >>>>> "java-7"... > > >>>>> // find java developers with ability over 5 in a single query > >>>>> datastore.find().type(User.class) > >>>>> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > >>>>> 5") // range > >>>>> start > >>>>> .addFilter("skillAbility", LESS_THAN, "java-" + > >>>>> Character.MAX_VALUE) // range end > >>>>> .returnResultsNow(); > > >>>>> But that doesn't fully answer your question which includes an > >>>>> AND on > >>>>> multiple property values which is not supported by the > >>>>> datastore. To > >>>>> do this you will need to perform two queries and merge the > >>>>> results. > > >>>>> Twig has support for merging only OR queries right now so you > >>>>> can do: > > >>>>> // find users with c++ ability > 2 OR java ability > 5 > > >>>>> RootFindCommand or = datastore.find().type(User.class); // > >>>>> default > >>>>> (only) operator is OR > > >>>>> or.addChildCommand() > >>>>> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > >>>>> 5") // range > >>>>> start > >>>>> .addFilter("skillAbility", LESS_THAN, "java-" + > >>>>> Character.MAX_VALUE); // range end > > >>>>> or.addChildCommand() > >>>>> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > >>>>> 5") // range > >>>>> start > >>>>> .addFilter("skillAbility", LESS_THAN, "java-" + > >>>>> Character.MAX_VALUE); // end > > >>>>> // merges results from both queries into a single iterator > >>>>> Iterator<User> results = or.returnResultsNow(); > > >>>>> Supporting AND merges is coming! Add a feature request if you > >>>>> like. > >>>>> But for now you will have to do two separate queries as in the > >>>>> first > >>>>> example and join the results in your own code. You should make > >>>>> sure > >>>>> both queries are sorted by key then you can "stream" the results > >>>>> without loading them all into memory at once. > > >>>>> // find java developers with ability over 5 > >>>>> datastore.find().type(User.class) > >>>>> .addSort("skillAbility") // first sort required to be > >>>>> inequality filter > >>>>> .addSort(Entity.KEY_RESERVED_PROPERTY) // ensure results > >>>>> in same order > >>>>> .addFilter("skillAbility", GREATER_THAN_EQUAL, "java: > >>>>> 5") // range > >>>>> start > >>>>> .addFilter("skillAbility", LESS_THAN, "java-" + > >>>>> Character.MAX_VALUE) // range end > >>>>> .returnResultsNow(); > > >>>>> // find c++ developers with ability over 2 > >>>>> datastore.find().type(User.class) > >>>>> .addSort("skillAbility") > >>>>> .addSort(Entity.KEY_RESERVED_PROPERTY) > >>>>> .addFilter("skillAbility", GREATER_THAN_EQUAL, "c++: > >>>>> 2") // > >>>>> range start > >>>>> .addFilter("skillAbility", LESS_THAN, "c++:-" + > >>>>> Character.MAX_VALUE) // range end > >>>>> .returnResultsNow(); > > >>>>> // now iterate through both results and only include those in both > >>>>> iterators > > >>>>> Again, I have not run this code so I might have made a mistake. > >>>>> Let > >>>>> me know how you get on! I'll be adding support for these merged > >>>>> AND > >>>>> queries on multiple property values soon - unless someone else > >>>>> wants > >>>>> to contribute it first ;) > > >>>>> To find the similarity between two users is now simple now that > >>>>> they > >>>>> are just a property of the User? just do a Sets.intersection() of > >>>>> the > >>>>> skills. > > >>>>> John > > >>>>> On 15 Mar 2010, at 12:07, Max wrote: > > >>>>>> Thanks John, > > >>>>>> Bret Slatkins' talk is impressive. Let's say we have m skills > >>>>>> with n > >>>>>> levels. (i.e., m x n SkillLevel entities). Each SkillLevel entity > >>>>>> consists of at least one SkillLevelIndex. > > >>>>>> We define similarity between user A and User B as number of > >>>>>> skills > >>>>>> with same level. i.e., number of SkillLevel entities of query: > >>>>>> "from SkillLevel where userKeyList contains A and userKeyList > >>>>>> contains > >>>>>> B" > > >>>>>> It works fine if userKeyList contains all user keys. However, > >>>>>> after we > >>>>>> applied relation index pattern, we have more than one user keys > >>>>>> lists, > >>>>>> then how to perform a query to calculate similarity between two > >>>>>> users? > > >>>>>> On Mar 10, 12:21 pm, John Patterson <[email protected]> > >>>>>> wrote: > >>>>>>> On 10 Mar 2010, at 10:53, Max wrote: > > >>>>>>>> Rusty Wright suggested a list of user keys to be stored in > >>>>>>>> skill > >>>>>>>> entity. But that means only 5000 users can have the same skill. > > >>>>>>> If you use the "Relation Index Entity" pattern as described in > >>>>>>> Bret > >>>>>>> Slatkins talk you can store 5000 users per index entity and an > >>>>>>> unlimited number of index entities. This will actually give you > >>>>>>> much > >>>>>>> better query performance too because you will not need to load > >>>>>>> the > >>>>>>> list of 5000 Keys every time you read your main entity. > > >>>>>>> The new release of Twig has direct support for RIE's and can > >>>>>>> actually > > ... > > read more » -- You received this message because you are subscribed to the Google Groups "Google App Engine for Java" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine-java?hl=en.
