Hi John,

Thanks for your reply. To be honest I am quite disappointed that GAE
can not perform such query directly.

According do your suggestion, to query with n filters on one relation
index entity, we need to:
1, perform n separate queries to obtain n cursors of UserSkill
entities by some sorting order
2, use in-memory zig-zag method to obtain User keys match all query
3, use a batch get to obtain all User entities

Please correct me if my understanding is wrong.

Best regards,
Max

On Mar 16, 10:53 am, John Patterson <[email protected]> wrote:
> Hi Max, in that case definitely the model suggested by Jeff would be  
> perfect.  So something like:
>
> class User
> {
>         String name;
>
> }
>
> class UserSkill
> {
>         String skill;
>         int ability;
>
> }
>
> When you store them make sure the User is the parent of the UserSkill  
> so they can be updated in a single transaction.
>
> datastore.store(aUser);
> datastore.store(aSkill, aUser);  // set the parent relationship
>
> Twig has support for this "Relation Entity" pattern like so:
>
> // return all high skilled java developers
> datastore.find().type(UserSkill.class)
>         .addFilter("skill", EQUAL, "java")
>         .addFilter("ability", GREATER_THAN, 4)
>         .returnParentsNow();
>
> To find Users with Java > 4 and C++ > 2 you still need to do two  
> queries.  To avoid having to load all the results into memory make  
> sure you sort by ability and key then "stream" the results using the  
> "zig-zag" method.  This will soon be built into Twig.  Also do  
> parallel queries to speed up the response.
>
> datastore.find().type(UserSkill.class)
>         .addSort("ability")
>         .addSort(Entity.KEY_RESERVED_PROPERTY)
>         .addFilter("skill", EQUAL, "java")
>         .addFilter("ability", GREATER_THAN, 4)
>         .returnParentsLater();
>
> If you are expecting a lot of results you should tune this query by  
> setting fetchResultsBy(100)  or something similar to reduce the number  
> of trips to the datastore.  The default is 20 results returned at a  
> time.
>
> Hope this helps!
>
> John
>
> On 16 Mar 2010, at 01:06, Max wrote:
>
> > We have already done some testings on RDBMS and the performance is not
> > acceptable to us. (for the second query, that means self join a table
> > with 10 million records for n times). That's why we try GAE now.
>
> > Thank you.
>
> > On Mar 16, 1:54 am, Max <[email protected]> wrote:
> >> Hi John,
>
> >> I am designing a quite similar data model of cited *User-Skill*
> >> problem, but not exactly the same.
>
> >> People may not be familiar with our domain. Basically it will be a
> >> track record system. User can perform different tasks and same task
> >> can be done by many users. According to historical data, more than
> >> 5000 different users will finish the same task, and it is possible  
> >> for
> >> some users to finish more than 5000 tasks within the data archive
> >> period. Additionally, following queries will be performed frequently:
> >> 1, given 2 user, A and B, find out common tasks they have done  (less
> >> important)
> >> 2, given several tasks, find out users who have done all these tasks
> >> (more important)
>
> >> Translate our problem into user-skill scenario, in this case, one  
> >> user
> >> can have more than 5000 skills and there could be more than 5000  
> >> users
> >> having the same skill.
> >> 1, given 2 user, A and B, how to find out their mutual skills
> >> 2, given n skills, how to find full list of users having all n skills
>
> >> Best regards,
> >> Max
>
> >> On Mar 15, 5:59 pm, John Patterson <[email protected]> wrote:
>
> >>> I was meaning just put the UserSkills of the two people into the  
> >>> set.
> >>> Each person only has a small number of skills yeah?
>
> >>> Perhaps I mis understood your last requirement "similarity between
> >>> user A and User B"
>
> >>> On 15 Mar 2010, at 14:23, Max wrote:
>
> >>>> Hi John,
>
> >>>> Thanks for your reply. I need some time to study and test your  
> >>>> codes.
>
> >>>> For the last point, Sets.intersection() means we need to load all  
> >>>> keys
> >>>> into memory and perform an in memory Sets.intersection(). Is that
> >>>> possible to do this by a query directly. In other words, is that
> >>>> possible to use more than one equality filter on a list property  
> >>>> of a
> >>>> relation entity index for their parents?
>
> >>>> Best regards,
> >>>> Max
>
> >>>> On Mar 15, 2:45 pm, John Patterson <[email protected]> wrote:
> >>>>> Hi Max,
>
> >>>>> Regarding your original question, a more efficient solution  
> >>>>> would be
> >>>>> to embed the UesrSkill in the User instance which would allow  
> >>>>> you to
> >>>>> find all results in a single query.  Th problem is that embedded
> >>>>> instances can only be queried on a single value.  There would be  
> >>>>> no
> >>>>> way to query skill and ability on the same UserSkill - just "java
> >>>>> and c
> >>>>> ++ with any skill over 3 and any skill over 5"
>
> >>>>> To solve this you could create a combined property in UserSkill  
> >>>>> for
> >>>>> querying "skillAbility" which would hold values such as "java:
> >>>>> 5", "c
> >>>>> ++:
> >>>>> 4".  This will only work with skill from 0-9 because it depends on
> >>>>> lexical ordering (or e.g. 000 - 999)
>
> >>>>> Both Twig and Objectify but not JDO support embedded collections  
> >>>>> of
> >>>>> instances.
>
> >>>>> In Twig it would be defined like this
>
> >>>>> class User
> >>>>> {
> >>>>>         @Embed List<UserSkill> skills;
>
> >>>>> }
>
> >>>>> class UserSkill
> >>>>> {
> >>>>>         String skillAbility;
> >>>>>         Skill skill;    // direct reference to Skill instance
> >>>>>         int ability;
>
> >>>>> }
>
> >>>>> Disclaimer: I have not tried any of this code - it is just off the
> >>>>> top
> >>>>> of my head
>
> >>>>> You would then do a single range query to find "java-5", "java-6,
> >>>>> "java-7"...
>
> >>>>> // find java developers with ability over 5 in a single query
> >>>>> datastore.find().type(User.class)
> >>>>>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >>>>> 5")  // range
> >>>>> start
> >>>>>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >>>>> Character.MAX_VALUE)  // range end
> >>>>>         .returnResultsNow();
>
> >>>>> But that doesn't fully answer your question which includes an  
> >>>>> AND on
> >>>>> multiple property values which is not supported by the  
> >>>>> datastore.  To
> >>>>> do this you will need to perform two queries and merge the  
> >>>>> results.
>
> >>>>> Twig has support for merging only OR queries right now so you  
> >>>>> can do:
>
> >>>>> // find users with c++ ability > 2 OR java ability > 5
>
> >>>>> RootFindCommand  or = datastore.find().type(User.class);  //  
> >>>>> default
> >>>>> (only) operator is OR
>
> >>>>> or.addChildCommand()
> >>>>>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >>>>> 5")  // range
> >>>>> start
> >>>>>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >>>>> Character.MAX_VALUE);  // range end
>
> >>>>> or.addChildCommand()
> >>>>>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >>>>> 5")  // range
> >>>>> start
> >>>>>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >>>>> Character.MAX_VALUE);  // end
>
> >>>>> // merges results from both queries into a single iterator
> >>>>> Iterator<User> results = or.returnResultsNow();
>
> >>>>> Supporting AND merges is coming!  Add a feature request if you  
> >>>>> like.
> >>>>> But for now you will have to do two separate queries as in the  
> >>>>> first
> >>>>> example and join the results in your own code.  You should make  
> >>>>> sure
> >>>>> both queries are sorted by key then you can "stream" the results
> >>>>> without loading them all into memory at once.
>
> >>>>> // find java developers with ability over 5
> >>>>> datastore.find().type(User.class)
> >>>>>         .addSort("skillAbility")      // first sort required to be
> >>>>> inequality filter
> >>>>>         .addSort(Entity.KEY_RESERVED_PROPERTY)  // ensure results
> >>>>> in same order
> >>>>>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> >>>>> 5")  // range
> >>>>> start
> >>>>>         .addFilter("skillAbility", LESS_THAN, "java-" +
> >>>>> Character.MAX_VALUE)  // range end
> >>>>>         .returnResultsNow();
>
> >>>>> // find c++ developers with ability over 2
> >>>>> datastore.find().type(User.class)
> >>>>>         .addSort("skillAbility")
> >>>>>         .addSort(Entity.KEY_RESERVED_PROPERTY)
> >>>>>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "c++:
> >>>>> 2")  //
> >>>>> range start
> >>>>>         .addFilter("skillAbility", LESS_THAN, "c++:-" +
> >>>>> Character.MAX_VALUE)  // range end
> >>>>>         .returnResultsNow();
>
> >>>>> // now iterate through both results and only include those in both
> >>>>> iterators
>
> >>>>> Again, I have not run this code so I might have made a mistake.  
> >>>>> Let
> >>>>> me know how you get on!  I'll be adding support for these merged  
> >>>>> AND
> >>>>> queries on multiple property values soon - unless someone else  
> >>>>> wants
> >>>>> to contribute it first ;)
>
> >>>>> To find the similarity between two users is now simple now that  
> >>>>> they
> >>>>> are just a property of the User?  just do a Sets.intersection() of
> >>>>> the
> >>>>> skills.
>
> >>>>> John
>
> >>>>> On 15 Mar 2010, at 12:07, Max wrote:
>
> >>>>>> Thanks John,
>
> >>>>>> Bret Slatkins' talk is impressive. Let's say we have m skills  
> >>>>>> with n
> >>>>>> levels. (i.e., m x n SkillLevel entities). Each SkillLevel entity
> >>>>>> consists of at least one SkillLevelIndex.
>
> >>>>>> We define similarity between user A and User B as number of  
> >>>>>> skills
> >>>>>> with same level. i.e., number of SkillLevel entities of query:
> >>>>>> "from SkillLevel where userKeyList contains A and userKeyList
> >>>>>> contains
> >>>>>> B"
>
> >>>>>> It works fine if userKeyList contains all user keys. However,
> >>>>>> after we
> >>>>>> applied relation index pattern, we have more than one user keys
> >>>>>> lists,
> >>>>>> then how to perform a query to calculate similarity between two
> >>>>>> users?
>
> >>>>>> On Mar 10, 12:21 pm, John Patterson <[email protected]>  
> >>>>>> wrote:
> >>>>>>> On 10 Mar 2010, at 10:53, Max wrote:
>
> >>>>>>>> Rusty Wright suggested a list of user keys to be stored in  
> >>>>>>>> skill
> >>>>>>>> entity. But that means only 5000 users can have the same skill.
>
> >>>>>>> If you use the "Relation Index Entity" pattern as described in  
> >>>>>>> Bret
> >>>>>>> Slatkins talk you can store 5000 users per index entity and an
> >>>>>>> unlimited number of index entities.  This will actually give you
> >>>>>>> much
> >>>>>>> better query performance too because you will not need to load  
> >>>>>>> the
> >>>>>>> list of 5000 Keys every time you read your main entity.
>
> >>>>>>> The new release of Twig has direct support for RIE's and can
> >>>>>>> actually
>
> ...
>
> read more »

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to