We have already done some testings on RDBMS and the performance is not
acceptable to us. (for the second query, that means self join a table
with 10 million records for n times). That's why we try GAE now.

Thank you.

On Mar 16, 1:54 am, Max <[email protected]> wrote:
> Hi John,
>
> I am designing a quite similar data model of cited *User-Skill*
> problem, but not exactly the same.
>
> People may not be familiar with our domain. Basically it will be a
> track record system. User can perform different tasks and same task
> can be done by many users. According to historical data, more than
> 5000 different users will finish the same task, and it is possible for
> some users to finish more than 5000 tasks within the data archive
> period. Additionally, following queries will be performed frequently:
> 1, given 2 user, A and B, find out common tasks they have done  (less
> important)
> 2, given several tasks, find out users who have done all these tasks
> (more important)
>
> Translate our problem into user-skill scenario, in this case, one user
> can have more than 5000 skills and there could be more than 5000 users
> having the same skill.
> 1, given 2 user, A and B, how to find out their mutual skills
> 2, given n skills, how to find full list of users having all n skills
>
> Best regards,
> Max
>
> On Mar 15, 5:59 pm, John Patterson <[email protected]> wrote:
>
> > I was meaning just put the UserSkills of the two people into the set.  
> > Each person only has a small number of skills yeah?
>
> > Perhaps I mis understood your last requirement "similarity between  
> > user A and User B"
>
> > On 15 Mar 2010, at 14:23, Max wrote:
>
> > > Hi John,
>
> > > Thanks for your reply. I need some time to study and test your codes.
>
> > > For the last point, Sets.intersection() means we need to load all keys
> > > into memory and perform an in memory Sets.intersection(). Is that
> > > possible to do this by a query directly. In other words, is that
> > > possible to use more than one equality filter on a list property of a
> > > relation entity index for their parents?
>
> > > Best regards,
> > > Max
>
> > > On Mar 15, 2:45 pm, John Patterson <[email protected]> wrote:
> > >> Hi Max,
>
> > >> Regarding your original question, a more efficient solution would be
> > >> to embed the UesrSkill in the User instance which would allow you to
> > >> find all results in a single query.  Th problem is that embedded
> > >> instances can only be queried on a single value.  There would be no
> > >> way to query skill and ability on the same UserSkill - just "java  
> > >> and c
> > >> ++ with any skill over 3 and any skill over 5"
>
> > >> To solve this you could create a combined property in UserSkill for
> > >> querying "skillAbility" which would hold values such as "java:5", "c
> > >> ++:
> > >> 4".  This will only work with skill from 0-9 because it depends on
> > >> lexical ordering (or e.g. 000 - 999)
>
> > >> Both Twig and Objectify but not JDO support embedded collections of
> > >> instances.
>
> > >> In Twig it would be defined like this
>
> > >> class User
> > >> {
> > >>         @Embed List<UserSkill> skills;
>
> > >> }
>
> > >> class UserSkill
> > >> {
> > >>         String skillAbility;
> > >>         Skill skill;    // direct reference to Skill instance
> > >>         int ability;
>
> > >> }
>
> > >> Disclaimer: I have not tried any of this code - it is just off the  
> > >> top
> > >> of my head
>
> > >> You would then do a single range query to find "java-5", "java-6,
> > >> "java-7"...
>
> > >> // find java developers with ability over 5 in a single query
> > >> datastore.find().type(User.class)
> > >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> > >> 5")  // range
> > >> start
> > >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> > >> Character.MAX_VALUE)  // range end
> > >>         .returnResultsNow();
>
> > >> But that doesn't fully answer your question which includes an AND on
> > >> multiple property values which is not supported by the datastore.  To
> > >> do this you will need to perform two queries and merge the results.
>
> > >> Twig has support for merging only OR queries right now so you can do:
>
> > >> // find users with c++ ability > 2 OR java ability > 5
>
> > >> RootFindCommand  or = datastore.find().type(User.class);  // default
> > >> (only) operator is OR
>
> > >> or.addChildCommand()
> > >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> > >> 5")  // range
> > >> start
> > >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> > >> Character.MAX_VALUE);  // range end
>
> > >> or.addChildCommand()
> > >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> > >> 5")  // range
> > >> start
> > >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> > >> Character.MAX_VALUE);  // end
>
> > >> // merges results from both queries into a single iterator
> > >> Iterator<User> results = or.returnResultsNow();
>
> > >> Supporting AND merges is coming!  Add a feature request if you like.
> > >> But for now you will have to do two separate queries as in the first
> > >> example and join the results in your own code.  You should make sure
> > >> both queries are sorted by key then you can "stream" the results
> > >> without loading them all into memory at once.
>
> > >> // find java developers with ability over 5
> > >> datastore.find().type(User.class)
> > >>         .addSort("skillAbility")      // first sort required to be  
> > >> inequality filter
> > >>         .addSort(Entity.KEY_RESERVED_PROPERTY)  // ensure results  
> > >> in same order
> > >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "java:
> > >> 5")  // range
> > >> start
> > >>         .addFilter("skillAbility", LESS_THAN, "java-" +
> > >> Character.MAX_VALUE)  // range end
> > >>         .returnResultsNow();
>
> > >> // find c++ developers with ability over 2
> > >> datastore.find().type(User.class)
> > >>         .addSort("skillAbility")
> > >>         .addSort(Entity.KEY_RESERVED_PROPERTY)
> > >>         .addFilter("skillAbility", GREATER_THAN_EQUAL, "c++:2")  //  
> > >> range start
> > >>         .addFilter("skillAbility", LESS_THAN, "c++:-" +
> > >> Character.MAX_VALUE)  // range end
> > >>         .returnResultsNow();
>
> > >> // now iterate through both results and only include those in both
> > >> iterators
>
> > >> Again, I have not run this code so I might have made a mistake.  Let
> > >> me know how you get on!  I'll be adding support for these merged AND
> > >> queries on multiple property values soon - unless someone else wants
> > >> to contribute it first ;)
>
> > >> To find the similarity between two users is now simple now that they
> > >> are just a property of the User?  just do a Sets.intersection() of  
> > >> the
> > >> skills.
>
> > >> John
>
> > >> On 15 Mar 2010, at 12:07, Max wrote:
>
> > >>> Thanks John,
>
> > >>> Bret Slatkins' talk is impressive. Let's say we have m skills with n
> > >>> levels. (i.e., m x n SkillLevel entities). Each SkillLevel entity
> > >>> consists of at least one SkillLevelIndex.
>
> > >>> We define similarity between user A and User B as number of  skills
> > >>> with same level. i.e., number of SkillLevel entities of query:
> > >>> "from SkillLevel where userKeyList contains A and userKeyList  
> > >>> contains
> > >>> B"
>
> > >>> It works fine if userKeyList contains all user keys. However,  
> > >>> after we
> > >>> applied relation index pattern, we have more than one user keys  
> > >>> lists,
> > >>> then how to perform a query to calculate similarity between two  
> > >>> users?
>
> > >>> On Mar 10, 12:21 pm, John Patterson <[email protected]> wrote:
> > >>>> On 10 Mar 2010, at 10:53, Max wrote:
>
> > >>>>> Rusty Wright suggested a list of user keys to be stored in skill
> > >>>>> entity. But that means only 5000 users can have the same skill.
>
> > >>>> If you use the "Relation Index Entity" pattern as described in Bret
> > >>>> Slatkins talk you can store 5000 users per index entity and an
> > >>>> unlimited number of index entities.  This will actually give you  
> > >>>> much
> > >>>> better query performance too because you will not need to load the
> > >>>> list of 5000 Keys every time you read your main entity.
>
> > >>>> The new release of Twig has direct support for RIE's and can  
> > >>>> actually
> > >>>> let you query for multiple skills *in parallel* then merge the
> > >>>> results
> > >>>> and return the parents :
>
> > >>>>http://code.google.com/p/twig-persist/wiki/Using#Relation_Index_Entities
>
> > >>> --
> > >>> You received this message because you are subscribed to the Google
> > >>> Groups "Google App Engine for Java" group.
> > >>> To post to this group, send email to 
> > >>> [email protected]
> > >>> .
> > >>> To unsubscribe from this group, send email to 
> > >>> [email protected]
> > >>> .
> > >>> For more options, visit this group 
> > >>> athttp://groups.google.com/group/google-appengine-java?hl=en
> > >>> .
>
> > > --
> > > You received this message because you are subscribed to the Google  
> > > Groups "Google App Engine for Java" group.
> > > To post to this group, send email to 
> > > [email protected]
> > > .
> > > To unsubscribe from this group, send email to 
> > > [email protected]
> > > .
> > > For more options, visit this group 
> > > athttp://groups.google.com/group/google-appengine-java?hl=en
> > > .

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine for Java" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine-java?hl=en.

Reply via email to