> I'm developing a  large webapp using django. One of my requirements
> is
> that it needs to be able to handle 10,000+ different entities or
> models that need to be associated with a user. A single user have
> needs to be able to to associate himself with any of the existing
> models and have one record per model. Each model will have an average
> of 30 fields (each need to be searchable). There will be several
> hundred thousands and in some occasions millions of records per model
> and we expect to have millions of users using the webapp.
> 
> My questions are:
> 
> 1) Do you recommend using a django model per entity or should I try a
> different approach?

Looking at things from a different perspective may help reframe
the problem in a more manageable context:

  class Entity(Model):
    name = CharField(...)

  class Values(Model):
    entity = ForeignKey(Entity)
    value = CharField(...)

  class Person(Model):
    name = CharField(...)
    entity_values = ManyToMany(Values)
    # other stuff

This allows you to set up an arbitrary number of entities, each
with their own number of allowed values.  Thus, you might have
Entities such as "Manager" (with values "John Smith", "Jane
Miller"), "Pet's Name" (with values "Spot", "Fluffy", and "Rex"),
"Favorite Breakfast Cereal" (with values "Cheerios", "Oatmeal",
and "Chocolate Frosted Sugar Bombs"), etc.

Entities and their values can be added arbitrarily, and users can
be associated with as many of them as you need.

The only caveat comes with searching...until the query-set
refactor hits the trunk, you have to do some spiffy SQL extra()
calls to do things that would ordinarily be written something like

  p = Person.entity_values.filter(
    Q(entity__name='Manager', value='John Smith'),
    Q(entity__name='Favorite Breakfast Cereal',
      value='Chocolate Frosted Sugar Bombs')
    ) # yes, the Q()'s are redundant, but it makes the
    # problem's intent clearer

to find people that have John Smith as their manager and
Chocolate Frosted Sugar Bombs as their favorite breakfast cereal.

However because of the way the SQL is currently generated, this
produces a null set because it's asking for an impossible
condition in the join (that a single field, named "entity__name"
or "value" be assigned multiple values at the same time).

I've posted my interim solution several times here on the ML (to
use an extra() call and an IN/EXISTS clause) if you want an
example of how to work around the problem in such a context[1].

I've worked on some large-scale "enterprise" applications[2] in
my life and having 1000+ tables all associated with a given
entity generally indicates a design flaw.

-tim

[1]
http://groups.google.com/group/django-users/browse_thread/thread/dbf9068482849d7/d0de78597fa6b9f7#d0de78597fa6b9f7

http://groups.google.com/group/django-users/browse_thread/thread/8e265aeb33f3ec32/5c169c88eef79409?#5c169c88eef79409

http://groups.google.com/group/django-users/browse_thread/thread/9517fe61d1e8e20f/aab62f3a3e1f5ba0#aab62f3a3e1f5ba0

http://groups.google.com/groups/search?q=exists&qt_s=Search&enc_author=f_5GNh4AAADtSWJxSFR4zrj8u9Z0bQb_U9DkLoOxide0N_XCIlgvOQ

[2] the system used by the PA Dept of Corrections, with several
hundreds of tables for everything from inmate intake to tracking
litigation to keeping tabs on the cable-TV privileges allotted to
various inmates.








--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to