Well, I guess the answer to the thread is looking like:

- a data warehouse is preferable to multiple db connections and cross-
vendor joins

But that's only because I haven't heard anyone voice support for the
multi-db idea.

Now the question is:

How should the warehouse be constructed in tandem with new
functionality and data entities?

What I've read about data warehouses seems to limit them to read-only
OLAP, with special data modelling (namely dimensional, as opposed to
the ER modelling I'm used to).  But in the example I give in my first
post, I'm talking about creating new order and order_item data with
keys to data in the warehouse.  Should that data be stored in the
warehouse, as well, so I don't have to go back to multiple db
connections for my joins?  If it co-exists with the read-only data,
should it all be modelled the same? (all ER or all dimensional)  Or
can it be mixed?

I would start a new thread, but whereas the multi-db issue related
directly to developing Django technology, I'm not sure the issue of
the warehouse design is an appropriate topic for this group (although
I'm also not sure where I should discuss it).

On Sep 24, 2:36 am, Joshua Russo <josh.r.ru...@gmail.com> wrote:
> If he wants to discuss it here I can do that too. It was just a little off
> topic, but not not too much I suppose.
>
> On Thu, Sep 24, 2009 at 7:41 AM, nausikaa <g.n.muel...@gmail.com> wrote:
>
> > Hi Joshua
>
> > Thanks for sharing your knowledge.
> > Too bad, I would've liked to read your suggestion to Tony as well. ; )
> > I'm a just graduate and at work I'm working on something very similar
> > to what has been discussed here.
>
> > On Sep 23, 9:51 pm, Joshua Russo <josh.r.ru...@gmail.com> wrote:
> > > I have many years of database design experience and it sounds like you
> > are
> > > getting lost in terminology.
> > > I would start out with a set of tables in the warehouse that mirror the
> > > existing data. In these tables you can either wipe and reload each time,
> > or
> > > create a mechanism to track each individual load.
>
> > > You can then simply write your application against these tables or if you
> > > feel adventurous you can add a more unified set of tables with linking
> > > foreign keys and such and have your process also update the cleaner/more
> > > unified data set. This second part get's tricky and takes practice to
> > know
> > > you have a good structure. You will also have to track changes,
> > additions,
> > > and deletions. If you want some help or advice I would be happy to take a
> > > gander at your structures to show you what I might do. Feel free to email
> > me
> > > directly at josh.r.ru...@gmail.com.
>
> > > On Wed, Sep 23, 2009 at 6:35 PM, Tony Schmidt <tschm...@sacfoodcoop.com
> > >wrote:
>
> > > > Thanks for your reply, Joshua.
>
> > > > Complexity is exactly what I'm trying to avoid - but after doing a
> > > > little research on data warehouses (I've never built one), that seems
> > > > like a very complex route as well!
>
> > > > I would like to just come up with my "dream schema" and run update
> > > > scripts on a daily or event-driven basis, but there seem to be a
> > > > million concepts related to data warehousing: data marts, dimensions,
> > > > star schemas, snowflakes, EAV tables, and so on.
>
> > > > It seems more straight forward if I could just connect directly to the
> > > > operational systems.  I wonder if all that theory is really necessary
> > > > for a basic data warehouse...
>
> > > > On Sep 23, 12:09 pm, Joshua Russo <josh.r.ru...@gmail.com> wrote:
> > > > > The multi-db branch is just now in the process of being baked into
> > the
> > > > core
> > > > > and from what I can tell it's not quite done yet. Even when it is
> > > > completely
> > > > > done I would recommend the data warehouse approach. I view the
> > multi-db
> > > > > functionality more as a last resort, where you really don't have an
> > > > option
> > > > > to merge the data. It adds a lot of complexity so if you can do
> > without I
> > > > > would recommend an alternative.
> > > > > Just my 2 cents
> > > > > Josh
>
> > > > > On Wed, Sep 23, 2009 at 4:05 PM, Tony Schmidt <
> > tschm...@sacfoodcoop.com
> > > > >wrote:
>
> > > > > > Hi, Nausikaa.  Thanks for your reply.
>
> > > > > > Unfortunately, the legacy systems must remain in place until they
> > are
> > > > > > gradually (if ever) phased out.  There's a whole bunch of
> > > > > > functionality that I don't want to have to recreate in those
> > systems
> > > > > > (POS, inventory/accounting, products, etc.).  I just want to build
> > new
> > > > > > functionality that accesses that data (like the example of an order
> > > > > > entry form that creates new order/item data entities in a new DB
> > with
> > > > > > keys to entities in the other DBs or to their versions in the
> > > > > > warehouse).  Most of it should be read only.
>
> > > > > > I'm starting to hear that a data warehouse is the way to go - but
> > then
> > > > > > there's the question of data warehouse vs. data mart, Inmon vs.
> > > > > > Kimball, and how I can get started building one in python.  I'm not
> > > > > > hearing many suggestions for the multi-db approach (which makes me
> > > > > > wonder what the Django muli-db branch is for?)
>
> > > > > > On Sep 22, 2:01 am, nausikaa <g.n.muel...@gmail.com> wrote:
> > > > > > > Hi snfctech
>
> > > > > > > With warehouse I assume you mean keeping the datasources and
> > periodic
> > > > > > > transfer into a central db (the warehouse).
> > > > > > > Why not migrate all your datasources into e.g. a PostgreDQL db?
> > > > > > > It is easy to write forms and implement logins/access rights in
> > > > > > > django so that your non-technical users can read or edit the
> > > > > > > data. Besides you'd remove some (unnecessary) heterogenity and
> > > > thereby
> > > > > > > complexity from your system.
> > > > > > > But since I don't know your system I might be missing the point
> > > > > > > completely.
>
> > > > > > > Nausikaa
>
> > > > > > > On Sep 22, 3:10 am, snfctech <tschm...@sacfoodcoop.com> wrote:
>
> > > > > > > > I understand that there is a Django branch being actively
> > worked on
> > > > > > > > for connections to multiple DB vendors, or that Django + Elixir
> > may
> > > > be
> > > > > > > > a good option.  But I'm wondering if building a single data
> > > > warehouse
> > > > > > > > may still be a better way to go?
>
> > > > > > > > Here's an example of some of the relations I'm going to have to
> > > > build
> > > > > > > > for my project:
>
> > > > > > > > I've got order and order_item tables with their own data and
> > > > relations
> > > > > > > > to members (Access DB), products (flat file) and employees
> > (MySQL).
>
> > > > > > > > I initially thought that the best way to manage this would be
> > to
> > > > > > > > create a new DB for the order and order_item tables, and then
> > > > create
> > > > > > > > cross-vendor joins in the ORM.  But then I came across an
> > > > unexpected
> > > > > > > > advantage of having all the data in an updated warehouse - my
> > semi-
> > > > > > > > technical staff could still use products like OOBase, that are
> > > > limited
> > > > > > > > to a single vendor connection, to make reports and forms based
> > on
> > > > the
> > > > > > > > warehouse data.
>
> > > > > > > > So now I'm wondering - are direct connections to multiple
> > databases
> > > > > > > > really the best way to go?  Or are there more advantages to
> > > > building a
> > > > > > > > data warehouse (keeping in mind the complexities of the data
> > > > > > > > replication, scripts for pushing and pulling data, etc.)
>
> > > > > > > > Thanks in advance for any tips.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to