Well, I guess the answer to the thread is looking like: - a data warehouse is preferable to multiple db connections and cross- vendor joins
But that's only because I haven't heard anyone voice support for the multi-db idea. Now the question is: How should the warehouse be constructed in tandem with new functionality and data entities? What I've read about data warehouses seems to limit them to read-only OLAP, with special data modelling (namely dimensional, as opposed to the ER modelling I'm used to). But in the example I give in my first post, I'm talking about creating new order and order_item data with keys to data in the warehouse. Should that data be stored in the warehouse, as well, so I don't have to go back to multiple db connections for my joins? If it co-exists with the read-only data, should it all be modelled the same? (all ER or all dimensional) Or can it be mixed? I would start a new thread, but whereas the multi-db issue related directly to developing Django technology, I'm not sure the issue of the warehouse design is an appropriate topic for this group (although I'm also not sure where I should discuss it). On Sep 24, 2:36 am, Joshua Russo <josh.r.ru...@gmail.com> wrote: > If he wants to discuss it here I can do that too. It was just a little off > topic, but not not too much I suppose. > > On Thu, Sep 24, 2009 at 7:41 AM, nausikaa <g.n.muel...@gmail.com> wrote: > > > Hi Joshua > > > Thanks for sharing your knowledge. > > Too bad, I would've liked to read your suggestion to Tony as well. ; ) > > I'm a just graduate and at work I'm working on something very similar > > to what has been discussed here. > > > On Sep 23, 9:51 pm, Joshua Russo <josh.r.ru...@gmail.com> wrote: > > > I have many years of database design experience and it sounds like you > > are > > > getting lost in terminology. > > > I would start out with a set of tables in the warehouse that mirror the > > > existing data. In these tables you can either wipe and reload each time, > > or > > > create a mechanism to track each individual load. > > > > You can then simply write your application against these tables or if you > > > feel adventurous you can add a more unified set of tables with linking > > > foreign keys and such and have your process also update the cleaner/more > > > unified data set. This second part get's tricky and takes practice to > > know > > > you have a good structure. You will also have to track changes, > > additions, > > > and deletions. If you want some help or advice I would be happy to take a > > > gander at your structures to show you what I might do. Feel free to email > > me > > > directly at josh.r.ru...@gmail.com. > > > > On Wed, Sep 23, 2009 at 6:35 PM, Tony Schmidt <tschm...@sacfoodcoop.com > > >wrote: > > > > > Thanks for your reply, Joshua. > > > > > Complexity is exactly what I'm trying to avoid - but after doing a > > > > little research on data warehouses (I've never built one), that seems > > > > like a very complex route as well! > > > > > I would like to just come up with my "dream schema" and run update > > > > scripts on a daily or event-driven basis, but there seem to be a > > > > million concepts related to data warehousing: data marts, dimensions, > > > > star schemas, snowflakes, EAV tables, and so on. > > > > > It seems more straight forward if I could just connect directly to the > > > > operational systems. I wonder if all that theory is really necessary > > > > for a basic data warehouse... > > > > > On Sep 23, 12:09 pm, Joshua Russo <josh.r.ru...@gmail.com> wrote: > > > > > The multi-db branch is just now in the process of being baked into > > the > > > > core > > > > > and from what I can tell it's not quite done yet. Even when it is > > > > completely > > > > > done I would recommend the data warehouse approach. I view the > > multi-db > > > > > functionality more as a last resort, where you really don't have an > > > > option > > > > > to merge the data. It adds a lot of complexity so if you can do > > without I > > > > > would recommend an alternative. > > > > > Just my 2 cents > > > > > Josh > > > > > > On Wed, Sep 23, 2009 at 4:05 PM, Tony Schmidt < > > tschm...@sacfoodcoop.com > > > > >wrote: > > > > > > > Hi, Nausikaa. Thanks for your reply. > > > > > > > Unfortunately, the legacy systems must remain in place until they > > are > > > > > > gradually (if ever) phased out. There's a whole bunch of > > > > > > functionality that I don't want to have to recreate in those > > systems > > > > > > (POS, inventory/accounting, products, etc.). I just want to build > > new > > > > > > functionality that accesses that data (like the example of an order > > > > > > entry form that creates new order/item data entities in a new DB > > with > > > > > > keys to entities in the other DBs or to their versions in the > > > > > > warehouse). Most of it should be read only. > > > > > > > I'm starting to hear that a data warehouse is the way to go - but > > then > > > > > > there's the question of data warehouse vs. data mart, Inmon vs. > > > > > > Kimball, and how I can get started building one in python. I'm not > > > > > > hearing many suggestions for the multi-db approach (which makes me > > > > > > wonder what the Django muli-db branch is for?) > > > > > > > On Sep 22, 2:01 am, nausikaa <g.n.muel...@gmail.com> wrote: > > > > > > > Hi snfctech > > > > > > > > With warehouse I assume you mean keeping the datasources and > > periodic > > > > > > > transfer into a central db (the warehouse). > > > > > > > Why not migrate all your datasources into e.g. a PostgreDQL db? > > > > > > > It is easy to write forms and implement logins/access rights in > > > > > > > django so that your non-technical users can read or edit the > > > > > > > data. Besides you'd remove some (unnecessary) heterogenity and > > > > thereby > > > > > > > complexity from your system. > > > > > > > But since I don't know your system I might be missing the point > > > > > > > completely. > > > > > > > > Nausikaa > > > > > > > > On Sep 22, 3:10 am, snfctech <tschm...@sacfoodcoop.com> wrote: > > > > > > > > > I understand that there is a Django branch being actively > > worked on > > > > > > > > for connections to multiple DB vendors, or that Django + Elixir > > may > > > > be > > > > > > > > a good option. But I'm wondering if building a single data > > > > warehouse > > > > > > > > may still be a better way to go? > > > > > > > > > Here's an example of some of the relations I'm going to have to > > > > build > > > > > > > > for my project: > > > > > > > > > I've got order and order_item tables with their own data and > > > > relations > > > > > > > > to members (Access DB), products (flat file) and employees > > (MySQL). > > > > > > > > > I initially thought that the best way to manage this would be > > to > > > > > > > > create a new DB for the order and order_item tables, and then > > > > create > > > > > > > > cross-vendor joins in the ORM. But then I came across an > > > > unexpected > > > > > > > > advantage of having all the data in an updated warehouse - my > > semi- > > > > > > > > technical staff could still use products like OOBase, that are > > > > limited > > > > > > > > to a single vendor connection, to make reports and forms based > > on > > > > the > > > > > > > > warehouse data. > > > > > > > > > So now I'm wondering - are direct connections to multiple > > databases > > > > > > > > really the best way to go? Or are there more advantages to > > > > building a > > > > > > > > data warehouse (keeping in mind the complexities of the data > > > > > > > > replication, scripts for pushing and pulling data, etc.) > > > > > > > > > Thanks in advance for any tips. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---