Re: What's the best way to handle big sets of data?

Bruno Tikami Mon, 01 Oct 2007 10:08:21 -0700

Hello Tim and Richard,

Sorry for the late reply but I was offline for the last days.


I asked the same question to Brazilian Django users and I've got the same
answer 'focus on your sgbd, Django is not going to be your problem' so I
decided to have a better look on Postgres  and start to do some tests BEFORE
writing my Django project.

Tks !

Tkm

On 9/25/07, Tim Chase <[EMAIL PROTECTED]> wrote:
>
>
> > I'm developing a Django project that's going to handle with
> > big sets of data and want you to advise me. I have 10 internal
> > bureaus and each of then has a 1.5 million registers database
> > and it really looks to keep growwing on size on and on.  I
> > intend to use Postgres.
> >
> > The question:  what's the best way to handle and store this
> > data? I tought about breaking the app model into 10 smaller
> > ones (Bureau_1, Bureau_2, Bureau_3 etc) cause the main reports
> > are splited by Bureau. Response time matters. What do you
> > think?
>
> I deal with fairly large datasets (my employer does cell-phone
> management, tending tens of thousands of phones, for hundreds of
> companies, with historical statement detail for each phone, and
> about 2.8e6 records of call-detail for those clients that require
> the 3mo worth that we'll keep...and only growing).
>
> I can't say that splitting across multiple databases makes a very
> useful partitioning and it forces you to design your application
> around performance.  It also becomes a maint. headache as you
> have to touch each DB (or script) when performing changes.
> Rather than just adding a column to a table, you have to spew
> your ALTER TABLE statement across each DB.  It (unless each is on
> its own machine, where it doesn't matter) would also not be able
> to agressivly cache common tables.
>
> Learning the ins-and-outs of Postgresql's EXPLAIN command can
> help you find bottle-necks (such as missing indexes).  I'm afraid
> I haven't become adroit at this.
>
> The VACUUM ANALYZE can find and fix areas of usage that
> Postgresql can optimize.
>
> I have had some performance problems with that call-detail table
> (with its 2.8e6 rows or so), but find that as it's indexed, as
> long as I pull from a joined table and only pull in the records I
> care about, it can be pretty snappy.  It's mostly sluggish when I
> try and do operations across the whole table rather than a subset
> of it, but even then, it's not too bad.
>
> Fast disks (a raid configuration helps) and loads of memory (as
> much as your server will hold, or at least a couple gigs if
> you've got a super-server) will go a long way towards easing your
> data pains.  Multiple processors can help too, but most notably
> after you've eased the IO/memory bottle-necks.
>
> Just my obeservations from the field,
>
> -tim
>
>
>
>
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: What's the best way to handle big sets of data?

Reply via email to