Hi, Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura proposal. There will be about 100 websites to capture data on daily basis. Each website adds per day(average) 2 articles.
Thomas talked about the noSQL possibility. What do you think would be better? I have no experience in noSQL and that could be a weakness. Best Regards, André On Mon, Jan 3, 2011 at 11:58 AM, Thomas Schmidt < postg...@stephan.homeunix.net> wrote: > Hello, > > Am 03.01.11 12:46, schrieb Radosław Smogura: > > I can propose you something like this: >> >> website(id int, url varchar); >> attr_def (id int, name varchar); >> attr_val (id int, def_id reference attr_def.id, website_id int references >> website.id, value varchar); >> If all of your attributes in website are single valued then you can remove >> id from attr_val and use PK from website_id, def_id. >> >> Depending on your needs one or many from following indexes: >> attr_val(value) - search for attributes with value; >> > (...) > > Probably you will use 2nd or 3rd index. >> >> Example of search on website >> select d.name, v.value from attre_def d join attr_val v on (v.def_id = >> d.id) join website w on (v.website_id = w.id) >> where d.name = 'xxxx' and w.url='http://somtehing' >> > > Imho its hard - (if not impossible) to recommand a specific database scheme > (incl indexes) without knowing the applications taking plance behind it. > Your schema is nice for specific querying, but might blow up if lots of > data is stored in the database (joins, index-building might be time > consuming). > On the other hand, google put some effort into their "BigTable" > http://en.wikipedia.org/wiki/BigTable for storing tons of data... > > Thus - it all depends on the usage :-) > > > Thomas > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >