Hi,

Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura
proposal. There will be about 100 websites to capture data on daily basis.
Each website adds per day(average) 2 articles.

Thomas talked about the noSQL possibility. What do you think would be
better? I have no experience in noSQL and that could be a weakness.

Best Regards,
André




On Mon, Jan 3, 2011 at 11:58 AM, Thomas Schmidt <
postg...@stephan.homeunix.net> wrote:

>  Hello,
>
> Am 03.01.11 12:46, schrieb Radosław Smogura:
>
>  I can propose you something like this:
>>
>> website(id int, url varchar);
>> attr_def (id int, name varchar);
>> attr_val (id int, def_id reference attr_def.id, website_id int references
>> website.id, value varchar);
>> If all of your attributes in website are single valued then you can remove
>> id from attr_val and use PK from website_id, def_id.
>>
>> Depending on your needs one or many from following indexes:
>> attr_val(value) - search for attributes with value;
>>
> (...)
>
>  Probably you will use 2nd or 3rd index.
>>
>> Example of search on website
>> select d.name, v.value from attre_def d join attr_val v on (v.def_id =
>> d.id) join website w on (v.website_id = w.id)
>> where d.name = 'xxxx' and w.url='http://somtehing'
>>
>
> Imho its hard - (if not impossible) to recommand a specific database scheme
> (incl indexes) without knowing the applications taking plance behind it.
> Your schema is nice for specific querying, but might blow up if lots of
> data is stored in the database (joins, index-building might be time
> consuming).
> On the other hand, google put some effort into their "BigTable"
> http://en.wikipedia.org/wiki/BigTable for storing tons of data...
>
> Thus - it all depends on the usage :-)
>
>
> Thomas
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Reply via email to