> Well, what problem exactly are you trying to solve?
> Having large tables itself isn't a problem, but it often
> tends to imply other things that might be problematic:

I'm trying to troubleshoot a very low cache hit rate as returned by this query:

    SELECT  sum(heap_blks_read) as heap_read,
            sum(heap_blks_hit)  as heap_hit,
            sum(heap_blks_hit) / (sum(heap_blks_hit) +
sum(heap_blks_read)) as ratio
    FROM    pg_statio_user_tables;

So I think that's your fourth option:

> - blowing out your buffer cache with useless dirty pages - this is where you 
> might want to look into separate servers.

So it sounds like I'm on the right track. But a separate
cluster/server seems like a drastic solution.

Thanks,
Paul


On Mon, Mar 11, 2013 at 12:17 PM, Ben Chobot <be...@silentmedia.com> wrote:
>
> On Mar 9, 2013, at 11:54 AM, Paul Jungwirth wrote:
>
> Hello,
>
> I'm running a specialized search engine that indexes a few tens of millions 
> of web pages, keeping everything in Postgres, and one problem I'm starting to 
> see is poor cache hit rates. My database has two or three tables just for the 
> text of the scraped pages, with one row every time a page is scraped and a 
> `text` column for the HTML. These tables are almost-but-not-quite write-only. 
> They are only read by one job, which uses them to create other objects in the 
> system. I'd like the rest of the database to be in-memory all the time, but I 
> don't really care if I have to read these tables from disk. To keep my 
> problem tables from dragging down performance on the rest of the system, I'm 
> thinking of splitting them out. I suppose I could save the HTML outside of 
> Postgres entirely, but I'm wondering if a nicer solution would be to keep 
> these tables in a separate "cluster" (i.e. 
> /var/lib/postgresql/9.1/something_else -- I hope this is the right 
> terminology). Then I could tune that cluster differently from the main 
> cluster, or even put it on a different machine. And I could still use dblink 
> to query both clusters conveniently (I think; this isn't really that 
> important). Does this seem like a worthwhile approach? Is there a better way 
> to deal with a few out-sized tables?
>
>
> Well, what problem exactly are you trying to solve? Having large tables 
> itself isn't a problem, but it often tends to imply other things that might 
> be problematic:
>
> - large vacuum or analyze times. (Probably fixable in your case with 
> per-table autovacuum thresholds.)
> - large disk usage. (Probably fixable in your case with tablespaces.)
> - slow selects or updates. (Probably fixable in your case with partitioning.)
> - blowing out your buffer cache with useless dirty pages - this is where you 
> might want to look into separate servers.




--
_________________________________
Pulchritudo splendor veritatis.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to