[GENERAL] Why facebook used mysql ?

2010-11-08 Thread Sandeep Srinivasa
There was an interesting post today on highscalability -
http://highscalability.com/blog/2010/11/4/facebook-at-13-million-queries-per-second-recommends-minimiz.html

The discussion/comments touched upon why mysql is a better idea for Facebook
than Postgres. Here's an interesting one


> One is that PG doesn't scale that well on multiple cores as MySQL nowadays.

Another is in fundamental differences of storage architecture - all
> MySQL/InnoDB data is either a clustered primary key, or secondary key with
> PK pointers - logical relationships between entries allow to have index-only
> scans, which are a must for web-facing databases (good response times, no
> variance).

One more reason is that in heavily indexed databases vacuum will have to do
> full index passes, rather than working with LRU.

As for sharding, etc - there's no way to scale vertically infinitely - so
> the "stupid people shard" point is very very moot.

It is much cheaper to go the commodity hardware path.


or

In general Postgresql is faster at complex queries with a lot of joins and
> such, while MySQL is faster at simple queries such as primary key look up.


I wonder if anyone can comment on this - especially the part that PG doesnt
scale as well as MySQL on multiple cores ?

regards
Sandeep


Re: [GENERAL] Why facebook used mysql ?

2010-11-08 Thread Sandeep Srinivasa
On Tue, Nov 9, 2010 at 10:31 AM, Richard Broersma <
richard.broer...@gmail.com> wrote:

> The following link contains hundreds of comments that you may be
> interested in, some that address issues that are much more interesting
> and well established:
>
>
> http://search.postgresql.org/search?q=mysql+performance&m=1&l=NULL&d=365&s=r&p=1
>

I did actually try to search for topics on multiple cores vs MySQL, but I
wasnt able to find anything of much use. Elsewhere (on Hacker News for
example), I have indeed come across statements that PG scales better on
multiple cores, which are usually offset by claims that MySQL is better.

Google isnt of much use for this either - while MySQL has several resources
talking about  benchmarks/tuning on multi core servers (e.g.
http://dimitrik.free.fr/blog/archives/2010/09/mysql-performance-55-notes.html),
I cant find any such serious discussion on Postgresql

However, what I did find (
http://www.pgcon.org/2008/schedule/events/72.en.html) was titled "*Problems
with PostgreSQL on Multi-core Systems with Multi-Terabyte Data*"
 (interestingly, published by the Postgresql Performance Team @ Sun)

Ergo, my question still stands - maybe my google-fu was bad... why is why I
am asking for help.

regards
Sandeep


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Sandeep Srinivasa
hi,
   I am the OP.

With due respect to everyone (and sincere apologies to Richard Broersma), my
intention was not to create a thread about MySQL/Oracle's business
practices.

It was about the technical discussion on Highscalability - I have been
trying to wrap my head around the concept of multiple core scaling for
Postgres, especially beyond 8 core (like Scott's Magny Coeurs example). My
doubt arises from  whether Postgres depends on the kernel scheduler for
multiple CPU/core utilization.

If that is the case, then does using FreeBSD vs Linux give rise to any
differences in scaling?

Taking the question one step further, do different Linux kernels (and
schedulers) impact Postgres scalability ? The Phoronix Test Suite already
tests linux kernel releases for regressions in performance w.r.t postgres DB
performance (e.g
http://www.phoronix.com/scan.php?page=article&item=linux_perf_regressions&num=1),
but doesnt particularly focus on multiple cores.

Is it something that should be benchmarked ?

thanks
-Sandeep

P.S. on the topic of scalability, here is another article -
http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-nosql-story-for.html
,
where people have asked if a similar thing can be done using Postgres UDF or
a marshalling  ODBA  http://scm.ywesee.com/?p=odba/.git;a=summary


Re: [GENERAL] Why facebook used mysql ?

2010-11-09 Thread Sandeep Srinivasa
On Tue, Nov 9, 2010 at 11:46 PM, David Boreham wrote:
>
>
> Hmm...typically multi-core scaling issues are in the area of memory
> contention and cache coherence (and therefore are for the most part not
> dependent on the OS and its scheduler).


If it is independent of the OS, then how does one go about tuning it.

Consider this - I get a 12 core server on which I want multiple webserver
instances + DB. Can one create CPU pools (say core 1,2,3 for webservers,
4,5,6,7 for DB, etc.) ?

I know about taskset, but should one be using it ?


Re: [GENERAL] Simple, free PG GUI/query tool wanted

2010-12-16 Thread Sandeep Srinivasa
you could try SQLWorkbench (http://www.sql-workbench.net/)


[GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-27 Thread Sandeep Srinivasa
hi,
   The question is very simple - which CMS/Shopping cart/Ecommerce solution
are people using in conjunction with Postgresql ?

Except for Drupal's partial support, I cant find any which has a sizeable
deployment and community size behind it. Spree is a new RoR based system,
that would obviously work with PG, but doesnt have a sizeable deployment
base.

What do you guys know of ?

thanks!


Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-28 Thread Sandeep Srinivasa
Could you point me to any deployments of Drupal + Ubercart  + Postgres ?

It felt really strange that nobody on IRC or forums could answer that they
had been involved in a postgres based deployment.

thanks!

On Wed, Jul 28, 2010 at 8:23 PM, Joshua D. Drake wrote:

>
> > Except for Drupal's partial support, I cant find any which has a
> sizeable
> > deployment and community size behind it. Spree is a new RoR based
> system,
> > that would obviously work with PG, but doesnt have a sizeable deployment
> > base.
> >
>
> Drupal + Ubercart + a ton of their modules work great. It is what drives:
>
> http://www.postgresqlconference.org/
> http://www.postgresql.us
> http://www.fossexperts.com/
> http://www.commandprompt.com/portal
>
>
>
> --
> PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
>   Consulting, Development, Support, Training
>   503-667-4564 - http://www.commandprompt.com/
>   The PostgreSQL Company, serving since 1997
>


Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-28 Thread Sandeep Srinivasa
yup I did. The reason why I wanted examples was to amply demonstrate,to
clients, that postgresql is viable.
It is kinda weird if the only examples I have are restricted to the
postgresql _community_ websites themselves.

This may sound irrelevant, but please do understand the huge opposition to
have anything to do with PG in the whole CMS/e-store community. In fact I
even saw a request to eliminate postgresql support in Drupal 7 (that was
taken care of by the valiant efforts of the PG community) :
http://drupal.org/node/337146

Plus, it would have been interesting to know which version of Drupal,
Ubercart, etc was being used for such deployments. Again, it is relevant
because of certain (older) benchmarks which denote significantly worse
performance because of the suboptimal way that Drupal integrates with
Postgresql :
http://mikkel.hoegh.org/blog/2008/oct/13/drupal-database-performance-mysql-postgresql/
There has been _nothing_ to disprove the above numbers, ever since - please
correct me if I am wrong.

What does a person making a case for Postgres do in this situation ?

thanks

<http://mikkel.hoegh.org/blog/2008/oct/13/drupal-database-performance-mysql-postgresql/>


On Wed, Jul 28, 2010 at 10:40 PM, Joshua D. Drake wrote:

> On Wed, 2010-07-28 at 22:37 +0530, Sandeep Srinivasa wrote:
> > Could you point me to any deployments of Drupal + Ubercart  +
> > Postgres ?
>
> Did you not see the links below?
>
> >
> >
> >
> > Drupal + Ubercart + a ton of their modules work great. It is
> > what drives:
> >
> > http://www.postgresqlconference.org/
> > http://www.postgresql.us
> > http://www.fossexperts.com/
> > http://www.commandprompt.com/portal
> >
> >
> >
> > --
> > PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
> >   Consulting, Development, Support, Training
> >   503-667-4564 - http://www.commandprompt.com/
> >   The PostgreSQL Company, serving since 1997
> >
> >
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>


Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-28 Thread Sandeep Srinivasa
hi ,
Thanks for several of the links that you guys posted.

The issue is not that I am looking for consulting companies who will set up
and optimize postgres+software. There are a million small firms that do
M*SQL+ work. And I am looking to do that kind of work with clients
- but I want to use the best DB out there, which I believe to be postgres.
But I find it hard to do it.

Clients do not want to engage in full custom s/w development, because they
get worried on what happens if we go out of business. I am sure many of you
out there, who have bigger clients have different experiences - but this is
the truth for most of the business that we see. And most of the existing
community or paid software out there, does not play nice with postgres.

This vicious cycle can only be broken at the level of pre-packaged web
software, which ought to work beautifully out-of-the-box with postgres.
There is just no way out of this.

What really, really hurts me is this - come Postgres 9.0 you will have the
most amazing DB software in the open source community. I (and millions of
small time developers like me) wont be able to leverage that - because our
clients will still demand , which have no good support for postgres.


-Sandeep

On Thu, Jul 29, 2010 at 10:54 AM, Joshua D. Drake wrote:

> On Thu, 2010-07-29 at 07:04 +0200, Ivan Sergio Borgonovo wrote:
>
> > BTW up to my memory Django suggest postgres. I haven't seen any
> > benchmark of Django with pg vs mysql.
>
> Django was originally developed for Postgres but really, they are wholly
> different beasts.
>
> Joshua D. Drake
>
> --
> PostgreSQL.org Major Contributor
> Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
> Consulting, Training, Support, Custom Development, Engineering
> http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-29 Thread Sandeep Srinivasa
On Thu, Jul 29, 2010 at 10:46 PM, Ivan Sergio Borgonovo <
m...@webthatworks.it> wrote:

> On Thu, 29 Jul 2010 08:52:46 -0700
> "Joshua D. Drake"  wrote:
>
> > The issue isn't Drupal. It is modules. There are a lot of popular
> > modules that do not work with PostgreSQL (Lightbox for example).
>
> > The google checkout module for Ubercart didn't work either until
> > relatively recently.
>
> I'd say the opposite but I'll wait to test more D7.
> Core takes ages to agree on what should be done to fix bugs for
> Postgres without affecting even the "feelings" of MySQL developers.
>
> Modules may have more problems but fixing them is generally trivial
> and generally upstream is quick to integrate the fix.
>
> The problem for core is maintaining your patches till and if they
> fix the bug.
>

Both of you are right - as I said it is a vicious circle. For a very long
time now (as evidenced by subsequent mails on this thread), a lot of PG
enthusiasts tried to understand the reason for MySQL popularity.
It isnt that developers are lazy - nor that they are "conditioned" in the
MySQL way.

The plain and simple truth is that there are some hugely popular pieces of
software out there (that  incidentally are written in PHP) and have no other
equivalent.

Biggest example - Wordpress. Strictly mysql only. If I want to throw
together a company blog + mailing list + SEO, I can get it done using
Wordpress in a matter of hours.

Same for shopping carts, though there is no single canonical software that I
can name - all of them are strictly or strongly MySQL only. These are
restrictions that cannot (and should not) always be answered with a
roll-your-own solution.

A very interesting example is Bugzilla - just search for "bugzilla install"
on Google and see how many of them are MySQL centric. This when Postgres
is officially supported.

 I searched and asked on IRC and forums - I got no answer to what works with
postgres. I google for stuff - I get no answers. Yet, I do end up with lots
of results that claim to show how bad postgres is as compared to mysql, in
context of all the popular pieces of software out there.

So what am I driving at ?   If you have to time to spare or have the
inclination, please please put out blog posts, how-tos, guides which relate
to usecases like putting up a blog, setting up an online shop, configuring a
popular CRM package, etc. rather than pure DB-centric information. Please
put information out there on how to configure Drupal, SugarCRM, Trac,
Yes, I know to RTFM - but then I see the buzz around MySQL and am swayed.

regards
Sandeep


Re: [GENERAL] Which CMS/Ecommerce/Shopping cart ?

2010-07-29 Thread Sandeep Srinivasa
On Fri, Jul 30, 2010 at 3:05 AM, John Gage  wrote:

> Shopping carts, company blogs, etc.  Popular pieces of software.
>
> As common denominators go, that's pretty low.
>
> Perhaps what is needed is a dumbed down version of Postgres.
>

I dont think that is what is required - as I mentioned above, developers who
work on php+mysql dont do it because they are dumb. A LOT of clients,
specifically ask for it (similarly for J2EE) because there is a huge amount
of talent available. Ruby on Rails is getting there.

But the MySQL camp has been innovating - Percona, XtraDB, MariaDB - all of
them are going beyond the legacy MySQL architecture, yet retaining language
compatibility.

All of them get the ecosystem for free.

What Postgres needs is a "dumbed-down" ecosystem, rather than the DB itself.
As you mentioned above, these are low common denominators, but that
denominator makes 90% of usage in the real world. They need help from you
guys to make choices which include Postgres.
Hang out in the popular software forums and help us in using Postgres +
, because there is simply too much FUD about PG and too much love
for every other db.

regards
Sandeep


Re: [GENERAL] MySQL versus Postgres

2010-08-06 Thread Sandeep Srinivasa
On Sat, Aug 7, 2010 at 12:29 AM, Joshua J. Kugler wrote:

>
> So, a "getting started" guide and/or cookbook would be great.  Another
> cool idea: a MySQL -> PostgreSQL migration guide (I'm sure there's
> already one out there) that would show "To do *this* MySQL function in
> PostgreSQL, use this function/feature/etc."  A good example is
> auto_increment -> serial. A comparison/contrast of permission would be
> good too. I'm sure others can think of more examples.
>

+1 on this.
This is very interesting from the point-of-view of transitioning MySQL
webapps to Postgres. The truth is that for a lot of people, MySQL is their
first DB (because of loads of pre-existing software. Refer to my thread
"Which CMS/Ecommerce/shopping cart"). When we are ready to move to PG, we
are already used to the MySQL way of doing things.

Take for example, the mysql command "*show databases"*. A commonly taught PG
equivalent is "*\l*". But, instead of mnemonics, I suppose "select datname
from pg_database;" would be more intuitive and easier to remember. Yes it is
longer, but it helps me understand what is going on behind the covers...
especially the cryptic "pg_" tables.
Another example is "*use *" in mysql - I spent a long time
searching for a similar command in PG.


Re: [GENERAL] MySQL versus Postgres

2010-08-09 Thread Sandeep Srinivasa
On Mon, Aug 9, 2010 at 12:15 PM, Allan Kamau  wrote:

>
> There may be worry of "copy and paste" without proper understanding of
> the code and concepts but this may be mitigated IMHO by fact that it
> seems unlikely that when presented with a case to solve, simple "copy
> and paste" of several commands (in the right sequence) will happen and
> correctly solve the problem at hand without appreciation of what these
> commands and statements do. Also more often or not the "copy and
> paste" will only work on specific schema definitions and data used in
> the example, therefore reconstruction (hence understanding) of these
> commands is neccessary.


The way I see it - for those who want to truly learn, there is the
documentation. For those who dont, there are ORMs.
For the rest of us, still floundering in MySQL land, please build a bridge.

-Sandeep


Re: [GENERAL] MySQL versus Postgres

2010-08-09 Thread Sandeep Srinivasa
On Mon, Aug 9, 2010 at 2:28 PM, Michael A. Peters
wrote:
>
>
> I have 4x6 cards that I write the postgresql way of doing what I use to do
> with MySQL so that I can easily reference them when I need to.
>
> Should I sit down and read a book and go through the exercises?
> Yes. But I need to get stuff done now, and the cheat sheets I make for
> myself let me do just that.
>
> I am not a DBA - I am not even a web developer.
> I do both because I can't afford to hire them, and when I have used stuff
> created by them, very frequently their code is clearly crap and insecure
> and even I can see that, so unless I really want to pay the big bucks,
> it's better for me to do it myself and cheat sheets really help.
>

Could you share them ? Maybe put them up on the Postgres wiki.
The other similar resource that I see is :
http://en.wikibooks.org/wiki/Converting_MySQL_to_PostgreSQL

-Sandeep


Re: [GENERAL] MySQL versus Postgres

2010-08-11 Thread Sandeep Srinivasa
On Thu, Aug 12, 2010 at 10:53 AM, Greg Smith  wrote:
>
> If only it were that easy.  25%, but only on a dedicated server, don't go
> above 8GB, limit to much less than that on Windows, and be extremely careful
> if you're writing heavily lest large checkpoints squash you.  Giving simple
> advice that people might follow without thinking about actually has some
> worst-case downsides that are worse than not tuning the server at all.
>
>
>  This makes Increasing work_mem is a complex calculation for a new user
>> trying out PostgreSQL.
>>
>>
>
> Increasing work_mem is a complex calculation for everyone, not just new
> users.  If it were easy for anyone, we'd just bottle whatever experts do
> into the software directly.  I tried doing a round of that with pgtune, and
> it's still not quite right yet even after surveying everyone who had an
> opinion on the subject.


 Maybe a tabular form would be nice - "work_mem" under a) windows < 8GB b)
windows > 8gb c) linux < 8gb d) linux > 8gb e) read-heavy f) write-heavy
g)log shipping enabled etc etc.
Rinse and repeat for all important parameters - in a wiki, you can do nifty
things like add footnotes, etc.
That would be awesome!

-Sandeep


Re: [GENERAL] MySQL versus Postgres

2010-08-12 Thread Sandeep Srinivasa
On Thu, Aug 12, 2010 at 12:37 PM, Scott Marlowe wrote:

> On Wed, Aug 11, 2010 at 11:41 PM, Greg Smith  wrote:
> > Sandeep Srinivasa wrote:
> >>
> >>  Maybe a tabular form would be nice - "work_mem" under...
> >
> > The problem with work_mem in particular is that the useful range depends
> > quite a bit on how complicated you expect the average query running to
> be.
>
> And it's very dependent on max connections.  A machine with 512GB that
> runs batch processes for one or two import processes and then has
> another two or three used to query it can run much higher work_mem
> than a machine with 32G set to handle hundreds of concurrent accesses.
>  Don't forget that when you set work_mem to high it has a very sharp
> dropoff in performance as swapping starts to occur.  If work_mem is a
> little low, queries run 2 or 3 times slower.  If it's too high the
> machine can grind to a halt.
>

^
Right there - could this information not have been captured in the tabular
form I was talking about ? Again, I'm not sure how to present the data, but
it sure would be of *some* help to the next poor soul who comes along with
same question.

This here is golden knowledge - yes you might not be able to add all the
qualifiers to just saying ">8GB use X work_mem", but it really, really is
much better than nothing that we have now.

-Sandeep


[GENERAL] .psqlrc and custom functions (mysql-like)

2010-08-14 Thread Sandeep Srinivasa
hi,
  As part of a product we are building, we are using postgres as our
database. But most of our developers have a (loud) mysql background.
I want to potentially provide "aliases" for a few commonly used mysql
functions.

 I see that the .psqlrc could potentially be the place where you could
define these custom functions - I looked at
http://okbob.blogspot.com/2009/12/macros-for-epsql.html but it seems to be a
patched psql. Is is possible to have the exact same commands as defined in
that page in a .psqlrc .. via plpgsql or something ?

I might be totally way off the target here, so please correct me if I am
wrong.

As an example, how would I do something as simple as "DESCRIBE table" (yes I
know \d+ equivalent.. I still want to be able to define this function)

thanks and regards
Sandeep


[GENERAL] Relationship between ulimit and effective_cache_size, work_mem, shared_buffers, maintenance_work_mem and vacuum

2010-08-15 Thread Sandeep Srinivasa
hi guys,
 we have a single Ubuntu 10.04 box on which we are going to be
running a Postgres 8.4 server, ROR passenger and a solr search server.
I was looking at ways to optimize the postgres database and yet limit the
amount of memory that it can consume.

I am gonna set my shared_buffers to 256mb and work_mem to 12mb, temp_buffers
to 20mb (on a 4GB machine).
Now, the "effective cache size" variable seems more of a hint to the query
planner, than any hard limit on the database server.

Q1.  if I add "ulimit -m" and "ulimit -v" lines in my postgres upstart files
 will that be good enough to hard-limit Postgres memory usage ?

Q2.  once I have decided my max memory allocation (call it MY_ULIMIT) -
should effective cache size be set to MY_ULIMIT - 256 - 12 -20 ?  round it
off to MY_ULIMIT - 512mb maybe

Q3. Or will doing something like this play havoc with the query
planner/unexplained OOM/crashes ? I ask this because I see that there are
other variables that I am not sure, will play nice with ulimit:
1. will this affect the memory usage of vacuum (going to be using default
vacuum settings for 8.4) - because ideally I would want to have some control
over it as well.
2. Would I have to tune max_connections, max_files_per_process (and any
related variables) ?
3. When I turn on WAL, would I have to tune wal_buffers accordingly  set
effective cache size to account for wal_buffers as well ?


thanks
-Sandeep


Re: [GENERAL] Relationship between ulimit and effective_cache_size, work_mem, shared_buffers, maintenance_work_mem and vacuum

2010-08-16 Thread Sandeep Srinivasa
Thanks for the comprehensive reply. It definitely cleared a lot of things up
for me.

regards
Sandeep

On Mon, Aug 16, 2010 at 12:39 PM, Greg Smith  wrote:

> Sandeep Srinivasa wrote:
>
>>
>> I was looking at ways to optimize the postgres database and yet limit the
>> amount of memory that it can consume.
>>
>
> You can't usefully limit the amount of memory that PostgreSQL will consume
> yet.  Each client can easily allocate multiples of work_mem worth of memory
> as they run queries, and there's temp_buffers to be concerned about too.
>  One way to work around problems like this is to significantly limit the
> number of clients that can be running at a time, using something like a
> connection pooler, and then keep a low setting for max_connections.  This
> can have some positive performance benefits of its own, and it will keep the
> number of clients (the only real source of variable allocations on the
> server) from getting too large.  Software suitable for that purpose includes
> pgbouncer and pgpool.
>
>
>  Now, the "effective cache size" variable seems more of a hint to the query
>> planner, than any hard limit on the database server.
>>
>
> That's correct.  It doesn't allocate anything.  Doesn't limit anything
> either.
>
>
>  Q1.  if I add "ulimit -m" and "ulimit -v" lines in my postgres upstart
>> files  will that be good enough to hard-limit Postgres memory usage ?
>>
>
> After fighting a few random crashes where the server runs into ulimit, you
> will find that trying to hard limit PostgreSQL memory usage is more trouble
> than it's worth.  It's really a bad place to go.
>
>
>  Q2.  once I have decided my max memory allocation (call it MY_ULIMIT) -
>> should effective cache size be set to MY_ULIMIT - 256 - 12 -20 ?  round it
>> off to MY_ULIMIT - 512mb maybe
>>
>
> effective_cache_size has no relation to the limits.  Take a look at how
> much of your operating system cache you think might be free at any time.
>  Figure out what percentage of that you might want PostgreSQL to be able to
> use sometime.  Set effective_cache_size to it.  If a query goes wild and
> decides to execute a really bad query plan that reads a bunch of data, it
> will trash the operating system cache regardless; you can't stop it like
> this.
>
>
>  Q3. Or will doing something like this play havoc with the query
>> planner/unexplained OOM/crashes ?
>>
>
> If you ever let the system get so low on RAM that the Linux OOM killer
> becomes active, it will almost always kill the main database server process,
> the one that spawns all the clients off, due to how Linux does shared memory
> accounting.  This is another really bad things to be avoided.
>
>
>  1. will this affect the memory usage of vacuum (going to be using default
>> vacuum settings for 8.4) - because ideally I would want to have some control
>> over it as well.
>>
>
> Each of the autovacuum processes (defaulting to 3) will use up to
> maintenance_work_mem worth of memory when they are running.  You should
> account for that when estimating peak usage.
>
>
>  2. Would I have to tune max_connections, max_files_per_process (and any
>> related variables) ?
>>
>
> Limiting max_connections, and accordingly dealing with the fact that some
> connections might be refused temporarily in your application, is the most
> effective thing you can do here.  max_files_per_process is really secondary
> to any of the other bits you're asking about.
>
>
>  3. When I turn on WAL, would I have to tune wal_buffers accordingly  set
>> effective cache size to account for wal_buffers as well ?
>>
>
> Set wal_buffers somewhere between 1MB and 16MB, include it in the general
> server shared memory overhead, and then ignore it.  It takes up a little
> memory but isn't nearly as important as these other bits.
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> g...@2ndquadrant.com   www.2ndQuadrant.us
>
>


Re: [GENERAL] searchable book database

2010-08-19 Thread Sandeep Srinivasa
If you dont ever need to return the complete book text to a user (which
means, you only need the book text for your search indexes only), then keep
the text on file and use Apache Solr to index it.

regards
Sandeep

On Fri, Aug 20, 2010 at 1:05 AM, Miguel Vaz  wrote:

>
> Hi,
>
> I need to make a database of books. Several specific subject books that are
> to be searchable.
>
> Is it viable to have the complete book text on a database and search inside
> it? Or should i consider keeping only its metadata (name, author, filename,
> etc) on the DB, keep the book file on the HD and use some sort of
> search algorithm on the file? If you agree on the second option, what would
> you guys suggest for text file searching? Its for a web project, so how
> could i go about doing this? (PHP, python...)
>
> Thanks.
>
> MV
>


Re: [GENERAL] postgres.conf settings

2010-08-29 Thread Sandeep Srinivasa
Do check a previous thread:
http://www.mail-archive.com/pgsql-general@postgresql.org/msg149691.html

On Sun, Aug 29, 2010 at 10:32 PM, Dario Beraldi wrote:

> Hello,
>
> I'm trying to tune the parameters in postgres.conf to improve the
> performance of my database (where performance = 'make queries run faster').
> I would be very grateful if you could give me some comments about my choice
> of configuration settings below (did I do anything very silly? Am I missing
> something relevant?).
>
> This my situation:
>
> 1. I installed postgresql (8.4) on my own machine to manage my own data. So
> I'm its only user. Queries are not complicated but should handle tables with
> (tens of) millions of rows.
> 2. Hardware: Laptop (DELL Latitude E6500) with 3.48 GB of RAM; Intel Core 2
> Duo Mobile Processor P8600 2.40 GHz; running Windows XP. 160 GB hard disk (+
> an external one of 640 GB).
> 3. Size of database: The /data directory is c.ca 37 GB, 88 tables in the
> main schema.
> 4. Raw data is backed-up elsewhere so I'm not terribly worried about risks
> of losing data.
>
> And these are the entries in postgres.conf that I changed from default
> (after reading some literature/docs):
>
> max_connections = 20 # Default was 100
> shared_buffers = 512MB   # min 128kB # Default was 32MB
> work_mem = 256MB # min 64kB default was 1MB
> wal_buffers = 1MB# min 32kB default was 64kb
> checkpoint_segments = 30 # in logfile segments, min 1, 16MB each;
> default was 3
> checkpoint_timeout = 30min   # range 30s-1h; default was 3
> effective_cache_size = 1GB   # Default 128MB
>
>
> I understand that tuning is a very database specific issue, but even some
> general pointers would help me...
>
> Many thanks!
>
> Dario
>
> --
>
> Dr. Dario Beraldi
> Institute of Evolutionary Biology
> University of Edinburgh
> West Mains Road
> Edinburgh EH9 3JT
> Scotland, UK
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] where does postgres keep the query result until it is returned?

2010-09-20 Thread Sandeep Srinivasa
On Mon, Sep 20, 2010 at 5:27 PM, Merlin Moncure  wrote:

> On Fri, Sep 17, 2010 at 3:59 PM, Sam Mason  wrote:
> > On Fri, Sep 17, 2010 at 04:51:46PM +0200, Willy-Bas Loos wrote:
> >> i have a function that produces a result in xml.
> >> that is one row, one value even, but it grows pretty large.
> >> how is that handled?
> >
> > Rows are sent back in the entireity, so the PG instance would need
> > enough memory to work with that row.  When you're running a 32bit
> > version of PG, values whose size is beyond ~100MB are a bit touch and go
> > whether it will work.
>

Is'nt "work_mem" used for this on the server side (when sorts and
hash-tables are required) and "temp_buffers" (when temporary tables are
created) ?


Re: [GENERAL] Unable to upgrade old cluster from 8.4 to 9.0

2010-09-23 Thread Sandeep Srinivasa
There is already a bug filed for this:
http://bugs.debian.org/597600

According to mpitt (the maintainer), the supported upgrade path is to use
pg_upgradecluster



On Thu, Sep 23, 2010 at 3:21 PM, Boris  wrote:

> Hello,
>
> I have Debian Linux (unstable) server, and I am trying to upgrade 8.4
> version
> database to 9.0, all developer packages for both versions were installed,
> as
> well as client ( postgresql-server-dev-8.4
> postgresql-client-8.4 postgresql-server-dev-9.0 postgresql-client-9.0).
>
> Here is the command I use
>
> /usr/lib/postgresql/9.0/bin/pg_upgrade -d -c /var/lib/postgresql/8.4/main/
> -D
> /var/lib/postgresql/9.0/main/ -b /usr/lib/postgresql/8.4/bin/ -B
> /usr/lib/postgresql/9.0/bin/ --user=postgres
>
> output is as follows:
>
> Performing Consistency Checks
> -
> Checking old data directory (-c)
> check for base failed:  No such file or directory
>
>
> I stopped server as it was suggested in upgrade manual
> (http://developer.postgresql.org/pgdocs/postgres/pgupgrade.html)
> All data directories are exist:
>
> $ ls -lha /var/lib/postgresql/8.4/main/
> total 72K
> drwx-- 11 postgres postgres 4.0K Sep 22 23:25 .
> drwxr-xr-x  3 root root 4.0K Sep 22 23:03 ..
> drwx-- 11 postgres postgres 4.0K Feb 13  2010 base
> drwx--  2 postgres postgres 4.0K Sep 21 15:41 global
> drwx--  2 postgres postgres 4.0K Mar  4  2010 pg_clog
> drwx--  4 postgres postgres 4.0K Dec  3  2009 pg_multixact
> drwx--  2 postgres postgres 4.0K Sep 21 15:41 pg_stat_tmp
> drwx--  2 postgres postgres 4.0K Sep 12 00:58 pg_subtrans
> drwx--  2 postgres postgres 4.0K Dec  3  2009 pg_tblspc
> drwx--  2 postgres postgres 4.0K Dec  3  2009 pg_twophase
> -rw---  1 postgres postgres4 Dec  3  2009 PG_VERSION
> drwx--  3 postgres postgres 4.0K Sep 18 13:26 pg_xlog
> -rw-r--r--  1 postgres postgres  17K Sep 22 23:25 postgresql.conf
> -rw---  1 postgres postgres  133 Sep 15 21:01 postmaster.opts
> lrwxrwxrwx  1 root root   36 Dec  3  2009 server.crt ->
> /etc/ssl/certs/ssl-cert-snakeoil.pem
> lrwxrwxrwx  1 root root   38 Dec  3  2009 server.key ->
> /etc/ssl/private/ssl-cert-snakeoil.key
>
>
> $ ls -lha /var/lib/postgresql/9.0/main/
> total 76K
> drwx-- 12 postgres postgres 4.0K Sep 23 13:34 .
> drwxr-xr-x  3 postgres postgres 4.0K Sep 21 15:49 ..
> drwx--  6 postgres postgres 4.0K Sep 22 22:51 base
> drwx--  2 postgres postgres 4.0K Sep 23 13:34 global
> drwx--  2 postgres postgres 4.0K Sep 21 15:49 pg_clog
> drwx--  4 postgres postgres 4.0K Sep 21 15:49 pg_multixact
> drwx--  2 postgres postgres 4.0K Sep 23 11:52 pg_notify
> drwx--  2 postgres postgres 4.0K Sep 23 13:34 pg_stat_tmp
> drwx--  2 postgres postgres 4.0K Sep 21 15:49 pg_subtrans
> drwx--  2 postgres postgres 4.0K Sep 21 15:49 pg_tblspc
> drwx--  2 postgres postgres 4.0K Sep 21 15:49 pg_twophase
> -rw---  1 postgres postgres4 Sep 21 15:49 PG_VERSION
> drwx--  3 postgres postgres 4.0K Sep 21 15:49 pg_xlog
> -rw-r--r--  1 postgres postgres  18K Sep 22 23:27 postgresql.conf
> -rw---  1 postgres postgres  133 Sep 23 11:52 postmaster.opts
> lrwxrwxrwx  1 root root   36 Sep 22 23:00 server.crt ->
> /etc/ssl/certs/ssl-cert-snakeoil.pem
> lrwxrwxrwx  1 root root   38 Sep 22 23:00 server.key ->
> /etc/ssl/private/ssl-cert-snakeoil.key
>
>
>
> Can anyone please give some advice why pg_upgrade fails to uprage old
> cluster to
> new one?
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


Re: [GENERAL] Trade Study on Oracle vs. PostgreSQL

2010-09-25 Thread Sandeep Srinivasa
@OP - One of the things you _might_ be interested in is the difference
in tracing
and performance framework between Oracle and Postgres.
There was a recent thread about it -
http://osdir.com/ml/pgsql-general/2010-09/msg00308.html

You could get your company to sponsor some work on that front.


[GENERAL] Postgres, DB2 and Materialized Query Tables (MQTs - DB2s Materialized Views)

2010-10-05 Thread Sandeep Srinivasa
Below is an excerpt from a comment over at Reddit. It seemed quite
interesting to read about the difference in PG and DB2's query optimizer.


Can anybody shed any more light on this ?


-Sandeep



--


there are a couple really important things that DB2 does with Materialized
Query Tables (MQTs, DB2's name for Oracle/everyone else's Materialized
Views) that other DBs -- especially free DBs -- don't do:

   1. Consider the contents MQTs in the query optimizer
   2. Update MQTs incrementally

for example, imagine you have an MQT backed by this query:

SELECT YEAR(somedate), COUNT(*) FROM sometable GROUP BY YEAR(somedate)

now imagine that, completely independently, you want to run the following
query:

SELECT COUNT(*) FROM sometable WHERE YEAR(somedate)=2010

if you're in PostgreSQL (or MySQL, or pretty much any DB but Oracle), then
the query optimizer will build a plan that involves a table scan over
sometable. for a large table, that sucks. (a clustered index over somedate
 onsometable can make this less painful, but still. bleh.)

if you're in DB2, the DB2 query optimizer is smart enough to realize that
you have that data laying around in your MQT, so it just does a lookup for
2010 in the MQT and returns the corresponding COUNT(*) value. if the MQT has
an index over YEAR(somedate), then this is *blazing* fast. Point DB2. and
because the query optimizer considers MQTs out-of-the-box with no changes to
existing applications querying the database, if a client's aggregation query
is running slow and you can define an MQT that backs that query and a
suitable update schedule for the MQT, the client's query will go 10x's
faster and require no application changes at all. Another point DB2.

also, DB2 (and Oracle too, pretty sure) lets you update MQTs incrementally.
in DB2, when you want to add records to an MQT, you can update the MQT's
contents with just those records as opposed to re-running the query that
backs it. (for most aggregation functions, anyway.) so, whereas in
PostgreSQL you'd have to re-run SELECT COUNT(*) FROM sometable WHERE
YEAR(somedate)=2010 to update your MQT if you add records, in DB2 you can
just pass those records through the existing MQT and update aggregate values
where appropriate. (the syntax for that escapes me at the moment,
unfortunately. any DB2-heads lurking about that can help with that?)

anyway, DB2 is *excellent* at managing enterprise-level data sets. MQTs are
an awesome feature it has, and they really change the kinds of applications
you can write against the database.