Hi Jeff I agree with most of your statements.
2014-04-01 4:20 GMT+02:00 Jeff Janes <jeff.ja...@gmail.com>: > On Sunday, March 30, 2014, Stefan Keller <sfkel...@gmail.com> wrote: > >> Hi Jeff >> >> >> 2013/11/20 Jeff Janes <jeff.ja...@gmail.com> >> >>> >>> I don't know what you mean about enhancements in the buffer pool. For >>> an in-memory database, there shouldn't be a buffer pool in the first place, >>> as it is *all* in memory. >>> >> >> You are right: In-memory DBs are making buffer-pooling obsolete - except >> for making data persistent (see below). >> > > I would be very reluctant to use any database engine which considered disk > access obsolete. There will always be a risk where data grows to exceed > RAM, and where it would be inconvenient to expand RAM fast enough to > accommodate it. I've played those games enough with Perl and C in-memory > systems. You fight and squeeze to fit the data into RAM, then the data > size grows 3% and all of our work is for naught. You can buy more RAM, if > you have the budget, and the RAM isn't back-ordered for 3 months because > the factory that makes it had a fire, and if more RAM fits on your > motherboard, and.... > I think there are some use cases where you can plan and foresee how data increases. > > >> > >> > Do you know why it is slow? I'd give high odds that it would be a >> specific implementation detail in >> > the code that is suboptimal, or maybe a design decision of PostGIS, >> rather than some high level >> > architectural decision of PostgreSQL. >> >> Referring to the application is something you can always say - but >> shouldn't prevent on enhancing Postgres. >> >> >> Postgres has been enhanced. Now we need to change osm2pgsql to take >> advantage of them. It defines indexes on the tables that are going to be >> bulk loaded with COPY, which defeats some recent optimizations made to >> COPY. The creation of the indexes should be delayed until after the bulk >> load is done. >> >> A further enhancement to Postgres would be would be to automatically >> defer creation of the indexes when a table is truncated or created within a >> transaction, so that users get the benefit of the improvement >> >> These enhancements to osm2pgsql seem to be reasonable to me. I hope >> somebody has time to care about. >> > > I have a fork of osm2pgsql on github which delays the index build until > the COPY is done. I'm not really motivated to convince anyone to merge it > (as my interest is postgresql not osm itself), but if someone wants to pick > it up, that is fine with me. It helps somewhat, but it is not a > game-changer because there are other bigger bottlenecks, at least for HDD > based systems. > > One of the bigger bottlenecks is building the GIN indexes on the way table > at the end. Setting maintenance_work_mem to huge values helps a lot, if > you can find a safe setting for it considering multiple index builds it > might be doing (at that point in the load, osm2pgsql's node cache has been > released, so there is substantial RAM to re-purpose). It would be better > for this use if PostgreSQL built the index by using an external sort, > rather than iterating over the table building maintenance_work_mem sized > chunks of red-black trees. The problem there is that osm uses the gin > index in an odd way (the vast majority of nodes occur in exactly one way, > with a minority occurring in more than one), and using a disk sort might > not be ideal for the more common use cases where GIN is used, where a given > token usually occurs in far more than one document. So an improvement that > only improves osm2pgsql and degrades other uses is unlikely to be adopted. > > Another bottleneck is just the raw COPY into the node table. When that is > running against an unindexed table which was created in the same > transaction, I see that osm2pgsql takes about 50% of a CPU to print a > copy-stream, and postgresql uses about 50% of a CPU to parse that stream > and insert into the table. So they add up to about 1 CPU despite the fact > this a is multiple CPU machine. So they seem to be playing ping-pong with > the pipe buffer when in theory they should each by able to run at almost > full speed. I don't know how to get it stop playing ping-pong, but I have > other use cases where this shows up, so trade-off-free solution would be > nifty. I suspect that that is more of a kernel issue than either > postgresql or osm2pgsql. > > You could do the COPY in parallel in multiple threads, but the problem > there is you can't use the "created in same transaction" optimization to > avoid WAL overhead. There is no fix to this without changing PostgreSQL to > accommodate it, but i have no clear idea how one would do that. Importing > a snapshot doesn't seem like it would be enough, as you can only import > snapshots for reading, not for writing. > > Also, I'm not too sure how seriously to take the goal of optimizing > osm2pgsql. Development on it seems to be less than vigorous. And its > purpose is to create a database to be used, so wouldn't it make more sense > to optimize the use, not the creation? And if you do want to optimize the > creation, the obvious way to do it so to create the export in a way more > closely aligned to that need, rather than a generic export. > As long as planet file importtakes about 5 days, I think it's an issue? > >> In the meantime I discussed with HANA users an thought about what makes >> in-memory dbs special and how to configure Postgres to be an in-memory db. >> >> There seem to be two main things which make in-memory dbs special: >> 1. Index: Having all data in memory there is no need for a default index. >> A full-table scan "suddenly" becomes the default. >> > > Surely not. I would say that full table scans are *already* the default, > deviated from only if it thinks an index seems to be better. If you don't > make an index, it can't seem to be better. And I don't know of any > higher-level in-memory language which fails to provide a way to do > efficient searching into an in-memory structure, usually in the form of > hash tables or balanced trees. If "let's seq scan everything as long as it > is already in memory" is a good idea, why would Perl, Java, Python, etc. > (not to mention C libraries and the source code of PostgreSQL itself) > provide ways to do efficient searches in memory? > > The caveat here is you can't make certain constraints without an index. > In theory you *could* have an unique constraint without an index to > support it, but if it were implemented you would probably rapidly learn > that you don't actually want to do that. > Ok. But at least it seems to me obvious that there is a need for different index (configurations) since the currently implemented indices are designed to avoid secondary storage. > > >> 2. Persistence: All updates/changes to the db are streamed to disk. >> > > That is already done through WAL files. But they are also written to > memory in their proper location, and then that location is written to disk > as well in a scattered manner once per checkpoint. If you suppress > checkpoints entirely then every time the database is restarted it would > have to read the entire history of the database since creation from the WAL > to create the initial in-memory image. So, how should checkpoints be > handled? > I have to look up how HANA solves this since that's one of the USPs they say. At least the delay checkpoints and accept "small data loss", thats for sure. > > >> 3. An increase of data simply needs to be compensated with more memory >> (since memory became cheap). >> > > The size of the data seems to increasing at the same rate as the RAM is > getting cheaper, if not faster. And the new RAM might need a new > motherboard, and then a new power supply, and a new dev environment and > new test environment and then a new rack to mount them, and then a union > electrician to wire up the new rack, and then a new data center to hold the > next new rack..... > > The same could be said for new hard drives, too, but I usually have more > head room in my hard drives than in my RAM. > > >> AFAIK one can configure Postgres for all these properties: One can force >> the use of full-table scan(?) and replicate to disk(?). >> > > Yes, already can be done. Don't build an index, and delay checkpoints as > long as you dare (1 hour is currently the max, but a simple hack can extend > that. Whether you will like the consequences of that is another matter). > > >> >> What do you think about this? >> > > I think there is no reason to think these changes will do much good. In > memory databases usually run as libraries, not as client-server > infrastructures. Without that, you already have one hand behind your back. > Also, I think they usually map datatypes directly to hardware supported > types, which is surely faster but would render PostgreSQL's flexible type > system nonfunctional. I don't know how they deal with visibility, but I > have to assume they either compromise on correctness, or limit the amount > of updating which is possible on the in memory portion. By the time you do > that stuff, it isn't clear what would be left anymore of the things that > make PostgreSQL be PostgreSQL. > > >> Are there any show cases out there? >> > > What did the HANA users have to say? Seems like they would be in the best > position to provide the test cases. > Yes, HANA or Oracle TimesTe or H-Store (VoltDB). > > Cheers, > > Jeff > >> Yours, Stefan