2016-09-12 17:01 GMT-03:00 Jeff Janes <jeff.ja...@gmail.com>: > On Mon, Sep 12, 2016 at 7:03 AM, Vinicius Segalin <vinisega...@gmail.com> > wrote: > >> Hi everyone, >> >> I'm trying to find a way to predict query runtime (I don't need to be >> extremely precise). I've been reading some papers about it, and people are >> using machine learning to do so. For the feature vector, they use what the >> DBMS's query planner provide, such as operators and their cost. The thing >> is that I haven't found any work using PostgreSQL, so I'm struggling to >> adapt it. >> My question is if anyone is aware of a work that uses machine learning >> and PostgreSQL to predict query runtime, or maybe some other method to >> perform this. >> > > I don't know about machine learning, but if there were some way to get the > planner to tell you predicted cost in terms of a breakdown of how many > multiples of each *_cost factor (rather than only a grand total which is > what it does now), then it would be fairly easy to combine that with wall > times from log_duration and do a simple linear regression. > > I suspect the result would be that seq_page_cost and random_page_cost > would have huge uncertainties on them. And since pretty much every query > has non-zero predicted values for at least one of those, the huge > uncertainties would then pollute all the rest of the fitted values as > well. Perhaps that is where the machine learning would come in? > > Another issue is the predicted costs are only meant to choose between > different plans, not to predict overall wall time. Some parts of the > planner only have one way to do something, and so doesn't bother to compute > a cost for that as there is no choice to be made. This would leave glaring > holes in the estimates (particularly for updates) > > But to get that data out would require quite a bit of tedious altering of > the planner code, and then you would have to find people willing to run > that altered code on real world databases with a high level of logging to > gather the data. (I suspect that gathering data from only toy databases > would not be very useful). > > Cheers, > > Jeff >
Modifying the planner is way too complex for me at this time, so I really can't go into that kind of solution, but I can try to use as much as the planner gives me today, make the best out of it and hope it's enough to give me some satisfactory results.