Hi,

Please don't top-post. If you're not responding to parts of the e-mail,
then don't quote it.

On Fri, Sep 06, 2019 at 12:50:33PM +0200, Esteban Zimanyi wrote:
Dear Tom

Many thanks for your quick reply. Indeed both solutions you proposed can be
combined together in order to solve all the problems. However changes in
the code are needed. Let me now elaborate on the solution concerning the
combination of stakind/staop first and I will elaborate on adding a new
kind identifier after.

In order to understand the setting, let me explain a little more about the
different kinds of temporal types. As explained in my previous email these
are types whose values are composed of elements v@t where v is a
PostgreSQL/PostGIS type (float or geometry) and t is a TimestampTz. There
are four kinds of temporal types, depending on the their duration
* Instant: Values of the form v@t. These are used for example to represent
car accidents as in Point(0 0)@2000-01-01 08:30
* InstantSet: A set of values {v1@t1, ...., vn@tn} where the values between
the points are unknown. These are used for example to represent checkins in
FourSquare or RFID readings
* Sequence: A sequence of values [v1@t1, ...., vn@tn] where the values
between two successive instants vi@ti vj@tj are (linearly) interpolated.
These are used to represent for example GPS tracks.
* SequenceSet: A set of sequences {s1, ... , sn} where there is a temporal
gap between them. These are used to represent for example GPS tracks where
the signal was lost during a time period.


So these are 4 different data types (or classes of data types) that you
introduce in your extension? Or is that just a conceptual view and it's
stored in some other way (e.g. normalized in some way)?

To compute the selectivity of temporal types we assume that time and space
dimensions are independent and thus we can reuse all existing analyze and
selectivity infrastructure in PostgreSQL/PostGIS. For the various durations
this amounts to
* Instant: Use the functions in analyze.c and selfuncs.c independently for
the value and time dimensions
* InstantSet: Use the functions in array_typanalyze.c, array_selfuncs.c
independently for the value and time dimensions
* Sequence and SequenceSet: To simplify, we do not take into account the
gaps, and thus use the functions in rangetypes_typanalyze.c,
rangetypes_selfuncs.c independently for the value and time dimensions


OK.

However, this requires that the analyze and selectivity functions in all
the above files satisfy the following
* Set the staop when computing statistics. For example in
rangetypes_typanalyze.c the staop is set for
STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM but not for
STATISTIC_KIND_BOUNDS_HISTOGRAM
* Always call get_attstatsslot with the operator Oid not with InvalidOid.
For example, from the 17 times this function is called in selfuncs.c only
two are passed with an operator. This also requires to pass the operator as
an additional parameter to several functions. For example, the operator
should be passed to the function ineq_histogram_selectivity in selfuncs.c
* Export several top-level functions which are currently static. For
example, var_eq_const, ineq_histogram_selectivity, eqjoinsel_inner and
several others in the file selfuncs.c should be exported.

That would solve all the problems excepted for
STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM, since in this case the staop will
always be Float8LessOperator, independently of whether we are computing
lengths of value ranges or of tstzranges. This could be solved by using a
different stakind for the value and time dimensions.


I don't think we're strongly against changing the code to allow this, as long as it does not break existing extensions/code (unnecessarily).

If you want I can prepare a PR in order to understand the implications of
these changes. Please let me know.


I think having an actual patch to look at would be helpful.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Reply via email to