Vovan, When I’m speaking of JOIN metrics I’m simply assume that we need to add metrics relevant for queries with joins, metrics that will help us get more insights on non-collocated and collocated joins execution flow.
> 1) Query exec count > 2) Query exec time (first define what "time" means) - min, max, avg Total query execution time might not be helpful in the trickiest cases. What if you have multiple joins in your query? How do I know which one contributes to the execution most? So, I would do a breakdown having total time, map time, per-join time, reduce time. Hope it’s possible. If it’s unclear how to support everything at the first place then it’s a different question. Let’s create several tickets and start implementing everything gracefully. > 3) Number of bytes exchanged between nodes during query execution It will be really helpful to make a breakdown showing a number of bytes exchanged per-join (physical join). Again, if you believe it makes sense to do the breakdown later then let’s create an additional ticket for the 2 tier metrics then. > 4) Number of returned rows - min, max, avg Plus, let’s add the following to the list: 5) Collocated: yes or no — Denis > On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov <voze...@gridgain.com> wrote: > > Denis, > > The main problem with suggested metrics is that they implies that ceratin > internal mechanics work in predefined way. For example, what is JOIN > metrics? There are no guarantees that JOIN in user's query will be > translated to a real physical join. What if several different query > execution pieces happen in parallel? What if we rework our distributed > query engine from pull to push approach for performance reasons and there > will be no JOINs in classical sense? > > This is why I think that we should start with very basic things. Something > like: > 1) Query exec count > 2) Query exec time (first define what "time" means) - min, max, avg > 3) Number of bytes exchanged between nodes during query execution > 4) Number of returned rows - min, max, avg > > Once we have base numbers in place, we can think of carefully integrating > and enhancing all pieces of query execution into more verbose formats, > similar to query plans with relative weights in classical RDBMS systems. > > Thoughts? > > > > On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda <dma...@apache.org> wrote: > >> Vovan, >> >> Your metrics make perfect sense to me. However, I see a high demand for >> JOINs based metrics especially from those who give a try to non-collocated >> joins in production and want to measure them somehow. This is why, >> personally, I prefer to see the metrics below in the top priority list as >> well: >> >> if a query was executed in the collocated or non-collocated mode. Three >> results are valid: collocated, non-collocated, simple query (no joins). >> non-collocated query: size of the data exchanged between the nodes to >> complete a specific join. If there are multiple joins in the query we need >> to provide this metric for every of them. >> non-collocated and collocated query: a part of the time spent joining the >> data. If there are multiple joins in the query we need to provide this >> metric for every of them. >> >> As for “unicast” and “broadcast”, agree, let’s ignore it for now. >> >> In any case, can we include timing information (map phase, reduce phase, >> join phase) into an execution plan produced by H2? Are there any >> implementation hooks? >> >> — >> Denis >> >> >>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan <dsetrak...@apache.org> >> wrote: >>> >>> I think some of the metrics specified by Denis also make sense, so I >> would >>> add them as well. See below... >>> >>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov <voze...@gridgain.com >> <mailto:voze...@gridgain.com>> >>> wrote: >>> >>>> Denis, >>>> >>>> Query execution is complex process involving different stages which are >> not >>>> very easy to match with each other. Especially provided that any node >> can >>>> leave topology at any time. Another problem is that engine evolves and >>>> metrics like "did a query do broadcast or unicast" may easily become >>>> useless at some point, because for example there will be neither >> unicast, >>>> nor broadast, but something different. On the other hand I completely >> agree >>>> that performance monitoring is essential part of any mature DBMS. >>>> >>>> I would start with metrics which are both very basic and easy to >> implement >>>> at the same time. For example we can add fingerprint (hash) to every >> query >>>> which will be used to join "map" and "reduce" parts with each other and >> add >>>> the following basic metrics: >>>> 1) Execution count for particular query >>>> 2) Number of map nodes - min, max, avg >>>> >>> >>> (1) and (2) makes sense >>> >>> >>>> 3) Map step duration (if applicable) - min, max, >>> >>> 4) Reduce step duration (if applicable) - min, max, avg >>>> >>> >>> Not sure if (3) and (4) are needed. I would only add them if they are >> easy >>> to implement. >>> >>> I would also add these: >>> >>> 5) Collocated: yes/no >>> 6) last execution time >>> 7) min/max/average execution duration >>> >>> >>>> >>>> Once done users will be able to get statistics for particular queries. >>>> >>>> Vladimir. >>>> >>>> >>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <dma...@apache.org> wrote: >>>> >>>>> BTW, >>>>> >>>>> What if we expose per-query metrics below as a part of EXPLAIN ANALYZE? >>>>> Sergi, is this feasible? >>>>> >>>>> — >>>>> Denis >>>>> >>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <dma...@apache.org> wrote: >>>>>> >>>>>> Igniters, >>>>>> >>>>>> Let’s shed more light on SQL query execution internals introducing a >>>> set >>>>> of useful metrics (https://issues.apache.org/jira/browse/IGNITE-4757). >>>>>> >>>>>> Per-query metrics. Total history size is defined by >>>> *CacheConfiguration. >>>>> getQueryDetailMetricsSize*: >>>>>> * if a query was executed in the collocated or non-collocated mode. >>>>> Three results are valid: collocated, non-collocated, simple query (no >>>>> joins). >>>>>> * non-collocated query: size of the data exchanged between the nodes >> to >>>>> complete a join. >>>>>> * non-collocated query: did a query do broadcast or unicast to get >> data >>>>> needed to complete a join. >>>>>> * non-collocated and collocated query: a part of the time spent >> joining >>>>> the data. >>>>>> >>>>>> CacheMetrics: >>>>>> * an average number of executed SQL queries (collocated, >>>> non-collocated, >>>>> simple query (no joins)). >>>>>> >>>>>> Please don’t hesitate do share suggest another metrics or improve >>>>> proposed ones. >>>>>> >>>>>> — >>>>>> Denis >> >>