Summarized the discussion updating ticket’s description: https://issues.apache.org/jira/browse/IGNITE-4757
— Denis > On Mar 3, 2017, at 11:14 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote: > > On Fri, Mar 3, 2017 at 11:07 AM, Denis Magda <dma...@apache.org> wrote: > >> What you’re saying should be default behavior. Plus, we can add a special >> parameter that will gather metrics for queries executed longer that N. >> > > Agree. I thought we already have that in some form. I remember seeing a > warning for long queries in the log. > > >> >> — >> Denis >> >>> On Mar 3, 2017, at 11:01 AM, Dmitriy Setrakyan <dsetrak...@apache.org> >> wrote: >>> >>> Hm... as a user I would be interested to know that, say, 95% of my >> "select >>> * from sometable where..." query executes under 10ms or so. >>> >>> I think holding some history is important and is not that hard to >> implement. >>> >>> D. >>> >>> On Fri, Mar 3, 2017 at 10:55 AM, Denis Magda <dma...@apache.org> wrote: >>> >>>> Sergey, agree, good point! >>>> >>>> Igniters, any other thoughts before we wrap up the discussion updating >> the >>>> ticket content? >>>> >>>> — >>>> Denis >>>> >>>>> On Mar 3, 2017, at 10:06 AM, Valentin Kulichenko < >>>> valentin.kuliche...@gmail.com> wrote: >>>>> >>>>> Sergey, that's great idea! Generally, user is not interested much in >> some >>>>> average numbers, especially in case of SQL queries. What they need is a >>>>> list of slow queries and detailed information about the execution flow >> of >>>>> these particular queries. >>>>> >>>>> -Val >>>>> >>>>> On Fri, Mar 3, 2017 at 2:50 AM, Sergey Kozlov <skoz...@gridgain.com> >>>> wrote: >>>>> >>>>>> One more comment: >>>>>> >>>>>> In general the customer is interested in slow queries details thus we >>>> can >>>>>> introduce an option which will allow to store only queries executed >> more >>>>>> than NNN seconds. It may significantly reduce the the memory >> consumption >>>>>> for history (but logging of all queries is still available if set that >>>>>> option to 0). >>>>>> >>>>>> On Fri, Mar 3, 2017 at 1:00 AM, Denis Magda <dma...@apache.org> >> wrote: >>>>>> >>>>>>> Vovan, >>>>>>> >>>>>>> When I’m speaking of JOIN metrics I’m simply assume that we need to >> add >>>>>>> metrics relevant for queries with joins, metrics that will help us >> get >>>>>> more >>>>>>> insights on non-collocated and collocated joins execution flow. >>>>>>> >>>>>>>> 1) Query exec count >>>>>>>> 2) Query exec time (first define what "time" means) - min, max, avg >>>>>>> >>>>>>> Total query execution time might not be helpful in the trickiest >> cases. >>>>>>> What if you have multiple joins in your query? How do I know which >> one >>>>>>> contributes to the execution most? >>>>>>> >>>>>>> So, I would do a breakdown having total time, map time, per-join >> time, >>>>>>> reduce time. Hope it’s possible. If it’s unclear how to support >>>>>> everything >>>>>>> at the first place then it’s a different question. Let’s create >> several >>>>>>> tickets and start implementing everything gracefully. >>>>>>> >>>>>>>> 3) Number of bytes exchanged between nodes during query execution >>>>>>> >>>>>>> It will be really helpful to make a breakdown showing a number of >> bytes >>>>>>> exchanged per-join (physical join). Again, if you believe it makes >>>> sense >>>>>> to >>>>>>> do the breakdown later then let’s create an additional ticket for >> the 2 >>>>>>> tier metrics then. >>>>>>> >>>>>>>> 4) Number of returned rows - min, max, avg >>>>>>> >>>>>>> >>>>>>> Plus, let’s add the following to the list: >>>>>>> >>>>>>> 5) Collocated: yes or no >>>>>>> >>>>>>> — >>>>>>> Denis >>>>>>> >>>>>>>> On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov <voze...@gridgain.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Denis, >>>>>>>> >>>>>>>> The main problem with suggested metrics is that they implies that >>>>>> ceratin >>>>>>>> internal mechanics work in predefined way. For example, what is JOIN >>>>>>>> metrics? There are no guarantees that JOIN in user's query will be >>>>>>>> translated to a real physical join. What if several different query >>>>>>>> execution pieces happen in parallel? What if we rework our >> distributed >>>>>>>> query engine from pull to push approach for performance reasons and >>>>>> there >>>>>>>> will be no JOINs in classical sense? >>>>>>>> >>>>>>>> This is why I think that we should start with very basic things. >>>>>>> Something >>>>>>>> like: >>>>>>>> 1) Query exec count >>>>>>>> 2) Query exec time (first define what "time" means) - min, max, avg >>>>>>>> 3) Number of bytes exchanged between nodes during query execution >>>>>>>> 4) Number of returned rows - min, max, avg >>>>>>>> >>>>>>>> Once we have base numbers in place, we can think of carefully >>>>>> integrating >>>>>>>> and enhancing all pieces of query execution into more verbose >> formats, >>>>>>>> similar to query plans with relative weights in classical RDBMS >>>>>> systems. >>>>>>>> >>>>>>>> Thoughts? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda <dma...@apache.org> >>>>>> wrote: >>>>>>>> >>>>>>>>> Vovan, >>>>>>>>> >>>>>>>>> Your metrics make perfect sense to me. However, I see a high demand >>>>>> for >>>>>>>>> JOINs based metrics especially from those who give a try to >>>>>>> non-collocated >>>>>>>>> joins in production and want to measure them somehow. This is why, >>>>>>>>> personally, I prefer to see the metrics below in the top priority >>>> list >>>>>>> as >>>>>>>>> well: >>>>>>>>> >>>>>>>>> if a query was executed in the collocated or non-collocated mode. >>>>>> Three >>>>>>>>> results are valid: collocated, non-collocated, simple query (no >>>>>> joins). >>>>>>>>> non-collocated query: size of the data exchanged between the nodes >> to >>>>>>>>> complete a specific join. If there are multiple joins in the query >> we >>>>>>> need >>>>>>>>> to provide this metric for every of them. >>>>>>>>> non-collocated and collocated query: a part of the time spent >> joining >>>>>>> the >>>>>>>>> data. If there are multiple joins in the query we need to provide >>>> this >>>>>>>>> metric for every of them. >>>>>>>>> >>>>>>>>> As for “unicast” and “broadcast”, agree, let’s ignore it for now. >>>>>>>>> >>>>>>>>> In any case, can we include timing information (map phase, reduce >>>>>> phase, >>>>>>>>> join phase) into an execution plan produced by H2? Are there any >>>>>>>>> implementation hooks? >>>>>>>>> >>>>>>>>> — >>>>>>>>> Denis >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan < >>>>>> dsetrak...@apache.org> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> I think some of the metrics specified by Denis also make sense, >> so I >>>>>>>>> would >>>>>>>>>> add them as well. See below... >>>>>>>>>> >>>>>>>>>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov < >>>>>> voze...@gridgain.com >>>>>>>>> <mailto:voze...@gridgain.com>> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Denis, >>>>>>>>>>> >>>>>>>>>>> Query execution is complex process involving different stages >> which >>>>>>> are >>>>>>>>> not >>>>>>>>>>> very easy to match with each other. Especially provided that any >>>>>> node >>>>>>>>> can >>>>>>>>>>> leave topology at any time. Another problem is that engine >> evolves >>>>>> and >>>>>>>>>>> metrics like "did a query do broadcast or unicast" may easily >>>> become >>>>>>>>>>> useless at some point, because for example there will be neither >>>>>>>>> unicast, >>>>>>>>>>> nor broadast, but something different. On the other hand I >>>>>> completely >>>>>>>>> agree >>>>>>>>>>> that performance monitoring is essential part of any mature DBMS. >>>>>>>>>>> >>>>>>>>>>> I would start with metrics which are both very basic and easy to >>>>>>>>> implement >>>>>>>>>>> at the same time. For example we can add fingerprint (hash) to >>>> every >>>>>>>>> query >>>>>>>>>>> which will be used to join "map" and "reduce" parts with each >> other >>>>>>> and >>>>>>>>> add >>>>>>>>>>> the following basic metrics: >>>>>>>>>>> 1) Execution count for particular query >>>>>>>>>>> 2) Number of map nodes - min, max, avg >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> (1) and (2) makes sense >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> 3) Map step duration (if applicable) - min, max, >>>>>>>>>> >>>>>>>>>> 4) Reduce step duration (if applicable) - min, max, avg >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Not sure if (3) and (4) are needed. I would only add them if they >>>> are >>>>>>>>> easy >>>>>>>>>> to implement. >>>>>>>>>> >>>>>>>>>> I would also add these: >>>>>>>>>> >>>>>>>>>> 5) Collocated: yes/no >>>>>>>>>> 6) last execution time >>>>>>>>>> 7) min/max/average execution duration >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Once done users will be able to get statistics for particular >>>>>> queries. >>>>>>>>>>> >>>>>>>>>>> Vladimir. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <dma...@apache.org> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> BTW, >>>>>>>>>>>> >>>>>>>>>>>> What if we expose per-query metrics below as a part of EXPLAIN >>>>>>> ANALYZE? >>>>>>>>>>>> Sergi, is this feasible? >>>>>>>>>>>> >>>>>>>>>>>> — >>>>>>>>>>>> Denis >>>>>>>>>>>> >>>>>>>>>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <dma...@apache.org> >>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Igniters, >>>>>>>>>>>>> >>>>>>>>>>>>> Let’s shed more light on SQL query execution internals >>>>>> introducing a >>>>>>>>>>> set >>>>>>>>>>>> of useful metrics (https://issues.apache.org/ >>>>>> jira/browse/IGNITE-4757 >>>>>>> ). >>>>>>>>>>>>> >>>>>>>>>>>>> Per-query metrics. Total history size is defined by >>>>>>>>>>> *CacheConfiguration. >>>>>>>>>>>> getQueryDetailMetricsSize*: >>>>>>>>>>>>> * if a query was executed in the collocated or non-collocated >>>>>> mode. >>>>>>>>>>>> Three results are valid: collocated, non-collocated, simple >> query >>>>>> (no >>>>>>>>>>>> joins). >>>>>>>>>>>>> * non-collocated query: size of the data exchanged between the >>>>>> nodes >>>>>>>>> to >>>>>>>>>>>> complete a join. >>>>>>>>>>>>> * non-collocated query: did a query do broadcast or unicast to >>>> get >>>>>>>>> data >>>>>>>>>>>> needed to complete a join. >>>>>>>>>>>>> * non-collocated and collocated query: a part of the time spent >>>>>>>>> joining >>>>>>>>>>>> the data. >>>>>>>>>>>>> >>>>>>>>>>>>> CacheMetrics: >>>>>>>>>>>>> * an average number of executed SQL queries (collocated, >>>>>>>>>>> non-collocated, >>>>>>>>>>>> simple query (no joins)). >>>>>>>>>>>>> >>>>>>>>>>>>> Please don’t hesitate do share suggest another metrics or >> improve >>>>>>>>>>>> proposed ones. >>>>>>>>>>>>> >>>>>>>>>>>>> — >>>>>>>>>>>>> Denis >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sergey Kozlov >>>>>> GridGain Systems >>>>>> www.gridgain.com >>>>>> >>>> >>>> >> >>