Re: Additional SQL metrics

Denis Magda Mon, 06 Mar 2017 15:03:43 -0800

Summarized the discussion updating ticket’s description:
https://issues.apache.org/jira/browse/IGNITE-4757


—
Denis

> On Mar 3, 2017, at 11:14 AM, Dmitriy Setrakyan <dsetrak...@apache.org> wrote:
> 
> On Fri, Mar 3, 2017 at 11:07 AM, Denis Magda <dma...@apache.org> wrote:
> 
>> What you’re saying should be default behavior. Plus, we can add a special
>> parameter that will gather metrics for queries executed longer that N.
>> 
> 
> Agree. I thought we already have that in some form. I remember seeing a
> warning for long queries in the log.
> 
> 
>> 
>> —
>> Denis
>> 
>>> On Mar 3, 2017, at 11:01 AM, Dmitriy Setrakyan <dsetrak...@apache.org>
>> wrote:
>>> 
>>> Hm... as a user I would be interested to know that, say, 95% of my
>> "select
>>> * from sometable where..." query executes under 10ms or so.
>>> 
>>> I think holding some history is important and is not that hard to
>> implement.
>>> 
>>> D.
>>> 
>>> On Fri, Mar 3, 2017 at 10:55 AM, Denis Magda <dma...@apache.org> wrote:
>>> 
>>>> Sergey, agree, good point!
>>>> 
>>>> Igniters, any other thoughts before we wrap up the discussion updating
>> the
>>>> ticket content?
>>>> 
>>>> —
>>>> Denis
>>>> 
>>>>> On Mar 3, 2017, at 10:06 AM, Valentin Kulichenko <
>>>> valentin.kuliche...@gmail.com> wrote:
>>>>> 
>>>>> Sergey, that's great idea! Generally, user is not interested much in
>> some
>>>>> average numbers, especially in case of SQL queries. What they need is a
>>>>> list of slow queries and detailed information about the execution flow
>> of
>>>>> these particular queries.
>>>>> 
>>>>> -Val
>>>>> 
>>>>> On Fri, Mar 3, 2017 at 2:50 AM, Sergey Kozlov <skoz...@gridgain.com>
>>>> wrote:
>>>>> 
>>>>>> One more comment:
>>>>>> 
>>>>>> In general the customer is interested in slow queries details thus we
>>>> can
>>>>>> introduce an option which will allow to store only queries executed
>> more
>>>>>> than NNN seconds. It may significantly reduce the the memory
>> consumption
>>>>>> for history (but logging of all queries is still available if set that
>>>>>> option to 0).
>>>>>> 
>>>>>> On Fri, Mar 3, 2017 at 1:00 AM, Denis Magda <dma...@apache.org>
>> wrote:
>>>>>> 
>>>>>>> Vovan,
>>>>>>> 
>>>>>>> When I’m speaking of JOIN metrics I’m simply assume that we need to
>> add
>>>>>>> metrics relevant for queries with joins, metrics that will help us
>> get
>>>>>> more
>>>>>>> insights on non-collocated and collocated joins execution flow.
>>>>>>> 
>>>>>>>> 1) Query exec count
>>>>>>>> 2) Query exec time (first define what "time" means) - min, max, avg
>>>>>>> 
>>>>>>> Total query execution time might not be helpful in the trickiest
>> cases.
>>>>>>> What if you have multiple joins in your query? How do I know which
>> one
>>>>>>> contributes to the execution most?
>>>>>>> 
>>>>>>> So, I would do a breakdown having total time, map time, per-join
>> time,
>>>>>>> reduce time. Hope it’s possible. If it’s unclear how to support
>>>>>> everything
>>>>>>> at the first place then it’s a different question. Let’s create
>> several
>>>>>>> tickets and start implementing everything gracefully.
>>>>>>> 
>>>>>>>> 3) Number of bytes exchanged between nodes during query execution
>>>>>>> 
>>>>>>> It will be really helpful to make a breakdown showing a number of
>> bytes
>>>>>>> exchanged per-join (physical join). Again, if you believe it makes
>>>> sense
>>>>>> to
>>>>>>> do the breakdown later then let’s create an additional ticket for
>> the 2
>>>>>>> tier metrics then.
>>>>>>> 
>>>>>>>> 4) Number of returned rows - min, max, avg
>>>>>>> 
>>>>>>> 
>>>>>>> Plus, let’s add the following to the list:
>>>>>>> 
>>>>>>> 5) Collocated: yes or no
>>>>>>> 
>>>>>>> —
>>>>>>> Denis
>>>>>>> 
>>>>>>>> On Mar 2, 2017, at 1:32 PM, Vladimir Ozerov <voze...@gridgain.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Denis,
>>>>>>>> 
>>>>>>>> The main problem with suggested metrics is that they implies that
>>>>>> ceratin
>>>>>>>> internal mechanics work in predefined way. For example, what is JOIN
>>>>>>>> metrics? There are no guarantees that JOIN in user's query will be
>>>>>>>> translated to a real physical join. What if several different query
>>>>>>>> execution pieces happen in parallel? What if we rework our
>> distributed
>>>>>>>> query engine from pull to push approach for performance reasons and
>>>>>> there
>>>>>>>> will be no JOINs in classical sense?
>>>>>>>> 
>>>>>>>> This is why I think that we should start with very basic things.
>>>>>>> Something
>>>>>>>> like:
>>>>>>>> 1) Query exec count
>>>>>>>> 2) Query exec time (first define what "time" means) - min, max, avg
>>>>>>>> 3) Number of bytes exchanged between nodes during query execution
>>>>>>>> 4) Number of returned rows - min, max, avg
>>>>>>>> 
>>>>>>>> Once we have base numbers in place, we can think of carefully
>>>>>> integrating
>>>>>>>> and enhancing all pieces of query execution into more verbose
>> formats,
>>>>>>>> similar to query plans with relative weights in classical RDBMS
>>>>>> systems.
>>>>>>>> 
>>>>>>>> Thoughts?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Mar 2, 2017 at 11:31 PM, Denis Magda <dma...@apache.org>
>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Vovan,
>>>>>>>>> 
>>>>>>>>> Your metrics make perfect sense to me. However, I see a high demand
>>>>>> for
>>>>>>>>> JOINs based metrics especially from those who give a try to
>>>>>>> non-collocated
>>>>>>>>> joins in production  and want to measure them somehow. This is why,
>>>>>>>>> personally, I prefer to see the metrics below in the top priority
>>>> list
>>>>>>> as
>>>>>>>>> well:
>>>>>>>>> 
>>>>>>>>> if a query was executed in the collocated or non-collocated mode.
>>>>>> Three
>>>>>>>>> results are valid: collocated, non-collocated, simple query (no
>>>>>> joins).
>>>>>>>>> non-collocated query: size of the data exchanged between the nodes
>> to
>>>>>>>>> complete a specific join. If there are multiple joins in the query
>> we
>>>>>>> need
>>>>>>>>> to provide this metric for every of them.
>>>>>>>>> non-collocated and collocated query: a part of the time spent
>> joining
>>>>>>> the
>>>>>>>>> data. If there are multiple joins in the query we need to provide
>>>> this
>>>>>>>>> metric for every of them.
>>>>>>>>> 
>>>>>>>>> As for “unicast” and “broadcast”, agree, let’s ignore it for now.
>>>>>>>>> 
>>>>>>>>> In any case, can we include timing information (map phase, reduce
>>>>>> phase,
>>>>>>>>> join phase) into an execution plan produced by H2? Are there any
>>>>>>>>> implementation hooks?
>>>>>>>>> 
>>>>>>>>> —
>>>>>>>>> Denis
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Mar 2, 2017, at 12:02 PM, Dmitriy Setrakyan <
>>>>>> dsetrak...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> I think some of the metrics specified by Denis also make sense,
>> so I
>>>>>>>>> would
>>>>>>>>>> add them as well. See below...
>>>>>>>>>> 
>>>>>>>>>> On Thu, Mar 2, 2017 at 12:36 AM, Vladimir Ozerov <
>>>>>> voze...@gridgain.com
>>>>>>>>> <mailto:voze...@gridgain.com>>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Denis,
>>>>>>>>>>> 
>>>>>>>>>>> Query execution is complex process involving different stages
>> which
>>>>>>> are
>>>>>>>>> not
>>>>>>>>>>> very easy to match with each other. Especially provided that any
>>>>>> node
>>>>>>>>> can
>>>>>>>>>>> leave topology at any time. Another problem is that engine
>> evolves
>>>>>> and
>>>>>>>>>>> metrics like "did a query do broadcast or unicast" may easily
>>>> become
>>>>>>>>>>> useless at some point, because for example there will be neither
>>>>>>>>> unicast,
>>>>>>>>>>> nor broadast, but something different. On the other hand I
>>>>>> completely
>>>>>>>>> agree
>>>>>>>>>>> that performance monitoring is essential part of any mature DBMS.
>>>>>>>>>>> 
>>>>>>>>>>> I would start with metrics which are both very basic and easy to
>>>>>>>>> implement
>>>>>>>>>>> at the same time. For example we can add fingerprint (hash) to
>>>> every
>>>>>>>>> query
>>>>>>>>>>> which will be used to join "map" and "reduce" parts with each
>> other
>>>>>>> and
>>>>>>>>> add
>>>>>>>>>>> the following basic metrics:
>>>>>>>>>>> 1) Execution count for particular query
>>>>>>>>>>> 2) Number of map nodes - min, max, avg
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> (1) and (2) makes sense
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 3) Map step duration (if applicable) - min, max,
>>>>>>>>>> 
>>>>>>>>>> 4) Reduce step duration (if applicable) - min, max, avg
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Not sure if (3) and (4) are needed. I would only add them if they
>>>> are
>>>>>>>>> easy
>>>>>>>>>> to implement.
>>>>>>>>>> 
>>>>>>>>>> I would also add these:
>>>>>>>>>> 
>>>>>>>>>> 5) Collocated: yes/no
>>>>>>>>>> 6) last execution time
>>>>>>>>>> 7) min/max/average execution duration
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Once done users will be able to get statistics for particular
>>>>>> queries.
>>>>>>>>>>> 
>>>>>>>>>>> Vladimir.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Feb 28, 2017 at 3:12 AM, Denis Magda <dma...@apache.org>
>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> BTW,
>>>>>>>>>>>> 
>>>>>>>>>>>> What if we expose per-query metrics below as a part of EXPLAIN
>>>>>>> ANALYZE?
>>>>>>>>>>>> Sergi, is this feasible?
>>>>>>>>>>>> 
>>>>>>>>>>>> —
>>>>>>>>>>>> Denis
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Feb 27, 2017, at 2:35 PM, Denis Magda <dma...@apache.org>
>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Igniters,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Let’s shed more light on SQL query execution internals
>>>>>> introducing a
>>>>>>>>>>> set
>>>>>>>>>>>> of useful metrics (https://issues.apache.org/
>>>>>> jira/browse/IGNITE-4757
>>>>>>> ).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Per-query metrics. Total history size is defined by
>>>>>>>>>>> *CacheConfiguration.
>>>>>>>>>>>> getQueryDetailMetricsSize*:
>>>>>>>>>>>>> * if a query was executed in the collocated or non-collocated
>>>>>> mode.
>>>>>>>>>>>> Three results are valid: collocated, non-collocated, simple
>> query
>>>>>> (no
>>>>>>>>>>>> joins).
>>>>>>>>>>>>> * non-collocated query: size of the data exchanged between the
>>>>>> nodes
>>>>>>>>> to
>>>>>>>>>>>> complete a join.
>>>>>>>>>>>>> * non-collocated query: did a query do broadcast or unicast to
>>>> get
>>>>>>>>> data
>>>>>>>>>>>> needed to complete a join.
>>>>>>>>>>>>> * non-collocated and collocated query: a part of the time spent
>>>>>>>>> joining
>>>>>>>>>>>> the data.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> CacheMetrics:
>>>>>>>>>>>>> * an average number of executed SQL queries (collocated,
>>>>>>>>>>> non-collocated,
>>>>>>>>>>>> simple query (no joins)).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Please don’t hesitate do share suggest another metrics or
>> improve
>>>>>>>>>>>> proposed ones.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> —
>>>>>>>>>>>>> Denis
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Sergey Kozlov
>>>>>> GridGain Systems
>>>>>> www.gridgain.com
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Additional SQL metrics

Reply via email to