You can add a flag to each driver query identifying what kind of query it is (a hw perf counter or a CPU-only query). Then you can enumerate all queries and see if there's at least one perf counter and if so, advertise the extension.
Or add a CAP and let drivers decide if they want the extension or not. You still need the per-query flag to skip the non-hw queries though. Marek On Mon, Mar 30, 2015 at 3:06 PM, Samuel Pitoiset <samuel.pitoi...@gmail.com> wrote: > > > On 03/29/2015 06:11 PM, Martin Peres wrote: >> >> On 29/03/2015 17:56, Samuel Pitoiset wrote: >>> >>> >>> On 03/28/2015 09:43 PM, Martin Peres wrote: >>>> >>>> On 22/03/2015 17:35, Samuel Pitoiset wrote: >>>>> >>>>> From: Christoph Bumiller <e0425...@student.tuwien.ac.at> >>>>> >>>>> This is based on the original patch of Christoph Bumiller. >>>>> (source: http://people.freedesktop.org/~chrisbmr/perfmon.diff) >>>> >>>> It would be nice if you could add "v2: Samuel Pitoiset" and tell what >>>> you changed. Christoph may delete his perfmon.diff and no-one will be >>>> able to diff the diffs :) >>> >>> Good idea! >>> >>>>> As for the Gallium HUD, we keep a list of busy queries in a ring >>>>> buffer in order to prevent stalls when reading queries. >>>>> >>>>> Drivers must implement get_driver_query_group_info and >>>>> get_driver_query_info in order to enable this extension. >>>>> >>>>> Signed-off-by: Samuel Pitoiset <samuel.pitoi...@gmail.com> >>>>> --- >>>>> src/mesa/Makefile.sources | 2 + >>>>> src/mesa/state_tracker/st_cb_perfmon.c | 455 >>>>> +++++++++++++++++++++++++++++++++ >>>>> src/mesa/state_tracker/st_cb_perfmon.h | 70 +++++ >>>>> src/mesa/state_tracker/st_context.c | 4 + >>>>> src/mesa/state_tracker/st_extensions.c | 3 + >>>>> 5 files changed, 534 insertions(+) >>>>> create mode 100644 src/mesa/state_tracker/st_cb_perfmon.c >>>>> create mode 100644 src/mesa/state_tracker/st_cb_perfmon.h >>>>> >>>>> diff --git a/src/mesa/Makefile.sources b/src/mesa/Makefile.sources >>>>> index 217be9a..e54e618 100644 >>>>> --- a/src/mesa/Makefile.sources >>>>> +++ b/src/mesa/Makefile.sources >>>>> @@ -432,6 +432,8 @@ STATETRACKER_FILES = \ >>>>> state_tracker/st_cb_flush.h \ >>>>> state_tracker/st_cb_msaa.c \ >>>>> state_tracker/st_cb_msaa.h \ >>>>> + state_tracker/st_cb_perfmon.c \ >>>>> + state_tracker/st_cb_perfmon.h \ >>>>> state_tracker/st_cb_program.c \ >>>>> state_tracker/st_cb_program.h \ >>>>> state_tracker/st_cb_queryobj.c \ >>>>> diff --git a/src/mesa/state_tracker/st_cb_perfmon.c >>>>> b/src/mesa/state_tracker/st_cb_perfmon.c >>>>> new file mode 100644 >>>>> index 0000000..fb6774b >>>>> --- /dev/null >>>>> +++ b/src/mesa/state_tracker/st_cb_perfmon.c >>>>> @@ -0,0 +1,455 @@ >>>>> +/* >>>>> + * Copyright (C) 2013 Christoph Bumiller >>>>> + * Copyright (C) 2015 Samuel Pitoiset >>>>> + * >>>>> + * Permission is hereby granted, free of charge, to any person >>>>> obtaining a >>>>> + * copy of this software and associated documentation files (the >>>>> "Software"), >>>>> + * to deal in the Software without restriction, including without >>>>> limitation >>>>> + * the rights to use, copy, modify, merge, publish, distribute, >>>>> sublicense, >>>>> + * and/or sell copies of the Software, and to permit persons to whom >>>>> the >>>>> + * Software is furnished to do so, subject to the following >>>>> conditions: >>>>> + * >>>>> + * The above copyright notice and this permission notice shall be >>>>> included in >>>>> + * all copies or substantial portions of the Software. >>>>> + * >>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>>>> EXPRESS OR >>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>>>> MERCHANTABILITY, >>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>>>> EVENT SHALL >>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >>>>> OR >>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>>>> OTHERWISE, >>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE >>>>> USE OR >>>>> + * OTHER DEALINGS IN THE SOFTWARE. >>>>> + */ >>>>> + >>>>> +/** >>>>> + * Performance monitoring counters interface to gallium. >>>>> + */ >>>>> + >>>>> +#include "st_context.h" >>>>> +#include "st_cb_bitmap.h" >>>>> +#include "st_cb_perfmon.h" >>>>> + >>>>> +#include "util/bitset.h" >>>>> + >>>>> +#include "pipe/p_context.h" >>>>> +#include "pipe/p_screen.h" >>>>> +#include "util/u_memory.h" >>>>> + >>>>> +/** >>>>> + * Return a PIPE_QUERY_x type >= PIPE_QUERY_DRIVER_SPECIFIC, or -1 if >>>>> + * the driver-specific query doesn't exist. >>>>> + */ >>>>> +static int >>>>> +find_query_type(struct pipe_screen *screen, const char *name) >>>>> +{ >>>>> + int num_queries; >>>>> + int type = -1; >>>>> + int i; >>>>> + >>>>> + num_queries = screen->get_driver_query_info(screen, 0, NULL); >>>>> + if (!num_queries) >>>>> + return type; >>>>> + >>>>> + for (i = 0; i < num_queries; i++) { >>>>> + struct pipe_driver_query_info info; >>>>> + >>>>> + if (!screen->get_driver_query_info(screen, i, &info)) >>>>> + continue; >>>>> + >>>>> + if (!strncmp(info.name, name, strlen(name))) { >>>>> + type = info.query_type; >>>>> + break; >>>>> + } >>>>> + } >>>>> + return type; >>>>> +} >>>>> + >>>>> +static bool >>>>> +init_perf_monitor(struct gl_context *ctx, struct >>>>> gl_perf_monitor_object *m) >>>>> +{ >>>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>>> + struct pipe_screen *screen = st_context(ctx)->pipe->screen; >>>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>>> + int gid, cid; >>>>> + >>>>> + st_flush_bitmap_cache(st_context(ctx)); >>>>> + >>>>> + /* Create a query for each active counter. */ >>>>> + for (gid = 0; gid < ctx->PerfMonitor.NumGroups; gid++) { >>>>> + const struct gl_perf_monitor_group *g = >>>>> &ctx->PerfMonitor.Groups[gid]; >>>>> + for (cid = 0; cid < g->NumCounters; cid++) { >>>>> + const struct gl_perf_monitor_counter *c = &g->Counters[cid]; >>>>> + struct st_perf_counter_object *cntr; >>>>> + int query_type; >>>>> + >>>>> + if (!BITSET_TEST(m->ActiveCounters[gid], cid)) >>>>> + continue; >>>> >>>> It would seem like the extension would not work with more than 32 >>>> counters per group. >>>> >>>> This certainly is not a problem on the NVIDIA side but it may become a >>>> problem for another >>>> GPU manufacturer. It may warrant a note disclosing this limitation. >>>> What do you think? >>> >>> Ok, I'll add a note for that. >>> >>>>> + >>>>> + query_type = find_query_type(screen, c->Name); >>>>> + assert(query_type != -1); >>>>> + >>>>> + cntr = CALLOC_STRUCT(st_perf_counter_object); >>>>> + if (!cntr) >>>>> + return false; >>>>> + >>>>> + cntr->queries[cntr->head] = pipe->create_query(pipe, >>>>> query_type, 0); >>>>> + cntr->query_type = query_type; >>>>> + cntr->id = cid; >>>>> + cntr->group_id = gid; >>>>> + >>>>> + list_addtail(&cntr->list, &stm->active_counters); >>>>> + } >>>>> + } >>>>> + return true; >>>>> +} >>>>> + >>>>> +static void >>>>> +reset_perf_monitor(struct st_perf_monitor_object *stm, >>>>> + struct pipe_context *pipe) >>>>> +{ >>>>> + struct st_perf_counter_object *cntr, *tmp; >>>>> + int i; >>>>> + >>>>> + LIST_FOR_EACH_ENTRY_SAFE(cntr, tmp, &stm->active_counters, list) { >>>>> + for (i = 0; i < ARRAY_SIZE(cntr->queries); i++) { >>>>> + if (cntr->queries[i]) >>>>> + pipe->destroy_query(pipe, cntr->queries[i]); >>>>> + } >>>>> + list_del(&cntr->list); >>>>> + free(cntr); >>>>> + } >>>>> +} >>>>> + >>>>> +static unsigned >>>>> +read_query_results(struct pipe_context *pipe, GLenum type, >>>>> + struct st_perf_counter_object *cntr, >>>>> + union pipe_query_result *results) >>>>> +{ >>>>> + unsigned num_results = 0; >>>>> + >>>>> + while (1) { >>>>> + union pipe_query_result result; >>>>> + >>>>> + if (!pipe->get_query_result(pipe, cntr->queries[cntr->tail], >>>>> + FALSE, &result)) >>>>> + break; >>>>> + >>>>> + switch (type) { >>>>> + case GL_UNSIGNED_INT64_AMD: >>>>> + (*results).u64 += result.u64; >>>>> + break; >>>>> + case GL_UNSIGNED_INT: >>>>> + (*results).u32 += result.u32; >>>>> + break; >>>>> + case GL_FLOAT: >>>>> + case GL_PERCENTAGE_AMD: >>>>> + (*results).f += result.f; >>>>> + break; >>>>> + } >>>>> + num_results++; >>>>> + >>>>> + if (cntr->tail == cntr->head) >>>>> + break; >>>>> + >>>>> + cntr->tail = (cntr->tail + 1) % ST_PERFMON_NUM_QUERIES; >>>>> + } >>>>> + return num_results; >>>>> +} >>>>> + >>>>> +static struct gl_perf_monitor_object * >>>>> +st_NewPerfMonitor() >>>>> +{ >>>>> + struct st_perf_monitor_object *stq = >>>>> ST_CALLOC_STRUCT(st_perf_monitor_object); >>>>> + if (stq) { >>>>> + list_inithead(&stq->active_counters); >>>>> + return &stq->base; >>>>> + } >>>>> + return NULL; >>>>> +} >>>>> + >>>>> +static void >>>>> +st_DeletePerfMonitor(struct gl_context *ctx, struct >>>>> gl_perf_monitor_object *m) >>>>> +{ >>>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>>> + >>>>> + reset_perf_monitor(stm, pipe); >>>>> + FREE(stm); >>>>> +} >>>>> + >>>>> +static GLboolean >>>>> +st_BeginPerfMonitor(struct gl_context *ctx, struct >>>>> gl_perf_monitor_object *m) >>>>> +{ >>>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>>> + struct st_perf_counter_object *cntr; >>>>> + int gid; >>>>> + >>>>> + /* Check the number of active counters for each group. */ >>>>> + for (gid = 0; gid < ctx->PerfMonitor.NumGroups; gid++) { >>>>> + const struct gl_perf_monitor_group *g = >>>>> &ctx->PerfMonitor.Groups[gid]; >>>>> + if (m->ActiveGroups[gid] > g->MaxActiveCounters) { >>>>> + /* Maximum number of counters reached. Cannot start the >>>>> session. */ >>>>> + return false; >>>>> + } >>>>> + } >>>> >>>> I was about to complain that you did not check for errors returned by >>>> pipe_query, but I guess that with the above test, only the lying >>>> drivers would get impacted. It is going to be hard for Nouveau to >>>> allow more than 1 counter at a time because of the B6 ones... >>> >>> This extension has not been designed for NVIDIA hardware. :-) >>> But yes, it will not really possible to monitor more than 1 counter per >>> group for "special" counting modes like B4/B6. >> >> >> As discussed on IRC, it will still be possible, but it will be more >> annoying for applications. We can discuss about this later when the core >> gallium code is in place :) >> >>> >>>>> + >>>>> + if (LIST_IS_EMPTY(&stm->active_counters)) { >>>>> + /* Create queries for each active counter before starting >>>>> + * a new monitoring session. */ >>>>> + if (!init_perf_monitor(ctx, m)) >>>>> + goto fail; >>>>> + } else { >>>>> + /* As for the Gallium HUD, we keep a list of busy queries in a >>>>> ring >>>>> + * buffer in order to prevent stalls when reading queries. */ >>>>> + LIST_FOR_EACH_ENTRY(cntr, &stm->active_counters, list) { >>>>> + if ((cntr->head + 1) % ST_PERFMON_NUM_QUERIES == cntr->tail) >>>>> { >>>>> + /* All queries are busy, throw away the last query and >>>>> create >>>>> + * a new one. */ >>>>> + pipe->destroy_query(pipe, cntr->queries[cntr->head]); >>>>> + cntr->queries[cntr->head] = >>>>> + pipe->create_query(pipe, cntr->query_type, 0); >>>> >>>> Shouldn't you block here, waiting for the oldest query to complete? By >>>> dropping some queries, you may create a deadlock if the application >>>> submits a ton of queries and then waits for them sequentially. >>>> >>>> An easy fix is to make sure that ST_PERFMON_NUM_QUERIES > 2 * >>>> g->MaxActiveCounters, then, you can simply drop the oldest request. >>>> Here is why: >>> >>> Yeah, this was a bit insane, I'll make the changes and try to avoid that >>> possible deadlock. >> >> >> Wonderful! >>> >>> >>>> - The spec forbids nested BeginPerfMonitor calls. >>>> - EndPerfMonitor make previous counter values impossible to reach. >>>> >>>> So, the worst case scenario is the following: >>>> BeginPerfMonitor(mon=1, ...) >>>> EndPerfMonitor(mon=1, ...) >>>> BeginPerfMonitor(mon=1, ...) >>>> GetPerfMonitorCounterData(mon=1, ...) >>> >>> No, this scenario is not possible. >>> This is actually not clearly specified by the spec but the "core >>> support" in mesa doesn't allow to read counters data while a monitoring >>> sessions is active, and this makes sense. >> >> >> In this case, you only need to have ST_PERFMON_NUM_QUERIES == >> g->MaxActiveCounters >>> >>> >>>> Anything that happened before the first BeginPerf here is irrelevant >>>> and can be safely optimised away. >>>> >>>> I would thus suggest dynamically allocating this ring buffer and >>>> dropping the oldest requests. To make the code simpler, you could >>>> simply always select the oldest. >>>> >>>> Any thoughts on this? >>>> >>>> + } else { >>>> + /* Add a new query for this frame. */ >>>> + cntr->head = (cntr->head + 1) % ST_PERFMON_NUM_QUERIES; >>>> + if (!cntr->queries[cntr->head]) { >>>> + cntr->queries[cntr->head] = >>>> + pipe->create_query(pipe, cntr->query_type, 0); >>>> + } >>>> + } >>>> + } >>>> + } >>>> + >>>> + /* Start the newest query for each active counter. */ >>>> + LIST_FOR_EACH_ENTRY(cntr, &stm->active_counters, list) { >>>> + if (!pipe->begin_query(pipe, cntr->queries[cntr->head])) >>>> + goto fail; >>>> + } >>>> + return true; >>>> + >>>> +fail: >>>> + /* Failed to start the monitoring session. */ >>>> + reset_perf_monitor(stm, pipe); >>>> + return false; >>>> +} >>>> + >>>> +static void >>>> +st_EndPerfMonitor(struct gl_context *ctx, struct >>>> gl_perf_monitor_object *m) >>>> +{ >>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>> + struct st_perf_counter_object *cntr; >>>> + >>>> + /* Stop the newest query for each active counter. */ >>>> + LIST_FOR_EACH_ENTRY(cntr, &stm->active_counters, list) >>>> + pipe->end_query(pipe, cntr->queries[cntr->head]); >>>> +} >>>> + >>>> +static void >>>> +st_ResetPerfMonitor(struct gl_context *ctx, struct >>>> gl_perf_monitor_object *m) >>>> +{ >>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>> + >>>> + if (!m->Ended) >>>> + st_EndPerfMonitor(ctx, m); >>>> + >>>> + reset_perf_monitor(stm, pipe); >>>> + >>>> + if (m->Active) >>>> + st_BeginPerfMonitor(ctx, m); >>>> +} >>>> + >>>> +static GLboolean >>>> +st_IsPerfMonitorResultAvailable(struct gl_context *ctx, >>>> + struct gl_perf_monitor_object *m) >>>> +{ >>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>> + struct st_perf_counter_object *cntr; >>>> + >>>> + if (LIST_IS_EMPTY(&stm->active_counters)) >>>> + return false; >>>> + >>>> + /* The result of a monitoring session is only available if the >>>> oldest >>>> + * query of each active counter is idle. */ >>>> + LIST_FOR_EACH_ENTRY(cntr, &stm->active_counters, list) { >>>> + union pipe_query_result result; >>>> + if (!pipe->get_query_result(pipe, cntr->queries[cntr->tail], >>>> + FALSE, &result)) { >>>> + /* The oldest query is busy. */ >>>> + return false; >>>> + } >>>> + } >>>> + return true; >>>> +} >>>> + >>>> +static void >>>> +st_GetPerfMonitorResult(struct gl_context *ctx, >>>> + struct gl_perf_monitor_object *m, >>>> + GLsizei dataSize, >>>> + GLuint *data, >>>> + GLint *bytesWritten) >>>> +{ >>>> + struct st_perf_monitor_object *stm = st_perf_monitor_object(m); >>>> + struct pipe_context *pipe = st_context(ctx)->pipe; >>>> + struct st_perf_counter_object *cntr; >>>> + >>>> + /* Copy data to the supplied array (data). >>>> + * >>>> + * The output data format is: <group ID, counter ID, value> for each >>>> + * active counter. The API allows counters to appear in any order. >>>> + */ >>>> + GLsizei offset = 0; >>>> + >>>> + /* Read query results for each active counter. */ >>>> + LIST_FOR_EACH_ENTRY(cntr, &stm->active_counters, list) { >>>> + union pipe_query_result results = {}; >>>> + unsigned num_results = 0; >>>> + int gid, cid; >>>> + GLenum type; >>>> + >>>> + cid = cntr->id; >>>> + gid = cntr->group_id; >>>> + type = ctx->PerfMonitor.Groups[gid].Counters[cid].Type; >>>> + >>>> + num_results = read_query_results(pipe, type, cntr, &results); >>>> + if (!num_results) >>>> + continue; >>>> >>>> It seems like you do not allow the case I presented earlier >>>> (GetPerfMonitorCounterData after BeginPerfMonitor) unless you rely on >>>> the drivers to do the right thing? >>> >>> See above. >>> >>>> + >>>> + data[offset++] = gid; >>>> + data[offset++] = cid; >>>> + switch (type) { >>>> + case GL_UNSIGNED_INT64_AMD: >>>> + *(uint64_t *)&data[offset] = results.u64 / num_results; >>>> + offset += sizeof(uint64_t) / sizeof(GLuint); >>>> + break; >>>> + case GL_UNSIGNED_INT: >>>> + *(uint32_t *)&data[offset] = results.u32 / num_results; >>>> + offset += sizeof(uint32_t) / sizeof(GLuint); >>>> + break; >>>> + case GL_FLOAT: >>>> + case GL_PERCENTAGE_AMD: >>>> + *(GLfloat *)&data[offset] = results.f / (float)num_results; >>>> + offset += sizeof(GLfloat) / sizeof(GLuint); >>>> + break; >>>> + } >>>> + } >>>> + >>>> + if (bytesWritten) >>>> + *bytesWritten = offset * sizeof(GLuint); >>>> +} >>>> + >>>> +bool >>>> +st_init_perfmon(struct st_context *st) >>>> +{ >>>> + struct gl_perf_monitor_state *perfmon = &st->ctx->PerfMonitor; >>>> + struct pipe_screen *screen = st->pipe->screen; >>>> + struct gl_perf_monitor_group *groups = NULL; >>>> + int num_counters, num_groups; >>>> + int gid, cid; >>>> + >>>> + if (!screen->get_driver_query_info || >>>> !screen->get_driver_query_group_info) >>>> + return false; >>>> + >>>> + /* Get the number of available queries. */ >>>> + num_counters = screen->get_driver_query_info(screen, 0, NULL); >>>> + if (!num_counters) >>>> + return false; >>>> + >>>> + /* Get the number of available groups. */ >>>> + num_groups = screen->get_driver_query_group_info(screen, 0, NULL); >>>> + if (num_groups) >>>> + groups = CALLOC(num_groups, sizeof(*groups)); >>>> + if (!groups) >>>> + return false; >>>> + >>>> + for (gid = 0; gid < num_groups; gid++) { >>>> + struct gl_perf_monitor_group *g = &groups[gid]; >>>> + struct pipe_driver_query_group_info group_info; >>>> + struct gl_perf_monitor_counter *counters = NULL; >>>> + >>>> + if (!screen->get_driver_query_group_info(screen, gid, >>>> &group_info)) >>>> + continue; >>>> + >>>> + g->Name = group_info.name; >>>> + g->MaxActiveCounters = group_info.max_active_queries; >>>> + g->NumCounters = 0; >>>> + g->Counters = NULL; >>>> + >>>> + if (group_info.num_queries) >>>> + counters = CALLOC(group_info.num_queries, sizeof(*counters)); >>>> + if (!counters) >>>> + goto fail; >>>> + >>>> + for (cid = 0; cid < num_counters; cid++) { >>>> + struct gl_perf_monitor_counter *c = &counters[g->NumCounters]; >>>> + struct pipe_driver_query_info info; >>>> + >>>> + if (!screen->get_driver_query_info(screen, cid, &info)) >>>> + continue; >>>> + if (info.group_id != gid) >>>> + continue; >>>> + >>>> + c->Name = info.name; >>>> + switch (info.type) { >>>> + case PIPE_DRIVER_QUERY_TYPE_UINT64: >>>> + c->Minimum.u64 = 0; >>>> + c->Maximum.u64 = info.max_value.u64; >>>> + c->Type = GL_UNSIGNED_INT64_AMD; >>>> + break; >>>> + case PIPE_DRIVER_QUERY_TYPE_UINT: >>>> + c->Minimum.u32 = 0; >>>> + c->Maximum.u32 = info.max_value.u32; >>>> + c->Type = GL_UNSIGNED_INT; >>>> + break; >>>> + case PIPE_DRIVER_QUERY_TYPE_FLOAT: >>>> + c->Minimum.f = 0.0; >>>> + c->Maximum.f = info.max_value.f; >>>> + c->Type = GL_FLOAT; >>>> + break; >>>> + case PIPE_DRIVER_QUERY_TYPE_PERCENTAGE: >>>> + c->Minimum.f = 0.0; >>>> + c->Maximum.f = 100.0; >>>> + c->Type = GL_PERCENTAGE_AMD; >>>> + break; >>>> + default: >>>> + unreachable("Should never happen: invalid driver query >>>> type"); >>>> >>>> >>>> The definition of "unreachable" should be enough, no need to add >>>> "Should never happen:" in my opinion. >>> >>> Sure. >>> >>>>> + } >>>>> + g->NumCounters++; >>>>> + } >>>>> + g->Counters = counters; >>>>> + } >>>>> + >>>>> + perfmon->NumGroups = num_groups; >>>>> + perfmon->Groups = groups; >>>>> + return true; >>>>> + >>>>> +fail: >>>>> + for (gid = 0; gid < num_groups; gid++) >>>>> + FREE((void *)groups[gid].Counters); >>>>> + FREE(groups); >>>>> + return false; >>>>> +} >>>>> + >>>>> +void >>>>> +st_destroy_perfmon(struct st_context *st) >>>>> +{ >>>>> + struct gl_perf_monitor_state *perfmon = &st->ctx->PerfMonitor; >>>>> + int gid; >>>>> + >>>>> + for (gid = 0; gid < perfmon->NumGroups; gid++) >>>>> + FREE((void *)perfmon->Groups[gid].Counters); >>>>> + FREE((void *)perfmon->Groups); >>>>> +} >>>>> + >>>>> +void st_init_perfmon_functions(struct dd_function_table *functions) >>>>> +{ >>>>> + functions->NewPerfMonitor = st_NewPerfMonitor; >>>>> + functions->DeletePerfMonitor = st_DeletePerfMonitor; >>>>> + functions->BeginPerfMonitor = st_BeginPerfMonitor; >>>>> + functions->EndPerfMonitor = st_EndPerfMonitor; >>>>> + functions->ResetPerfMonitor = st_ResetPerfMonitor; >>>>> + functions->IsPerfMonitorResultAvailable = >>>>> st_IsPerfMonitorResultAvailable; >>>>> + functions->GetPerfMonitorResult = st_GetPerfMonitorResult; >>>> >>>> I don't see SelectPerfMonitorCountersAMD, who defined the entry >>>> points? Seems like I am missing something here. Are you piggy backing >>>> on the intel implementation? >>> >>> SelectPerfMonitorCountersAMD is defined in >>> src/mesa/main/performance_monitor.c >> >> >> Ok, I will have a look at it later. Thanks. >> >>> >>>>> +} >>>>> diff --git a/src/mesa/state_tracker/st_cb_perfmon.h >>>>> b/src/mesa/state_tracker/st_cb_perfmon.h >>>>> new file mode 100644 >>>>> index 0000000..ba5bf04 >>>>> --- /dev/null >>>>> +++ b/src/mesa/state_tracker/st_cb_perfmon.h >>>>> @@ -0,0 +1,70 @@ >>>>> +/* >>>>> + * Copyright (C) 2013 Christoph Bumiller >>>>> + * Copyright (C) 2015 Samuel Pitoiset >>>>> + * >>>>> + * Permission is hereby granted, free of charge, to any person >>>>> obtaining a >>>>> + * copy of this software and associated documentation files (the >>>>> "Software"), >>>>> + * to deal in the Software without restriction, including without >>>>> limitation >>>>> + * the rights to use, copy, modify, merge, publish, distribute, >>>>> sublicense, >>>>> + * and/or sell copies of the Software, and to permit persons to whom >>>>> the >>>>> + * Software is furnished to do so, subject to the following >>>>> conditions: >>>>> + * >>>>> + * The above copyright notice and this permission notice shall be >>>>> included in >>>>> + * all copies or substantial portions of the Software. >>>>> + * >>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, >>>>> EXPRESS OR >>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF >>>>> MERCHANTABILITY, >>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO >>>>> EVENT SHALL >>>>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES >>>>> OR >>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR >>>>> OTHERWISE, >>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE >>>>> USE OR >>>>> + * OTHER DEALINGS IN THE SOFTWARE. >>>>> + */ >>>>> + >>>>> +#ifndef ST_CB_PERFMON_H >>>>> +#define ST_CB_PERFMON_H >>>>> + >>>>> +#include "util/u_double_list.h" >>>>> + >>>>> +#define ST_PERFMON_NUM_QUERIES 8 >>>>> + >>>>> +/** >>>>> + * Subclass of gl_perf_monitor_object >>>>> + */ >>>>> +struct st_perf_monitor_object >>>>> +{ >>>>> + struct gl_perf_monitor_object base; >>>>> + struct list_head active_counters; >>>>> +}; >>>>> + >>>>> +struct st_perf_counter_object >>>>> +{ >>>>> + struct list_head list; >>>>> + int query_type; >>>>> + int id; >>>>> + int group_id; >>>>> + >>>>> + /* Ring of queries. If a query is busy, we use another slot. */ >>>>> + struct pipe_query *queries[ST_PERFMON_NUM_QUERIES]; >>>>> + unsigned head, tail; >>>>> +}; >>>>> + >>>>> +/** >>>>> + * Cast wrapper >>>>> + */ >>>>> +static INLINE struct st_perf_monitor_object * >>>>> +st_perf_monitor_object(struct gl_perf_monitor_object *q) >>>>> +{ >>>>> + return (struct st_perf_monitor_object *)q; >>>>> +} >>>>> + >>>>> +bool >>>>> +st_init_perfmon(struct st_context *st); >>>>> + >>>>> +void >>>>> +st_destroy_perfmon(struct st_context *st); >>>>> + >>>>> +extern void >>>>> +st_init_perfmon_functions(struct dd_function_table *functions); >>>>> + >>>>> +#endif >>>>> diff --git a/src/mesa/state_tracker/st_context.c >>>>> b/src/mesa/state_tracker/st_context.c >>>>> index 5fe132a..2146424 100644 >>>>> --- a/src/mesa/state_tracker/st_context.c >>>>> +++ b/src/mesa/state_tracker/st_context.c >>>>> @@ -51,6 +51,7 @@ >>>>> #include "st_cb_fbo.h" >>>>> #include "st_cb_feedback.h" >>>>> #include "st_cb_msaa.h" >>>>> +#include "st_cb_perfmon.h" >>>>> #include "st_cb_program.h" >>>>> #include "st_cb_queryobj.h" >>>>> #include "st_cb_readpixels.h" >>>>> @@ -116,6 +117,7 @@ st_destroy_context_priv(struct st_context *st) >>>>> st_destroy_bitmap(st); >>>>> st_destroy_drawpix(st); >>>>> st_destroy_drawtex(st); >>>>> + st_destroy_perfmon(st); >>>>> for (shader = 0; shader < ARRAY_SIZE(st->state.sampler_views); >>>>> shader++) { >>>>> for (i = 0; i < ARRAY_SIZE(st->state.sampler_views[0]); i++) { >>>>> @@ -190,6 +192,7 @@ st_create_context_priv( struct gl_context *ctx, >>>>> struct pipe_context *pipe, >>>>> st_init_bitmap(st); >>>>> st_init_clear(st); >>>>> st_init_draw( st ); >>>>> + st_init_perfmon(st); >>>> >>>> Why is there no checks here on the return value? >>> >>> Good catch! >>> >>>>> /* Choose texture target for glDrawPixels, glBitmap, >>>>> renderbuffers */ >>>>> if (pipe->screen->get_param(pipe->screen, >>>>> PIPE_CAP_NPOT_TEXTURES)) >>>>> @@ -414,6 +417,7 @@ void st_init_driver_functions(struct >>>>> dd_function_table *functions) >>>>> st_init_fbo_functions(functions); >>>>> st_init_feedback_functions(functions); >>>>> st_init_msaa_functions(functions); >>>>> + st_init_perfmon_functions(functions); >>>>> st_init_program_functions(functions); >>>>> st_init_query_functions(functions); >>>>> st_init_cond_render_functions(functions); >>>>> diff --git a/src/mesa/state_tracker/st_extensions.c >>>>> b/src/mesa/state_tracker/st_extensions.c >>>>> index bc20f73..5b6d6c6 100644 >>>>> --- a/src/mesa/state_tracker/st_extensions.c >>>>> +++ b/src/mesa/state_tracker/st_extensions.c >>>>> @@ -629,6 +629,9 @@ void st_init_extensions(struct pipe_screen *screen, >>>>> extensions->OES_EGL_image_external = GL_TRUE; >>>>> extensions->OES_draw_texture = GL_TRUE; >>>>> + if (screen->get_driver_query_info && >>>>> screen->get_driver_query_group_info) >>>>> + extensions->AMD_performance_monitor = GL_TRUE; >>>> >>>> Isn't this a bit silly to enable this extension if a driver does not >>>> export any group or counter? >>>> The return value of st_init_perfmon() would be well suited for >>>> enabling or not the extension. >>>> >>>> I am OK with the current solution though as driver devs should not >>>> export those functions >>>> unless they do have counters to expose. >>> >>> I don't think this is silly because there is no point to define those >>> functions if the driver doesn't expose any counters. >> >> >> You mean it would be silly for the driver dev to export those functions if >> they had not counters? Yes, I guess you are right. > > > Well, actually it's a bad solution to expose this extension like that > because amd perfmon must only expose hw performance counters. > The good one is to enable it *only* if the driver expose GPU counters. > > > >> >>> >>>>> + >>>>> /* Expose the extensions which directly correspond to gallium >>>>> caps. */ >>>>> for (i = 0; i < ARRAY_SIZE(cap_mapping); i++) { >>>>> if (screen->get_param(screen, cap_mapping[i].cap)) { >>>> >>>> Great work Samuel! 1-5 are R-b: Martin Peres <martin.pe...@free.fr> >>>> >>>> As for this patch, I will hold my R-b until we clear this query >>>> management issues. >>> >>> Thanks a lot for the review. >> >> > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev