Re: [Mesa-dev] [v2 4/6] glsl: ir_deserializer class for the shader cache

Tapani Pälli Fri, 08 Nov 2013 07:11:53 -0800

On 11/05/2013 10:16 PM, Paul Berry wrote:

On 1 November 2013 02:16, Tapani Pälli <tapani.pa...@intel.com<mailto:tapani.pa...@intel.com>> wrote:
    ir_deserializer can create a gl_shader structure out of binary
    data from ir_serializer, will be used by OES_get_program_binary
    implementation.

    Signed-off-by: Tapani Pälli <tapani.pa...@intel.com
    <mailto:tapani.pa...@intel.com>>
    ---
     src/glsl/Makefile.sources          |    1 +
     src/glsl/ir_cache_deserializer.cpp | 1341
    ++++++++++++++++++++++++++++++++++++
     src/glsl/ir_cache_deserializer.h   |  301 ++++++++
     3 files changed, 1643 insertions(+)
     create mode 100644 src/glsl/ir_cache_deserializer.cpp
     create mode 100644 src/glsl/ir_cache_deserializer.h
General comment: my attitude about error handling is different whenreviewing this patch than it was when reviewing the last patch,because in the case of deserialization, we're dealing with data passedin by the client, so I think it's far more important to be robust inthe case of malformed input data. We probably don't need to worryabout malicious clients, since the client is in process so if theywanted to be malicious they could just walk all over memory. But itwill make life a lot easier for library clients (and for us when we'redebugging) if we can avoid having Mesa die in a fire when it receivesa corrupted shader binary.

Yes, I agree, it makes sense to have some more error checking indeserialization, it has to be basic and quick though.

Having said that, I still don't think integer return values are theway to go, for the following reasons:
- it's confusing for 0 to mean success.
- we're going to forget to check return values.
- it complicates the signatures of a lot of functions that should beable to just return the thing that they read.
What I'd recommend doing instead is adding a "failed" boolean tomemory_map. That way rather than having to check every single returnvalue for failure, we just have to check the "failed" boolean atappropriate critical times (such as inside loops and at the end offunctions like ir_read_variable()), and return early in the event offailure. An additional advantage of this is that it makes it easy toadd bounds checking to memory_map--the read() and ffwd() functions cansimply check if the position goes out of bounds, and if so, set"failed" flag to 0 and read zeros.

ok, I'll take a look how this would work out. The current checks haveactually helped as they catch not just invalid/corrupt data from clientbut possible errors in serialization too, even though they are justsimple value checks but there's quite a lot of them and they help instopping the parsing immediately. If allowed to go further after error,it'll be harder to track when did it start to fail so I think there willbe still a lot of checks.

    +ir_deserializer::read_header(struct gl_shader *shader, memory_map
    &map,
    +   const char *mesa_sha)
Rather than pass the memory_map to all of the ir_deserializerfunctions, I'd prefer to see it just be a member of their_deserializer class. It's a closer parallel to what you do withmemory_writer in ir_serializer, and it makes the code easier to readby making all of the function signatures and invocations smaller.

Yep, I think it can be changed, the actual memory_map will come fromcaller of this class. I think read_glsl_type() is the only method whichwill require support to pass a memory_map, this is used when readinguniform_storage when deserializing a whole pgoram.

As with the memory_writer::write*() functions, I think thememory_map::read() functions should include the type in their name,and should use return by value rather than a pointer argument. E.g.:
int32_t read_int32();
bool read_bool();

and so on.
(Note: I think it's still ok to have a read() function that operateson a pointer for reading larger structs).


Yes, will change these.

Although technically the C standard allows locals whose name beginwith an underscore (provided that what follows is a lower caseletter), it seems strange--names beginning with underscores typicallyrefer to globals that belong to libraries.
Also, it looks like all the callers of read_glsl_type pass this->statefor this parameter, so I don't understand why the parameter is neededat all.

Yes, it is possible that in some of calls it might not be explicitlyneeded parameter, read_glsl_type must support also situation where wedon't have state, this is used when reading uniform_storage.

    +{
    +   uint32_t type_size;
    +
    +   char *name = map.read_string();
    +   map.read(&type_size);
    +
    +   const glsl_type *type_exists = NULL;
"type_exists" sounds like the name of a boolean. Can we rename thisto "existing_type"?

ok

    +
    +   if (_state && _state->symbols)
    +      type_exists = _state->symbols->get_type(name);
I don't see any circumstances where _state or _state->symbols would beNULL. Am I missing something?


state can be null, state->symbols is a paranoid check and can be removed

    +
    +   /* if type exists, move read pointer forward and return type */
    +   if (type_exists) {
    +      map.ffwd(type_size);
    +      return type_exists;
    +   }
A single shader is allowed to have multiple distinct interface typeswith the same name (e.g. an input and an output with the block name,but different interface members), so this won't work. It will causethe distinct interface types to alias to each other.

This will need some more thought then, I'll check if there's some teststhat would utilize this.

I would have thought that the logic above for looking up existingtypes would take care of samplers, since they are built-in types (andhence populated in the symbol table before deserialization). Am Imissing something here?

Yes, samplers could be removed. I wanted this function to be able toconstruct whatever type but you are right that samplers are all built-ins.

If we do need to keep this switch statement, I recommend adding someerror handling in the default case to make sure the "failed" flag isset and then return glsl_type::error_type. That will reduce the riskof Mesa crashing in the event of a corrupted shader binary.


I will remove samplers from here

    +  glsl_struct_field *fields = ralloc_array(mem_ctx,
    +         glsl_struct_field, length);
We should check for a memory allocation failure here and doappropriate error handling; that way a corrupted binary that has anoutrageously large length stored in it won't lead to a crash.


ok, will fix

    +ir_deserializer::read_ir_variable(struct exec_list *list,
    memory_map &map)
General comment on code organization: I'd prefer to seedeserialization functions like these written as constructors (i.e.this function would be an ir_variable constructor, which takes anir_deserializer as a single parameter). The advantages of thisapproach are (a) the deserialization code is closer to the otherir_variable-related code, so there's less danger of us forgetting toupdate it when we change the structure of ir_variable, and (b) if wechange ir_variable and forget to update the deserialization code,static analysis tools like coverity will alert us to the bug.


Yep, I'll refactor and go this way now.

uint8_t read_only;
map.read(&read_only);
var->read_only = read_only;

we simply would do:

var->read_only = map.read_uint8();
Of course, my idea from the last patch of serializing a "bits" datastructure would make this example moot.

I will do the separate 'bits' structure, it'll make code much moreeasier to maintain.

+

    + list->push_tail(var);
    +
    +   return 0;
    +}
Rather than having this function read the variable and add it to alist, I'd prefer for it to read the variable and return a pointer toit. Then the caller can decide whether to add it to a list or dosomething else to it. The key advantage IMHO is more flexibility forthe future. For example, we have talked about the idea of changingthe storage of function parameter ir_variables so that they use anarray of ir_variable pointers rather than an exec_list. As thedeserialization code is written right now we have extra obstacles tomaking that future change.
A similar comment applies to most of the other read...() functions below.

Sure, can be done. Currently it's been all the time about building thelists so I wanted to have it in functions instead of caller.

    +   uint32_t sig_amount;
As in the previous patch, I'd recommend renaming "sig_amount" to"num_signatures" and "par_count" to "param_count".

ok

    +      if (ir_type != ir_type_function_signature) {
    +         CACHE_DEBUG("cache format error with function %s\n", name);
    +         return -1;
    +      }
It seems silly to store ir_type in the instruction stream and thengenerate an error if it doesn't match. If we want to be robust in theface of a corrupted shader binary, we should be checking for thingsthat don't make sense (like is_builtin being a number other than 0 or1) rather than storing redundant information like ir_type in theinstruction stream and then checking that it equals the only valuethat it could possibly be.


agreed, this check can be removed

+

    +      /* insert function parameter variables to prototypes list
    ... */
    +      foreach_list_const(node, &sig->parameters) {
    +         ir_variable *var = ((ir_instruction *) node)->as_variable();
    +         if (var)
    +  prototypes->push_tail(var->clone(mem_ctx, NULL));
    +      }

Can you explain what's going on with the variables in the prototypeslist? I don't understand what purpose this serves.

There can be further references to these variables so this list will bethen used to search for them.

    +read_errors:
    +   CACHE_DEBUG("%s: read errors with [%s]\n", __func__, name);
    +   if (sig)
    +      ralloc_free(sig);
Part of the point of ralloc is that we shouldn't have to do this kindof clean up. Instead, in the event of failure, we should just freethe toplevel memory context, and all the memory we allocated duringthe failed deserialization attempt will be automatically reclaimed.
If we let ralloc take care of reclaiming the memory at the end ofdeserialization, then we can replace a lot of these goto labels forerror conditions with early returns, which I think will make the codeeasier to read.


true, I will go through these cases

    +   return NULL;
If we adopt the idea of the "failed" boolean that I talked aboutabove, then that gives us the opportunity to return a dummyir_dereference_array rather than NULL. The advantage of doing that isthat there's less risk of the caller segfaulting before it figures outthat we are in an error condition.
Of course, if we adopt my suggestion of turning these reader functionsinto IR constructors, that will happen automatically.


Yes, I'm going for the constructor approach.

    +
    +read_return:
    +   if (list)
    +      list->push_tail(con);
This is a great example of why it should be the caller'sresponsibility to add the deserialized IR to the appropriate list.That way callers that need to put the IR in a list can do so, andcallers that don't need to put it in a list won't.


will be changed

    +
    +static uint8_t
    +read_bool(memory_map &map)
    +{
    +   uint8_t value = 0;
    +   map.read(&value);
    +   return value;
    +}


As in the previous patch, this should just be a function in memory_map.


will fix


Thanks for the review;

// Tapani

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [v2 4/6] glsl: ir_deserializer class for the shader cache

Reply via email to