On 11/05/2013 10:16 PM, Paul Berry wrote:
On 1 November 2013 02:16, Tapani Pälli <tapani.pa...@intel.com <mailto:tapani.pa...@intel.com>> wrote:

    ir_deserializer can create a gl_shader structure out of binary
    data from ir_serializer, will be used by OES_get_program_binary
    implementation.

    Signed-off-by: Tapani Pälli <tapani.pa...@intel.com
    <mailto:tapani.pa...@intel.com>>
    ---
     src/glsl/Makefile.sources          |    1 +
     src/glsl/ir_cache_deserializer.cpp | 1341
    ++++++++++++++++++++++++++++++++++++
     src/glsl/ir_cache_deserializer.h   |  301 ++++++++
     3 files changed, 1643 insertions(+)
     create mode 100644 src/glsl/ir_cache_deserializer.cpp
     create mode 100644 src/glsl/ir_cache_deserializer.h


General comment: my attitude about error handling is different when reviewing this patch than it was when reviewing the last patch, because in the case of deserialization, we're dealing with data passed in by the client, so I think it's far more important to be robust in the case of malformed input data. We probably don't need to worry about malicious clients, since the client is in process so if they wanted to be malicious they could just walk all over memory. But it will make life a lot easier for library clients (and for us when we're debugging) if we can avoid having Mesa die in a fire when it receives a corrupted shader binary.


Yes, I agree, it makes sense to have some more error checking in deserialization, it has to be basic and quick though.

Having said that, I still don't think integer return values are the way to go, for the following reasons:
- it's confusing for 0 to mean success.
- we're going to forget to check return values.
- it complicates the signatures of a lot of functions that should be able to just return the thing that they read.

What I'd recommend doing instead is adding a "failed" boolean to memory_map. That way rather than having to check every single return value for failure, we just have to check the "failed" boolean at appropriate critical times (such as inside loops and at the end of functions like ir_read_variable()), and return early in the event of failure. An additional advantage of this is that it makes it easy to add bounds checking to memory_map--the read() and ffwd() functions can simply check if the position goes out of bounds, and if so, set "failed" flag to 0 and read zeros.

ok, I'll take a look how this would work out. The current checks have actually helped as they catch not just invalid/corrupt data from client but possible errors in serialization too, even though they are just simple value checks but there's quite a lot of them and they help in stopping the parsing immediately. If allowed to go further after error, it'll be harder to track when did it start to fail so I think there will be still a lot of checks.

    +ir_deserializer::read_header(struct gl_shader *shader, memory_map
    &map,
    +   const char *mesa_sha)


Rather than pass the memory_map to all of the ir_deserializer functions, I'd prefer to see it just be a member of the ir_deserializer class. It's a closer parallel to what you do with memory_writer in ir_serializer, and it makes the code easier to read by making all of the function signatures and invocations smaller.

Yep, I think it can be changed, the actual memory_map will come from caller of this class. I think read_glsl_type() is the only method which will require support to pass a memory_map, this is used when reading uniform_storage when deserializing a whole pgoram.

As with the memory_writer::write*() functions, I think the memory_map::read() functions should include the type in their name, and should use return by value rather than a pointer argument. E.g.:

int32_t read_int32();
bool read_bool();

and so on.

(Note: I think it's still ok to have a read() function that operates on a pointer for reading larger structs).

Yes, will change these.


Although technically the C standard allows locals whose name begin with an underscore (provided that what follows is a lower case letter), it seems strange--names beginning with underscores typically refer to globals that belong to libraries.

Also, it looks like all the callers of read_glsl_type pass this->state for this parameter, so I don't understand why the parameter is needed at all.

Yes, it is possible that in some of calls it might not be explicitly needed parameter, read_glsl_type must support also situation where we don't have state, this is used when reading uniform_storage.

    +{
    +   uint32_t type_size;
    +
    +   char *name = map.read_string();
    +   map.read(&type_size);
    +
    +   const glsl_type *type_exists = NULL;


"type_exists" sounds like the name of a boolean. Can we rename this to "existing_type"?

ok

    +
    +   if (_state && _state->symbols)
    +      type_exists = _state->symbols->get_type(name);


I don't see any circumstances where _state or _state->symbols would be NULL. Am I missing something?

state can be null, state->symbols is a paranoid check and can be removed

    +
    +   /* if type exists, move read pointer forward and return type */
    +   if (type_exists) {
    +      map.ffwd(type_size);
    +      return type_exists;
    +   }


A single shader is allowed to have multiple distinct interface types with the same name (e.g. an input and an output with the block name, but different interface members), so this won't work. It will cause the distinct interface types to alias to each other.

This will need some more thought then, I'll check if there's some tests that would utilize this.


I would have thought that the logic above for looking up existing types would take care of samplers, since they are built-in types (and hence populated in the symbol table before deserialization). Am I missing something here?


Yes, samplers could be removed. I wanted this function to be able to construct whatever type but you are right that samplers are all built-ins.

If we do need to keep this switch statement, I recommend adding some error handling in the default case to make sure the "failed" flag is set and then return glsl_type::error_type. That will reduce the risk of Mesa crashing in the event of a corrupted shader binary.

I will remove samplers from here

    +  glsl_struct_field *fields = ralloc_array(mem_ctx,
    +         glsl_struct_field, length);


We should check for a memory allocation failure here and do appropriate error handling; that way a corrupted binary that has an outrageously large length stored in it won't lead to a crash.

ok, will fix

    +ir_deserializer::read_ir_variable(struct exec_list *list,
    memory_map &map)


General comment on code organization: I'd prefer to see deserialization functions like these written as constructors (i.e. this function would be an ir_variable constructor, which takes an ir_deserializer as a single parameter). The advantages of this approach are (a) the deserialization code is closer to the other ir_variable-related code, so there's less danger of us forgetting to update it when we change the structure of ir_variable, and (b) if we change ir_variable and forget to update the deserialization code, static analysis tools like coverity will alert us to the bug.

Yep, I'll refactor and go this way now.

uint8_t read_only;
map.read(&read_only);
var->read_only = read_only;

we simply would do:

var->read_only = map.read_uint8();

Of course, my idea from the last patch of serializing a "bits" data structure would make this example moot.

I will do the separate 'bits' structure, it'll make code much more easier to maintain.

+

    + list->push_tail(var);
    +
    +   return 0;
    +}


Rather than having this function read the variable and add it to a list, I'd prefer for it to read the variable and return a pointer to it. Then the caller can decide whether to add it to a list or do something else to it. The key advantage IMHO is more flexibility for the future. For example, we have talked about the idea of changing the storage of function parameter ir_variables so that they use an array of ir_variable pointers rather than an exec_list. As the deserialization code is written right now we have extra obstacles to making that future change.

A similar comment applies to most of the other read...() functions below.

Sure, can be done. Currently it's been all the time about building the lists so I wanted to have it in functions instead of caller.

    +   uint32_t sig_amount;


As in the previous patch, I'd recommend renaming "sig_amount" to "num_signatures" and "par_count" to "param_count".

ok


    +      if (ir_type != ir_type_function_signature) {
    +         CACHE_DEBUG("cache format error with function %s\n", name);
    +         return -1;
    +      }


It seems silly to store ir_type in the instruction stream and then generate an error if it doesn't match. If we want to be robust in the face of a corrupted shader binary, we should be checking for things that don't make sense (like is_builtin being a number other than 0 or 1) rather than storing redundant information like ir_type in the instruction stream and then checking that it equals the only value that it could possibly be.

agreed, this check can be removed

+

    +      /* insert function parameter variables to prototypes list
    ... */
    +      foreach_list_const(node, &sig->parameters) {
    +         ir_variable *var = ((ir_instruction *) node)->as_variable();
    +         if (var)
    +  prototypes->push_tail(var->clone(mem_ctx, NULL));
    +      }


Can you explain what's going on with the variables in the prototypes list? I don't understand what purpose this serves.

There can be further references to these variables so this list will be then used to search for them.

    +read_errors:
    +   CACHE_DEBUG("%s: read errors with [%s]\n", __func__, name);
    +   if (sig)
    +      ralloc_free(sig);


Part of the point of ralloc is that we shouldn't have to do this kind of clean up. Instead, in the event of failure, we should just free the toplevel memory context, and all the memory we allocated during the failed deserialization attempt will be automatically reclaimed.

If we let ralloc take care of reclaiming the memory at the end of deserialization, then we can replace a lot of these goto labels for error conditions with early returns, which I think will make the code easier to read.

true, I will go through these cases

    +   return NULL;


If we adopt the idea of the "failed" boolean that I talked about above, then that gives us the opportunity to return a dummy ir_dereference_array rather than NULL. The advantage of doing that is that there's less risk of the caller segfaulting before it figures out that we are in an error condition.

Of course, if we adopt my suggestion of turning these reader functions into IR constructors, that will happen automatically.

Yes, I'm going for the constructor approach.

    +
    +read_return:
    +   if (list)
    +      list->push_tail(con);


This is a great example of why it should be the caller's responsibility to add the deserialized IR to the appropriate list. That way callers that need to put the IR in a list can do so, and callers that don't need to put it in a list won't.

will be changed


    +
    +static uint8_t
    +read_bool(memory_map &map)
    +{
    +   uint8_t value = 0;
    +   map.read(&value);
    +   return value;
    +}


As in the previous patch, this should just be a function in memory_map.

will fix


Thanks for the review;

// Tapani

_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to