Re: Thoughts on new metastore APIs

Thejas Nair Thu, 06 Mar 2014 23:40:25 -0800

Thanks for the detailed explanation.
If we follow a consistent pattern of creating a new Request/Response
object for each method, I think the readability is not hurt by this
approach.


I think the best practice can be clarified as "practice of a single
request and response struct, specific for each method."

On Thu, Mar 6, 2014 at 10:37 AM, Brock Noland <br...@cloudera.com> wrote:
> On Thu, Mar 6, 2014 at 12:13 PM, Thejas Nair <the...@hortonworks.com> wrote:
>> Thanks for discussing this, Brock. I agree that is important to
>> consider while writing a new metastore api call.
>> But I think this (single input/output struct)  should be a guideline,
>> I am not sure if this should be used in every case.
>
> As with any rule, there are always exceptions. However, looking at the
> new API's I don't see an instance where it would would have been
> harmful to use this model. Exceptions should be extremely rare since
> thousands of RPC implementations successfully use the request/response
> model. However, as always, it would be up to the developers working on
> the change to make this call. They'd do so knowing they are going
> against the guideline and could be asked to justify why they are doing
> so.
>
>> What you are saying shows that there is a tradeoff between ending up
>> with more functions vs ending up with more input/output
>> structs/classes. I am not sure if having more input/output structs is
>> any better.
>>
>> Take the case of create_table/create_table_with_environment_context
>> that you mentioned. Even though create_table had a single input
>> argument Table, instead of adding  EnvironmentContext contents to
>> Table struct, the authors decided to create a new function with
>> additional  EnvironmentContext argument. This makes sense because the
>> Table struct is used by other functions as well such as get_table, and
>> EnvironmentContext fields don't make sense for those cases as it is
>> not persisted as part of table.
>>
>> Which means that the only way to prevent creation of
>> create_table_with_environment_context method would have been to have a
>> CreateTableArg struct as input argument instead of Table as the
>> argument. ie, creating a different struct for the single input/output
>> every function is only way you can be sure that you don't need more
>> functions.
>
> RPC methods are "special". They are published to the world and
> therefore cannot be easily modified or refactored. Once we create a
> new RPC method, we are stuck with it for a very long time. In this
> way, Thrift is rather strange in that it allows you to exposed the api
> signatures. The request/response model is far more common.
>
> Therefore, if we were creating create_table from scratch, I would
> suggest we use:
>
> CreateTableRequest/CreateTableResponse
>
> That way, we can add optional arguments such as "environment context"
> very easily. Although, I'd love to take credit for this idea, I didn't
> just come up with this myself. This is a standard way of handling RPC.
>
>> This approach of reducing the number of functions also means that you
>> would start encoding different types of actions within the single
>> input argument.
>>
>> Consider the case of get_partition vs get_partition_by_name. It would
>> need a single struct with an enum that tells if it to lookup based on
>> the partition key-values or the name, and based on the enum it would
>> use different fields in the struct. I feel having different functions
>> is more readable for this case.
>
> There will be cases where we'll need similar methods. This point where
> is with the request/response model, adding a single parameter doesn't
> require an entirely new method. The developers working on the change
> would have to make the call as to whether a new functionality requires
> a new API or can be handled within the current API.
>
>> For example, the api in HIVE-5931 would need to change from
>> ====================================
>> list<RolePrincipalGrant> get_principals_in_role(1:string role_name)
>> ====================================
>> to
>>
>> ====================================
>> struct GetPrincipalInRoleOutput{
>>  1:list<RolePrincipalGrant> rolePrincList;
>> }
>>
>> struct GetPrincipalInRoleInput{
>> 1:string role_name;
>> }
>> GetPrincipalInRoleOutput get_principals_in_role(1:
>> GetPrincipalInRoleInput input);
>> ====================================
>>
>> I am not sure if the insurance costs in terms of readability is low
>> here. I think we should consider the risk in each case of function
>> proliferation and pay the costs accordingly.
>> Let me know if I have misunderstood what you are proposing here.
>
> The "Output" and "Input" names do feel odd. As I said above.
> Request/Response are standard names for these kinds of objects. Today,
> for a method like the one above, it may feel like overheard or extra
> work. However, in the future when you want to add another parameter
> such as "isFilter" or "encoding" etc, then the insurance pays off big
> time.
>
> Brock

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Thoughts on new metastore APIs

Reply via email to