Dmitry Dolgov <9erthali...@gmail.com> writes: >> On 4 January 2018 at 03:05, Tom Lane <t...@sss.pgh.pa.us> wrote: >> I wonder what happened to the plan to separate assignment and fetch into two >> different node types. I can see that that didn't happen so far as primnodes.h >> is concerned, but you've made some changes that seem to assume it did happen.
> There was one version of this patch that followed this plan. It turns out that > it's quite unnatural approach (at least within the current implementation), > because I had to duplicate or reuse a lot of code for those two node types. I'm not entirely convinced by that argument. Sure, there would be a lot of duplicative code in the node support functions, but that's inherent in our approach to those functions. I'm more concerned about readability, especially given that exposing this feature to extensions is going to set all these decisions in concrete forever. > Here is a new rebased version of the patch > with incorporated improvements that you've mentioned. I spent a couple of hours looking through this. I'm still pretty unhappy with the state of the parser APIs. In the first place, you haven't documented what those APIs are in any meaningful way. I do not think it's adequate to tell people to go read array_subscript_parse as the first and only way to understand what a subscript parse function must do. We do often expect extension authors to pick up small details that way, but there should be a textual API spec of some sort --- for example, look at the index AM API specs in indexam.sgml, which give pretty clear high-level definitions of what each AM API function has to do. Part of the reason why I'm insistent on that is that I think it will expose that the division of labor between the core parser and the datatype-specific parse function is still a mess. One particular sore spot is the question of who decides what the return data type of a subscripting function is. Right now you seem to be making that decision in the core parser, at least for the assignment case, which is pretty bad from an extensibility standpoint and also leads to all of this messiness: * You changed transformArrayType so that it doesn't throw an error if the given type isn't an array --- without, I note, updating either the function header comment or the internal comment that you falsified. * That in turn means that where transformAssignmentSubscripts thinks it's determining the result type, it may or may not be producing InvalidOid instead (again, with no comment to warn the reader). * And then you had to kludge transformAssignmentIndirection horribly (and I'm not sure you kludged it enough, either, because there are still a bunch of places where it uses targetTypeId without any concern for the possibility that that's zero). It doesn't seem to me to be acceptable to just ignore coercion failure, as that code does now. If it's no longer the responsibility of this function to guarantee that the result is of the right type, why is it attempting coercion at all? In any case you failed to update its header comment to explain what it's doing differently than before. In short the division of labor in this area still needs to be thought about. I don't think we really want transformAssignmentSubscripts determining the required input data type at all; that should be farmed out to the datatype-specific code. I'm also pretty unconvinced about refnestedfunc --- why do we need that? I also notice that you still haven't done anything about the problem of the subscripting operation possibly yielding a different typmod or collation than the container type has. It was okay to let that go for awhile, but we're not shipping it like this, because it's going to be awfully hard to change that struct type once extensions are building them. While I'm on the topic, I am not really happy with s/array/container/ as you've done in some of this code. To my mind, "container type" includes composite types. Particularly in the parse_target code, where we're currently dealing with either composites or arrays, making it say that we're dealing with either composites or containers is just a recipe for confusion. Unfortunately I can't think of a better word offhand, but some attention to this is needed. As far as the comments go, we might be able to use the term "subscriptable type", but not sure if that will work for function/variable names. On the executor side of things, I suspect Andres will be unhappy that you are making ExprEvalStep part of the API for datatypes --- he objected to my exposing it to plpgsql param eval in https://postgr.es/m/20171220174243.n4y3hgzf7xd3m...@alap3.anarazel.de and there was a lot more reason to do so there than there is here, IMO. It looks like what you actually need is just the SubscriptingRefState and an isnull flag, so it might be better to specify that the fetch and assign functions have signatures like Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull) (representing both of the last two arguments as INTERNAL at SQL level). Now on the other hand, maybe the right way to go is to embrace a similar approach to what I did for plpgsql param eval, and let the datatype control what gets generated as the expression execution step. The main point here would be to let the datatype provide the address of a callback function that gets executed for a subscripting step, rather than having it specify the OID of a pg_proc entry to call. There would be two big wins from that: * The callback function would have a plain C call signature, so we would not have to go through FunctionCallN, saving a few cycles. This is attractive because it would pretty much eliminate any concern about this patch making array access slower at execution time. * There would no longer be a wired-in restriction that there be two and only two subscripting execution functions per datatype, since there would not be any need for those functions to be identified in pg_type. Basically, with this approach, a subscriptable data type would need to provide two cataloged support functions: parse, as we have now, and compile. Actual execution functions would be outside that. (Possibly we could merge the support functions into one function that takes an operation code, similar to one of your earlier designs. Not sure that that's better, but it'd be easier to extend in future if we decide we need three support operations...) The two disadvantages I can see of approaching things this way are: * There'd be at least some connection of subscriptable types to expression compilation, which is what Andres was objecting to in the message I cited above. Possibly we could alleviate that a bit by providing helper functions that mask exactly what goes into the expression step structs, but I'm not sure that that gets us far. * We'd not have OIDs of execution functions in the parse trees for subscripting operations, which would mean that we would not have a convenient way to identify subscripting operations that are mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume that subscripting is always immutable and parallel-safe, although I'm slightly more concerned about whether people would want the option to label it leaky vs leakproof. As against that, the approach that's there right now adds planning overhead that wasn't there before for exactly those function property lookups, and again I'm a bit worried about the performance impact. (I did some crude performance tests today that indicated that the existing patch has small but measurable penalties, maybe on the order of 10 percent; and it'd be more by the time we're done because I'm pretty sure you've missed some places that ought to check these function properties if we're going to have them. So I'm afraid that we'll get pushback from people who don't care about extensible subscripts and do care about array performance.) So roughly speaking, I'm imagining that we'd go back to a design similar to one I recall you had at one point, where there's a single SQL-visible subscripting support function per datatype, with a signature like subscript_support(int opcode, internal other_info) returns internal but the opcodes would now be "parse" and "compile". Actual execution would use callback functions that don't have to be SQL-visible. This is closer to the approach we've been using of late for things like AM APIs: to the extent possible, there's just one SQL-registered handler function and all else is a callback. Actually, we could make it *exactly* like that, and have the registered handler give back a struct full of function pointers rather than doing anything much itself. Maybe that's an even better way to go. regards, tom lane