Dear Nikolay, On Wed, Feb 20, 2019 at 12:28 PM Nikolay Shaplov <dh...@nataraj.su> wrote:
> В письме от вторник, 29 января 2019 г. 20:43:07 MSK пользователь Dmitry > Belyavsky написал: > > Dear all, > > > > Please find attached the patch extending the sets of symbols allowed in > > ltree labels. The patch introduces 2 variants of escaping symbols, via > > backslashing separate symbols and via quoting the labels in whole for > > ltree, lquery and ltxtquery datatypes. > > Sorry I still did not do full revision, I've stumbled up on other things > in > the code, which, as I think should be fixed before final checking through. > (I > guess that code do what is expected, since it passes tests and so on, it > needs > recheck, but I'd like to do this full recheck only once) > > So my doubts about this code is that a lot parts of code is just a > copy-paste > of the same code that was written right above. And it can be rewritten in > a > way that the same lines (or parts of lines) is repeated only once... > This should make code more readable, and make code more similar to the > code of > postgres core, I've seen. > > Here I will speak about lquery_in function from ltree_io.c. > > 1. May be it is better here to switch from else if (state == > LQPRS_[something]) to switch (stat) { case LQPRS_[something]: } > because because here the main case is to determine what is the current > state > of your state machine, do something with it, and then go to the next step. > > Now after you've done what should be done in this step, you should make > sure > that no other code is executed writing more ifs... If you use switch/case > you > will be able to say break wherever you like, it will save you some ifs. > I thought about it writing the code. But it significantly increased the size of the patch so I decided to follow the original coding style here. > > Theoretical example > > (cl is for character length fc is for fist char) > > if (cl==1 && fc =='1') {do_case1();} > else if (cl==1 && fc =='2') {do_case2();} > else if (cl==1 && fc =='3') {do_case3();} > else {do_otherwise();} > > If it is wrapped in switch/case it will be like > > if (cl==1) > { > if (fc =='1') {do_case1(); break;} > if (fc =='2') {do_case2(); break;} > if (fc =='3') {do_case3(); break;} > } > /*if nothing is found */ > do_otherwise(); > > This for example will allow you to get rid of multiply charlen == 1 checks > in > > else if ((lptr->flag & LVAR_QUOTEDPART) == 0) > > > { > > > if (charlen == 1 && t_iseq(ptr, '@')) > > > { > > > if (lptr->start == ptr) > > > UNCHAR; > > > lptr->flag |= LVAR_INCASE; > > > curqlevel->flag |= LVAR_INCASE; > > > } > > > else if (charlen == 1 && t_iseq(ptr, '*')) > > > { > > > if (lptr->start == ptr) > > > UNCHAR; > > > lptr->flag |= LVAR_ANYEND; > > > curqlevel->flag |= LVAR_ANYEND; > > > } > > > else if (charlen == 1 && t_iseq(ptr, '%')) > > > { > > > if (lptr->start == ptr) > > > UNCHAR; > > > lptr->flag |= LVAR_SUBLEXEME; > > > curqlevel->flag |= LVAR_SUBLEXEME; > > > } > > > else if (charlen == 1 && t_iseq(ptr, '|')) > > > { > > > real_nodeitem_len(lptr, ptr, escaped_count, > tail_space_bytes, tail_space_symbols); > > > if (state == LQPRS_WAITDELIMSTRICT) > > > adjust_quoted_nodeitem(lptr); > > > > check_level_length(lptr, pos); > > > state = LQPRS_WAITVAR; > > > } > > > else if (charlen == 1 && t_iseq(ptr, '.')) > > > { > > > real_nodeitem_len(lptr, ptr, escaped_count, > tail_space_bytes, tail_space_symbols); > > > if (state == LQPRS_WAITDELIMSTRICT) > > > adjust_quoted_nodeitem(lptr); > > > check_level_length(lptr, pos); > > > > state = LQPRS_WAITLEVEL; > > > curqlevel = NEXTLEV(curqlevel); > > > } > > > else if (charlen == 1 && t_iseq(ptr, '\\')) > > > { > > > if (state == LQPRS_WAITDELIMSTRICT) > > > UNCHAR; > > > state = LQPRS_WAITESCAPED; > > > } > > > else > > > { > > > if (charlen == 1 && strchr("!{}", *ptr)) > > > UNCHAR; > > > if (state == LQPRS_WAITDELIMSTRICT) > > > { > > > if (t_isspace(ptr)) > > > { > > > ptr += charlen; > > > pos++; > > > tail_space_bytes += charlen; > > > tail_space_symbols = 1; > > > continue; > > > } > > > > UNCHAR; > > > } > > > if (lptr->flag & ~LVAR_QUOTEDPART) > > > UNCHAR; > > > } > > > } > Yes, this piece of code could be converted to the form you suggest, but only partially, because the final else includes a strchr() call that, being rewritten to the 'case' syntax will look ugly enough. > > There are a lot of them, I would suggest to reduce there number to one > check > in the right place. > > Similar thing is about "GETVAR(curqlevel) = lptr = (nodeitem *) > palloc0(sizeof(nodeitem) * numOR);" in this part of code. > > if (charlen == 1) > > > { > > > if (t_iseq(ptr, '!')) > > > { > > > GETVAR(curqlevel) = lptr = (nodeitem *) > palloc0(sizeof(nodeitem) * numOR); > > lptr->start = ptr + 1; > > > state = LQPRS_WAITDELIM; > > > curqlevel->numvar = 1; > > > curqlevel->flag |= LQL_NOT; > > > hasnot = true; > > > } > > > else if (t_iseq(ptr, '*')) > > > state = LQPRS_WAITOPEN; > > > else if (t_iseq(ptr, '\\')) > > > { > > > GETVAR(curqlevel) = lptr = (nodeitem *) > palloc0(sizeof(nodeitem) * numOR); > > lptr->start = ptr; > > > curqlevel->numvar = 1; > > > state = LQPRS_WAITESCAPED; > > > } > > > else if (strchr(".|@%{}", *ptr)) > > > { > > > UNCHAR; > > > } > > > else > > > { > > > GETVAR(curqlevel) = lptr = (nodeitem *) > palloc0(sizeof(nodeitem) * numOR); > > lptr->start = ptr; > > > state = LQPRS_WAITDELIM; > > > curqlevel->numvar = 1; > > > if (t_iseq(ptr, '"')) > > > { > > > lptr->flag |= LVAR_QUOTEDPART; > > > } > > > } > > > } > else > > > { > > > GETVAR(curqlevel) = lptr = (nodeitem *) > palloc0(sizeof(nodeitem) * numOR); > > lptr->start = ptr; > > > state = LQPRS_WAITDELIM; > > > curqlevel->numvar = 1; > > > } > > It is repeated almost is every branch of this if-subtree... Can it be > rewritten the way it is repeated only once? > Well, I see the only one so long sequence of 'if'. > > And so on. > > So my request is to reduce multiply repetition of the same code... > > PS. One other thing that I've been thinking over but did not came to final > conclusion... > > It is often quite often case > > if (t_iseq(ptr, 'a'){same_code(); code_for_a();} > else if (t_iseq(ptr, 'b'){same_code(); code_for_b();} > else if (t_iseq(ptr, 'c'){same_code(); code_for_c);} > else something_else(); > > I've been thinking if it is possible to move this to > > const unsigned char c=TOUCHAR(ptr); > switch(c) > { > case 'a': > case 'b': > case 'c': > same_code(); > if (c=='a') {code_for_a();} > if (c=='b') {code_for_b();} > if (c=='c') {code_for_c();} > break; > default: > something_else(); > } > > I did not still get, how valid is replacement of t_iseq() with TOUCHAR() + > "==". It is quite equivalent now, but I do not know how the core code is > going > to evolve, so can't get final understanding. I even tried to discuss it > with > Teodor (you've seen it) but still did come to any conclusion. > I agree with Teodor here. We do not know about various charsets so the current syntax seems to be more safe. > > So for now status of this idea is "my considerations about ways to > deduplicate > the code". > > So this is all essential things I can say about this patch as it is now. > May > be somebody can add something more... > > -- SY, Dmitry Belyavsky