Re: Ltree syntax improvement

Dmitry Belyavsky Sun, 24 Feb 2019 03:32:48 -0800

Dear Nikolay,

On Wed, Feb 20, 2019 at 12:28 PM Nikolay Shaplov <dh...@nataraj.su> wrote:


> В письме от вторник, 29 января 2019 г. 20:43:07 MSK пользователь Dmitry
> Belyavsky написал:
> > Dear all,
> >
> > Please find attached the patch extending the sets of symbols allowed in
> > ltree labels. The patch introduces 2 variants of escaping symbols, via
> > backslashing separate symbols and via quoting the labels in whole for
> > ltree, lquery and ltxtquery datatypes.
>
> Sorry I still did not do full revision, I've stumbled up on other things
> in
> the code, which, as I think should be fixed before final checking through.
> (I
> guess that code do what is expected, since it passes tests and so on, it
> needs
> recheck, but I'd like to do this full recheck only once)
>
> So my doubts about this code is that a lot parts of code is just a
> copy-paste
> of the same code that was written right above. And it can be rewritten in
> a
> way that the same lines (or parts of lines) is repeated only once...
> This should make code more readable, and make code more similar to the
> code of
> postgres core, I've seen.
>
> Here I will speak about lquery_in function from ltree_io.c.
>
> 1. May be it is better here to switch from else if (state ==
> LQPRS_[something]) to switch (stat) { case LQPRS_[something]: }
> because because here the main case is to determine what is the current
> state
> of your state machine, do something with it, and then go to the next step.
>
> Now after you've done what should be done in this step, you should make
> sure
> that no other code is executed writing more ifs... If you use switch/case
> you
> will be able to say break wherever you like, it will save you some ifs.
>

I thought about it writing the code. But it significantly increased the
size of the patch
so I decided to follow the original coding style here.


>
> Theoretical example
>
> (cl is for character length fc is for fist char)
>
> if (cl==1 && fc =='1') {do_case1();}
> else if (cl==1 && fc =='2') {do_case2();}
> else if (cl==1 && fc =='3') {do_case3();}
> else {do_otherwise();}
>
> If it is wrapped in switch/case it will be like
>
> if (cl==1)
> {
>   if (fc =='1') {do_case1(); break;}
>   if (fc =='2') {do_case2(); break;}
>   if (fc =='3') {do_case3(); break;}
> }
> /*if nothing is found */
> do_otherwise();
>
> This for example will allow you to get rid of multiply charlen == 1 checks
> in
>
>         else if ((lptr->flag & LVAR_QUOTEDPART) == 0)
>
>
>             {
>
>
>                 if (charlen == 1 && t_iseq(ptr, '@'))
>
>
>                 {
>
>
>                     if (lptr->start == ptr)
>
>
>                         UNCHAR;
>
>
>                     lptr->flag |= LVAR_INCASE;
>
>
>                     curqlevel->flag |= LVAR_INCASE;
>
>
>                 }
>
>
>                 else if (charlen == 1 && t_iseq(ptr, '*'))
>
>
>                 {
>
>
>                     if (lptr->start == ptr)
>
>
>                         UNCHAR;
>
>
>                     lptr->flag |= LVAR_ANYEND;
>
>
>                     curqlevel->flag |= LVAR_ANYEND;
>
>
>                 }
>
>
>                 else if (charlen == 1 && t_iseq(ptr, '%'))
>
>
>                 {
>
>
>                     if (lptr->start == ptr)
>
>
>                         UNCHAR;
>
>
>                     lptr->flag |= LVAR_SUBLEXEME;
>
>
>                     curqlevel->flag |= LVAR_SUBLEXEME;
>
>
>                 }
>
>
>                 else if (charlen == 1 && t_iseq(ptr, '|'))
>
>
>                 {
>
>
>                     real_nodeitem_len(lptr, ptr, escaped_count,
> tail_space_bytes, tail_space_symbols);
>
>
>                     if (state == LQPRS_WAITDELIMSTRICT)
>
>
>                         adjust_quoted_nodeitem(lptr);
>
>
>
>                     check_level_length(lptr, pos);
>
>
>                     state = LQPRS_WAITVAR;
>
>
>                 }
>
>
>                 else if (charlen == 1 && t_iseq(ptr, '.'))
>
>
>                 {
>
>
>                     real_nodeitem_len(lptr, ptr, escaped_count,
> tail_space_bytes, tail_space_symbols);
>
>
>                     if (state == LQPRS_WAITDELIMSTRICT)
>
>
>                         adjust_quoted_nodeitem(lptr);
>
>
>                     check_level_length(lptr, pos);
>
>
>
>                     state = LQPRS_WAITLEVEL;
>
>
>                     curqlevel = NEXTLEV(curqlevel);
>
>
>                 }
>
>
>                 else if (charlen == 1 && t_iseq(ptr, '\\'))
>
>
>                 {
>
>
>                     if (state == LQPRS_WAITDELIMSTRICT)
>
>
>                         UNCHAR;
>
>
>                     state = LQPRS_WAITESCAPED;
>
>
>                 }
>
>
>                 else
>
>
>                 {
>
>
>                     if (charlen == 1 && strchr("!{}", *ptr))
>
>
>                         UNCHAR;
>
>
>                     if (state == LQPRS_WAITDELIMSTRICT)
>
>
>                     {
>
>
>                         if (t_isspace(ptr))
>
>
>                         {
>
>
>                             ptr += charlen;
>
>
>                             pos++;
>
>
>                             tail_space_bytes += charlen;
>
>
>                             tail_space_symbols = 1;
>
>
>                             continue;
>
>
>                         }
>
>
>
>                         UNCHAR;
>
>
>                     }
>
>
>                     if (lptr->flag & ~LVAR_QUOTEDPART)
>
>
>                         UNCHAR;
>
>
>                 }
>
>
>             }
>

Yes, this piece of code could be converted to the form you suggest, but
only partially, because the final else includes a strchr() call that,
being rewritten to the 'case' syntax will look ugly enough.


>
> There are a lot of them, I would suggest to reduce there number to one
> check
> in the right place.
>
> Similar thing is about "GETVAR(curqlevel) = lptr = (nodeitem *)
> palloc0(sizeof(nodeitem) * numOR);" in this part of code.
>
>             if (charlen == 1)
>
>
>             {
>
>
>                 if (t_iseq(ptr, '!'))
>
>
>                 {
>
>
>                     GETVAR(curqlevel) = lptr = (nodeitem *)
> palloc0(sizeof(nodeitem) * numOR);
>
>                     lptr->start = ptr + 1;
>
>
>                     state = LQPRS_WAITDELIM;
>
>
>                     curqlevel->numvar = 1;
>
>
>                     curqlevel->flag |= LQL_NOT;
>
>
>                     hasnot = true;
>
>
>                 }
>
>
>                 else if (t_iseq(ptr, '*'))
>
>
>                     state = LQPRS_WAITOPEN;
>
>
>                 else if (t_iseq(ptr, '\\'))
>
>
>                 {
>
>
>                     GETVAR(curqlevel) = lptr = (nodeitem *)
> palloc0(sizeof(nodeitem) * numOR);
>
>                     lptr->start = ptr;
>
>
>                     curqlevel->numvar = 1;
>
>
>                     state = LQPRS_WAITESCAPED;
>
>
>                 }
>
>
>                 else if (strchr(".|@%{}", *ptr))
>
>
>                 {
>
>
>                     UNCHAR;
>
>
>                 }
>
>
>                 else
>
>
>                 {
>
>
>                     GETVAR(curqlevel) = lptr = (nodeitem *)
> palloc0(sizeof(nodeitem) * numOR);
>
>                     lptr->start = ptr;
>
>
>                     state = LQPRS_WAITDELIM;
>
>
>                     curqlevel->numvar = 1;
>
>
>                     if (t_iseq(ptr, '"'))
>
>
>                     {
>
>
>                         lptr->flag |= LVAR_QUOTEDPART;
>
>
>                     }
>
>
>                 }
>
>
>             }
>             else
>
>
>             {
>
>
>                 GETVAR(curqlevel) = lptr = (nodeitem *)
> palloc0(sizeof(nodeitem) * numOR);
>
>                 lptr->start = ptr;
>
>
>                 state = LQPRS_WAITDELIM;
>
>
>                 curqlevel->numvar = 1;
>
>
>             }
>
> It is repeated almost is every branch of this if-subtree... Can it be
> rewritten the way it is repeated only once?
>

Well, I see the only one so long sequence of 'if'.

>
> And so on.
>
> So my request is to reduce multiply repetition of the same code...
>
> PS. One other thing that I've been thinking over but did not came to final
> conclusion...
>
> It is often quite often case
>
> if (t_iseq(ptr, 'a'){same_code(); code_for_a();}
> else if (t_iseq(ptr, 'b'){same_code(); code_for_b();}
> else if (t_iseq(ptr, 'c'){same_code(); code_for_c);}
> else something_else();
>
> I've been thinking if it is possible to move this to
>
> const unsigned char c=TOUCHAR(ptr);
> switch(c)
> {
>   case 'a':
>   case 'b':
>   case 'c':
>     same_code();
>     if (c=='a') {code_for_a();}
>     if (c=='b') {code_for_b();}
>     if (c=='c') {code_for_c();}
>     break;
>   default:
>    something_else();
> }
>
> I did not still get, how valid is replacement of t_iseq() with TOUCHAR() +
> "==". It is quite equivalent now, but I do not know how the core code is
> going
> to evolve, so can't get final understanding. I even tried to discuss it
> with
> Teodor (you've seen it) but still did come to any conclusion.
>

I agree with Teodor here. We do not know about various charsets so
the current syntax seems to be more safe.

>
> So for now status of this idea is "my considerations about ways to
> deduplicate
> the code".
>
> So this is all essential things I can say about this patch as it is now.
> May
> be somebody can add something more...
>
>

-- 
SY, Dmitry Belyavsky

Re: Ltree syntax improvement

Reply via email to