Responding on top

Jacque's method only gets us a  list, not an array, so one ends up having to 
write more code to parse the list anyway, your method is more efficient.

"not comfortable with RegEx"  Ha,, right. but it worth the effort to keep the 
little grey cells green! I will have to study the regEx… things like ?ms
are "brand new" to me.


re: extracting the head first: I was under the impression your repeat loop 
would have to work through the entire text of _HTML unnecessarily and that 
extracting the heads would reduce processing time. OTOH, Andre tells me that 
for this kind of operation, even cell phones have CPU's that are more powerful 
than some desktop machines and so perhaps the time to loop through the entire 
html source is too trivial to consider at all.

Thanks for the effort you put into this. We are adding OG tags to all the media 
on our web site (eventually) and our apps will need to parse that out in 
various contexts.

BR



 

On 8/1/17, 10:07 PM, "use-livecode on behalf of Thierry Douez via use-livecode" 
<use-livecode-boun...@lists.runrev.com on behalf of 
use-livecode@lists.runrev.com> wrote:

    2017-08-02 6:45 GMT+02:00 Sannyasin Brahmanathaswami:
    
    
    ​Hi Brahmanathaswami,
    ​
    
    Thanks Thierry
    >
    > though I'm yet sure when using regEx this is better than using Jacque's
    > method
    >
    
    ​That's 2 different ways..
    but with the regex one, you have the exact key and value of each tags,
    nothing more to do.​
    
    
    Either way it would seem prudent to extract the head first before processing
    >
    
    ​Mmm, don't really see why, but I've added a line of code for this too
    below.
    
    ​
    
    >
    > Using jacques method just gets the list..
    
    and we need to do more coding to get the array we need.
    >
    > But your method can only handle 1 tag.
    >
    
    
    ​I was aware of that but didn't know what you want to achieve, therefore I
    leave it for the reader.
    However this has nothing to do with the regex but with the code inside the
    repeat loop.
    
    
    Here is another way to do it, changing only *1* line of code inside the loop
    with the same regex as before:
    
    
    
      -- to please BR wishes, but not necessary
      -- erase everything after </head>
       put replaceText( _Html, "(?ms)</head>.*?$", empty) into _Html
    
       repeat while matchChunk( _Html, Rx, p1,p2,p3,p4 )
          put  char p1 to p2 of _Html & tab& char p3 to p4 of _Html  &cr after
    Rslt
          delete char 1 to p4 of _Html
       end repeat
       delete last char of Rslt -- extra cr
    
       put Rslt into fld 1
       answer "Got " & the number of lines of Rslt & " og: meta tags!"
    
    
    Building a multi-dimensionnal array after the extraction,
    a bit more work inside the repeat loop will be needed,
    but  the extraction part is still valid.
    ​
    
    ​
    
    Finally, if you are not at ease with regex, go with Jacque's way and
    everything will be fine.
    There are fundamentally not much differences in between the 2 ways.
    
    
    Kind regards,
    
    Thierry
    
    
    
    
    
    
    > On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez wrote:
    >
    >     So, here is the code:
    >
    >        local Rx, Rslt, _Html, OG
    >
    >        put empty into Rslt
    >        put URL "https://www.youtube.com/user/kauaiaadheenam"; into _Html
    >
    >        get
    >     "(?ms)<meta\s+property=\x{22}og:(.+?)\x{22}\s+content=\x{
    > 22}(.+?)\x{22}>"
    >        put IT into Rx
    >
    >        repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 )
    >           put  char p3 to p4 of _Html  into OG[  char p1 to p2 of _Html ]
    >           delete char 1 to p4 of _Html
    >        end repeat
    >
    >
    >
    >     and you can test it this way:
    >
    >        combine OG using return and ":"
    >        put OG into fld 1
    >
    >
    >
    >     HTH and feel free to ask any question...
    >
    >     Kind regards,
    >
    >     Thierry
    >
    
    
    -- 
    ------------------------------------------------
    Thierry Douez - sunny-tdz.com
    sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage
    _______________________________________________
    use-livecode mailing list
    use-livecode@lists.runrev.com
    Please visit this url to subscribe, unsubscribe and manage your 
subscription preferences:
    http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to