2017-08-02 17:54 GMT+02:00 Sannyasin Brahmanathaswami via use-livecode < use-livecode@lists.runrev.com>:
> Responding on top > > Jacque's method only gets us a list, not an array, so one ends up having > to write more code to parse the list anyway, your method is more efficient. > > "not comfortable with RegEx" Ha,, right. but it worth the effort to keep > the little grey cells green! I will have to study the regEx… things like ?ms > are "brand new" to me. > So, you win your first Regex training :) (?ms) are regex options. m means multi-lines s means the dot ( '.' ) could also match a return/cr/lf char. > > > re: extracting the head first: I was under the impression your repeat loop > would have to work through the entire text of _HTML unnecessarily and that > extracting the heads would reduce processing time. Well, you are right: but only when the regex will try to match after the last valid pattern. What is most costly is the delete inside the loop; so working only with the <head>...</head> of your html might be more efficient in this case. But this is more a LC thing. > OTOH, Andre tells me that for this kind of operation, even cell phones > have CPU's that are more powerful than some desktop machines and so perhaps > the time to loop through the entire html source is too trivial to consider > at all. > Yep, as I said, only after the last match, the regex will loop through the end of the html and only one time. About quality concerns, restricting the regex to the <head> part is a good idea as you never know what could be some html in the future... > > Thanks for the effort you put into this. You're welcome. Kind regards, Thierry We are adding OG tags to all the media on our web site (eventually) and our > apps will need to parse that out in various contexts. > > BR > > > > > > On 8/1/17, 10:07 PM, "use-livecode on behalf of Thierry Douez via > use-livecode" <use-livecode-boun...@lists.runrev.com on behalf of > use-livecode@lists.runrev.com> wrote: > > 2017-08-02 6:45 GMT+02:00 Sannyasin Brahmanathaswami: > > > Hi Brahmanathaswami, > > > Thanks Thierry > > > > though I'm yet sure when using regEx this is better than using > Jacque's > > method > > > > That's 2 different ways.. > but with the regex one, you have the exact key and value of each tags, > nothing more to do. > > > Either way it would seem prudent to extract the head first before > processing > > > > Mmm, don't really see why, but I've added a line of code for this too > below. > > > > > > > Using jacques method just gets the list.. > > and we need to do more coding to get the array we need. > > > > But your method can only handle 1 tag. > > > > > I was aware of that but didn't know what you want to achieve, > therefore I > leave it for the reader. > However this has nothing to do with the regex but with the code inside > the > repeat loop. > > > Here is another way to do it, changing only *1* line of code inside > the loop > with the same regex as before: > > > > -- to please BR wishes, but not necessary > -- erase everything after </head> > put replaceText( _Html, "(?ms)</head>.*?$", empty) into _Html > > repeat while matchChunk( _Html, Rx, p1,p2,p3,p4 ) > put char p1 to p2 of _Html & tab& char p3 to p4 of _Html &cr > after > Rslt > delete char 1 to p4 of _Html > end repeat > delete last char of Rslt -- extra cr > > put Rslt into fld 1 > answer "Got " & the number of lines of Rslt & " og: meta tags!" > > > Building a multi-dimensionnal array after the extraction, > a bit more work inside the repeat loop will be needed, > but the extraction part is still valid. > > > > > Finally, if you are not at ease with regex, go with Jacque's way and > everything will be fine. > There are fundamentally not much differences in between the 2 ways. > > > Kind regards, > > Thierry > > > > > > > > On 7/31/17, 12:31 AM, "use-livecode on behalf of Thierry Douez wrote: > > > > So, here is the code: > > > > local Rx, Rslt, _Html, OG > > > > put empty into Rslt > > put URL "https://www.youtube.com/user/kauaiaadheenam" into > _Html > > > > get > > "(?ms)<meta\s+property=\x{22}og:(.+?)\x{22}\s+content=\x{ > > 22}(.+?)\x{22}>" > > put IT into Rx > > > > repeat while matchChunk( _Html, Rx,p1,p2,p3,p4 ) > > put char p3 to p4 of _Html into OG[ char p1 to p2 of > _Html ] > > delete char 1 to p4 of _Html > > end repeat > > > > > > > > and you can test it this way: > > > > combine OG using return and ":" > > put OG into fld 1 > > > > > > > > HTH and feel free to ask any question... > > > > Kind regards, > > > > Thierry > > > > > -- > ------------------------------------------------ > Thierry Douez - sunny-tdz.com > sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > > _______________________________________________ > use-livecode mailing list > use-livecode@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-livecode > -- ------------------------------------------------ Thierry Douez - sunny-tdz.com sunnYrex - sunnYtext2speech - sunnYperl - sunnYmidi - sunnYmage _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode