Thanks. I really appreciate everyone's help on this. Was at a high level of frustration the other day.
monty-3 wrote > You could use XMLHTMLParser from STHub PharoExtras/XMLParserHTML > (supported on Pharo, Squak, and GS): > > descriptions := OrderedCollection new. > (XMLHTMLParser parseURL: aURL) > allElementsNamed: 'meta' > do: [:each | > ((each attributeAt: 'name') asLowercase = 'description' > or: [(each attributeAt: 'http-equiv') asLowercase = > 'description']) > ifTrue: [descriptions addLast: (each attributeAt: > 'content')]]. > > it accepts messy HTML and produces an XML DOM tree from it. > >> Sent: Thursday, March 30, 2017 at 1:58 PM >> From: "PAUL DEBRUICKER" < > pdebruic@ > > >> To: "Any question about pharo is welcome" < > pharo-users@.pharo > > >> Subject: [Pharo-users] PetitParser question parsing HTML meta tags >> >> This is kind of a "I'm tired of thinking about this and not making much >> progress for the amount of time I'm putting in question" but here it is: >> >> >> >> I'm trying to parse descriptions from HTML meta elements. I can't use >> Soup because there isn't a working GemStone port. >> >> I've got it to work with the structure: >> >> > <meta name="description" content="my description"> >> >> and >> >> > <meta name="Description" content="my description"> >> >> >> but I'm running into instances of: >> >> > <meta http-equiv="description" content="my description"> >> >> and >> >> > <meta http-equiv="Description" content="my description"> >> >> >> and am having trouble adapting my parsing code (such as it is). >> >> >> The parsing code that addresses the first two cases is: >> >> >> >> parseHtmlPageForDescription: htmlString >> | startParser endParser ppStream descParser result text lower str >> doubleQuoteIndex | >> lower := 'escription' asParser. >> startParser := ' > <meta name=' asParser , #'any' asParser , #'any' asParser. >> > endParser := '>' asParser. >> ppStream := htmlString readStream asPetitStream. >> descParser := ((#'any' asParser starLazy: startParser , lower) >> , (#'any' asParser starLazy: endParser)) ==> #'second'. >> result := descParser parse: ppStream. >> text := (result >> inject: (WriteStream on: String new) >> into: [ :stream :char | >> stream nextPut: char. >> stream ]) >> contents trimBoth. >> str := text copyFrom: (text findString: 'content=') + 9 to: text size. >> doubleQuoteIndex := 8 - ((str last: 7) indexOf: $"). >> ^ str copyFrom: 1 to: str size - doubleQuoteIndex >> >> >> I can't figure out how to change the startParser parser to accept the >> second idiom. And maybe there's a better approach altogether. Anyway. >> If anyone has any ideas on different approaches I'd appreciate learning >> them. >> >> >> Thanks for giving it some thought >> >> Paul >> -- View this message in context: http://forum.world.st/PetitParser-question-parsing-HTML-meta-tags-tp4940587p4941367.html Sent from the Pharo Smalltalk Users mailing list archive at Nabble.com.