Kjell
Almost certainly the HTML files will not contain the code for the actual pictures; they will just contain an ‘href’ node with the address to load the picture file from. If the web pages are built to a regular pattern, you should be able to parse them and locate the href nodes you want. I haven’t found any problem with the parse from XMLHTMLParser taking up too much memory. My machine has 4GB ram; if you have much less than that, you might have trouble. If you have found a systematic way to locate the picture file, you could minimise the size of the DOM the parser creates, by using a streaming parser. The streaming version of Monty’s parser is called StAXHTMLParser. I have a bit of experience playing with these parsers. If you get stuck, ask again here with more details; I may be able to help. Peter Kenny From: Pharo-users [mailto:pharo-users-boun...@lists.pharo.org] On Behalf Of Kjell Godo Sent: 08 November 2017 23:00 To: Any question about pharo is welcome <pharo-users@lists.pharo.org> Subject: Re: [Pharo-users] Soup bug(fix) i like to collect some newspaper comics from an online newspaper but it takes really long to do it by hand by hand i tried Soup but i didn’t get anywhere the pictures were hidden behind a script or something is there anything to do about that? i don’t want to collect them all i have the XPath .pdf but i haven’t read it yet these browsers seem to gobble up memory and while open they just keep getting bigger till the OS session crash might there be a browser that is more minimal? Vivaldi seems better at not bloating up RAM