Re: [Pharo-users] ZnClient GET, but just the content of the tag?

Sven Van Caekenberghe Sat, 26 Nov 2016 10:19:59 -0800

Paul,

> On 26 Nov 2016, at 18:31, PAUL DEBRUICKER <pdebr...@gmail.com> wrote:
> 
> This is a micro optimization if there ever was one but I wondered if it was 
> possible to stop downloading and get the entity once the </head> tag has been 
> received.  
> 
> Right now I download the whole page, parse it with Soup, then extract the 
> tags I want from the head.  Which works fine.  e.g.
> 
> head:=((Soup fromString: (ZnEasy get: 'http://pharo.org') entity)
>                               findChildTag: 'html') findChildTag: 'head'.


This would only be useful for large pages. Dealing with the content of 
resources (like parsing HTML) is outside the scope of Zinc. However, I can help 
you get started.

What you want to do is use streaming. That gives you access to the content of a 
resource using a direct stream, so you could decide to stop reading (but then 
you have to close the connection, else you need to read everything anyway).

Start by having a look at ZnClient>>#downloadTo: and ZnStreamingEntity. What 
you want to do is more or less the following.

ZnClient new
  url: 'http://pharo.org';
  streaming: true;
  get.

At this point, the request is done, the response is in, but the entity of the 
response is not yet read. When you ask for the entity, you get a 
ZnStreamingEntity which holds the stream that you then have to read from. You 
can check the response (and its header) for meta info.

Your next challenge then is to process this stream so that you can parse it in 
a real streaming fashion. I don't know if Soup can do this.

Sven

Re: [Pharo-users] ZnClient GET, but just the content of the tag?

Reply via email to