Google's PDF-as-text

2009-11-14 Thread Jim Nagel
when the results of a Google search include a PDF, Google offers to 
display it as plain text.  but it comes out as nonsense in Netsurf 
(r9629): all the lines of text seem to be piled on top of one another.

at least in this example:

http://209.85.229.132/search?q=cache:dr7rk6-FAmcJ:www.mendip.gov.uk/Download.asp%3Fpath%3D%252FDocuments%252FRegeneration%252Fcar%2Bparks%252FGlastonbury%2BCarnival%2BRoad%2BClosure%2BOrder%2B2009.pdf+Glastonbury+carnival+traffic&cd=2&hl=en&ct=clnk&gl=uk&;

-- 
Jim Nagelwww.archivemag.co.uk



Re: Google's PDF-as-text

2009-11-14 Thread Kevin Wells
In message 
  Jim Nagel  wrote:

>when the results of a Google search include a PDF, Google offers to 
>display it as plain text.  but it comes out as nonsense in Netsurf 
>(r9629): all the lines of text seem to be piled on top of one another.
>
>at least in this example:
>
>http://209.85.229.132/search?q=cache:dr7rk6-FAmcJ:www.mendip.gov.uk/Download.asp%3Fpath%3D%252FDocuments%252FRegeneration%252Fcar%2Bparks%252FGlastonbury%2BCarnival%2BRoad%2BClosure%2BOrder%2B2009.pdf+Glastonbury+carnival+traffic&cd=2&hl=en&ct=clnk&gl=uk&;
>

Is the original pdf just an image file served up as a pdf?

As pdf's from the http://whatdotheyknow.com site viewed as html work
fine if they are not purely image files, as these two examples show.



Failed.



Works.



-- 
Kev Wells  http://riscos.kevsoft.co.uk/
http://kevsoft.co.uk/   http://kevsoft.co.uk/AleQuest/
ICQ 238580561
Feeling stupid I know I am.



Re: Google's PDF-as-text

2009-11-14 Thread cj
In article ,
   Jim Nagel  wrote:
> when the results of a Google search include a PDF, Google offers to 
> display it as plain text.  but it comes out as nonsense in Netsurf 
> (r9629): all the lines of text seem to be piled on top of one another.

> at least in this example:

> http://209.85.229.132/search?q=cache:dr7rk6-FAmcJ:www.mendip.gov.uk/Download.asp%3Fpath%3D%252FDocuments%252FRegeneration%252Fcar%2Bparks%252FGlastonbury%2BCarnival%2BRoad%2BClosure%2BOrder%2B2009.pdf+Glastonbury+carnival+traffic&cd=2&hl=en&ct=clnk&gl=uk&;

True, but if you SHIFT-click on the link displayed above the
nonsense, the pdf file downloads and can be read in e.g. !PDF without
any problem.

-- 
Chris Johnson




Re: Google's PDF-as-text

2009-11-14 Thread Michael Bell
In message 
  Jim Nagel  wrote:

> when the results of a Google search include a PDF, Google offers to
> display it as plain text.  but it comes out as nonsense in Netsurf
> (r9629): all the lines of text seem to be piled on top of one another.

> at least in this example:

> http://209.85.229.132/search?q=cache:dr7rk6-FAmcJ:www.mendip.gov.uk/Do
> wnload.asp%3Fpath%3D%252FDocuments%252FRegeneration%252Fcar%2Bparks%25
> 2FGlastonbury%2BCarnival%2BRoad%2BClosure%2BOrder%2B2009.pdf+Glastonbu
> ry+carnival+traffic&cd=2&hl=en&ct=clnk&gl=uk&


Yes, it's always done that.

Michael Bell
-- 



Re: Google's PDF-as-text

2009-11-14 Thread Michael Bell
In message <50ba21fe2ach...@chris-johnson.org.uk>
  cj  wrote:

> In article ,
>Jim Nagel  wrote:
>> when the results of a Google search include a PDF, Google offers to
>> display it as plain text.  but it comes out as nonsense in Netsurf
>> (r9629): all the lines of text seem to be piled on top of one another.

>> at least in this example:

>> http://209.85.229.132/search?q=cache:dr7rk6-FAmcJ:www.mendip.gov.uk/Do
>> wnload.asp%3Fpath%3D%252FDocuments%252FRegeneration%252Fcar%2Bparks%25
>> 2FGlastonbury%2BCarnival%2BRoad%2BClosure%2BOrder%2B2009.pdf+Glastonbu
>> ry+carnival+traffic&cd=2&hl=en&ct=clnk&gl=uk&

> True, but if you SHIFT-click on the link displayed above the
> nonsense, the pdf file downloads and can be read in e.g. !PDF without
> any problem.

I didn't know that. Thank you!

Michael Bell



-- 



Re: Google's PDF-as-text

2009-11-14 Thread Erving
In message  you wrote:

> In message <50ba21fe2ach...@chris-johnson.org.uk>
>   cj  wrote:
> 
> > In article ,
> >Jim Nagel  wrote:
> >> when the results of a Google search include a PDF, Google offers to
> >> display it as plain text.  but it comes out as nonsense in Netsurf
> >> (r9629): all the lines of text seem to be piled on top of one another.
> 
> >> at least in this example:
> 
> >> http://209.85.229.132/search?q=cache:dr7rk6-FAmcJ:www.mendip.gov.uk/Do
> >> wnload.asp%3Fpath%3D%252FDocuments%252FRegeneration%252Fcar%2Bparks%25
> >> 2FGlastonbury%2BCarnival%2BRoad%2BClosure%2BOrder%2B2009.pdf+Glastonbu
> >> ry+carnival+traffic&cd=2&hl=en&ct=clnk&gl=uk&
> 
> > True, but if you SHIFT-click on the link displayed above the
> > nonsense, the pdf file downloads and can be read in e.g. !PDF without
> > any problem.
> 
> I didn't know that. Thank you!
> 
> Michael Bell
> 
> 
> 

I'm puzzeled. whent I click on that link I don't get any 'piling' 
(Netsurf 2.1) and clicking the link at the top of the page 
(not SHIFT-click) downloads the PDF.
But I see with r9640 (26 Aug 2009 11:30) I do get the 'piling'
but still do not need to SHIFT-click to get the PDF.

-- 
Erving



Re: Google's PDF-as-text

2009-11-14 Thread cj
In article <54392fba50.erv...@orpheusnet.co.uk>,
   Erving  wrote:
> I'm puzzeled. whent I click on that link I don't get any 'piling' 
> (Netsurf 2.1) and clicking the link at the top of the page 
> (not SHIFT-click) downloads the PDF.
> But I see with r9640 (26 Aug 2009 11:30) I do get the 'piling'
> but still do not need to SHIFT-click to get the PDF.

Ditto - it's obviously a problem with the recent development versions.

-- 
Chris Johnson




Re: Google's PDF-as-text

2009-11-14 Thread Chris Newman
In article <50ba30204ach...@chris-johnson.org.uk>,
   cj  wrote:
> In article <54392fba50.erv...@orpheusnet.co.uk>,
>Erving  wrote:
> > I'm puzzeled. whent I click on that link I don't get any 'piling' 
> > (Netsurf 2.1) and clicking the link at the top of the page 
> > (not SHIFT-click) downloads the PDF.
> > But I see with r9640 (26 Aug 2009 11:30) I do get the 'piling'
> > but still do not need to SHIFT-click to get the PDF.

> Ditto - it's obviously a problem with the recent development versions.

Also no problem with r8643 (21 Jul 2009) which was the last build before the
memory grabbing problems started.

-- 
Chris