Hi,

when i wget the page "absolute_instrument"  i get a gzipped version of it.

file absolute_instrument
absolute_instrument: gzip compressed data, from Unix

as opposed to the example "-a", which is not gzipped, but plain HTML right
away.

Hence, the former one might look garbled to you, unless you use "gunzip"
first to remove the compression. (If gzip complains about "unknown suffix"
rename it to *.gz).
Then you should get regular HTML.

Here's an example on how to remove gzip in Java:

http://code.hammerpig.com/how-to-gunzip-files-with-java.html

I am not sure however how the server-side decides whether to compress it or
not.
Hope that helps anyways,

Daniel

On Fri, Jul 29, 2011 at 2:58 PM, Matthew Pocock <
[email protected]> wrote:

> Hi,
>
> I've been pulling down pages from wiktionary in a Java application. The
> majority of pages seem to work fine (e.g.
> http://en.wiktionary.org//wiki/-a).
> I can load them in Java, and if I wget them, I end up with a file
> containing
> what I'd expect.
>
> However, some pages seem not to work (e.g.
> http://en.wiktionary.org/wiki/absolute_instrument). In Java, I get a codec
> exception and when using wget, the resulting downloaded file is garbled. I
> think this is because although they claim to be UTF-8 encoded, they are
> not.
> These pages show up fine in my browser, but it isn't telling me what
> charset
> it uses to decode the text.
>
> Is this a known issue? Is there any workaround for this? Can it be fixed
> server-side?
>
> Thanks,
>
> Matthew
>
> --
> Dr Matthew Pocock
> Visitor, School of Computing Science, Newcastle University
> mailto: [email protected]
> gchat: [email protected]
> msn: [email protected]
> irc.freenode.net: drdozer
> tel: (0191) 2566550
> mob: +447535664143
> _______________________________________________
> Wiktionary-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiktionary-l
>



-- 
--
Daniel Zahn <[email protected] <[email protected]>>
_______________________________________________
Wiktionary-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiktionary-l

Reply via email to