subject:"Having strange result on processing UTF\-8 file"

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Tim via users

On Sat, 2021-12-18 at 20:40 -0500, Tom Horsley wrote: > Just a (possibly) relevant note: I've seen many html pages with > headers claiming they are UTF-8, but text that only displays > correctly if you treat them as one of the windows code pages. > > Worse yet, some browsers have heuristics to det

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Ed Greshko

On 19/12/2021 09:50, Michael D. Setzer II wrote: %10.10s and %20.20s both would cause the problem. I believe those are both printf format indicators. Which is why I was wondering if converting to plain text would be better because those would be removed (dealt with) during the convert. -- Did

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Michael D. Setzer II via users

On 19 Dec 2021 at 9:14, Ed Greshko wrote: From: Ed Greshko Date sent: Sun, 19 Dec 2021 09:14:37 +0800 Subject:Re: Having strange result on processing UTF-8 file To: "Michael D. Setzer II" , Community support for Fedora users Send reply to: Community s

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Tom Horsley

On Sun, 19 Dec 2021 09:14:37 +0800 Ed Greshko wrote: > Does it contain html? Just a (possibly) relevant note: I've seen many html pages with headers claiming they are UTF-8, but text that only displays correctly if you treat them as one of the windows code pages. Worse yet, some browsers have he

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Ed Greshko

On 19/12/2021 08:31, Michael D. Setzer II wrote: But could change if they add more or remove some currently 633 records. Some lines in the file are over 25000 characters?? Total download is about 13M. The actual lines I need for the data are just 256K, so it has lots of junk (stuff I don't nee

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Michael D. Setzer II via users

On 19 Dec 2021 at 7:54, Ed Greshko wrote: From: Ed Greshko Date sent: Sun, 19 Dec 2021 07:54:31 +0800 Subject:Re: Having strange result on processing UTF-8 file To: users@lists.fedoraproject.org Send reply to: Community

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Ed Greshko

On 19/12/2021 02:15, Michael D. Setzer II via users wrote: Download 64 web pages into a single file using wget2. That is fine. One more thing. The single file you get is an html formatted file, yes? For the results that you want, and how you want to use it, do you really want html? If n

Re: Having strange result on processing UTF-8 file

2021-12-18 Thread Ed Greshko

On 19/12/2021 02:15, Michael D. Setzer II via users wrote: $ ./findnoascii2 allraw.uog Think this is the issue, but no ideal how to fix it. $ file allraw.uog.out allraw.uog.out: Non-ISO extended-ASCII text I assume findnoascii2 iswritten by you? Without knowing what it does (source), I think

Having strange result on processing UTF-8 file

2021-12-18 Thread Michael D. Setzer II via users

I've spent a number of hours trying all kinds of things I've found on web, but not getting anywhere. Probable something simple. Download 64 web pages into a single file using wget2. That is fine. file allraw.uog allraw.uog: HTML document, UTF-8 Unicode text, with very long lines File is about 13M

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Re: Having strange result on processing UTF-8 file

Having strange result on processing UTF-8 file

9 matches

Site Navigation

Mail list logo

Footer information