I have written and attached an example that compares bufio.Reader and 
bufio.Scanner.
Here's the output from `go run .` (a line count followed by the first error 
encountered):
```
Reader 1333665 <nil>
Scanner 777758 bufio.Scanner: token too long
```
This probably _won't_ fail on your 2M line file; it looks like the problem 
is with the line length of a Debian Packages file. If you have a 
Debian-derived distro you could try replacing the filename in the file with 
one from `/var/lib/apt/lists/`.

The docs for bufio.Scanner do say
"Programs that need more control over error handling or large tokens, or 
must run sequential scans on a reader, should use bufio.Reader instead"
Perhaps it would be more helpful to mention what the token length limit is?

On Thursday, October 12, 2023 at 9:45:10 AM UTC+1 Rob Pike wrote:

> I just did a simple test with a 2M line file and it worked fine, so I 
> suspect it's a bug in your code. But if not, please provide a complete 
> working executable example, with data, to help identify the problem.
>
> -rob
>
>
> On Thu, Oct 12, 2023 at 7:39 PM 'Mark' via golang-nuts <
> golan...@googlegroups.com> wrote:
>
>> I'm reading Debian *Package files, some of which are over 1M lines long.
>> I used bufio.Scanner and found that it won't read past 1M lines (I'm 
>> using Go 1.21.1 linux/amd64).
>> Is this a limitation of bufio.Scanner? If so then it ought to be in the 
>> docs.
>> Or is it a bug?
>> Or maybe I made a mistake (although using bufio.Scanner seems easy)?
>> ```
>> scanner := bufio.NewScanner(file)
>>         lino := 1
>> for scanner.Scan() {
>> line := scanner.Text()
>>                 lino++
>>                 ... // etc
>>         }
>> ```
>> Anyway, I've switched to using bufio.Reader and that works great.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to golang-nuts...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/ebf242ab-c8aa-4de8-821b-3abe77a9da86n%40googlegroups.com.
package main

import (
	"bufio"
	"fmt"
	"io"
	"os"
)

const pkgFile = "/var/lib/apt/lists/gb.archive.ubuntu.com_ubuntu_" +
	"dists_jammy_universe_binary-amd64_Packages"

func main() {
	lines, err := readPackages(pkgFile)
	fmt.Println("Reader", lines, err)
	lines, err = scanPackages(pkgFile)
	fmt.Println("Scanner", lines, err)
}

func readPackages(filename string) (int, error) {
	file, err := os.Open(filename)
	if err != nil {
		return 0, err
	}
	defer file.Close()
	reader := bufio.NewReader(file)
	lines := 0
	for {
		_, err := reader.ReadString('\n')
		if err == io.EOF {
			break
		} else if err != nil {
			return 0, err
		}
		lines++
	}
	return lines, nil
}

func scanPackages(filename string) (int, error) {
	file, err := os.Open(filename)
	if err != nil {
		return 0, err
	}
	defer file.Close()
	scanner := bufio.NewScanner(file)
	lines := 0
	for scanner.Scan() {
		_ = scanner.Text()
		lines++
	}
	return lines, scanner.Err()
}

Reply via email to