I have written and attached an example that compares bufio.Reader and bufio.Scanner. Here's the output from `go run .` (a line count followed by the first error encountered): ``` Reader 1333665 <nil> Scanner 777758 bufio.Scanner: token too long ``` This probably _won't_ fail on your 2M line file; it looks like the problem is with the line length of a Debian Packages file. If you have a Debian-derived distro you could try replacing the filename in the file with one from `/var/lib/apt/lists/`.
The docs for bufio.Scanner do say "Programs that need more control over error handling or large tokens, or must run sequential scans on a reader, should use bufio.Reader instead" Perhaps it would be more helpful to mention what the token length limit is? On Thursday, October 12, 2023 at 9:45:10 AM UTC+1 Rob Pike wrote: > I just did a simple test with a 2M line file and it worked fine, so I > suspect it's a bug in your code. But if not, please provide a complete > working executable example, with data, to help identify the problem. > > -rob > > > On Thu, Oct 12, 2023 at 7:39 PM 'Mark' via golang-nuts < > golan...@googlegroups.com> wrote: > >> I'm reading Debian *Package files, some of which are over 1M lines long. >> I used bufio.Scanner and found that it won't read past 1M lines (I'm >> using Go 1.21.1 linux/amd64). >> Is this a limitation of bufio.Scanner? If so then it ought to be in the >> docs. >> Or is it a bug? >> Or maybe I made a mistake (although using bufio.Scanner seems easy)? >> ``` >> scanner := bufio.NewScanner(file) >> lino := 1 >> for scanner.Scan() { >> line := scanner.Text() >> lino++ >> ... // etc >> } >> ``` >> Anyway, I've switched to using bufio.Reader and that works great. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "golang-nuts" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to golang-nuts...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com >> >> <https://groups.google.com/d/msgid/golang-nuts/69f2fa03-c650-4c02-9470-51894dc56d1an%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/ebf242ab-c8aa-4de8-821b-3abe77a9da86n%40googlegroups.com.
package main import ( "bufio" "fmt" "io" "os" ) const pkgFile = "/var/lib/apt/lists/gb.archive.ubuntu.com_ubuntu_" + "dists_jammy_universe_binary-amd64_Packages" func main() { lines, err := readPackages(pkgFile) fmt.Println("Reader", lines, err) lines, err = scanPackages(pkgFile) fmt.Println("Scanner", lines, err) } func readPackages(filename string) (int, error) { file, err := os.Open(filename) if err != nil { return 0, err } defer file.Close() reader := bufio.NewReader(file) lines := 0 for { _, err := reader.ReadString('\n') if err == io.EOF { break } else if err != nil { return 0, err } lines++ } return lines, nil } func scanPackages(filename string) (int, error) { file, err := os.Open(filename) if err != nil { return 0, err } defer file.Close() scanner := bufio.NewScanner(file) lines := 0 for scanner.Scan() { _ = scanner.Text() lines++ } return lines, scanner.Err() }