subject:"\[R\] Exceptional slowness with read.csv"

Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread avi.e.gross

Original Message- From: R-help On Behalf Of Dave Dixon Sent: Wednesday, April 10, 2024 12:20 PM To: Rui Barradas ; r-help@r-project.org Subject: Re: [R] Exceptional slowness with read.csv That's basically what I did 1. Get text lines using readLines 2. use tryCatch to parse each

Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread Dave Dixon

That's basically what I did 1. Get text lines using readLines 2. use tryCatch to parse each line using read.csv(text=...) 3. in the catch, use gregexpr to find any quotes not adjacent to a comma (gregexpr("[^,]\"[^,]",...) 4. escape any quotes found by adding a second quote (using str_sub from

Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread avi.e.gross

once, ... -Original Message- From: R-help On Behalf Of Rui Barradas Sent: Wednesday, April 10, 2024 9:46 AM To: Dave Dixon ; r-help@r-project.org Subject: Re: [R] Exceptional slowness with read.csv Às 06:47 de 08/04/2024, Dave Dixon escreveu: > Greetings, > > I have a csv fil

Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread Rui Barradas

Às 06:47 de 08/04/2024, Dave Dixon escreveu: Greetings, I have a csv file of 76 fields and about 4 million records. I know that some of the records have errors - unmatched quotes, specifically. Reading the file with readLines and parsing the lines with read.csv(text = ...) is really slow. I k

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread jim holtman

Try reading the lines in (readLines), count the number of both types of quotes in each line. Find out which are not even and investigate. On Mon, Apr 8, 2024, 15:24 Dave Dixon wrote: > I solved the mystery, but not the problem. The problem is that there's > an unclosed quote somewhere in those 5

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Eberhard W Lisse

I find QSV very helpful. el On 08/04/2024 22:21, Dave Dixon wrote: > I solved the mystery, but not the problem. The problem is that > there's an unclosed quote somewhere in those 5 additional records I'm > trying to access. So read.csv is reading million-character fields. > It's slow at that. Tha

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Dave Dixon

Right, I meant to add header=FALSE. And, it looks now like the next line is the one with the unclosed quote, so read.csv is trying to read million-character headers! On 4/8/24 12:42, Ivan Krylov wrote: В Sun, 7 Apr 2024 23:47:52 -0600 Dave Dixon пишет: > second_records <- read.csv(file_na

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Dave Dixon

Good suggestion - I'll look into data.table. On 4/8/24 12:14, CALUM POLWART wrote: > data.table's fread is also fast. Not sure about error handling. But I > can merge 300 csvs with a total of 0.5m lines and 50 columns in a > couple of minutes versus a lifetime with read.csv or readr::read_csv >

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Dave Dixon

Thanks, yeah, I think scan is more promising. I'll check it out. On 4/8/24 11:49, Bert Gunter wrote: > No idea, but have you tried using ?scan to read those next 5 rows? It > might give you a better idea of the pathologies that are causing > problems. For example, an unmatched quote might result

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Dave Dixon

I solved the mystery, but not the problem. The problem is that there's an unclosed quote somewhere in those 5 additional records I'm trying to access. So read.csv is reading million-character fields. It's slow at that. That mystery solved. However, the the problem persists: how to fix what is o

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Rui Barradas

Às 19:42 de 08/04/2024, Ivan Krylov via R-help escreveu: В Sun, 7 Apr 2024 23:47:52 -0600 Dave Dixon пишет: > second_records <- read.csv(file_name, skip = 2459465, nrows = 5) It may or may not be important that read.csv defaults to header = TRUE. Having skipped 2459465 lines, it may attemp

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Ivan Krylov via R-help

В Sun, 7 Apr 2024 23:47:52 -0600 Dave Dixon пишет: > > second_records <- read.csv(file_name, skip = 2459465, nrows = 5) It may or may not be important that read.csv defaults to header = TRUE. Having skipped 2459465 lines, it may attempt to parse the next one as a header, so the second call read

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread CALUM POLWART

data.table's fread is also fast. Not sure about error handling. But I can merge 300 csvs with a total of 0.5m lines and 50 columns in a couple of minutes versus a lifetime with read.csv or readr::read_csv On Mon, 8 Apr 2024, 16:19 Stevie Pederson, wrote: > Hi Dave, > > That's rather frustratin

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Bert Gunter

No idea, but have you tried using ?scan to read those next 5 rows? It might give you a better idea of the pathologies that are causing problems. For example, an unmatched quote might result in some huge number of characters trying to be read into a single element of a character variable. As your pr

Re: [R] Exceptional slowness with read.csv

2024-04-08 Thread Stevie Pederson

Hi Dave, That's rather frustrating. I've found vroom (from the package vroom) to be helpful with large files like this. Does the following give you any better luck? vroom(file_name, delim = ",", skip = 2459465, n_max = 5) Of course, when you know you've got errors & the files are big like that

[R] Exceptional slowness with read.csv

2024-04-08 Thread Dave Dixon

Greetings, I have a csv file of 76 fields and about 4 million records. I know that some of the records have errors - unmatched quotes, specifically. Reading the file with readLines and parsing the lines with read.csv(text = ...) is really slow. I know that the first 2459465 records are good.

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

Re: [R] Exceptional slowness with read.csv

[R] Exceptional slowness with read.csv

16 matches

Site Navigation

Mail list logo

Footer information