The software attempts to read the registry and temporarily augment the path in case you have Rtools installed so that the filter can access all the tools that Rtools provides. I am not sure why its failing on your system but there is evidently some differences between systems here and I have added some code to trap and bypass that portion in case it fails. I have added the new version to the svn repository so try this:
library(sqldf) # overwrite with development version source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") # your code to call read.csv.sql On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA} <satish.vadlam...@fritolay.com> wrote: > > Gabor: > Here is the update. As you can see, I got the same error as below in 1. > > 1. Error > test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", > header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n") > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found > > 2. But the loading of the bigger file was successful as you can see below. > 857 MB, 333,250 rows, 227 columns. This is good. > > I will have to just do an inline edit in Perl and change the file to csv from > within R and then call the read.csv.sql. > > If you have any suggestions to fix 1, I would like to try them. > > system.time(test_df <- read.csv.sql(file="out.txt")) > user system elapsed > 192.53 15.50 213.68 > Warning message: > closing unused connection 3 (out.txt) > > Thanks again. > > Satish > > -----Original Message----- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 3:02 PM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > Note that you can shorten #1 to read.csv.sql("out.txt") since your > other arguments are the default values. > > For the second one, use read.csv.sql, eliminate the arguments that are > defaults anyways (should not cause a problem but its error prone) and > add an explicit eol= argument since SQLite can have problems with end > of line in some cases. Also test out your perl script separately from > R first to ensure that it works: > > test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl > parse_3wkout.pl", eol = "\n") > > SQLite has some known problems with end of line so try it with and > without the eol= argument just in case. When I just made up the > following gawk example I noticed that I did need to specify the eol= > argument. > > Also I have added a complete example using gawk as Example 13c on the > home page just now: > http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql > > > On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA} > <satish.vadlam...@fritolay.com> wrote: >> Gabor: >> >> I had success with the following. >> 1. I created a csv file with a perl script called "out.txt". Then ran the >> following successfully >> library("sqldf") >> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = >> TRUE, sep = ",", dbname = tempfile()) >> >> 2. I did not have success with the following. Could you tell me what I may >> be doing wrong? I could paste the perl script if necessary. From the perl >> script, I am reading the file, creating the csv record and printing each >> record one by one and then exiting. >> >> Thanks. >> >> Not had success with below.. >> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname >> = tempfile()) >> test_df >> >> Error message below: >> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname >> = tempfile()) >> Error in readRegistry(key, maxdepth = 3) : >> Registry key 'SOFTWARE\R-core' not found >> In addition: Warning messages: >> 1: closing unused connection 14 (3wkoutstatfcst_small.dat) >> 2: closing unused connection 13 (3wkoutstatfcst_small.dat) >> 3: closing unused connection 11 (3wkoutstatfcst_small.dat) >> 4: closing unused connection 9 (3wkoutstatfcst_small.dat) >> 5: closing unused connection 3 (3wkoutstatfcst_small.dat) >>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname >>> = tempfile()) >> Error in readRegistry(key, maxdepth = 3) : >> Registry key 'SOFTWARE\R-core' not found >> >> -----Original Message----- >> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >> Sent: Saturday, February 06, 2010 12:14 PM >> To: Vadlamani, Satish {FLNA} >> Cc: r-help@r-project.org >> Subject: Re: [R] Reading large files >> >> No. >> >> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA} >> <satish.vadlam...@fritolay.com> wrote: >>> Gabor: >>> Can I pass colClasses as a vector to read.csv.sql? Thanks. >>> Satish >>> >>> >>> -----Original Message----- >>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>> Sent: Saturday, February 06, 2010 9:41 AM >>> To: Vadlamani, Satish {FLNA} >>> Cc: r-help@r-project.org >>> Subject: Re: [R] Reading large files >>> >>> Its just any Windows batch command string that filters stdin to >>> stdout. What the command consists of should not be important. An >>> invocation of perl that runs a perl script that filters stdin to >>> stdout might look like this: >>> read.csv.sql("myfile.dat", filter = "perl myprog.pl") >>> >>> For an actual example see the source of read.csv2.sql which defaults >>> to using a Windows vbscript program as a filter. >>> >>> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA} >>> <satish.vadlam...@fritolay.com> wrote: >>>> Jim, Gabor: >>>> Thanks so much for the suggestions where I can use read.csv.sql and embed >>>> Perl (or gawk). I just want to mention that I am running on Windows. I am >>>> going to read the documentation the filter argument and see if it can take >>>> a decent sized Perl script and then use its output as input. >>>> >>>> Suppose that I write a Perl script that parses this fwf file and creates a >>>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only >>>> be a statement or something? If you know the answer, please let me know. >>>> Otherwise, I will try a few things and report back the results. >>>> >>>> Thanks again. >>>> Saitsh >>>> >>>> >>>> -----Original Message----- >>>> From: jim holtman [mailto:jholt...@gmail.com] >>>> Sent: Saturday, February 06, 2010 6:16 AM >>>> To: Gabor Grothendieck >>>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org >>>> Subject: Re: [R] Reading large files >>>> >>>> In perl the 'unpack' command makes it very easy to parse fixed fielded >>>> data. >>>> >>>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck >>>> <ggrothendi...@gmail.com> wrote: >>>>> Note that the filter= argument on read.csv.sql can be used to pass the >>>>> input through a filter written in perl, [g]awk or other language. >>>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk") >>>>> >>>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed >>>>> width fields, e.g. >>>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html >>>>> making this very easy but perl or whatever you are most used to would >>>>> be fine too. >>>>> >>>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA} >>>>> <satish.vadlam...@fritolay.com> wrote: >>>>>> Hi Gabor: >>>>>> Thanks. My files are all in fixed width format. They are a lot of them. >>>>>> It would take me some effort to convert them to CSV. I guess this cannot >>>>>> be avoided? I can write some Perl scripts to convert fixed width format >>>>>> to CSV format and then start with your suggestion. Could you let me know >>>>>> your thoughts on the approach? >>>>>> Satish >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>>>>> Sent: Friday, February 05, 2010 5:16 PM >>>>>> To: Vadlamani, Satish {FLNA} >>>>>> Cc: r-help@r-project.org >>>>>> Subject: Re: [R] Reading large files >>>>>> >>>>>> If your problem is just how long it takes to load the file into R try >>>>>> read.csv.sql in the sqldf package. A single read.csv.sql call can >>>>>> create an SQLite database and table layout for you, read the file into >>>>>> the database (without going through R so R can't slow this down), >>>>>> extract all or a portion into R based on the sql argument you give it >>>>>> and then remove the database. See the examples on the home page: >>>>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql >>>>>> >>>>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani >>>>>> <satish.vadlam...@fritolay.com> wrote: >>>>>>> >>>>>>> Matthew: >>>>>>> If it is going to help, here is the explanation. I have an end state in >>>>>>> mind. It is given below under "End State" header. In order to get >>>>>>> there, I >>>>>>> need to start somewhere right? I started with a 850 MB file and could >>>>>>> not >>>>>>> load in what I think is reasonable time (I waited for an hour). >>>>>>> >>>>>>> There are references to 64 bit. How will that help? It is a 4GB RAM >>>>>>> machine >>>>>>> and there is no paging activity when loading the 850 MB file. >>>>>>> >>>>>>> I have seen other threads on the same types of questions. I did not see >>>>>>> any >>>>>>> clear cut answers or errors that I could have been making in the >>>>>>> process. If >>>>>>> I am missing something, please let me know. Thanks. >>>>>>> Satish >>>>>>> >>>>>>> >>>>>>> End State >>>>>>>> Satish wrote: "at one time I will need to load say 15GB into R" >>>>>>> >>>>>>> >>>>>>> ----- >>>>>>> Satish Vadlamani >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html >>>>>>> Sent from the R help mailing list archive at Nabble.com. >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-help@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>>> >>>> >>>> -- >>>> Jim Holtman >>>> Cincinnati, OH >>>> +1 513 646 9390 >>>> >>>> What is the problem that you are trying to solve? >>>> >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.