I have uploaded another version which suppresses display of the error message but otherwise works the same. Omitting the redundant arguments we have:
ibrary(sqldf) # next line is only needed once per session to read in devel version source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl parse_3wkout.pl") On Sat, Feb 6, 2010 at 5:48 PM, Vadlamani, Satish {FLNA} <satish.vadlam...@fritolay.com> wrote: > Gabor: > Please see the results below. Sourcing your new R script worked (although > with the same error message). If I put eol="\n" option, it is adding a "\r" > to the last column. I took out the eol option below. This is just some more > feedback to you. > > I am thinking that I will just do an inline edit in Perl (that is create the > csv file through Perl by overwriting the current file) and then use > read.csv.sql without the filter= option. This seems to be more tried and > tested. If you have any suggestions, please let me know. Thanks. > Satish > > > BEFORE SOURCING YOUR NEW R SCRIPT >> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl") > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found >> test_df > Error: object 'test_df' not found > > AFTER SOURCING YOUR NEW R SCRIPT >> source("f:/dp_modeling_team/downloads/R/sqldf.R") >> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", sql = "select * >> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl") > Error in readRegistry(key, maxdepth = 3) : > Registry key 'SOFTWARE\R-core' not found > In addition: Warning messages: > 1: closing unused connection 5 (3wkoutstatfcst_small.dat) > 2: closing unused connection 4 (3wkoutstatfcst_small.dat) > 3: closing unused connection 3 (3wkoutstatfcst_small.dat) >> test_df > allgeo area1 zone dist ccust1 whse bindc ccust2 account area2 ccust3 > 1 A 4 1 37 99 4925 4925 99 99 4 99 > 2 A 4 1 37 99 4925 4925 99 99 4 99 > > -----Original Message----- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 4:28 PM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > The software attempts to read the registry and temporarily augment the > path in case you have Rtools installed so that the filter can access > all the tools that Rtools provides. I am not sure why its failing on > your system but there is evidently some differences between systems > here and I have added some code to trap and bypass that portion in > case it fails. I have added the new version to the svn repository so > try this: > > library(sqldf) > # overwrite with development version > source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R") > # your code to call read.csv.sql > > > On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA} > <satish.vadlam...@fritolay.com> wrote: >> >> Gabor: >> Here is the update. As you can see, I got the same error as below in 1. >> >> 1. Error >> test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", >> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n") >> Error in readRegistry(key, maxdepth = 3) : >> Registry key 'SOFTWARE\R-core' not found >> >> 2. But the loading of the bigger file was successful as you can see below. >> 857 MB, 333,250 rows, 227 columns. This is good. >> >> I will have to just do an inline edit in Perl and change the file to csv >> from within R and then call the read.csv.sql. >> >> If you have any suggestions to fix 1, I would like to try them. >> >> system.time(test_df <- read.csv.sql(file="out.txt")) >> user system elapsed >> 192.53 15.50 213.68 >> Warning message: >> closing unused connection 3 (out.txt) >> >> Thanks again. >> >> Satish >> >> -----Original Message----- >> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >> Sent: Saturday, February 06, 2010 3:02 PM >> To: Vadlamani, Satish {FLNA} >> Cc: r-help@r-project.org >> Subject: Re: [R] Reading large files >> >> Note that you can shorten #1 to read.csv.sql("out.txt") since your >> other arguments are the default values. >> >> For the second one, use read.csv.sql, eliminate the arguments that are >> defaults anyways (should not cause a problem but its error prone) and >> add an explicit eol= argument since SQLite can have problems with end >> of line in some cases. Also test out your perl script separately from >> R first to ensure that it works: >> >> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl >> parse_3wkout.pl", eol = "\n") >> >> SQLite has some known problems with end of line so try it with and >> without the eol= argument just in case. When I just made up the >> following gawk example I noticed that I did need to specify the eol= >> argument. >> >> Also I have added a complete example using gawk as Example 13c on the >> home page just now: >> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql >> >> >> On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA} >> <satish.vadlam...@fritolay.com> wrote: >>> Gabor: >>> >>> I had success with the following. >>> 1. I created a csv file with a perl script called "out.txt". Then ran the >>> following successfully >>> library("sqldf") >>> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header >>> = TRUE, sep = ",", dbname = tempfile()) >>> >>> 2. I did not have success with the following. Could you tell me what I may >>> be doing wrong? I could paste the perl script if necessary. From the perl >>> script, I am reading the file, creating the csv record and printing each >>> record one by one and then exiting. >>> >>> Thanks. >>> >>> Not had success with below.. >>> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname >>> = tempfile()) >>> test_df >>> >>> Error message below: >>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname >>> = tempfile()) >>> Error in readRegistry(key, maxdepth = 3) : >>> Registry key 'SOFTWARE\R-core' not found >>> In addition: Warning messages: >>> 1: closing unused connection 14 (3wkoutstatfcst_small.dat) >>> 2: closing unused connection 13 (3wkoutstatfcst_small.dat) >>> 3: closing unused connection 11 (3wkoutstatfcst_small.dat) >>> 4: closing unused connection 9 (3wkoutstatfcst_small.dat) >>> 5: closing unused connection 3 (3wkoutstatfcst_small.dat) >>>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * >>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", >>>> dbname = tempfile()) >>> Error in readRegistry(key, maxdepth = 3) : >>> Registry key 'SOFTWARE\R-core' not found >>> >>> -----Original Message----- >>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>> Sent: Saturday, February 06, 2010 12:14 PM >>> To: Vadlamani, Satish {FLNA} >>> Cc: r-help@r-project.org >>> Subject: Re: [R] Reading large files >>> >>> No. >>> >>> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA} >>> <satish.vadlam...@fritolay.com> wrote: >>>> Gabor: >>>> Can I pass colClasses as a vector to read.csv.sql? Thanks. >>>> Satish >>>> >>>> >>>> -----Original Message----- >>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>>> Sent: Saturday, February 06, 2010 9:41 AM >>>> To: Vadlamani, Satish {FLNA} >>>> Cc: r-help@r-project.org >>>> Subject: Re: [R] Reading large files >>>> >>>> Its just any Windows batch command string that filters stdin to >>>> stdout. What the command consists of should not be important. An >>>> invocation of perl that runs a perl script that filters stdin to >>>> stdout might look like this: >>>> read.csv.sql("myfile.dat", filter = "perl myprog.pl") >>>> >>>> For an actual example see the source of read.csv2.sql which defaults >>>> to using a Windows vbscript program as a filter. >>>> >>>> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA} >>>> <satish.vadlam...@fritolay.com> wrote: >>>>> Jim, Gabor: >>>>> Thanks so much for the suggestions where I can use read.csv.sql and embed >>>>> Perl (or gawk). I just want to mention that I am running on Windows. I am >>>>> going to read the documentation the filter argument and see if it can >>>>> take a decent sized Perl script and then use its output as input. >>>>> >>>>> Suppose that I write a Perl script that parses this fwf file and creates >>>>> a CSV file. Can I embed this within the read.csv.sql call? Or, can it >>>>> only be a statement or something? If you know the answer, please let me >>>>> know. Otherwise, I will try a few things and report back the results. >>>>> >>>>> Thanks again. >>>>> Saitsh >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: jim holtman [mailto:jholt...@gmail.com] >>>>> Sent: Saturday, February 06, 2010 6:16 AM >>>>> To: Gabor Grothendieck >>>>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org >>>>> Subject: Re: [R] Reading large files >>>>> >>>>> In perl the 'unpack' command makes it very easy to parse fixed fielded >>>>> data. >>>>> >>>>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck >>>>> <ggrothendi...@gmail.com> wrote: >>>>>> Note that the filter= argument on read.csv.sql can be used to pass the >>>>>> input through a filter written in perl, [g]awk or other language. >>>>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk") >>>>>> >>>>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed >>>>>> width fields, e.g. >>>>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html >>>>>> making this very easy but perl or whatever you are most used to would >>>>>> be fine too. >>>>>> >>>>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA} >>>>>> <satish.vadlam...@fritolay.com> wrote: >>>>>>> Hi Gabor: >>>>>>> Thanks. My files are all in fixed width format. They are a lot of them. >>>>>>> It would take me some effort to convert them to CSV. I guess this >>>>>>> cannot be avoided? I can write some Perl scripts to convert fixed width >>>>>>> format to CSV format and then start with your suggestion. Could you let >>>>>>> me know your thoughts on the approach? >>>>>>> Satish >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>>>>>> Sent: Friday, February 05, 2010 5:16 PM >>>>>>> To: Vadlamani, Satish {FLNA} >>>>>>> Cc: r-help@r-project.org >>>>>>> Subject: Re: [R] Reading large files >>>>>>> >>>>>>> If your problem is just how long it takes to load the file into R try >>>>>>> read.csv.sql in the sqldf package. A single read.csv.sql call can >>>>>>> create an SQLite database and table layout for you, read the file into >>>>>>> the database (without going through R so R can't slow this down), >>>>>>> extract all or a portion into R based on the sql argument you give it >>>>>>> and then remove the database. See the examples on the home page: >>>>>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql >>>>>>> >>>>>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani >>>>>>> <satish.vadlam...@fritolay.com> wrote: >>>>>>>> >>>>>>>> Matthew: >>>>>>>> If it is going to help, here is the explanation. I have an end state in >>>>>>>> mind. It is given below under "End State" header. In order to get >>>>>>>> there, I >>>>>>>> need to start somewhere right? I started with a 850 MB file and could >>>>>>>> not >>>>>>>> load in what I think is reasonable time (I waited for an hour). >>>>>>>> >>>>>>>> There are references to 64 bit. How will that help? It is a 4GB RAM >>>>>>>> machine >>>>>>>> and there is no paging activity when loading the 850 MB file. >>>>>>>> >>>>>>>> I have seen other threads on the same types of questions. I did not >>>>>>>> see any >>>>>>>> clear cut answers or errors that I could have been making in the >>>>>>>> process. If >>>>>>>> I am missing something, please let me know. Thanks. >>>>>>>> Satish >>>>>>>> >>>>>>>> >>>>>>>> End State >>>>>>>>> Satish wrote: "at one time I will need to load say 15GB into R" >>>>>>>> >>>>>>>> >>>>>>>> ----- >>>>>>>> Satish Vadlamani >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html >>>>>>>> Sent from the R help mailing list archive at Nabble.com. >>>>>>>> >>>>>>>> ______________________________________________ >>>>>>>> R-help@r-project.org mailing list >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>>> PLEASE do read the posting guide >>>>>>>> http://www.R-project.org/posting-guide.html >>>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>>> >>>>>>> >>>>>> >>>>>> ______________________________________________ >>>>>> R-help@r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jim Holtman >>>>> Cincinnati, OH >>>>> +1 513 646 9390 >>>>> >>>>> What is the problem that you are trying to solve? >>>>> >>>> >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.