Re: [R] Reading large files

Gabor Grothendieck Sat, 06 Feb 2010 14:29:53 -0800

The software attempts to read the registry and temporarily augment the
path in case you have Rtools installed so that the filter can access
all the tools that Rtools provides.  I am not sure why its failing on
your system but there is evidently some differences between systems
here and I have added some code to trap and bypass that portion in
case it fails.  I have added the new version to the svn repository so
try this:


library(sqldf)
# overwrite with development version
source("http://sqldf.googlecode.com/svn/trunk/R/sqldf.R";)
# your code to call read.csv.sql


On Sat, Feb 6, 2010 at 5:18 PM, Vadlamani, Satish {FLNA}
<satish.vadlam...@fritolay.com> wrote:
>
> Gabor:
> Here is the update. As you can see, I got the same error as below in 1.
>
> 1. Error
>  test_df <- read.csv.sql(file="out_small.txt", sql = "select * from file", 
> header = TRUE, sep = ",", filter="perl parse_3wkout.pl", eol="\n")
> Error in readRegistry(key, maxdepth = 3) :
>  Registry key 'SOFTWARE\R-core' not found
>
> 2. But the loading of the bigger file was successful as you can see below. 
> 857 MB, 333,250 rows, 227 columns. This is good.
>
> I will have to just do an inline edit in Perl and change the file to csv from 
> within R and then call the read.csv.sql.
>
> If you have any suggestions to fix 1, I would like to try them.
>
>  system.time(test_df <- read.csv.sql(file="out.txt"))
>   user  system elapsed
>  192.53   15.50  213.68
> Warning message:
> closing unused connection 3 (out.txt)
>
> Thanks again.
>
> Satish
>
> -----Original Message-----
> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
> Sent: Saturday, February 06, 2010 3:02 PM
> To: Vadlamani, Satish {FLNA}
> Cc: r-help@r-project.org
> Subject: Re: [R] Reading large files
>
> Note that you can shorten #1 to read.csv.sql("out.txt") since your
> other arguments are the default values.
>
> For the second one, use read.csv.sql, eliminate the arguments that are
> defaults anyways (should not cause a problem but its error prone) and
> add an explicit eol= argument since SQLite can have problems with end
> of line in some cases.  Also test out your perl script separately from
> R first to ensure that it works:
>
> test_df <- read.csv.sql(file="3wkoutstatfcst_small.dat", filter="perl
> parse_3wkout.pl", eol = "\n")
>
> SQLite has some known problems with end of line so try it with and
> without the eol= argument just in case.  When I just made up the
> following gawk example I noticed that I did need to specify the eol=
> argument.
>
> Also I have added a complete example using gawk as Example 13c on the
> home page just now:
> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>
>
> On Sat, Feb 6, 2010 at 3:52 PM, Vadlamani, Satish {FLNA}
> <satish.vadlam...@fritolay.com> wrote:
>> Gabor:
>>
>> I had success with the following.
>> 1. I created a csv file with a perl script called "out.txt". Then ran the 
>> following successfully
>> library("sqldf")
>> test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = 
>> TRUE, sep = ",", dbname = tempfile())
>>
>> 2. I did not have success with the following. Could you tell me what I may 
>> be doing wrong? I could paste the perl script if necessary. From the perl 
>> script, I am reading the file, creating the csv record and printing each 
>> record one by one and then exiting.
>>
>> Thanks.
>>
>> Not had success with below..
>> #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
>> test_df
>>
>> Error message below:
>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>> = tempfile())
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>> In addition: Warning messages:
>> 1: closing unused connection 14 (3wkoutstatfcst_small.dat)
>> 2: closing unused connection 13 (3wkoutstatfcst_small.dat)
>> 3: closing unused connection 11 (3wkoutstatfcst_small.dat)
>> 4: closing unused connection 9 (3wkoutstatfcst_small.dat)
>> 5: closing unused connection 3 (3wkoutstatfcst_small.dat)
>>> test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * 
>>> from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname 
>>> = tempfile())
>> Error in readRegistry(key, maxdepth = 3) :
>>  Registry key 'SOFTWARE\R-core' not found
>>
>> -----Original Message-----
>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>> Sent: Saturday, February 06, 2010 12:14 PM
>> To: Vadlamani, Satish {FLNA}
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Reading large files
>>
>> No.
>>
>> On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA}
>> <satish.vadlam...@fritolay.com> wrote:
>>> Gabor:
>>> Can I pass colClasses as a vector to read.csv.sql? Thanks.
>>> Satish
>>>
>>>
>>> -----Original Message-----
>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>> Sent: Saturday, February 06, 2010 9:41 AM
>>> To: Vadlamani, Satish {FLNA}
>>> Cc: r-help@r-project.org
>>> Subject: Re: [R] Reading large files
>>>
>>> Its just any Windows batch command string that filters stdin to
>>> stdout.  What the command consists of should not be important.   An
>>> invocation of perl that runs a perl script that filters stdin to
>>> stdout might look like this:
>>>  read.csv.sql("myfile.dat", filter = "perl myprog.pl")
>>>
>>> For an actual example see the source of read.csv2.sql which defaults
>>> to using a Windows vbscript program as a filter.
>>>
>>> On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA}
>>> <satish.vadlam...@fritolay.com> wrote:
>>>> Jim, Gabor:
>>>> Thanks so much for the suggestions where I can use read.csv.sql and embed 
>>>> Perl (or gawk). I just want to mention that I am running on Windows. I am 
>>>> going to read the documentation the filter argument and see if it can take 
>>>> a decent sized Perl script and then use its output as input.
>>>>
>>>> Suppose that I write a Perl script that parses this fwf file and creates a 
>>>> CSV file. Can I embed this within the read.csv.sql call? Or, can it only 
>>>> be a statement or something? If you know the answer, please let me know. 
>>>> Otherwise, I will try a few things and report back the results.
>>>>
>>>> Thanks again.
>>>> Saitsh
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: jim holtman [mailto:jholt...@gmail.com]
>>>> Sent: Saturday, February 06, 2010 6:16 AM
>>>> To: Gabor Grothendieck
>>>> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org
>>>> Subject: Re: [R] Reading large files
>>>>
>>>> In perl the 'unpack' command makes it very easy to parse fixed fielded 
>>>> data.
>>>>
>>>> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck
>>>> <ggrothendi...@gmail.com> wrote:
>>>>> Note that the filter= argument on read.csv.sql can be used to pass the
>>>>> input through a filter written in perl, [g]awk or other language.
>>>>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk")
>>>>>
>>>>> gawk has the FIELDWIDTHS variable for automatically parsing fixed
>>>>> width fields, e.g.
>>>>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html
>>>>> making this very easy but perl or whatever you are most used to would
>>>>> be fine too.
>>>>>
>>>>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA}
>>>>> <satish.vadlam...@fritolay.com> wrote:
>>>>>> Hi Gabor:
>>>>>> Thanks. My files are all in fixed width format. They are a lot of them. 
>>>>>> It would take me some effort to convert them to CSV. I guess this cannot 
>>>>>> be avoided? I can write some Perl scripts to convert fixed width format 
>>>>>> to CSV format and then start with your suggestion. Could you let me know 
>>>>>> your thoughts on the approach?
>>>>>> Satish
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com]
>>>>>> Sent: Friday, February 05, 2010 5:16 PM
>>>>>> To: Vadlamani, Satish {FLNA}
>>>>>> Cc: r-help@r-project.org
>>>>>> Subject: Re: [R] Reading large files
>>>>>>
>>>>>> If your problem is just how long it takes to load the file into R try
>>>>>> read.csv.sql in the sqldf package.  A single read.csv.sql call can
>>>>>> create an SQLite database and table layout for you, read the file into
>>>>>> the database (without going through R so R can't slow this down),
>>>>>> extract all or a portion into R based on the sql argument you give it
>>>>>> and then remove the database.  See the examples on the home page:
>>>>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql
>>>>>>
>>>>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani
>>>>>> <satish.vadlam...@fritolay.com> wrote:
>>>>>>>
>>>>>>> Matthew:
>>>>>>> If it is going to help, here is the explanation. I have an end state in
>>>>>>> mind. It is given below under "End State" header. In order to get 
>>>>>>> there, I
>>>>>>> need to start somewhere right? I started with a 850 MB file and could 
>>>>>>> not
>>>>>>> load in what I think is reasonable time (I waited for an hour).
>>>>>>>
>>>>>>> There are references to 64 bit. How will that help? It is a 4GB RAM 
>>>>>>> machine
>>>>>>> and there is no paging activity when loading the 850 MB file.
>>>>>>>
>>>>>>> I have seen other threads on the same types of questions. I did not see 
>>>>>>> any
>>>>>>> clear cut answers or errors that I could have been making in the 
>>>>>>> process. If
>>>>>>> I am missing something, please let me know. Thanks.
>>>>>>> Satish
>>>>>>>
>>>>>>>
>>>>>>> End State
>>>>>>>> Satish wrote: "at one time I will need to load say 15GB into R"
>>>>>>>
>>>>>>>
>>>>>>> -----
>>>>>>> Satish Vadlamani
>>>>>>> --
>>>>>>> View this message in context: 
>>>>>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html
>>>>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>>>>
>>>>>>> ______________________________________________
>>>>>>> R-help@r-project.org mailing list
>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>> PLEASE do read the posting guide 
>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>
>>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help@r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jim Holtman
>>>> Cincinnati, OH
>>>> +1 513 646 9390
>>>>
>>>> What is the problem that you are trying to solve?
>>>>
>>>
>>
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reading large files

Reply via email to