Re: [R] Dealing With Extremely Large Files

2008-09-30 Thread Gabor Grothendieck
There are no built in facilties for fixed column widths but its not hard to parse out the fields yourself using the sqlite substr function. I've added example 6f to the sqldf home page which illustrates this. http://sqldf.googlecode.com On Tue, Sep 30, 2008 at 5:18 PM, zerfetzen <[EMAIL PROTECTE

Re: [R] Dealing With Extremely Large Files

2008-09-30 Thread zerfetzen
Thank you Gabor, this is fantastic, easy to use and so powerful. I was instantly able to many things with .csv files that are much too large for my PC's memory. This is clearly my new favorite way to read in data, I love it! Is it possible to use sqldf with a fixed width format that requires a

Re: [R] Dealing With Extremely Large Files

2008-09-26 Thread Gabor Grothendieck
Not sure if it applies to your file or not but if it does then the sqldf package facilitates reading a large file into an SQLite database. Its a front end to RSQLite which is a front end to SQLite and it reads the data straight into the database without going through R so R does not limit it in any

Re: [R] Dealing With Extremely Large Files

2008-09-26 Thread jim holtman
You can always setup a "connection" and then read in the number of lines you need for the analysis, write out the results and then read in the next ones. I have also used 'filehash' to initially read in portions of a file and then write the objects into the database. These are quickly retrieved if

Re: [R] Dealing With Extremely Large Files

2008-09-26 Thread Charles C. Berry
Try RSiteSearch("biglm") for some threads that discuss strategy for analyzing big datasets. HTH, Chuck On Fri, 26 Sep 2008, zerfetzen wrote: Hi, I'm sure that a large fixed width file, such as 300 million rows and 1,000 columns, is too large for R to handle on a PC, but are there

[R] Dealing With Extremely Large Files

2008-09-26 Thread zerfetzen
Hi, I'm sure that a large fixed width file, such as 300 million rows and 1,000 columns, is too large for R to handle on a PC, but are there ways to deal with it? For example, is there a way to combine some sampling method with read.fwf so that you can read in a sample of 100,000 records, for exam