On Tue, Apr 13, 2010 at 6:26 PM, Sebastian Kruk <residuo.so...@gmail.com> wrote: > Dear R-list users: > > I would like to import a database of web robots, > http://www.robotstxt.org/db/all.txt, it´s formatted RFC-822, ¿how can > I do it?
RFC822 looks very much like R's package DESCRIPTION files, and they are read in using read.dcf because they are conformant to 'Debian Control File' format. So I tried read.dcf on it: > robots = read.dcf("all.txt") > dim(robots) [1] 298 38 so that's a matrix: > dimnames(robots) [[1]] NULL [[2]] [1] "robot-id" "robot-name" [3] "robot-cover-url" "robot-details-url" [5] "robot-owner-name" "robot-owner-url" [7] "robot-owner-email" "robot-status" [9] "robot-purpose" "robot-type" [11] "robot-platform" "robot-availability" [13] "robot-exclusion" "robot-exclusion-useragent" [15] "robot-noindex" "robot-host" [17] "robot-from" "robot-useragent" [19] "robot-language" "robot-description" [21] "robot-history" "robot-environment" [23] "modified-date" "modified-by" [25] "robot-nofollow" "robot-owner-name2" [27] "robot-owner-url2" "robot-owner-email2" [29] "robot-owner-name3" "robot-owner-name4" [31] "robot-environment1" "robot-environment2" [33] "robot-purpose1" "robot-purpose2" [35] "robot-purpose3" "robot-platform1" [37] "robot-description1" "robot-description2" and I guess it pads out the columns so every row has every possible variable value even if it doesn't exist in the record for that robot. Sorted? Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.