On Nov 1, 2009, at 8:24 AM, onyourmark wrote:


Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each row.

There either are 15 fields or there aren't. You can't make a dataframe with an approximate number of fields. In the fragment below there appear to be 14 fields. Try:

twitfrag <- strsplit(c("4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings 15K Manage Holiday Rush [Black Friday] http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136 ", "4927863;05:04:14;28;10;2009;padden;Rachel master chef cook anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114",
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty  success   bored
attentions people formerly snubbed you. -Mary Wilson Little #quote;UK;United Kingdom;;;;55.378051;-3.435973", "4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight conchords, pleeeeeaaase :) thanks rosie xx;Australia;Australia;;;;-25.274398;133.775136", "4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526",
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362",
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters Studio O+A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114";
), ";")
twitfrag

I think you will see some patterns emerging.

Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?

They are strings (in our aRgot, objects of type character.) That is an effect of whatever processing you have done with components of the tm package, the entirety of which you are failing to share with us.


Here is the end of the output for dput(twitter)

The whole point of using dput is to create a complete representation of an object.



"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings 15K Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136";,
"4927863;05:04:14;28;10;2009;padden;Rachel  master chef  cook
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114",
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty success bored
attentions  people  formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973",
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136",
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach las ich jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um 10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526",
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health care
reform  rural America: By Christopher Smart The health-care crisis  ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362",
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters Studio O +A: San
Francisco based interior design firm Studio O+A  designed  ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114";
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898,
  min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
  wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class = c("POSIXt",
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(), Origin =
character(0), class = c("PlainTextDocument",
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0,
  MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105,
      min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
      wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec",
  "min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
  ), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date",
  "creator")), Children = NULL), .Names = c("NodeID", "MetaData",
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
  MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus",
"Corpus", "list"))




onyourmark wrote:

Hi. I have a huge list called twitter:

dim(twitter)
NULL
str(twitter)
List of 1
$ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For Governance From Campaigner-in-chief: President obama jumps campaign
09  tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise
worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
wearing thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
#video;USA;USA;;;;37.09024;-95.712891 ...
.. ..- attr(*, "Author")= chr(0)
.. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
.. ..- attr(*, "Description")= chr(0)
.. ..- attr(*, "Heading")= chr(0)
.. ..- attr(*, "ID")= chr "1"
.. ..- attr(*, "Language")= chr "en"
.. ..- attr(*, "LocalMetaData")= list()
.. ..- attr(*, "Origin")= chr(0)
- attr(*, "CMetaData")=List of 3
..$ NodeID  : num 0
..$ MetaData:List of 2
.. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
.. ..$ creator    : Named chr ""
.. .. ..- attr(*, "names")= chr "LOGNAME"
..$ Children: NULL
..- attr(*, "class")= chr "MetaDataNode"
- attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
..$ MetaID: num 0
- attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro
computador da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры без границ" опубликовали список стран со свободой слова, из 173 Грузия на 81 месте опережая Украину. Успехи,успехи...;Portugal
Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;æ”¯æŒä½ ï¼Œå›½å®¶éœ€è¦ä»– ä»¬ï¼Œä½†æ˜¯å›½å®¶çš„æœªæ ¥ä¸èƒ½é 他们…RT
@zuola ￿我觉得 @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried

twitterDF<-as.data.frame(twitter)
Error in sort.list(y) :
invalid input
'547286;06:37:18;21;10;2009;Atogey;æ”¯æŒä½ ï¼Œå›½å®¶éœ€è¦ä»– ä»¬ï¼Œä½†æ˜¯å›½å®¶çš„æœªæ ¥ä¸èƒ½é 他们…RT
@zuola ￿我觉得 @wenyunchao
ä¸€ç‚¹éƒ½ä¸ä¹è§‚ã€‚çœŸæ £çš„ä¹è§‚åº”è¯¥æ˜¯ï¼šä½ å… ³æˆ‘åˆæ€Žä¹ˆæ ·ï¼Œåæ £æ”¿æ²»æ– —äº‰ä¸ä¼šä¸¢æŽ‰æ€§å‘½ï¼Œè€å å‡ºæ¥åŽæ›´æ˜¯ä¸€æ¡å¥½æ±‰ã€ ‚北风还是舍不得*霸地位㠀è‚‰ã€ä¹¦ã€å ¥³äººå’Œç½‘ç»œçš„ï¼Œä¸è¿‡ç‰ ¢é‡Œä¸ä¼šæä¾›è¿™äº›ã€‚另⠀¦;山西,浙江;China;Zhejiang;;; 28.695035;119.751054'
in 'utf8towcs'


Can anyone suggest what I can do?

P.S. Actually, I would love to remove all the non-English tweets but I
have no clue about how to do that.



--
View this message in context: 
http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to