"4927861;05:04:14;28;10;2009;HOYTSTHEATRES;GameStop Brings 15K
Manage
Holiday Rush [Black Friday]
http://bit.ly/2d3OJg;Australia;Australia;;;;-25.274398;133.775136",
"4927863;05:04:14;28;10;2009;padden;Rachel master chef cook
anytime!;Sydney, Australia;Australia;NSW;;;-33.867139;151.207114",
"4927878;05:04:17;28;10;2009;GSpotMagazine;The penalty success
bored
attentions people formerly snubbed you. -Mary Wilson Little
#quote;UK;United Kingdom;;;;55.378051;-3.435973",
"4927885;05:04:20;28;10;2009;super_assassin;@triplejsr flight
conchords,
pleeeeeaaase :) thanks rosie
xx;Australia;Australia;;;;-25.274398;133.775136",
"4927893;05:04:21;28;10;2009;SLMFE;Gestern:Achso,ja okey,um 5 nach
las ich
jemanden komen der dir die Akupunkturnadel(zb 5!im Ohr!)entfernt..Um
10 n.
kommt immer noch keiner..;Germany;Germany;;;;51.165691;10.451526",
"4927901;05:04:23;28;10;2009;mikesemple;HHS Secretary pushes health
care
reform rural America: By Christopher Smart The health-care crisis ..
http://bit.ly/49Iqcu;London;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362",
"4927913;05:04:26;28;10;2009;coax_k;Facebook Headquarters Studio O
+A: San
Francisco based interior design firm Studio O+A designed ..
http://bit.ly/hdqWp;Sydney;Australia;NSW;;;-33.867139;151.207114"
), Author = character(0), DateTimeStamp = structure(list(sec =
56.4049999713898,
min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec", "min",
"hour", "mday", "mon", "year", "wday", "yday", "isdst"), class =
c("POSIXt",
"POSIXlt"), tzone = "GMT"), Description = character(0), Heading =
character(0), ID = "1", Language = "en", LocalMetaData = list(),
Origin =
character(0), class = c("PlainTextDocument",
"TextDocument", "character"))), CMetaData = structure(list(NodeID = 0,
MetaData = structure(list(create_date = structure(list(sec =
56.4059998989105,
min = 46L, hour = 4L, mday = 31L, mon = 9L, year = 109L,
wday = 6L, yday = 303L, isdst = 0L), .Names = c("sec",
"min", "hour", "mday", "mon", "year", "wday", "yday", "isdst"
), class = c("POSIXt", "POSIXlt"), tzone = "GMT"), creator =
structure("", .Names = "LOGNAME")), .Names = c("create_date",
"creator")), Children = NULL), .Names = c("NodeID", "MetaData",
"Children"), class = "MetaDataNode"), DMetaData = structure(list(
MetaID = 0), .Names = "MetaID", row.names = c(NA, -1L), class =
"data.frame"), class = c("VCorpus",
"Corpus", "list"))
onyourmark wrote:
Hi. I have a huge list called twitter:
dim(twitter)
NULL
str(twitter)
List of 1
$ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed
Lessons
For Governance From Campaigner-in-chief: President obama jumps
campaign
09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535
12210;10:47:37;20;10;2009;David_Stringer;William Hague heading
Washington
meets Gen. Jim Jones, Sen. John McCain others. Will Obama team
raise
worries EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses
wearing thin Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama
Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8
#obama
#video;USA;USA;;;;37.09024;-95.712891 ...
.. ..- attr(*, "Author")= chr(0)
.. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31
04:46:56"
.. ..- attr(*, "Description")= chr(0)
.. ..- attr(*, "Heading")= chr(0)
.. ..- attr(*, "ID")= chr "1"
.. ..- attr(*, "Language")= chr "en"
.. ..- attr(*, "LocalMetaData")= list()
.. ..- attr(*, "Origin")= chr(0)
- attr(*, "CMetaData")=List of 3
..$ NodeID : num 0
..$ MetaData:List of 2
.. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
.. ..$ creator : Named chr ""
.. .. ..- attr(*, "names")= chr "LOGNAME"
..$ Children: NULL
..- attr(*, "class")= chr "MetaDataNode"
- attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable:
..$ MetaID: num 0
- attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
It contains tweets but in many languages. The "columns" are
separated by
semi-colons. I am using the tm package and it is a "corpus".
It looks like this:
547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro
computador da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры
без границ"
опубликовали список стран со
свободой слова, из 173 Грузия на 81 месте
опережая Украину.
Успехи,успехи...;Portugal
Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's
Daughter>\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;支æŒä½
,国家需è¦ä»–
们,但是国家的未æ
¥ä¸èƒ½é 他们…RT
@zuola ￿我觉得 @wenyunc
I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried
twitterDF<-as.data.frame(twitter)
Error in sort.list(y) :
invalid input
'547286;06:37:18;21;10;2009;Atogey;支æŒä½
,国家需è¦ä»–
们,但是国家的未æ
¥ä¸èƒ½é 他们…RT
@zuola ￿我觉得 @wenyunchao
一点都ä¸ä¹è§‚。真æ
£çš„ä¹è§‚åº”è¯¥æ˜¯ï¼šä½ å…
³æˆ‘åˆæ€Žä¹ˆæ ·ï¼Œåæ £æ”¿æ²»æ–
—争ä¸ä¼šä¸¢æŽ‰æ€§å‘½ï¼Œè€å
出æ¥åŽæ›´æ˜¯ä¸€æ¡å¥½æ±‰ã€
‚北风还是èˆä¸å¾—*霸地ä½ã
€è‚‰ã€ä¹¦ã€å
¥³äººå’Œç½‘络的,ä¸è¿‡ç‰
¢é‡Œä¸ä¼šæä¾›è¿™äº›ã€‚å¦â
€¦;山西,浙江;China;Zhejiang;;;
28.695035;119.751054'
in 'utf8towcs'
Can anyone suggest what I can do?
P.S. Actually, I would love to remove all the non-English tweets
but I
have no clue about how to do that.
--
View this message in context:
http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148893.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.