I did this on the source files which were semi-colon delimted (to delimit the fields, I am not sure what character denotes the new tweet)
After loading the tm package > txt <- system.file("texts", "txt", package = "tm") > (twitter <- Corpus(DirSource(txt), + readerControl = list(language = "lat"))) then twitter <- tm_map(twitter, removeWords, stopwords("english")) That last command took about an hour to complete. onyourmark wrote: > > Hi. I have a huge list called twitter: > >> dim(twitter) > NULL >> str(twitter) > List of 1 > $ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic > [1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons > For Governance From Campaigner-in-chief: President obama jumps campaign > 09 tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.5157535 > 12210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washington > meets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise > worries EU ties?;London, England;United Kingdom;Greater > London;Westminster;;51.5001524;-0.1262362 > 12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses > wearing thin Obama, media pals... http://tinyurl.com/yfw6cd9;So. > California;USA;CA;;;36.778261;-119.4179324 > 12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama Afghanistan > troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama > #video;USA;USA;;;;37.09024;-95.712891 ... > .. ..- attr(*, "Author")= chr(0) > .. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31 > 04:46:56" > .. ..- attr(*, "Description")= chr(0) > .. ..- attr(*, "Heading")= chr(0) > .. ..- attr(*, "ID")= chr "1" > .. ..- attr(*, "Language")= chr "en" > .. ..- attr(*, "LocalMetaData")= list() > .. ..- attr(*, "Origin")= chr(0) > - attr(*, "CMetaData")=List of 3 > ..$ NodeID : num 0 > ..$ MetaData:List of 2 > .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56" > .. ..$ creator : Named chr "" > .. .. ..- attr(*, "names")= chr "LOGNAME" > ..$ Children: NULL > ..- attr(*, "class")= chr "MetaDataNode" > - attr(*, "DMetaData")='data.frame': 1 obs. of 1 variable: > ..$ MetaID: num 0 > - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list" > > It contains tweets but in many languages. The "columns" are separated by > semi-colons. I am using the tm package and it is a "corpus". > > It looks like this: > > 547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1 day > :p;Huddersfield/Lincoln;United > Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296 > 547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro > computador da facool? BOM DIA.;Belo Horizonte - MG - > BR;Brazil;MG;;;-19.8157306;-43.9542226 > 547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры без границ" > опубликовали список стран со свободой слова, из 173 Грузия на 81 месте > опережая Украину. Успехи,успехи...;Portugal > Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169 > 547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton <\;Someone's > Daughter>\;;Kanazawa, Japan;Japan;Ishikawa > Prefecture;;;36.5613254;136.6562051 > Error: invalid input > '547286;06:37:18;21;10;2009;Atogey;支æŒä½ ,国家需è¦ä»–们,但是国家的未æ¥ä¸èƒ½é 他们…RT > @zuola ￿我觉得 @wenyunc > > I want to convert it to "fields" or columns and so I thought I should > convert it to a dataframe. I tried > >> twitterDF<-as.data.frame(twitter) > Error in sort.list(y) : > invalid input > '547286;06:37:18;21;10;2009;Atogey;支æŒä½ ,国家需è¦ä»–们,但是国家的未æ¥ä¸èƒ½é 他们…RT > @zuola ￿我觉得 @wenyunchao > 一点都ä¸ä¹è§‚。真æ£çš„ä¹è§‚åº”è¯¥æ˜¯ï¼šä½ å…³æˆ‘åˆæ€Žä¹ˆæ ·ï¼Œåæ£æ”¿æ²»æ–—争ä¸ä¼šä¸¢æŽ‰æ€§å‘½ï¼Œè€å出æ¥åŽæ›´æ˜¯ä¸€æ¡å¥½æ±‰ã€‚北风还是èˆä¸å¾—*霸地ä½ã€è‚‰ã€ä¹¦ã€å¥³äººå’Œç½‘络的,ä¸è¿‡ç‰¢é‡Œä¸ä¼šæä¾›è¿™äº›ã€‚å¦â€¦;山西,浙江;China;Zhejiang;;;28.695035;119.751054' > in 'utf8towcs' >> > > Can anyone suggest what I can do? > > P.S. Actually, I would love to remove all the non-English tweets but I > have no clue about how to do that. > > -- View this message in context: http://old.nabble.com/convert-list-to-Dataframe-tp26148889p26148898.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.