Re: [R] convert list to Dataframe

Duncan Murdoch Sun, 01 Nov 2009 06:07:30 -0800

On 01/11/2009 7:43 AM, onyourmark wrote:

Hi. I have a huge list called twitter:

It's a list, but more importantly it's a VCorpus and a Corpus. Youshould use the functions appropriate to those classes to extract thestrings making up the data, declare their encoding properly (or convertthem to your native encoding), then use read.delim() on a textConnectionto read them in.


Duncan Murdoch

dim(twitter)

NULL

str(twitter)

List of 1
 $ :Classes 'PlainTextDocument', 'TextDocument', 'character'  atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For

Governance From Campaigner-in-chief: President obama jumps campaign 09tuesday.. http://bit.ly/2eHMaN;Florida;USA;FL;;;27.6648274;-81.515753512210;10:47:37;20;10;2009;David_Stringer;William Hague heading Washingtonmeets Gen. Jim Jones, Sen. John McCain others. Will Obama team raise

worries  EU ties?;London, England;United Kingdom;Greater
London;Westminster;;51.5001524;-0.1262362
12355;10:47:53;20;10;2009;Singsabit;RT @Drudge_Report PAPER: Excuses wearing
thin  Obama, media pals... http://tinyurl.com/yfw6cd9;So.
California;USA;CA;;;36.778261;-119.4179324
12407;10:47:59;20;10;2009;obamavideonews;Obama News Obama   Afghanistan
troop decision timing (AFP) : AFP - Pres.. http://bit.ly/3KPUr8 #obama
#video;USA;USA;;;;37.09024;-95.712891 ...

.. ..- attr(*, "Author")= chr(0).. ..- attr(*, "DateTimeStamp")= POSIXlt[1:9], format: "2009-10-31

04:46:56"

.. ..- attr(*, "Description")= chr(0).. ..- attr(*, "Heading")= chr(0).. ..- attr(*, "ID")= chr "1"

  .. ..- attr(*, "Language")= chr "en"
  .. ..- attr(*, "LocalMetaData")= list()

.. ..- attr(*, "Origin")= chr(0)- attr(*, "CMetaData")=List of 3

  ..$ NodeID  : num 0
  ..$ MetaData:List of 2
  .. ..$ create_date: POSIXlt[1:9], format: "2009-10-31 04:46:56"
  .. ..$ creator    : Named chr ""
  .. .. ..- attr(*, "names")= chr "LOGNAME"
  ..$ Children: NULL
  ..- attr(*, "class")= chr "MetaDataNode"
 - attr(*, "DMetaData")='data.frame':   1 obs. of  1 variable:
  ..$ MetaID: num 0
 - attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"

It contains tweets but in many languages. The "columns" are separated by
semi-colons. I am using the tm package and it is a "corpus".

It looks like this:

547282;06:37:17;21;10;2009;dani_jade18;@Laura_Whyte1   day
:p;Huddersfield/Lincoln;United
Kingdom;Kirklees;Kirklees;;53.6468475;-1.7727296
547283;06:37:17;21;10;2009;fabiomafra;alguém traz mais lenha pro computador
da facool? BOM DIA.;Belo Horizonte - MG -
BR;Brazil;MG;;;-19.8157306;-43.9542226
547284;06:37:17;21;10;2009;romanotr;Вау, "Репортеры без границ" опубликовали
список стран со свободой слова, из 173 Грузия на 81 месте опережая Украину.
Успехи,успехи...;Portugal Aveiro;Portugal;Aveiro;;;40.6411848;-8.6536169
547285;06:37:18;21;10;2009;Y_T_;Playing: Beth Orton &lt\;Someone's
Daughter&gt\;;Kanazawa, Japan;Japan;Ishikawa
Prefecture;;;36.5613254;136.6562051
Error: invalid input
'547286;06:37:18;21;10;2009;Atogey;æ”¯æŒä½ 
ï¼Œå›½å®¶éœ€è¦ä»–ä»¬ï¼Œä½†æ˜¯å›½å®¶çš„æœªæ¥ä¸èƒ½é ä»–ä»¬â€¦RT
@zuola ï¿¿æˆ‘è§‰å¾— @wenyunc

I want to convert it to "fields" or columns and so I thought I should
convert it to a dataframe. I tried

twitterDF<-as.data.frame(twitter)

Error in sort.list(y) :invalid input

'547286;06:37:18;21;10;2009;Atogey;æ”¯æŒä½ 
ï¼Œå›½å®¶éœ€è¦ä»–ä»¬ï¼Œä½†æ˜¯å›½å®¶çš„æœªæ¥ä¸èƒ½é ä»–ä»¬â€¦RT
@zuola ï¿¿æˆ‘è§‰å¾— @wenyunchao
ä¸€ç‚¹éƒ½ä¸ä¹è§‚ã€‚çœŸæ£çš„ä¹è§‚åº”è¯¥æ˜¯ï¼šä½ å…³æˆ‘åˆæ€Žä¹ˆæ 
·ï¼Œåæ£æ”¿æ²»æ–—äº‰ä¸ä¼šä¸¢æŽ‰æ€§å‘½ï¼Œè€åå‡ºæ¥åŽæ›´æ˜¯ä¸€æ¡å¥½æ±‰ã€‚åŒ—é£Žè¿˜æ˜¯èˆä¸å¾—*éœ¸åœ°ä½ã€è‚‰ã€ä¹¦ã€å¥³äººå’Œç½‘ç»œçš„ï¼Œä¸è¿‡ç‰¢é‡Œä¸ä¼šæä¾›è¿™äº›ã€‚å¦â€¦;å±±è¥¿ï¼Œæµ™æ±Ÿ;China;Zhejiang;;;28.695035;119.751054'
in 'utf8towcs'

Can anyone suggest what I can do?

P.S. Actually, I would love to remove all the non-English tweets but I have
no clue about how to do that.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] convert list to Dataframe

Reply via email to