Re: [R] Burt table from word frequency list

Joan-Josep Vallbé Mon, 30 Mar 2009 07:08:45 -0700

Thank you very much for all your comments, and sorry for the confusionof my messages. My corpus is a collection of responses to an openquestion from a questionnaire. Since my intention is not to creategroups of respondents but to treat all responses as a "wholediscourse" on a particular issue so that I can find out different"semantic contexts" within the text. I have all the responses in asingle document, then I want to split it into strings of (specified) nwords. The resulting semantic contexts would be sets of (correlated)word-strings containing particularly relevant (correlated) words.

I guess I must dive deeper into the "ca" and "tm" packages. Any otherideas will be really welcomed.


best,

Pep Vallbé





On Mar 30, 2009, at 2:05 PM, Alan Zaslavsky wrote:

Maybe not terribly hard, depending on exactly what you need.Suppose you turn your text into a character vector 'mytext' ofwords. Then for a table of words appearing delta words apart(ordered), you can table mytext against itself with a lag:
nwords=length(mytext)
burttab=table(mytext[-(1:delta)],mytext[nwords+1-(1:delta)])
Add to its transpose and sum over delta up to your maximum distanceapart. If you want only words appearing near each other within thesame sentence (or some other unit), pad out the sentence break withat least delta instances of a dummy spacer:
the cat chased the greedy rat SPACER SPACER SPACER the dog chasedthe
   clever cat
This will count all pairings at distance delta; if you want to countonly those for which this was the NEAREST co-occurence (so
   the cat and the rate chased the dog
would count as two at delta=3 but not one at delta=6) it will betrickier and I'm not sure this approach can be modified to handle it.
Date: Sun, 29 Mar 2009 22:20:15 -0400
From: "Murray Cooper" <myrm...@earthlink.net>
Subject: Re: [R] Burt table from word frequency list
The usual approach is to count the co-occurence within so manywords of
each other.  Typical is between 5 words before and 5 words after a
given word.  So for each word in the document, you look for the
occurence of all other words within -5 -4 -3 -2 -1 0 1 2 3 4 5 words.
Depending on the language and the question being asked certain words
may be excluded.
This is not a simple function! I don't know if anyone has done a
package, for this type of analysis but with over 2000 packagesfloating
around you might get lucky.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Burt table from word frequency list

Reply via email to