[R] how to find end of a FASTA file

2012-09-13 Thread mail me
Hi:

I am trying to find end of a FASTA file:

library(ShortRead)
fastadata <- readFasta("fastafolder", "fa$")
file <- tempfile()
writeFasta(fastadata, file)
var1 <- readLines(file)
while(countlength(tmp <- readLines(file, n = -1)) > 0)  {
#do something
}

I want the while loop to run till the end of file is reached, but the
while statement dosent work. Thanks for help.

Regards
Jac

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] PCA on high dimentional data

2011-12-10 Thread mail me
Hi:

I have a large dataset mydata, of 1000 rows and 1000 columns. The rows
have gene names and columns have condition names (cond1, cond2, cond3,
etc).

mydata<- read.table(file="c:/file1.mtx", header=TRUE, sep="")

I applied PCA as follows:

data_after_pca<- prcomp(mydata, retx=TRUE, center=TRUE, scale.=TRUE);

Now i get 1000 PCs and i choose first three PCs and make a new data frame

new_data_frame<- cbind(data_after_pca$x[,1], data_after_pca$x[,2],
data_after_pca$x[,3]);

After the PCA, in the new_data_frame, i loose the previous cond1,
cond2, cond3 labels, and instead have PC1, PC2, PC3 as column names.

My question is, is there any way I can map the PC1, PC2, PC3 to the
original conditions, so that i can still have a reference to original
condition labels after PCA?

Thanks:
deb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] data generation

2012-03-04 Thread mail me
Hi:
I am trying to generate data form a simple linear regression model.
The training data
 T = {(x1, y1), . . . , (xn), yn}, want to sample x uniformly from the
range [0,1],  find uncorrupted response y = x^2, and generate random
noise "e" from normal distribution N(0, 1). Any idea how to do in
simple steps?

Thanks in advance.
deb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] word frequency count

2012-03-18 Thread mail me
Hi:

I have a dataframe containing comma seperated group of words such as

milk,bread
bread,butter
beer,diaper
beer,diaper
milk,bread
beer,diaper

I want to output the frequency of occurrence of comma separated words
for each row and collapse duplicate rows, to make the output as shown
in the following dataframe:

milk,bread   2
bread,butter 1
beer,diaper  3
milk,bread   2

Thanks for help!

deb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] word frequency count

2012-03-18 Thread mail me
Hi:

Suppose I create the dataframe df using the following code:

df <- data.frame( item1 = c('milk',
'bread','beer','beer','milk','beer'), item2 =c('bread',
'butter','diaper','diaper','bread', 'diaper'), stringsAsFactors = F);


df

 item1  item2
1  milk  bread
2 bread butter
3  beer diaper
4  beer diaper
5  milk  bread
6  beer diaper

And now i want the following output:

milk,bread   2
bread,butter 1
beer,diaper  3
milk,bread   2

and "milk,bread" is a single datum. I hope this clarifies the problem!

Thanks!



On 3/18/12, John Kane  wrote:
> ? table
>
> First however confirm "that milk,bread" is a single datum. str() should do
> this
>
> Can you post a sample of the data here using dput()?
>
> John Kane
> Kingston ON Canada
>
>
>> -Original Message-
>> From: mailme...@googlemail.com
>> Sent: Sun, 18 Mar 2012 13:12:48 +0200
>> To: r-help@r-project.org
>> Subject: [R] word frequency count
>>
>> Hi:
>>
>> I have a dataframe containing comma seperated group of words such as
>>
>> milk,bread
>> bread,butter
>> beer,diaper
>> beer,diaper
>> milk,bread
>> beer,diaper
>>
>> I want to output the frequency of occurrence of comma separated words
>> for each row and collapse duplicate rows, to make the output as shown
>> in the following dataframe:
>>
>> milk,bread   2
>> bread,butter 1
>> beer,diaper  3
>> milk,bread   2
>>
>> Thanks for help!
>>
>> deb
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> 
> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your
> desktop!
> Check it out at http://www.inbox.com/marineaquarium
>
>
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] word frequency count

2012-03-18 Thread mail me
Hi:
Thanks for reply. I am using the following statement

res <- with(df, table(paste(item1, item2, sep=', ')) )

to get the frequency counts of the rows, which gives the following output:
milk,bread 2
bread,butter 1
beer,diaper 3
milk,bread 2

But I need to extract from the above result two vectors or dataframes
(such as DF1 and DF2) to make the final output as below:

DF1
milk,bread
bread,butter
beer,diaper
milk,bread

DF2
2
1
3
2

Can anyone help? Thanks in advance!




On Sun, Mar 18, 2012 at 4:22 PM, S Ellison  wrote:
> You could do try
> with(df, table(item1:item2) )
> or
> with(df, table(paste(item1, item2, sep=', ')) )
>
> If the order is immaterial, so that (milk, bread) is the same as (bread, 
> milk), there's a bit more work to do. Maybe
>
> table( apply(df, 1, function(x) paste(sort(x))) )
>
> 
> From: r-help-boun...@r-project.org [r-help-boun...@r-project.org] On Behalf 
> Of mail me [mailme...@googlemail.com]
> Sent: 18 March 2012 13:31
> To: r-help
> Subject: Re: [R] word frequency count
>
> Hi:
>
> Suppose I create the dataframe df using the following code:
>
> df <- data.frame( item1 = c('milk',
> 'bread','beer','beer','milk','beer'), item2 =c('bread',
> 'butter','diaper','diaper','bread', 'diaper'), stringsAsFactors = F);
>
>
> df
>
>  item1  item2
> 1  milk  bread
> 2 bread butter
> 3  beer diaper
> 4  beer diaper
> 5  milk  bread
> 6  beer diaper
>
> And now i want the following output:milk,bread   2
> bread,butter 1
> beer,diaper  3
> milk,bread   2

>
> >
> and "milk,bread" is a single datum. I hope this clarifies the problem!
>
> Thanks!
>
>
>
> On 3/18/12, John Kane  wrote:
>> ? table
>>
>> First however confirm "that milk,bread" is a single datum. str() should do
>> this
>>
>> Can you post a sample of the data here using dput()?
>>
>> John Kane
>> Kingston ON Canada
>>
>>
>>> -Original Message-
>>> From: mailme...@googlemail.com
>>> Sent: Sun, 18 Mar 2012 13:12:48 +0200
>>> To: r-help@r-project.org
>>> Subject: [R] word frequency count
>>>
>>> Hi:
>>>
>>> I have a dataframe containing comma seperated group of words such as
>>>
>>> milk,bread
>>> bread,butter
>>> beer,diaper
>>> beer,diaper
>>> milk,bread
>>> beer,diaper
>>>
>>> I want to output the frequency of occurrence of comma separated words
>>> for each row and collapse duplicate rows, to make the output as shown
>>> in the following dataframe:
>>>
>>> milk,bread   2
>>> bread,butter 1
>>> beer,diaper  3
>>> milk,bread   2
>>>
>>> Thanks for help!
>>>
>>> deb
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> 
>> FREE 3D MARINE AQUARIUM SCREENSAVER - Watch dolphins, sharks & orcas on your
>> desktop!
>> Check it out at http://www.inbox.com/marineaquarium
>>
>>
>>
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ***
> This email and any attachments are confidential. Any u...{{dropped:8}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] writing data to file

2012-03-22 Thread mail me
Hi:

I created a data frame

df <- data.frame( person = c('John','Bob','Mary'), team =
c('a','b','c'), stringsAsFactors = F);

and obtained the expected  output

 df
  person   team
1   John  a
2Bob  b
3   Mary  c

now I want to save the whole content of df preserving its row and
column order to a file in disk with the following command:

write(df, file = "testfile",  append=FALSE, sep=" ");

and I get the error message

Error in cat(list(...), file, sep, fill, labels, append) :   argument
1 (type 'list') cannot be handled by 'cat'

Can you help to solve the problem? Thanks in advance.

deb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] how to cluster rows of words in a text file

2012-03-23 Thread mail me
Hi:

I am trying to cluster the rows of a text file with kmeans:

I load the data as follows
file1 <- read.csv("somefile.csv")

and the file can be viewed having the following line of words
> file1

1   word1 word3 word4 word1
2   word1 word4 word3 word1
3   word4 word2 word4 word3
4   word4 word2 word1 word3
5   word2 word2 word4 word2

file_as_matrix <- as.matrix(file1);

Now, I want to apply some clustering algorithm such as kmeans to
cluster the rows  in the file to get the following output:

Cluster1
  word1 word3 word4 word1
  word1 word4 word3 word1


Cluster2
   word4 word2 word4 word3
   word4 word2 word1 word3
   word2 word2 word4 word2

 But as kmeans takes as input numeric matrix of data, it cannot be
used to cluster the rows in this case.
Is there any simple way to cluster the rows of such a text file? An
example code would be really useful.


Thanks and regards:
debb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] create waveform sawtooth

2012-03-26 Thread mail me
Hi:

I am trying to create a sawtooth waveform. I used the following

x <- runif(500, min = -2, max = 2)
y <- (1 -abs(x3))* ((x3) <= 1)
combined <- data.frame(x = x3, y = y3)
plot(combined)

and I get a triangular waveform, not sawtooth. Can someone give a
solution to create a sawtooth waveform?

Thanks
deb

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.