Re: [R] I need help making a data.fame comprised of selected columns of an original data frame.

Ted Byers Fri, 16 Jul 2010 09:33:40 -0700

Hi Steve,

Thanks


Here is a tiny subset of the data:
> dput(head(moreinfo, 40))
structure(list(m_id = c(171, 206, 206, 206, 206, 206, 206, 218,
224, 224, 227, 229, 229, 229, 229, 229, 229, 229, 229, 233, 233,
238, 238, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251, 251,
251, 251, 251, 251, 251, 251), sale_date = c("2008-04-25 07:41:09",
"2008-05-09 20:58:12", "2008-09-06 19:51:52", "2008-05-01 21:26:40",
"2008-08-06 23:53:17", "2008-05-29 18:44:50", "2008-05-16 16:10:52",
"2008-12-30 17:59:54", "2008-11-06 18:15:40", "2008-09-05 17:43:51",
"2008-10-31 21:55:52", "2008-04-30 21:30:36", "2008-11-11 00:43:54",
"2008-07-24 22:26:29", "2008-10-07 17:57:22", "2008-04-23 20:39:41",
"2008-09-08 22:42:12", "2008-11-13 00:09:59", "2008-04-15 22:57:31",
"2008-07-05 08:52:58", "2008-10-04 13:17:02", "2008-03-20 23:02:12",
"2008-08-08 16:48:42", "2008-06-04 04:31:20", "2008-09-27 07:02:14",
"2008-09-08 07:16:39", "2008-09-25 07:09:11", "2008-09-23 07:02:39",
"2008-08-09 07:31:46", "2008-09-28 07:02:13", "2008-07-05 07:26:46",
"2008-05-11 04:01:55", "2008-06-26 07:46:17", "2008-07-09 07:36:16",
"2008-07-21 18:36:44", "2008-10-11 07:01:36", "2008-07-21 19:03:42",
"2008-05-07 04:21:23", "2008-10-14 07:07:02", "2008-05-12 04:26:21"
), sale_year = c(2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L, 2008L,
2008L, 2008L, 2008L, 2008L, 2008L, 2008L), sale_week = c(16L,
18L, 35L, 17L, 31L, 21L, 19L, 52L, 44L, 35L, 43L, 17L, 45L, 29L,
40L, 16L, 36L, 45L, 15L, 26L, 39L, 11L, 31L, 22L, 38L, 36L, 38L,
38L, 31L, 39L, 26L, 19L, 25L, 27L, 29L, 40L, 29L, 18L, 41L, 19L
), return_type = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), elapsed_time = c(1e-04, 1e-04, 3.0001, 4.0001,
21.0001, 5.0001, 24.0001, 1.0001, 8.0001, 1e-04, 1e-04, 8.0001,
14.0001, 55.0001, 35.0001, 1e-04, 1e-04, 4.0001, 1e-04, 2.0001,
5.0001, 1e-04, 52.0001, 4.0001, 28.0001, 49.0001, 34.0001, 72.0001,
5.0001, 53.0001, 128.0001, 8.0001, 2.0001, 55.0001, 1.0001, 12.0001,
46.0001, 30.0001, 12.0001, 12.0001)), .Names = c("m_id", "sale_date",
"sale_year", "sale_week", "return_type", "elapsed_time"), row.names = c("1",

"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35",
"36", "37", "38", "39", "40"), class = "data.frame")
>

The full dataset has almost 200,000 observations!  That is why I hadn't
posted the raw data.  And m_id_default_res is even bigger because it
includes all the original data along with the computed stats.


Yes, the following line you pointed out has a typo:

ndf$n[i] = m_id_default_res[i]

It should have been

ndf$n[i] = m_id_default_res$n[i]

Correcting that makes the error go away, but at the end of the loop, ndf is
said to have 0 columns and 0 rows.  That I don't understand.

But your statement (as corrected for the right source name) below does what
I'd intended.
ndf <- m_id_default_res[, c('mid', 'estimate', 'sd', 'loglik', 'aic','bic',
'chisq', 'chisqpvalue', 'chisqdf')]

Thanks

Ted


On Fri, Jul 16, 2010 at 12:04 PM, Steve Lianoglou <
mailinglist.honey...@gmail.com> wrote:

> Hi,
>
> First: it's kind of hard to play along w/o some reproducible data. To
> that end, you can paste into an email the output of:
>
> dput(moreinfo)
>
> If there are lots of rows in `moreinfo`, just give us the first ~10-20
>
> dput(head(moreinfo, 20))
>
> Anyway:
>
> <snip>
> > At this point, each row in m_id_default_res corresponds to one data.frame
> > produced by fitdist.  When I print it, I get the output I expected.
> > However, I need to store only some of it into my DB.
> >
> > And then, because fitdist produces a data frame that includes a lot of
> info
> > I don't need to store in the DB, I tried making a new data.frame
> containing
> > only the info I need as follows:
> > ndf = data.frame()
> > for (i in 1:length(m_id_default_res[,1])) {
> >  ndf$mid[i] = m_id_default_res$mid[i]
> >  ndf$estimate[i] = m_id_default_res$estimate[i]
> >  ndf$sd[i] = m_id_default_res$sd[i]
> >  ndf$n[i] = m_id_default_res[i]
> >  ndf$loglik[i] = m_id_default_res$loglik[i]
> >  ndf$aic[i] = m_id_default_res$aic[i]
> >  ndf$bic[i] = m_id_default_res$bic[i]
> >  ndf$chisq[i] = m_id_default_res$chisq[i]
> >  ndf$chisqpvalue[i] = m_id_default_res$chisqpvalue[i]
> >  ndf$chisqdf[i] = m_id_default_res$chisqdf[i]
> > }
>
> Forget the for loop. How about:
>
> ndf <- m_id_default[, c('mid, 'estimate', 'sd', 'loglik', 'aic',
> 'bic', 'chisq', 'chisqpvalue', 'chisqdf')
>
> Having just written that, I see something strange in your for loop.
> Specifically this line:
>
> >  ndf$n[i] = m_id_default_res[i]
>
> m_id_default_res is a data.frame, right? Why don't you try to see what
> `m_id_default_res[1]` returns.
>
> I'm not sure that that's what your error message is coming from, but I
> foresee this to be a problem anyway, if I follow your "build up" code
> correctly.
>
> Hope that helps,
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: 
> http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] I need help making a data.fame comprised of selected columns of an original data frame.

Reply via email to