Hello,

I don't understand why you are splitting data1 and then unlisting the result.

if you want to apply a modeling function to each of the subdf's, split by Product name, you can follow more or less these steps:

0. Create a dataset

set.seed(9376)    # Make the results reproducible

n <- 100
PN <- c("Target Brand", "3M", "Avery")
data1 <- data.frame(Product_name = sample(PN, n, TRUE),
                    Year_of_Record = sample(2011:2018, n, TRUE),
                    Sales = runif(n, 10, 1000),
                    Region = sample(letters[1:5], n, TRUE)
                    )

head(data1)


1. Split the dataset by product name. Thsi gives a list of subdf's.


X <- split(data1, data1$Product_name)


2. Now lappy a modeling function to each subdf.


modelFun <- function(DF){

    lm(Sales ~ Region, data = DF)

}

model_list <- lapply(X, modelFun )
model_smry <- lapply(model_list, summary)
model_smry[[1]]
#
#Call:
#  lm(formula = Sales ~ Region, data = DF)
#
#Residuals:
#  Min      1Q  Median      3Q     Max
#-487.41 -196.17    1.76  195.96  498.48
#
#Coefficients:
#  Estimate Std. Error t value Pr(>|t|)
#(Intercept)  437.300    108.147   4.044 0.000355 ***
#  Regionb      437.019    167.540   2.608 0.014229 *
#  Regionc      102.989    179.341   0.574 0.570217
#Regiond      105.520    152.942   0.690 0.495721
#Regione       -5.638    138.342  -0.041 0.967773
#---
#  Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
#Residual standard error: 286.1 on 29 degrees of freedom
#Multiple R-squared:  0.2426,    Adjusted R-squared:  0.1381
#F-statistic: 2.322 on 4 and 29 DF,  p-value: 0.08039

Hope this helps,


Rui Barradas


Às 16:54 de 01-06-2018, nguy2952 University of Minnesota escreveu:
Hello folks,

I have a big project to work on and the dataset is classified so I am just
going to use my own example so everyone can understand what I am targeting.

Let's take Target as an example: We consider three brands of tape: Target
brand, 3M and Avery. The original data frame has 4 columns: Year of Record,
Product_Name(which contains three brands of tape), Sales, and Region. I
want to create a new data frame that looks like this:

                       Year of Record       Sales     Region
   Target Brand
   3M
   Avery

Here is what I did.

    1.

    I split the original data frame which I called data1:

    X = split(data1, Product_name)

    2.

    Unlist X

    X1 = unlist(X)

    3.

    Create a new data frame

    new_df = as.data.frame(X1)


But, when I used the command View(new_df), I had only two columns: The left
one is similar to TargetBrand.Sales, etc. and the right one is just "X1"

I did not achieve what I wanted.

**A potentially big question from readers:*

Why am I doing this?

*Answer:*

I want to run a multiple regression model later to see among different
regions, what the sales look like for these three brands of tape:

*Does Mid-west buy more house brand than East Coast?*

or

*Does region really affect the sales? Are Mid-West's purchases similar to
those of East Coast and West Coast?*

I need help. Please give me guidance.

Sincerely,
Hugh N

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to