On 01/27/2012 03:12 AM, Raphael Bauduin wrote:
Hi,

I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.

I am working with data loaded from a csv filewith these fields:
   order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.

I managed to compute the total value  and the number of items for each order:

   oli<- read.csv("/tmp/order_line_items_data.csv", header=TRUE)
   orders_values<- tapply(oli[[2]], oli[[1]], sum)
   items_per_order<- tapply(oli[[2]], oli[[1]], length)

I then can display the histogram of the order values:

   hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)

Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.

Hi Raph,
As this looks a tiny bit like homework, I'll only provide suggestions. You have the value and number of items for each order. What you need to do is to match them in groups. In order to do that, you want a factor that will show the group for each value-items pair. The "cut" function will give you such a factor, using the breaks above. You seem to understand the *apply functions, so you can use one of these to return the mean number of items for each value group. Alternatively, you could use the factor in the "by" function to get the mean number of items.

You should now have a factor that can be sent to "table" to get the number of orders in each value range, and a vector of the corresponding mean numbers of items in each value grouping. Why you could even use the same trick to calculate the mean price of the orders in each value grouping...

I would use "barplot" to display all this information, as it is a bit easier to place the mean number on items on the bars (if you check the return value for barplot).

Jim

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to