On 01/27/2012 10:07 PM, Raphael Bauduin wrote:
On Fri, Jan 27, 2012 at 9:51 AM, Jim Lemon<j...@bitwrit.com.au> wrote:
On 01/27/2012 03:12 AM, Raphael Bauduin wrote:
Hi,
I am a beginner with R, and I think the answer to my question will
seem obvious, but after searching and trying without success I've
decided to post to the list.
I am working with data loaded from a csv filewith these fields:
order_id, item_value
As an order can have multiple items, an order_id may be present
multiple times in the CSV.
I managed to compute the total value and the number of items for each
order:
oli<- read.csv("/tmp/order_line_items_data.csv", header=TRUE)
orders_values<- tapply(oli[[2]], oli[[1]], sum)
items_per_order<- tapply(oli[[2]], oli[[1]], length)
I then can display the histogram of the order values:
hist(orders_values, breaks=c(10*0:20,800), xlim=c(0,200), prob=TRUE)
Now on this histogram, I would like to display the average number of
items of the orders in each group (defined with the breaks).
So for the bar of orders with value 0 to 10, I'd like to display the
average number of items of these orders.
Hi Raph,
As this looks a tiny bit like homework, I'll only provide suggestions. You
This is absolutely not a homework :-)
I'm learning R to try to get some info out of data of a e-commerce website.
have the value and number of items for each order. What you need to do is to
match them in groups. In order to do that, you want a factor that will show
the group for each value-items pair. The "cut" function will give you such a
factor, using the breaks above. You seem to understand the *apply functions,
so you can use one of these to return the mean number of items for each
value group. Alternatively, you could use the factor in the "by" function to
get the mean number of items.
You should now have a factor that can be sent to "table" to get the number
of orders in each value range, and a vector of the corresponding mean
numbers of items in each value grouping. Why you could even use the same
trick to calculate the mean price of the orders in each value grouping...
I would use "barplot" to display all this information, as it is a bit easier
to place the mean number on items on the bars (if you check the return value
for barplot).
Your suggestions helped me get the info I wanted. I still need to
finetune it as I currently generate 2 barplots.
Here's what I've done, in case it can help someone in the future:
#assigns to each entry of orders_values, the range to which is belongs
according to the breaks passed in second arg.
order_value_range<-cut(orders_values, c(10*0:20, 800))
#count number of orders in each range:
orders_number_per_range=tapply(orders_values, order_value_range, length)
#equivalent to table(test_o)
average_number_of_item_per_order_in_range<- tapply(items_per_order,
order_value_range, mean)
barplot(average_number_of_item_per_order_in_range, ylab="Items
number", xlab="Order value")
barplot(orders_number_per_range, ylab="Items number", xlab="Order value")
The next step: combine the two barplots in one.
Thanks already for your help!
Hi Raph,
Okay, what you want to do is to draw one barplot, then use the text
function (or boxed.labels in plotrix) to put the values of items per
order over or (better for not distorting the height relationship) on the
bars. In the barplot function, you can get the x positions of the bars
from the return value, and of course, you know the heights of the bars...
Jim
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.