Re: [R] Dataframe Manipulation

2017-09-04 Thread Hemant Sain
Hello Ulrik,
Can you please explain this code means how and what this code is doing
because I'm not able to understand it, if you can explain it i can use it
in future by doing some Lil bit manipulation.

Thanks


data_help <-
  data_help %>%
  mutate(Purchase_ID = 1:n()) %>%
  group_by(Purchase_ID) %>%
do(split_items(.))

cat_help %>% gather("Foo", "Item") %>%
  filter(!is.na(Item)) %>%
left_join(data_help, by = "Item") %>%
  group_by(Foo, Purchase_ID) %>%
  summarise(Item = paste(Item, collapse = ", ")) %>%
  spread(key = "Foo", value = "Item")

On 31 August 2017 at 13:17, Ulrik Stervbo  wrote:

> Hi Hemant,
>
> the solution is really quite similar, and the logic is identical:
>
> library(readr)
> library(dplyr)
> library(stringr)
> library(tidyr)
>
> data_help <- read_csv("data_help.csv")
> cat_help <- read_csv("cat_help.csv")
>
> # Helper function to split the Items and create a data_frame
> split_items <- function(items){
>   x <- items$Items_purchased_on_Receipts %>%
> str_split(pattern = ",") %>%
> unlist(use.names = FALSE)
>
>   data_frame(Item = x, Purchase_ID = items$Purchase_ID)
> }
>
> data_help <-
>   data_help %>%
>   mutate(Purchase_ID = 1:n()) %>%
>   group_by(Purchase_ID) %>%
> do(split_items(.))
>
> cat_help %>% gather("Foo", "Item") %>%
>   filter(!is.na(Item)) %>%
> left_join(data_help, by = "Item") %>%
>   group_by(Foo, Purchase_ID) %>%
>   summarise(Item = paste(Item, collapse = ", ")) %>%
>   spread(key = "Foo", value = "Item")
>
> HTH
> Ulrik
>
> On Wed, 30 Aug 2017 at 13:22 Hemant Sain  wrote:
>
>> by using these two tables we have to create third table in this format
>> where categories will be on the top and transaction will be in the rows,
>>
>> On 30 August 2017 at 16:42, Hemant Sain  wrote:
>>
>>> Hello Ulrik,
>>> Can you please once check this code again on the following data set
>>> because it doesn't giving same output to me due to absence of quantity,a
>>> compare to previous demo data set becaue spiting is getting done on the
>>> basis of quantity and in real data set quantity is missing. so please use
>>> following data set and help me out please consider this mail is my final
>>> email i won't bother you again but its about my job please help me
>>> .
>>>
>>> Note* the file I'm attaching is very confidential
>>>
>>> On 30 August 2017 at 15:02, Ulrik Stervbo 
>>>  wrote:
>>>
 Hi Hemant,

 Does this help you along?

 table_1 <- textConnection("Item_1;Item_2;Item_3
 1KG banana;300ML milk;1kg sugar
 2Large Corona_Beer;2pack Fries;
 2 Lux_Soap;1kg sugar;")

 table_1 <- read.csv(table_1, sep = ";", na.strings = "",
 stringsAsFactors = FALSE, check.names = FALSE)

 table_2 <- 
 textConnection("Toiletries;Fruits;Beverages;Snacks;Vegetables;Clothings;Dairy
 Products
 Soap;banana;Corona_Beer;King Burger;Pumpkin;Adidas Sport Tshirt XL;milk
 Shampoo;Mango;Red Label Whisky;Fries;Potato;Nike Shorts Black L;Butter
 Showergel;Oranges;grey Cocktail;cheese pizza;Tomato;Puma Jersy red
 M;sugar
 Lux_Soap;;2 Large corona Beer;;Cheese;Toothpaste")

 table_2 <- read.csv(table_2, sep = ";", na.strings = "",
 stringsAsFactors = FALSE, check.names = FALSE)

 library(tidyr)
 library(dplyr)

 table_2 <- gather(table_2, "Category", "Item")

 table_1 <- gather(table_1, "Foo", "Item") %>%
   filter(!is.na(Item))

 table_1 <- separate(table_1, col = "Item", into = c("Quantity",
 "Item"), sep = " ")

 table_3 <- left_join(table_1, table_2, by = "Item") %>%
   mutate(Item = paste(Quantity, Item)) %>%
   select(-Quantity)

 table_3 %>%
   group_by(Foo, Category) %>%
   summarise(Item = paste(Item, collapse = ", ")) %>%
   spread(key = "Category", value = "Item")

 You need to figure out how to handle words written with different cases
 and how to get the quantity in an universal way. For the code above, I
 corrected these things by hand in the example data.

 HTH
 Ulrik

 On Wed, 30 Aug 2017 at 10:16 Hemant Sain 
 wrote:

> Hey PIKAL,
> It's not a homework neithe that is the real dataset i have signer NDA
> for
> my company so that i can share the original data file, Actually I'm
> working
> on a market basket analysis task but not able to convert my existing
> data
> table to appropriate format so that i can apply Apriori algorithm
> using R,
> and this is very important me to get it done because I'm an intern and
> if i
> won't get it done they will not  going to hire me as a full-time
> employee.
> i tried everything by myself but not able to get it done.
> your precious 10-15 can save my upcoming years. so please if you can
> please
> help me through this.
> i want another dataset based on first two dataset i have mentioned .
>
> Thanks
>
> On 30 August 2017 at 12:49, PIKAL Petr

[R] [R-pkgs] New package: rDotNet

2017-09-04 Thread Jonathan Shore

I’ve published a package on CRAN called ‘rDotNet’.  rDotNet allows R to access 
.NET libraries. From R one can:

* create .NET objects
* call member functions
* call class functions (i.e. static members)
* access and set properties
* access indexing members

The package will run with either mono on OS X / Linux or the Microsoft .NET VM 
on windows.   Find the source and description of the package on:

https://github.com/tr8dr/.Net-Bridge/blob/master/src/R/rDotNet/ 


And the CRAN link as:

https://cran.r-project.org/web/packages/rDotNet/index.html 


The package is stable, as has been in use for some years, but only now packaged 
up for public use on CRAN.  Feel free to contact with questions or suggestions 
on GitHub or by email.  

Regards
--
Jonathan Shore




[[alternative HTML version deleted]]

___
R-packages mailing list
r-packa...@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-packages
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Merge by Range in R

2017-09-04 Thread Mohammad Tanvir Ahamed via R-help
Hi, 
I have two big data set. 

data _1 : 
> dim(data_1)
[1] 15820 5

> head(data_1)
   Chromosome  StartEndFeature GroupA_3
1:       chr1 521369  75 chr1-0001    0.170
2:       chr1 750001  80 chr1-0002   -0.086
3:       chr1 81  85 chr1-0003    0.006
4:       chr1 850001  90 chr1-0004    0.050
5:       chr1 91  95 chr1-0005    0.062
6:       chr1 950001 100    chr1-0006   -0.016

data_2:
> dim(data_2)
[1] 470870 5

> head(data_2)
   Chromosome Start   EndFeature GroupA_3
1:       chr1 15864 15865 cg13869341    0.207
2:       chr1 18826 18827 cg14008030   -0.288
3:       chr1 29406 29407 cg12045430   -0.331
4:       chr1 29424 29425 cg20826792   -0.074
5:       chr1 29434 29435 cg00381604    0.141
6:       chr1 68848 68849 cg20253340   -0.458


What I want to do : 
Based on column name "Chromosome", "Start" and "End" of two data set ,   I want 
to find which row (preciously "Feature") of data_2 is in every range ( between 
"Start" and "End") of data_1 ? Also "Chromosome" column element should be match 
between two data set. 

I have tried "GenomicRanges" packages describe in the post  
https://stackoverflow.com/questions/11892241/merge-by-range-in-r-applying-loops
But i was not successful. Can any one please help me to do this fast, as the 
data is very big ? 
Thanks in advance.


Regards.
Tanvir Ahamed Stockholm, Sweden |  mashra...@yahoo.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Merge by Range in R

2017-09-04 Thread jim holtman
Have you tried 'foverlaps' in the data.table package?


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Mon, Sep 4, 2017 at 8:31 AM, Mohammad Tanvir Ahamed via R-help <
r-help@r-project.org> wrote:

> Hi,
> I have two big data set.
>
> data _1 :
> > dim(data_1)
> [1] 15820 5
>
> > head(data_1)
>Chromosome  StartEndFeature GroupA_3
> 1:   chr1 521369  75 chr1-00010.170
> 2:   chr1 750001  80 chr1-0002   -0.086
> 3:   chr1 81  85 chr1-00030.006
> 4:   chr1 850001  90 chr1-00040.050
> 5:   chr1 91  95 chr1-00050.062
> 6:   chr1 950001 100chr1-0006   -0.016
>
> data_2:
> > dim(data_2)
> [1] 470870 5
>
> > head(data_2)
>Chromosome Start   EndFeature GroupA_3
> 1:   chr1 15864 15865 cg138693410.207
> 2:   chr1 18826 18827 cg14008030   -0.288
> 3:   chr1 29406 29407 cg12045430   -0.331
> 4:   chr1 29424 29425 cg20826792   -0.074
> 5:   chr1 29434 29435 cg003816040.141
> 6:   chr1 68848 68849 cg20253340   -0.458
>
>
> What I want to do :
> Based on column name "Chromosome", "Start" and "End" of two data set ,   I
> want to find which row (preciously "Feature") of data_2 is in every range (
> between "Start" and "End") of data_1 ? Also "Chromosome" column element
> should be match between two data set.
>
> I have tried "GenomicRanges" packages describe in the post
> https://stackoverflow.com/questions/11892241/merge-by-
> range-in-r-applying-loops
> But i was not successful. Can any one please help me to do this fast, as
> the data is very big ?
> Thanks in advance.
>
>
> Regards.
> Tanvir Ahamed Stockholm, Sweden |  mashra...@yahoo.com
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] JSM 2018 Invited Session Proposals on Statistical Graphics and Data Visualization Due by September 7, 2017

2017-09-04 Thread isabella
Dear Colleagues,

If you work in the statistical graphics and/or data visualization fields, 
please consider organizing an invited session for the JSM 2018 conference in 
Vancouver, whose 
theme is “#LeadWithStatistics.”

ASA's Section on Statistical Graphics will sponsor 3 invited sessions at JSM 
2018, with a further 1-2 proposals having the potential to be included in the 
JSM 2018 
conference program through open competition. 

Invited session proposals need to be submitted by September 7th, 2017 via the 
website: http://ww2.amstat.org/meetings/jsm/2018/submissions.cfm. When 
submitting 
your proposal, please list the ASA Section on Statistical Graphics as the 
sponsor of your invited session. 

Invited sessions include invited papers and panels:

* Invited paper sessions consist of 2–6 presenters and/or discussants.
* Invited panels have 3–6 panelists providing commentary on a particular topic.

An invited session proposal includes a session title, general description of 
the session, list of participants, and tentative talk titles.

If you are interested in organizing an invited session, you need to select a 
session topic and solicit potential speakers. Once you have a sufficient number 
of committed 
speakers, you can submit your proposal online by the September 7, 2017 deadline.

To have the best chance of receiving an invited session slot, you need to: 

* Have solid new work in an important field;
* Know some of your competitors working in the same field;
* Be willing to reach out to your competitors and forge a session with energy 
in it.

For more details, please refer to 
http://ww2.amstat.org/meetings/jsm/2018/invitedsessions.cfm. 

Many thanks, 

Isabella

Isabella R. Ghement, Ph.D. 
JSM 2018 Program Chair for the ASA Section on Statistical Graphics
E-mail: isabe...@ghement.ca




Isabella R. Ghement, Ph.D.
Ghement Statistical Consulting Company Ltd.
301-7031 Blundell Road, Richmond, B.C., Canada, V6Y 1J5
Tel: 604-767-1250
Fax: 604-270-3922
E-mail: isabe...@ghement.ca
Web: www.ghement.ca

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Sample size calculation for three-way incomplete block crossover study.

2017-09-04 Thread Jomy Jose
Hi

In R,how to do sample size calculation for three-way incomplete block
crossover study where within subject residual standard deviation,treatment
difference and power is given.

Thanks in advance.

Regards
Jose

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.