Re: [R] Estimating Mean of a Poisson Distribution Based on interval censoring

mohsen hs via R-help Tue, 16 Feb 2016 06:26:31 -0800

Hi Terry,
Thank you for your reply and your time. I really appreciate your time and I 
know that you should be very busy.
Please forgive me for any possible technical mistake as I am not a professional 
in statistic.
R has a package with the name of fitdist that gave me the estimation. It seems 
that lifereg does not support Poisson at list without any specific modification,
(https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sect019.htm).
Please let me explain my problem which you can find below. Since, you are 
interested to know the problem, I tried to explain the problem clearly. Please 
forgive me if it is too long:( . I really appreciate your time, consideration, 
and feedback.



Many thanks,Mohsen
We collected the data from some devices and they are able to count the data 
within a minute. For example, we had 10 observations in 12:01, 15 in 12:02, and 
so forth. I randomly distributed the data (10 obs) in the first 60 sec, and the 
15 obs in (60 to 120 sec) and count the interarrival of obs. In this case, I 
got a result like this:1, 2002, 4003, 150The first row means I had 200 objects 
that arrived with the difference of 1 second, 400 objects with the difference 
of two seconds (interarrival rate was 2 sec for 400 objects), and so on. To do 
that, I wrote a function to repeat 1, two hundred times, repeat 2, four hundred 
times, and ... Now, I would like to investigate which distribution they follow. 
Since some of the devices were faulty and  for some other reasons such as tails 
of QQ plot, I decided to censor some of the data, and those are 
interarrival(obs)<3 sec and interarrival(obs)>500 sec. 

On a different topic related to difference between R and SAS, I have provided 
you with the mu, sigma, and codes.

In SAS and R, I used interval censoring with the boundaries of 3, and 500 and 
formed a table like this:

        count & t1  & t2           1     & 1   & 3             1     & 1   & 3  
           2     & 2   & 3             3     & 3   & 3            3     & 3   & 
3             4     & 4   & 4             ..    & ..  & ..            500   & 
500 & 500          501   & 500 & 501           ..    & ..  & ..           39011 
& 500 & 39011

count is the observed value and is the response variable, t1 is the bound of 
right censoring and t2 is the bound of left censoring. This is done by the 
following command in SAS
 loss count/ lc=t2 rc=t1;

SAS estimated:
 $\mu=3.67361$ , $\sigma=1.45265$ 
http://support.sas.com/documentation/cdl/en/etsug/63939/HTML/default/viewer.htm#etsug_severity_sect037.htm
      The LOSS statement specifies the left and right boundary of each group as 
the       right-censoring and left-censoring limits, respectively.       lc=t2 
rc=t1;      t1   t2      1    3      1    3      500  600      500  4500
SAS code,/////////////////////////////
                 data df;                                                       
                                                                             
infile "/home/Username/PathAll_TWOMONTH _BothDirection715_2.csv"                
                                                              LRECL=10000000  
DLM=',' firstobs=2;                                                             
                                       input                                    
                                                                                
              timenumn count right left                                         
                                                                     ;          
                                                                                
                                            run;       data t1(drop=count left 
right rename=(timenumn=t1)) ;                                                   
                                                                              
set df;                                                                         
                                                             do i = 1 to count; 
                                                                                
                                          output;                               
                                                                                
                       end;                                                     
                                                                                
    run;           data t2(drop=count left right rename=(timenumn=t2));         
                                                                                
                                             set df;                            
                                                                                
                          do i = 1 to count;                                    
                                                                                
       output;                                                                  
                                                                    end;        
                                                                                
                                                 run;           data 
count(drop=count left right rename=(timenumn=count)) ;                          
                                                                                
                           set df;                                              
                                                                                
        do i = 1 to count;                                                      
                                                                     output;    
                                                                                
                                                  end;                          
                                                                                
                               run;                     data t1;     set t1;    
 /*If t1 < 3 then t1 = 1;     if t1>1000 then t1=1000;*/     if (t1<3 ) then 
t1=1;     if(t1>2) then t1=t1;     if(t1>500) then t1=500;     run;          
data t2;     set t2;     /*If t2 < 3 then t2 = 5;     if t2 >1000 then t2=t2;*/ 
    if(t2<3) then t2=3;     /*if(t2<501) then t2=t2;     if (t2>500 ) then 
t2=500;*/     run;     ;     /*     data count;     set count;     if count 
<100005 then count=1     run;     ;     */     data final;     merge t1 t2 
count;     run;          proc severity data=final print=all plots(histogram 
kernel)=all     criterion=ad;     loss count/ lc=t2 rc=t1;           dist logn; 
/* _predef_; */     run;

| Parameter Estimates |
| Parameter | Estimate | Standard
Error | t Value | Approx
Pr > |t| |
| Mu | 3.67361 | 0.00859 | 427.50 | <.0001 |
| Sigma | 1.45265 | 0.00615 | 236.07 | <.0001 |

////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
I decided to compare the $\mu$ and $\sigma$ values from R and SAS. I got the 
following values from R which is slightly different:
$\mu=3.653001$$\sigma=1.497570$
////////R codelibrary(fitdistrplus)df= read.csv 
("E:/Motorway-Urban/hour/PathAll_TWOMONTH 
_BothDirection715_2.csv")z=rep(df$timenum,time=df$count)y<-zycens <- 
data.frame(left=y,right=y)max=27219ct=maxfor(i in max:28666 ){    
ycens$right[ct]=y[ct]    ycens$left[ct]=500    ct=ct+1}
ct=1;for(i in 1:28666 ){        if( ycens$left[i]<3)    {        
ycens$left[ct]=1            }    
ct=ct+1}fitlnc<-fitdistcens(ycens,"lnorm")Fitting of the distribution ' lnorm ' 
on censored data by maximum likelihood Parameters:        estimatemeanlog 
3.653001sdlog   1.497570
   

    On Tuesday, February 16, 2016 1:38 AM, "Therneau, Terry M., Ph.D." 
<thern...@mayo.edu> wrote:


 For an interval censored poisson or lognormal, use survreg() in the survival 
package.  (Or 
if you are a SAS fan use proc lifereg).  If you have a data set where R and SAS 
give 
different answers I'd like to know about it, but my general experience is that 
this is 
more often a user error.  I am also curious to learn exactly what you mean by 
"interval 
censored poisson".  Exponential with interval time to first event is equivalent 
to 
poisson, which is what I'd guess from "lognormal", but you may mean something 
else.


Terry Therneau
(author of survival)



On 02/14/2016 05:00 AM, r-help-requ...@r-project.org wrote:
> Dear all,
> I appreciate that if you let me know if there is any package implemented in R 
> for Estimating Mean of a Poisson Distribution Based on Interval censoring? 
> And if yes, could you please provide some information about it:)?
> By the way, is there anything for lognormal?I think fitdistcens is not good 
> for this purpose as it gives me different result compared to SAS and only 
> useful for right/left censoring and not interval censoring?(or both left and 
> right together).?
> Kind regards,Mohsen


  
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Estimating Mean of a Poisson Distribution Based on interval censoring

Reply via email to