So - as you can see, your data can be modelled. Now the interesting question is: what do you do with that knowledge. I know nearly nothing about your domain, but given that the data looks log-normal, I am curious abut the following:
- Most of the events are in the small-loss category. But most of the damage is done by the rare large losses. Is it even meaningful to guard against a single 1/1000 event? Shouldn't you be saying: my contingency funds need to be large enough to allow survival of, say, a fiscal year with 99.9 % probability? This is a very different question. - If a loss occurs, in what time do the funds need to be replenished? Do you need to take series of events into account? - The model assumes that the data are independent. This is probably a poor (and dangerous) assumption. Cheers, B. On Jul 22, 2015, at 3:56 PM, Ben Bolker <bbol...@gmail.com> wrote: > Amelia Marsh <amelia_marsh08 <at> yahoo.com> writes: > > >> Hello! (I dont know if I can raise this query here on this forum, >> but I had already raised on teh finance forum, but have not received >> any sugegstion, so now raising on this list. Sorry for the same. The >> query is about what to do, if no statistical distribution is fitting >> to data). > >> I am into risk management and deal with Operatioanl risk. As a part >> of BASEL II guidelines, we need to arrive at the capital charge the >> banks must set aside to counter any operational risk, if it >> happens. As a part of Loss Distribution Approach (LDA), we need to >> collate past loss events and use these loss amounts. The usual >> process as being practised in the industry is as follows - > >> Using these historical loss amounts and using the various >> statistical tests like KS test, AD test, PP plot, QQ plot etc, we >> try to identify best statistical (continuous) distribution fitting >> this historical loss data. Then using these estimated parameters >> w.r.t. the statistical distribution, we simulate say 1 miliion loss >> anounts and then taking appropriate percentile (say 99.9%), we >> arrive at the capital charge. > >> However, many a times, loss data is such that fitting of >> distribution to loss data is not possible. May be loss data is >> multimodal or has significant variability, making the fitting of >> distribution impossible. Can someone guide me how to deal with such >> data and what can be done to simulate losses using this historical >> loss data in R. > > A skew-(log)-normal fit doesn't look too bad ... (whenever you > have positive data that are this strongly skewed, log-transforming > is a good step) > > hist(log10(mydat),col="gray",breaks="FD",freq=FALSE) > ## default breaks are much coarser: > ## hist(log10(mydat),col="gray",breaks="Sturges",freq=FALSE) > lines(density(log10(mydat)),col=2,lwd=2) > library(fGarch) > ss <- snormFit(log10(mydat)) > xvec <- seq(2,6.5,length=101) > lines(xvec,do.call(dsnorm,c(list(x=xvec),as.list(ss$par))), > col="blue",lwd=2) > ## or try a skew-Student-t: not very different: > ss2 <- sstdFit(log10(mydat)) > lines(xvec,do.call(dsstd,c(list(x=xvec),as.list(ss2$estimate))), > col="purple",lwd=2) > > There are more flexible distributional families (Johnson, > log-spline ...) > > Multimodal data are a different can of worms -- consider > fitting a finite mixture model ... > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.