The Statistical Plan

JoeyDVivre

New member
Joined
Jun 18, 2026
Messages
0
Reaction score
0
OK I've been playing with this data a bit and here's the plan. I thought I would do an analysis of the data in a pedagogical way because it's data and results that everyone cares about and maybe we can learn some statistics that would be useful for other things.

The goals:
1) Determine the MPS
2) See how much we can improve on 40/60/80
3) Give everyone a fancy calculator for their score
4) See if we can come up with some insights useful for studying
5) See if we can find any evidence that ethics is more important than its points would indicate for borderlines.

The Plan:

First the score on the sections are highly correlated. Just a correlation of 1's, 2's, and 3's yields correlations that are all positive and are almost 0.5 for equity analysis and FSA. Obviously, the 1-2-3 transformation is not especially justified.

We are going to assume that the percentage in each section is a beta r.v.. Betas are quite useful for this kind of thing and wikipedia has a fine entry about beta r.v.'s. However, unlike the normal distribution there is not an obvious generalization of the beta into a multivariate distribution (except Dirichlet which is inappropriate here because it would mean the scores are negatively correlated). So in keeping with what's chic in finance these days, we'll estimate a Gaussian copula. I haven't tried estimating the marginals and the copula simultaneously yet, but I'm going to try. It looks a little numerically tricky because the data are binned and it's just a tough numerical problem (if you don't think so, I would like to hear from you). If this doesn't work we'll do the marginals first and then the copula which is totally easy. We're going to try using the trusty EM algorithm to solve the likelihood equations.

If you think about what that does, we will have a multivariate distribution. You then take your binned scores and find the highest likelihood point on this distribution function and that's your maximum likelihood score. That means, for example, that if you got in the mid range on FSA your estimated FSA score would be higher if you aced equity analysis than if you bombed it.

However, we will use everyone's score to find the MLE of the MPS. Thus, when you find an estimate of your score it will be the maximum likelihood point in the pass/fail region based on the estimate of the MPS.

That covers goals 1)- 3), I think.

For 4), I'm going to think about some kind of principal components analysis and compare the weights given by CFAI with the PCA weights. I don't quite have this worked out.

For 5), I'll bet we need a mountain of data to come up with anything. In the framweork above it means there is a "kink" in the MPS in the ethics dimension. We'll see how hard this is when we get there.

Suggestions? Cool, huh?
 
you make me want to dust off my old stats textbooks and crunch a few numbers.
 
Because after he's done with crunching scores I'm going to make a market for him in whatever CDS correlation trades he's looking for.
 
haha....I love this place.

Joey: I think this covers the bulk of what everyone wants to know. Thank you, as always for being a backbone of AF.

Also, anyone know where we can get more data from?



Edited 1 time(s). Last edit at Friday, July 27, 2007 at 12:39PM by ymichael12.
 
maratikus Wrote:
-------------------------------------------------------
> What's the reason for using Gaussian copula?


Got a better copula that you think we should use? We've got to come up with some notion of multivariate beta and, um, the M-step in EM is really easy if we use the Gaussian copula.
 
Gaussian copula gives lower correlation in the tails. I think the opposite is true for CFA results. It seems like people rarely do extremely well in one section and very poor in another. What do you think?

t-copula for example gives higher tail correlation.
 
That sounds very interesting. We don't have a whole lot of data though do we? I guess we can get a better guess than we have now anyway.
 
Joey,

One thing I could help with is a Pricipal Component Analysis of the data. We could get the first two princial components and reduce the space from 10 or 9 dimensions to just two. We can identify the area in that 2D space that corresponds to a passing grade.

This could be more realistic than the 40/60/80 rule since it will account for the correlations of the scores in the different sections.
 
mo34, you can use R for PCA. One thing I don't like about PCA is that results might be hard to interpret.
 
Maratikus,

In a 2D space it will be much easier to see what the CFAI is really looking for. I think a simple 40/60/80 calculation is too simple.
 
Back
Top