boutique louboutin en ligne http://inlinguabologna.makkie.com/images/PSS7/9072.html ]]>

I am working on decision trees for the first time at job and request some clarity on the following :

1.Should the oversampling/undersampling technique be applied on the entire dataset and then the dataset be split into training and validation subsets or it should be the other way around?ie. splitting the dataset into training and validation and then applying oversampling on the training dataset while keeping the testing dataset as it is?

2.I am working on problems of classifying customers with high cheque bounce % ; identifying frauds etc. Which techniques would serve me better – CHAID or classification tree?

3. Can we have a ordinal categorical target variable for classification trees?

Response would be highly appreciated.

thanks

quants_mum

1/(1+((1/original fraction)-1)/((1/oversampled fraction)-1)*[(1/scoring result)-1]);

Without them multiplication would be done before subtracting 1, but you want to first subtract 1 and then multiply, indicated by the extra brackets.

]]>1/(1+((1/original fraction)-1)/((1/oversampled fraction)-1)*((1/scoring result)-1));

The same, right?!

]]>1/(1+(1/original fraction-1)/(1/oversampled fraction-1)*(1/scoring result-1));

But the graphical one translated to text is (because of multiplication taking precedence before subtraction):

1/(1+(1/original fraction-1)/(1/oversampled fraction-1)*(1/scoring result)-1);

Since the 1’s would cancel eachother out in this case, I am assuming the graphics are wrong?

]]>Thing is I did oversampling using 30: 70 ratio as my events ratio in actual 20000 population was (355) or 1.78%. Now I have built a model using 1000 observations, which I know might not be that good. But what if I want to go with the equation derived from 1000 oversampled data to calculate probabilities for overall 20000 population.

1 thing is intercept correction. What else needs to be done?

Will be of gr8 help

]]>I am developing a prediction model with a logistic regression by using SAS Enterprise Miner. The original sample (N=342) only has 16 target “1” category, which corresponds to 4,7% (16/342) of the observations. To handle the imbalanced sample and the rare events issue, at the sample general property panel, for the level based option I set sample proportion as 50.0. Hence I end up with a sample containing 32 observations (16 observations for target “1” category and 16 observations for the target “0” category), which I used to developed the prediction model.

However, for my PhD thesis stand point of view, I must provide references that this methodology “oversampling” approach is legible.

Could you suggest me any references in order to prove that this approach is legible and accepted by the Statisticians community?

Thank you so much for your kind cooperation,

Mina