Saturday, August 17, 2019
Predictive Modeling Decision Tree
Predict ââ¬Ëkicksââ¬â¢ or bad purchases using Carvana ââ¬â Cleaned and Sampled. jmp file. Create a validation data set with 50% of the data. Use Decision Tree, Regression and Neural Network approached for building predictive models. Perform a comparative analysis of the three competing models on validation data set. Write down your final conclusions on which model performs the best, what is the best cut-off to use, and what is the ââ¬Ëvalue-addedââ¬â¢ from conducting predictive modeling?Upload the saved file with the assignment. I created 6 models for this project, which are DT1, DT2, Reg1, Reg2, Reg3, and NN. After testing, the parameters I used to predict ââ¬Å"IsBadBuyâ⬠ in all my models are: PurchDate, Auction, VehicleAge, Transmission, WheelType, VehOdo, All ââ¬Å"MMRsâ⬠, VehBCost, IsOnlineSale, and WarrantyCost. Those parameters together can help me get better models (i. e. ROC Area > 0. 7) I used the cut-off of 0. 6, because after trying out other c   ut-offs such as 0. 5, 0. 7, and 0. , the results were either ââ¬Å"Iââ¬â¢m eliminating too many Good Buysâ⬠, or ââ¬Å"Iââ¬â¢m accepting too many Bad Buysâ⬠. As we know, both of the situations will affect the business (i. e. if we want stronger confident of the model, we will have too many 0s in the result, which means we may accept more Bad Buys in accident). Finally, I decided to use 0. 6 as my cut-off to balance the situation. The best model I chose is Reg2 (Forward regression model). I have two reasons: First, Reg2 has the largest ROC Area in the Logistic Fit compression (Saved as ââ¬Å"Lodistic1~6â⬠), which is 0. 478; Second, it has a relatively low (the second smallest) number in the FalseNegative box from the Contingency Table among all models. For my second reason, I didnââ¬â¢t use overall accuracy because I think the FalseNegative will damage the business more than FalsePossitive does. Because accidentally having a BadBuy will cost the company to d   o all require and fix job. For the Value-added calculation, as we can see in the Contingency tables (Saved as ââ¬Å"Contingency 1~6â⬠), the Baseline Accuracy is 49. 89. The accuracy of Reg2 is 82. 49. So the Reg2 provides the lift value of 82. 49/49. 89 = 1. 653.    
Subscribe to:
Post Comments (Atom)
 
 
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.