Problem
Evaluate multiple analytical tasks in one reproducible workflow: subscription renewal prediction, insurance risk classification, credit risk scoring, and speech text pattern analysis.
Approach
- Logistic regression with
glmfor renewal propensity. - Decision tree modeling with
rpartfor insurance category logic. - Neural network classification with
nneton normalized features. - Text mining with
tmand document-term matrices for term prevalence insights.
Model stack snapshot
Structured models: glm, rpart, nnet
Text pipeline: tm + DTM preprocessing
Validation style: scored outputs with reproducible scripts
Outcome
- Final score: 100/100 on the Week 6 summative.
- Produced a consistent workflow across structured and semi-structured data.
- Generated interpretable findings and reusable scripts for future work.
Artifacts and proof
Neural network scoring output (nnet, normalized features) — representative sample from Exercise 3 (credit risk classification):
Applicant | Predicted class | High | Moderate | Very Low | Low | DO NOT LEND |
597325 | Very Low (borderline) | — | — | 89.3% | 10.7% | — |
311882 | High / Moderate | 71.1% | 28.9% | — | — | — |
Sample denied | DO NOT LEND | 0% | 0% | 0% | 0% | 100% |
Sample denied | DO NOT LEND | 0% | 0% | 0% | 0% | 100% |
23 applicants scored DO NOT LEND in the full run; all had credit scores below 500. Table shows a borderline approval and representative denied rows only — full answer key omitted for public portfolio use.
- Related course hub: DSC 550 — Data Mining
- Public portfolio parent: Dave Wilson · Base Layer Labs
# Simplified reproducible pipeline skeleton
model_glm <- glm(renewal ~ ., data = train_df, family = binomial())
model_tree <- rpart(risk_class ~ ., data = train_df)
model_nnet <- nnet(target ~ ., data = normalized_train, size = 5)What I learned
- Feature normalization is critical for stable neural network training.
- Root split and variable importance in trees are related but not identical concepts.
- Text mining pipelines need strict preprocessing consistency to make output defensible.
Public-sanitized note
This summary omits full answer-key tables and local file-path details for public portfolio use.