Wk 6 R Pipeline — Portfolio Summary

Problem

Evaluate multiple analytical tasks in one reproducible workflow: subscription renewal prediction, insurance risk classification, credit risk scoring, and speech text pattern analysis.

Approach

Logistic regression with glm for renewal propensity.
Decision tree modeling with rpart for insurance category logic.
Neural network classification with nnet on normalized features.
Text mining with tm and document-term matrices for term prevalence insights.

Model stack snapshot

📊

Structured models: glm, rpart, nnet

Text pipeline: tm + DTM preprocessing

Validation style: scored outputs with reproducible scripts

Outcome

Final score: 100/100 on the Week 6 summative.
Produced a consistent workflow across structured and semi-structured data.
Generated interpretable findings and reusable scripts for future work.

Artifacts and proof

Neural network scoring output (nnet, normalized features) — representative sample from Exercise 3 (credit risk classification):

Applicant	Predicted class	High	Moderate	Very Low	Low	DO NOT LEND
597325	Very Low (borderline)	—	—	89.3%	10.7%	—
311882	High / Moderate	71.1%	28.9%	—	—	—
Sample denied	DO NOT LEND	0%	0%	0%	0%	100%
Sample denied	DO NOT LEND	0%	0%	0%	0%	100%

📊

23 applicants scored DO NOT LEND in the full run; all had credit scores below 500. Table shows a borderline approval and representative denied rows only — full answer key omitted for public portfolio use.

Related course hub: DSC 550 — Data Mining
Public portfolio parent: Dave Wilson · Base Layer Labs

# Simplified reproducible pipeline skeleton
model_glm <- glm(renewal ~ ., data = train_df, family = binomial())
model_tree <- rpart(risk_class ~ ., data = train_df)
model_nnet <- nnet(target ~ ., data = normalized_train, size = 5)

What I learned

Feature normalization is critical for stable neural network training.
Root split and variable importance in trees are related but not identical concepts.
Text mining pipelines need strict preprocessing consistency to make output defensible.

Public-sanitized note

This summary omits full answer-key tables and local file-path details for public portfolio use.