/* Cover scale */ .notion-page-cover, .notion-header__cover { background-size: cover !important; background-position: center center !important; } :root { --gs-bg: #080808; --gs-surface: #0e0e0c; --gs-surface-2: #131310; --gs-accent: #C9A96E; --gs-text: #E8E4DA; --gs-muted: #5a5548; --gs-border: rgba(201, 169, 98, 0.16); --gs-font: "Comic Sans MS", "Comic Sans", cursive; --color-bg-default: #080808 !important; --color-text-default: #E8E4DA !important; } *, *::before, *::after { font-family: var(--gs-font) !important; } html, body, .super-root, .super-content { background: var(--gs-bg) !important; color: var(--gs-text) !important; font-family: var(--gs-font) !important; } body::before { content: ""; position: fixed; inset: 0; pointer-events: none; background-image: radial-gradient(circle, rgba(201, 169, 110, 0.22) 1px, transparent 1px); background-size: 32px 32px; opacity: 0.35; z-index: 0; } .super-content, .notion-frame, .notion-page-content { position: relative; z-index: 1; } .notion-page-title-text, .notion-heading, .notion-text, .notion-callout-text, .notion-page-content-inner, .notion-h, h1, h2, h3, h4, h5, h6, p, li, span, a, button, input, textarea, label { color: var(--gs-text) !important; font-family: var(--gs-font) !important; } .notion-divider { border-color: var(--gs-border) !important; } a, .notion-link { color: var(--gs-accent) !important; } .notion-collection_view-block, .notion-collection-item, .notion-callout { background: var(--gs-surface) !important; border: 1px solid var(--gs-border) !important; border-radius: 6px !important; } .super-navbar, .notion-navbar { background: rgba(8, 8, 8, 0.92) !important; border-bottom: 1px solid var(--gs-border) !important; backdrop-filter: blur(8px); } .notion-navbar a, .notion-navbar button, .notion-navbar div { color: var(--gs-text) !important; font-family: var(--gs-font) !important; }
📊

Wk 6 R Pipeline — Portfolio Summary

Problem

Evaluate multiple analytical tasks in one reproducible workflow: subscription renewal prediction, insurance risk classification, credit risk scoring, and speech text pattern analysis.

Approach

  • Logistic regression with glm for renewal propensity.
  • Decision tree modeling with rpart for insurance category logic.
  • Neural network classification with nnet on normalized features.
  • Text mining with tm and document-term matrices for term prevalence insights.

Model stack snapshot

📊

Structured models: glm, rpart, nnet

Text pipeline: tm + DTM preprocessing

Validation style: scored outputs with reproducible scripts

Outcome

  • Final score: 100/100 on the Week 6 summative.
  • Produced a consistent workflow across structured and semi-structured data.
  • Generated interpretable findings and reusable scripts for future work.

Artifacts and proof

Neural network scoring output (nnet, normalized features) — representative sample from Exercise 3 (credit risk classification):

Applicant
Predicted class
High
Moderate
Very Low
Low
DO NOT LEND
597325
Very Low (borderline)
89.3%
10.7%
311882
High / Moderate
71.1%
28.9%
Sample denied
DO NOT LEND
0%
0%
0%
0%
100%
Sample denied
DO NOT LEND
0%
0%
0%
0%
100%
📊

23 applicants scored DO NOT LEND in the full run; all had credit scores below 500. Table shows a borderline approval and representative denied rows only — full answer key omitted for public portfolio use.

# Simplified reproducible pipeline skeleton
model_glm <- glm(renewal ~ ., data = train_df, family = binomial())
model_tree <- rpart(risk_class ~ ., data = train_df)
model_nnet <- nnet(target ~ ., data = normalized_train, size = 5)

What I learned

  • Feature normalization is critical for stable neural network training.
  • Root split and variable importance in trees are related but not identical concepts.
  • Text mining pipelines need strict preprocessing consistency to make output defensible.

Public-sanitized note

This summary omits full answer-key tables and local file-path details for public portfolio use.