01 - Real case study / product analytics / data engineering / ML system design

Product journeys were hiding in order history.

The engagement title was Business Analyst. The real work was product framing, data engineering, analytics, and AI-assisted ML implementation for a Markov-chain recommendation system that could estimate next-product movement, same-SKU reorder likelihood, protocol continuation, and next-best-product recommendations without exposing sensitive customer data to AI.

This page reflects real implementation work. The model design, analytics, data pipeline, and privacy-preserving AI-assisted implementation are real; final production deployment status was not verified after the engagement ended.

Engagement title

Business Analyst

The work extended into pipeline design, modeling, analytics, and technical handoff.

Core system

Markov-chain product prediction and recommendation

One ML system with multiple layers, not a pile of unrelated experiments.

AI delivery pattern

GPT-4 accelerated, privacy preserved

No PII, credentials, raw records, or database access were handed to the model.

Public claim boundary

Offline validated prototype and analytics handoff

The engagement ended before final production verification, so the page does not claim it.

02 - Business Context

This was protocol commerce, not generic ecommerce.

CellCore sold into a practitioner-and-consumer environment where products often belonged to bundles, pathways, and multi-step protocols. A customer who did not reorder the same SKU was not necessarily lost. They might have been moving correctly into the next step.

Practitioner teams

Needed to understand which product or protocol step a patient was likely to need next, and where pathway adherence was failing.

Marketing and CRM

Needed to separate one-time buyers from continuation buyers so campaigns could reflect actual protocol behavior instead of blunt reorder logic.

Merchandising

Needed product affinity and pathway visibility to improve bundles, cross-sells, and protocol merchandising.

Leadership

Needed a business-readable explanation of product movement, churn, protocol progression, and where the customer journey was breaking.

03 - Problem Framing

The hard part was reconstructing real product journeys from messy order history well enough that the business could tell churn, replenishment, and protocol continuation apart.

That is why the build did not start with a black-box recommender. It started with transparent state transitions, explicit basket logic, feature engineering, and model layers that commercial teams could understand and engineers could replay.

04 - AI Delivery Model

GPT-4 helped build it. GPT-4 never got the data.

The work happened before agentic coding stacks were truly usable. GPT-4 accelerated SQL design, Python scaffolding, feature engineering, validation logic, debugging, and documentation, but it was not connected to the live database and did not receive raw records.

Allowed

Schemas, table names, and sanitized examples

Enough structure to generate compatible SQL, feature logic, pseudocode, and validation queries without exposing customer identity.

Allowed

Business rules, protocol mappings, and edge cases

That let GPT-4 act like a fast implementation and explanation partner while product logic and risk stayed human-owned.

Excluded

Raw customer records, credentials, or direct database access

The model never became the processor of record. No names, emails, phone numbers, addresses, or sensitive wellness notes were needed.

05 - Data Architecture

Deterministic transformations before any model ever scored.

The essential job was turning raw orders into normalized basket events, sequence-safe transitions, stable feature tables, and recommendation artifacts that could actually be audited.
+
Pull paid and fulfilled orders, order lines, product metadata, account type, channel, subscription flags, and protocol mappings into raw staging tables.
+
Resolve SKU aliases, kits, bundle semantics, cancellations, returns, duplicate events, and timestamp issues before sequence logic runs.
+
Group same-order line items into baskets so co-purchase behavior is modeled as a basket and not fake intra-order sequence steps.
+
Create first-order and higher-order transitions by customer, cohort, product, family, and protocol state.
+
Calculate recency, frequency, monetary value, days since last purchase, pathway depth, practitioner flag, and bundle exposure.
+
Estimate transition probabilities, reorder propensity, affinity rules, segment labels, ranking weights, and recommendation tables with reason codes.

06 - Model Stack And Methods

One system. Six modeling layers. One commercial question: what is most likely to happen next?

The recommendation surface combined interpretable sequence models, affinity logic, segmentation, and deterministic business rules instead of hiding the business problem behind an opaque model.

Model layer

First-order Markov chain

Core next-product sequence model. Given the current state, estimate the most likely next product, family, basket, or protocol step.

Model layer

Higher-order Markov paths

Conditions the next step on the last two or three states so protocol progression is not flattened into one-step transitions.

Model layer

Reorder propensity scoring

Separates true replenishment from pathway continuation so not every SKU change gets mislabeled as churn.

Model layer

Association rules and affinity

Uses support, confidence, and lift to identify which products naturally belong together inside an order or customer journey.

Model layer

Behavioral segmentation

Cohorts customers into practitioner-led buyers, one-time detox buyers, replenishment buyers, bundle entrants, and pathway followers.

Model layer

Weighted recommendation ranking

Combines Markov probability, pathway continuation, reorder propensity, affinity lift, segment modifiers, and hard business rules.

07 - Formulas And Tradeoffs

Simple, explainable models first.

The goal was not to win an academic benchmark. The goal was to produce something engineers could wire up, leaders could understand, and the business could trust.

First-order transition probability

P(next = j | current = i) = (C_ij + alpha) / (sum_k C_ik + alpha K)

The core Markov layer. Smoothing keeps rare but plausible next states from being zeroed out just because the count is small.

Higher-order pathway backoff

P_final = lambda2 P2(j|s_t-2,s_t-1) + lambda1 P1(j|s_t-1) + lambda0 P_popular(j)

Higher-order paths capture protocol continuation, but sparse states require interpolation or backoff.

Reorder propensity

P(reorder = 1 | x) = 1 / (1 + exp(-(beta0 + beta1 x1 + ... + betam xm)))

Used specifically for same-SKU replenishment. Timing, prior counts, pathway depth, customer type, and bundle context all matter.

Weighted ranking

score = w1 P_markov + w2 P_pathway + w3 P_reorder + w4 lift + w5 segment + w6 rules

The ranker stays lightweight so each recommendation can carry a reason code instead of becoming an unexplained score.

08 - Reconstructed Insights

The key insight was not "buy again." It was "move correctly."

Generic reorder probability looked weak for isolated purchases. The moment a buyer was attached to a practitioner, a protocol, a bundle, or a known pathway, the commercial interpretation changed.

Example transition view

Current stateNext candidateProbabilityStep 1 / Energy + DrainageStep 2 / Gut + Immune0.31Binder categoryDrainage support0.27First direct single-SKU orderNo purchase in window0.62

Pathway continuation

The model turned a missing same-SKU reorder into a meaningful pathway signal. That is a better retention interpretation than a raw repeat-purchase chart.

Cohort-specific activation

Practitioner-led buyers, direct one-time buyers, bundle entrants, and replenishment customers should not receive the same recommendation logic.

Quantified affinity

Support, confidence, and lift replaced anecdotal adjacency with measurable product relationships for bundles, merchandising, and education.

Operational output

The recommendation table was designed to feed BI, CRM, ecommerce modules, practitioner dashboards, or email audiences, not to die as a notebook artifact.

09 - Productionization Blueprint

No final production claim. A production-ready blueprint.

The public claim stays disciplined: offline validated model design and analytics handoff, with final production deployment status unverified after the engagement ended.

Production control

Data extraction job

Refresh normalized orders, order lines, product metadata, and account context on a schedule the business can support.

Production control

Model build job

Recompute transition counts, smoothed probabilities, affinity rules, segment labels, and score weights on each run.

Production control

Validation gate

Block publication if coverage drops, data quality fails, or a new artifact drifts materially from the prior known-good run.

Production control

Scoring job

Generate ranked recommendations with score components, reason codes, and exclusion flags so every output is auditable.

Production control

Activation tables

Expose outputs to BI, CRM, ecommerce, and practitioner tools behind feature flags rather than jumping directly into sensitive channels.

Production control

Monitoring and rollback

Track hit rate, conversion, coverage, drift, availability, and complaint signals. Keep the prior activation table ready for rollback.

10 - Governance And Privacy

AI accelerated the build. It never became the system of record.

This is one of the strongest things the project demonstrates: aggressive AI-assisted implementation without surrendering privacy, governance, or deployment ownership.
+
No PII to AI. Only schemas, business rules, synthetic rows, and sanitized fragments were used in the GPT-4 workflow.
+
Hashed identifiers. Customer and order joins should use hashed or surrogate keys in the modeling tables.
+
Least privilege access. The scoring pipeline needs order, product, and metadata fields, not identity-bearing customer records.
+
Human review before sensitive activation. Recommendations should be reviewed before practitioner-facing or customer-facing activation.
+
Drift and sparse-state risk. Support thresholds, smoothing, backoff, product-map ownership, and scheduled monitoring prevent quiet degradation.
+
Unverified deployment claims. The public framing stops at offline validated design and analytics handoff.

11 - What This Demonstrates

Product management, data engineering, analytics, ML system design, and AI-assisted implementation on one real business problem.

Demonstrates

Product management

Translated a vague business problem into concrete outputs, activation paths, model boundaries, and honest claims language.

Demonstrates

Data engineering

Designed normalization, basketization, sequence construction, feature tables, contracts, and repeatable artifacts.

Demonstrates

Analytics

Turned raw order history into interpretable views of protocol continuation, cohort behavior, pathway drop-off, and product relationships.

Demonstrates

ML system design

Chose transparent models that fit sparse sequential commerce and specified validation before heavier recommenders.

Demonstrates

Privacy-constrained AI delivery

Used GPT-4 to accelerate SQL, Python, feature engineering, debugging, and documentation without exposing PII.

Demonstrates

Enterprise readiness

Specified replay, rollback, feature flags, reason codes, review gates, and clear public claim boundaries.

12 - Method Grounding And Claim Boundary

Real work, explicit limits.

The work is real. The public page stays disciplined about what can be claimed: serious offline model design, analytics, privacy-aware AI-assisted implementation, and a production-ready handoff without pretending final deployment evidence exists.

What this page claims

Model stack, data flow, SQL-style transformation plan, ranking logic, validation approach, and privacy-preserving AI workflow.

What this page does not claim

Verified final production deployment, medical decisioning, or any access pattern that was not present in the work.

Start Here

Start with the real data problem before choosing the AI surface.

Use the assessment to decide whether the next useful move is a research copilot, an internal knowledge system, an analytics workflow, or a broader implementation plan with explicit source trails and human review.