Applied data science & experimentation
Models that inform decisions. Data scientists who test first.
A high offline AUC that never ships still burns runway. We staff data scientists who run power analysis before experiments launch, catch leakage in peer review, build forecasts finance uses for planning, and explain uncertainty to stakeholders who do not care which library you prefer. Recommendations arrive with limitations documented and scoring specs engineers can implement.
Scope your modeling roadmapimport pandas as pdfrom scipy import statscontrol = df.loc[df.variant == "control", "converted"]treatment = df.loc[df.variant == "treatment", "converted"]result = stats.ttest_ind(treatment, control, equal_var=False)lift = treatment.mean() - control.mean()print(f"lift={lift:.3%}, p-value={result.pvalue:.4f}")if result.pvalue < 0.05 and lift > 0: recommend("Ship treatment to 100% traffic")else: recommend("Keep control; rerun with larger sample") Core stack
- Python & scikit-learn
- Statistical modeling
- Experimentation
- Forecasting
- Reproducible notebooks
- SQL feature engineering
5+
Average years in applied data science
Scientists who've influenced product or revenue decisions, not only Kaggle rankings.
Deep-Dive Tech Stack
Applied data science ties experimentation, modeling, SQL features, and handoff into one accountable path. We match scientists who document limits before leadership acts, not medalists who skip survivorship bias in your cohort.
-
Python & scikit-learn
pandas and NumPy from exploration through scheduled scoring with pinned dependencies. They know when logistic regression beats a deep net and document preprocessing so engineers reproduce training without the original notebook.
-
Statistical modeling
Regression, classification, survival analysis, and hypothesis tests with explicit assumptions. Underpowered samples, outlier-driven effects, and models that should not ship are flagged in the readout, not buried in appendix slides.
-
Experimentation
A/B design with pre-registered metrics, power analysis, and guardrails against peeking and p-hacking. Scoped hypotheses often reach a clear ship or no-ship readout in three to six weeks with criteria everyone agreed to before launch.
-
Forecasting
Seasonality decomposition and backtests aligned to finance planning cycles. Prophet, ARIMA, or lightweight ML ship with uncertainty intervals stakeholders can plan around, not charts that fail on the next shock.
-
Reproducible notebooks
Jupyter with conda or uv lockfiles and parameterized papermill runs. Analysis that becomes weekly reporting graduates to dbt or Airflow instead of cells that break on the next pandas upgrade.
-
SQL feature engineering
Cohort definitions and point-in-time correct joins against Snowflake or BigQuery. Future data in features is caught before offline AUC misleads you about production performance.
-
Model cards & handoff
Limitation notes, monitoring hooks, and scoring specs for FastAPI or batch jobs. Assumptions and known failure modes are documented before stakeholders act on the output.
-
XGBoost & gradient boosting
Tabular models with hyperparameter search, class imbalance handling, and calibration for probability outputs used in ranking and risk scoring. Feature importance and SHAP summaries explain why a row scored high before ops or sales acts on the prediction.
-
SHAP & model explainability
Global and local explanations for tree and linear models, bias checks on protected attributes where policy requires, and documentation for auditors and product owners. Black-box deploys get challenged in review when stakeholders need to trust the output.
Data science outcomes we optimize for
- Average years in applied data science
- 5+
- Typical time to first experiment readout
- 3–6 wks
- Documented model assumptions
- 100%
- Training and scoring pipelines
- Reproducible
Scientists who've influenced product or revenue decisions, not only Kaggle rankings.
For scoped hypotheses with clean assignment, tracking, and pre-registered success criteria.
Data limits, leakage checks, and known failure modes before recommendations ship.
Pinned dependencies, versioned datasets, and handoff paths engineers can maintain.
Data science staffing, answered plainly
How do you handle time-zone crossovers?
We align overlap for experiment reviews and stakeholder readouts. Written memos with charts, caveats, and next steps cover async work across regions.
Do your scientists deploy models, or only analyze?
They focus on analysis, experimentation, and model design. Deployment handoffs include scoring specs engineers can implement, or we pair with your ML platform team when needed.
How do you prevent data leakage and p-hacking?
Pre-registered metrics, holdout sets, and peer review on experiment design before launch. We document stopping rules and multiple-comparison risks upfront.
Can you work with messy real-world data?
Yes. Missing values, delayed events, and shifting user behavior are normal. We flag data quality limits in every readout, not only in appendix footnotes.
Who owns the notebooks and model artifacts?
You do. All analysis code and documentation live in your repositories and data platforms under your terms.