Hello World

I. Statistical Inference for Categorical Data

A. Single Proportion ($p$)

Method	Statistic/Interval Formula	Validity/Assumptions	Key Feature
Wald Test (H$_0: p = p_0$)	$Z = \frac{\hat{p} - p_0}{\sqrt{\hat{p}(1-\hat{p})/n}}$	$X \ge 5$ and $(n - X) \ge 5$	Requires large samples
Score Test (H$_0: p = p_0$)	$Z = \frac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}}$	$np_0 \ge 5$ and $n(1-p_0) \ge 5$	Better small sample properties than Wald
Wald CI	$\hat{p} \pm Z_{1-\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$	Performs well if $n\hat{p}$ and $n\hat{p}(1-\hat{p})$ are very large	Symmetric around $\hat{p}$
Wilson (Score-based) CI	$\frac{(2n\hat{p} + z^2) \pm \sqrt{z^4 + 4nz^2\hat{p}(1-\hat{p})}}{2(n + z^2)}$ (where $z = Z_{1-\alpha/2}$)	Provides better coverage than Wald when $n$ is not large or $p$ is near 0 or 1.	Not symmetric around $\hat{p}$. Wald CI can fall outside $$.

B. Comparing Two Proportions ($p_1, p_2$)

Comparison Target	Estimate	Hypothesis Test H$_0: p_1 = p_2$
Risk Difference (RD)	$\hat{R}D = \hat{p}_1 - \hat{p}_2$	Z-test/$\chi^2$ Test: $Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})(1/n_1 + 1/n_2)}} \sim N(0, 1)$
Pooled Proportion	$\hat{p} = \frac{x_1 + x_2}{n_1 + n_2}$	Equivalence: $\mathbf{Z^2 = \chi^2}$ (same assumptions, identical result).

Confidence Interval Method	Formula/Structure	Conditions
Simple CI for $\mathbf{p_1 - p_2}$	$\hat{p}_1 - \hat{p}2 \pm Z{1-\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$	May be unsafe if $<30$ subjects per group or $\hat{p}$ are close to 0 or 1.
Newcombe’s CI (Better Coverage)	Uses Wilson CIs ($l_i, u_i$) for $p_1, p_2$.	Preferred for small samples.
Newcombe Lower Limit	$\hat{p}_1 - \hat{p}_2 - \sqrt{(\hat{p}_1 - l_1)^2 + (u_2 - \hat{p}_2)^2}$
Newcombe Upper Limit	$\hat{p}_1 - \hat{p}_2 + \sqrt{(\hat{p}_2 - l_2)^2 + (u_1 - \hat{p}_1)^2}$

C. Measures of Effect (2x2 Table: $a, b, c, d$)

Measure	Formula ($\hat{p}_1 = a/n_1, \hat{p}_2 = c/n_2$)	Log Variance Estimate $\hat{V}ar(\log \hat{M})$
Risk Difference (RD)	$\hat{p}_1 - \hat{p}_2$	N/A
Relative Risk (RR)	$\hat{R}R = \frac{a/n_1}{c/n_2}$	$\frac{1}{a} - \frac{1}{n_1} + \frac{1}{c} - \frac{1}{n_2}$
Odds Ratio (OR)	$\hat{O}R = \frac{ad}{bc}$	$\frac{1}{a} + \frac{1}{b} + \frac{1}{c} + \frac{1}{d}$

Key Insight: If the event is rare, OR $\approx$ RR. OR is the primary measure for case-control (retrospective) studies.
NNT (Number Needed To Treat): $NNT = 1/RD$ (Requires $RD > 0$).
CI for RR/OR: Calculate CI for $\log(\text{Measure}) = \log(\hat{M}) \pm Z \sqrt{\hat{V}ar(\log \hat{M})}$, then exponentiate the limits: $(\exp(l), \exp(u))$.

II. Chi-Square Tests

General Validity Rule: Chi-square tests (Goodness-of-Fit, Independence, Homogeneity) are valid based on the Central Limit Theorem (CLT) and require that all Expected Cell Counts ($\mathbf{E_{ij}}$) be $\ge 5$.
Fisher’s Exact Test: Used for 2x2 tables if one or more expected cell frequencies is less than 5.

Test	Null Hypothesis (H$_0$)	$\chi^2$ Statistic	Degrees of Freedom ($df$)
Goodness-of-fit	Specified proportions ($p_1, \dots, p_K$) are true.	$\sum_{k=1}^{K} \frac{(O_k - E_k)^2}{E_k}$	$K - 1$
Independence/Homogeneity (R x C Table)	Variables are independent / Risks are equal ($p_1=p_2$).	$\sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$	$(r - 1)(c - 1)$
McNemar (Paired)	$p_{\text{before}} = p_{\text{after}}$. (Used for “before-and-after” designs).	$\chi^2_M = \frac{(b - c)^2}{b + c}$ (where $b, c$ are discordant pairs).	1
Expected Cell Count		$E_{ij} = \frac{n_i m_j}{n}$ (Row total $\times$ Column total / Grand total)

Examination Cheatsheet (Back Side)

III. Stratified Analysis & Confounding Control

Mantel-Haenszel (MH) Methods: Used to combine results across $K$ strata (confounder levels) and control the confounding variable. Tests H$_0$: Adjusted OR = 1.
MH OR Estimator: $$\hat{OR}_{MH} = \frac{\sum_k a_k d_k / n_k}{\sum_k b_k c_k / n_k}$$
MH RR Estimator: $$\hat{RR}{MH} = \frac{\sum_k a_k n{2k} / n_k}{\sum_k c_k n_{1k} / n_k}$$

IV. Sample Size Determination

Scenario	Margin of Error ($E$) / Power ($1-\beta$)	Required Sample Size ($n$) Formula
One Group Proportion	Estimation with margin $E$ (CI).	$n = p(1-p) \left(\frac{Z_{1-\alpha/2}}{E}\right)^2$. (Use $p=0.5$ for maximum $n$).
Two Group Proportions	Comparing $p_1$ vs. $p_2$ (Equal $n_1=n_2=n$).	$n = \left(\frac{Z_{1-\alpha/2} \sqrt{2p(1-p)} + Z_{1-\beta} \sqrt{p_1(1-p_1) + p_2(1-p_2)}}{p_1 - p_2}\right)^2$ where $p = (p_1 + p_2)/2$.
Paired Proportions (McNemar)	Comparing proportions based on discordant pairs ($p_b, p_c$).	$n = \left(\frac{Z_{1-\alpha/2} \sqrt{p_c + p_b} + Z_{1-\beta} \sqrt{p_c + p_b - (p_c - p_b)^2}}{p_c - p_b}\right)^2$

V. Analysis of Variance (ANOVA)

A. One-Way ANOVA

Purpose: Test if $k > 2$ population means are equal ($H_0: \mu_1 = \cdots = \mu_k$).
Key Conditions: $k$ independent populations, random samples, and Equal Population Variances ($\sigma^2$).
Variance Decomposition: Total Sum of Squares = Within (Error) + Between (Treatment). $$SST = SS_W + SS_B$$
Calculations & F-Test: | Source | Sum of Squares ($SS$) | $df$ | Mean Square ($s^2$) | F-Ratio | | :— | :— | :— | :— | :— | | Between | $SS_B = \sum_{j} n_j(\bar{y}j - \bar{y})^2$ | $k - 1$ | $s^2_b = SS_B / (k-1)$ | $F = s^2_b / s^2_w$ | | Within (Error) | $SS_W = \sum{j} \sum_{i} (y_{ij} - \bar{y}j)^2$ | $n - k$ | $s^2_w = SS_W / (n-k)$ | $\sim F{k-1, n-k}$ |

B. Repeated Measures ANOVA

Design: One sample of $n$ subjects, with $k$ repeated measurements per subject.
Advantage: Increased power by removing random variations between subjects. Accounts for dependency among measurements.
Hypotheses: $H_0: \mu_1 = \mu_2 = \cdots = \mu_k$ (Treatment means are equal).
Error Degrees of Freedom: $df_{\text{Error}} = (n-1)(k-1)$.
F-Ratio Distribution: $F = s^2_b / s^2_w \sim F_{df_1=k-1, df_2=(n-1)(k-1)}$.

VI. Multiple Comparisons (Controlling Familywise Error Rate, FWE)

FWE Definition: The probability of making at least one Type I error ($\alpha$) when performing $n$ comparisons. FWE $= 1 - (1 - \alpha)^n$.
Complex Contrast: $C = \sum_{j=1}^k c_j \mu_j$ where $\sum_{j=1}^k c_j = 0$.

Procedure	Primary Use Case	Conservation Level (Power)	Individual Test Level ($\alpha^*$)
Bonferroni Correction	Any endpoint; not limited to pairwise.	Conservative.	$\alpha^* = \alpha / \binom{k}{2} = \frac{2\alpha}{k(k-1)}$ (for pairwise tests).
Tukey’s HSD	Pairwise comparisons only (Gaussian outcomes).	Higher power than Scheffe.	Compares mean difference to critical value HSD.
Scheffe’s Procedure	Complex contrasts (Gaussian outcomes).	Most conservative.	Requires a modified F-test for contrasts.

I. Statistical Inference for Categorical Data#

A. Single Proportion ($p$)#

B. Comparing Two Proportions ($p_1, p_2$)#

C. Measures of Effect (2x2 Table: $a, b, c, d$)#

II. Chi-Square Tests#

Examination Cheatsheet (Back Side)#

III. Stratified Analysis & Confounding Control#

IV. Sample Size Determination#

V. Analysis of Variance (ANOVA)#

A. One-Way ANOVA#

B. Repeated Measures ANOVA#

VI. Multiple Comparisons (Controlling Familywise Error Rate, FWE)#

I. Statistical Inference for Categorical Data

A. Single Proportion ($p$)

B. Comparing Two Proportions ($p_1, p_2$)

C. Measures of Effect (2x2 Table: $a, b, c, d$)

II. Chi-Square Tests

Examination Cheatsheet (Back Side)

III. Stratified Analysis & Confounding Control

IV. Sample Size Determination

V. Analysis of Variance (ANOVA)

A. One-Way ANOVA

B. Repeated Measures ANOVA

VI. Multiple Comparisons (Controlling Familywise Error Rate, FWE)