Probability measures the likelihood of an event: P(A) ranges from 0 (impossible) to 1 (certain). Core
rules govern how probabilities combine.
P(A or B)
P(A) + P(B) − P(A and B)
P(A and B)
P(A) × P(B|A)
Independent
P(A and B) = P(A) × P(B)
Complement
P(not A) = 1 − P(A)
Worked Example 1
Combined Probability - Medical Test
Problem: Two independent tests for a disease each have 95%
sensitivity. If both are used, what is the probability of detecting the disease (at least one positive)?
Complement method
P(both miss) = 0.05 × 0.05 = 0.0025
P(at least one detects) = 1 − 0.0025 = 0.9975 = 99.75%
Answer: 99.75%. Two independent 95% tests combined give 99.75%
sensitivity. This is why critical diagnoses use multiple independent tests.
2 Bayes' Theorem
Bayes' theorem updates our beliefs when new evidence arrives. It answers: "Given that we observed B, what
is the probability of A?"
P(A|B) = P(B|A) × P(A) / P(B)
P(A|B) = posterior (updated belief)P(A) = prior (initial
belief)P(B|A) = likelihood (how likely is evidence if A is true)
Worked Example 2
Bayes' Theorem - False Positives
Problem: A disease affects 1% of the population. A test is 99%
sensitive (positive when disease present) and 95% specific (negative when disease absent). If you test
positive, what is the actual probability you have the disease?
Answer: Only 16.7%! Despite a "99% accurate" test, most positives
are false when the disease is rare. This counterintuitive result is called the base rate fallacy - the
low prevalence (1%) means false positives overwhelm true positives.
3 The Normal Distribution
The bell curve - the most important distribution in statistics. Many natural phenomena follow it:
heights, test scores, measurement errors. Characterized by mean (μ) and standard deviation (σ).
68-95-99.7 Rule
68% of data within μ ± 1σ95% within μ ±
2σ99.7% within μ ± 3σ
4 Hypothesis Testing
A structured framework for making decisions from data. You start with a null hypothesis (H₀, usually "no
effect"), collect data, and determine whether the evidence is strong enough to reject H₀.
H₀
Null hypothesis (no effect/difference)
H₁
Alternative hypothesis (there IS an effect)
p-value
Probability of data (or more extreme) if H₀ true
α = 0.05
Common significance level (reject if p < α)
Common mistake: A p-value
of 0.03 does NOT mean "3% chance H₀ is true." It means: "If H₀ were true, there's a 3% chance of
seeing data this extreme." The distinction matters!
5 Confidence Intervals
A confidence interval gives a range of plausible values for a parameter. A 95% CI means: if we
repeated the study 100 times, about 95 of those intervals would contain the true parameter.
Problem: A survey of 400 students finds average study time =
4.2 hours/day with σ = 1.5 hours. Construct a 95% confidence interval for the population mean.
95% CI
Margin = 1.96 × (1.5/√400) = 1.96 × 0.075 = 0.147
CI = 4.2 ± 0.147 = [4.053, 4.347] hours
Answer: We are 95% confident the true average is between 4.05
and 4.35 hours/day. Quadrupling the sample size (1600) would halve the margin to ±0.074 - precision
improves with √n.
6 Linear Regression
Regression finds the line of best fit through data points, allowing prediction and quantifying
relationships between variables.
ŷ = a + bx, where b = Σ(xᵢ−x̄)(yᵢ−ȳ) / Σ(xᵢ−x̄)²
b = slope, a = ȳ − bx̄ (intercept)R² = fraction of
variance explained (0 to 1)R² = 0.85 means model explains 85% of
variation