Statistics Introduction


This series of tutorials is an introduction to the basic statistics used in machine learning algorithms.

Axioms of Probability

Definitions: let (\(\Omega\) , F, P) be a measure space with P(\(\Omega\)) = 1. This means that  (\(\Omega\) , F, P) is a probability space where \(\Omega\) = sample space, F = event space and P  = Probability measure.

In Kolmogorov's probability theory the probability P of some event E must satisft 3 axioms.

  1. P(E) \(\in\)\(R\) , P(E) \(\geq\) 0,\(\forall\) E \(\in\) F. This means that the probability of some event occurring must be a real positive number for all event E in the event space (all possible events).
  2. P(\(\Omega\)) = 1. This means that the probability that an event from the sample space occurring is 1 (some event being measured must occur).
  3. P(\(\cup_{i=0}^{\infty} E_i)\) = \(\sum_{i=0}^{\infty} P(E_i)\). What this means is that each event \(E_i\) must be mutually exclusive to each other.

Consequences of Kolmogorov's Axioms of Probability

Probability of Empty Set:

P(\(\emptyset\)) = 0


A \(\subseteq\) B then P(A) \(\subseteq\) P(B)

Numeric Bound:

0 \(\leq\) P(E) \(\leq\) 1

Probability of Union of Events

P(A \(\cup\) B) = P(A) + P(B) - P(A \(\cap\) B)

It can also be shown as a consequence of axiom 3 the following:

P(A \(\cup\) B) = P(A) + P(B \ (A \(\cap\) B)). In other words the event A or B can be described as the event A occuring or only B occuring.

Principal of Inclusion Exclusion

P(\(A^c\)) = P(\(\Omega\)\A) = 1 - P(A)


This is just a very basic introduction to statistics and will be the foundation on which further theorems will build upon. I highly recommend the reader to proof the consequences outlined above. If you have any question please leave a comment below.

Subscribe to our mailing list

* indicates required