# Introduction

This series of tutorials is an introduction to the basic statistics used in machine learning algorithms.

# Axioms of Probability

Definitions: let ($$\Omega$$ , F, P) be a measure space with P($$\Omega$$) = 1. This means that  ($$\Omega$$ , F, P) is a probability space where $$\Omega$$ = sample space, F = event space and P  = Probability measure.

In Kolmogorov's probability theory the probability P of some event E must satisft 3 axioms.

1. P(E) $$\in$$$$R$$ , P(E) $$\geq$$ 0,$$\forall$$ E $$\in$$ F. This means that the probability of some event occurring must be a real positive number for all event E in the event space (all possible events).
2. P($$\Omega$$) = 1. This means that the probability that an event from the sample space occurring is 1 (some event being measured must occur).
3. P($$\cup_{i=0}^{\infty} E_i)$$ = $$\sum_{i=0}^{\infty} P(E_i)$$. What this means is that each event $$E_i$$ must be mutually exclusive to each other.

# Consequences of Kolmogorov's Axioms of Probability

Probability of Empty Set:

P($$\emptyset$$) = 0

Monotonicity:

A $$\subseteq$$ B then P(A) $$\subseteq$$ P(B)

Numeric Bound:

0 $$\leq$$ P(E) $$\leq$$ 1

Probability of Union of Events

P(A $$\cup$$ B) = P(A) + P(B) - P(A $$\cap$$ B)

It can also be shown as a consequence of axiom 3 the following:

P(A $$\cup$$ B) = P(A) + P(B \ (A $$\cap$$ B)). In other words the event A or B can be described as the event A occuring or only B occuring.

Principal of Inclusion Exclusion

P($$A^c$$) = P($$\Omega$$\A) = 1 - P(A)

# Conclusion

This is just a very basic introduction to statistics and will be the foundation on which further theorems will build upon. I highly recommend the reader to proof the consequences outlined above. If you have any question please leave a comment below.

## Subscribe to our mailing list

* indicates required 