MathJax

Monday, September 9, 2013

Five-number summary

If you see me write something that looks like this:
 
  1  -[  4  22  70  ]-  296
 
you're looking at a five-number summary. This is a handy way of summarising a numerical data set. Traditionally, the values are

  minimum -[ 25th percentile | median | 75 percentile ]- maximum

The interpretation of the summary above is as follows.
  • The typical data (the values in the middle) lie between 4 and 70 (between the 25th and 75th percentiles).
  • The median, 22, is the value that divides the bottom half and top half of the data. So 50% of the data lie below 22 and 50% of the data lie above 22.
  • The minimum value is 1.
  • The maximum values is 296.
I prefer replacing the minimum with the 1st percentile and the maximum with the 99th percentile since these are more stable (less sensitive to outliers in the data set).

1st perc.  -[  25th perc.  |  median  |  75th perc.  ]-  99th perc.

Monday, June 20, 2011

Exercise: Objects and Attributes

There are \(M\) objects, each with a number of attributes or labels. There are \(L\) possible labels and each object is associated with some subset of these labels. The indicator variables
\[
\alpha_{ml} = \left\{\begin{array}{cl}
1 & \textrm{if label } l \textrm{ is associated with object } m \\
0 & \textrm{otherwise}
\end{array}\right.
\]
define the association between objects and labels. The true values of the \(\alpha_{ml}\) are unknown. To infer the values of the \(\alpha_{ml}\), we ask \(N\) people to indicate which labels they believe are associated with each object. The outcomes of these trials are defined by another set of indicator variables,
\[
\beta_{nml} = \left\{\begin{array}{cl}
1 & \textrm{if person } n \textrm{ associated label } l \textrm{ with object } m \\
0 & \textrm{otherwise}
\end{array}\right.
\]
Now, people can make mistakes and we need to take two types of mistakes into account, namely false positives and false negatives. A false positive is when a person associates a label with an object, while the truth is that the label is not associated with it — i.e. \(\beta_{nml} = 1\) and \(\alpha_{ml} = 0\). A false negative is the opposite scenario, \(\beta_{nml} = 0\) and \(\alpha_{ml} = 1\). We will assume that each person makes each type of error with some fixed but unknown probability,
\begin{eqnarray}
e_n^{\textrm{pos}} &=& P(\beta_{nml}=1 | \alpha_{ml}=0) \quad\textrm{ and}\\
e_n^{\textrm{neg}} &=& P(\beta_{nml}=0 | \alpha_{ml}=1)
\end{eqnarray}

Questions

  1. Come up with an appropriate prior over \(\left\{\alpha_{ml}\right\}\) and derive the posterior after observing \(\left\{\beta_{nml}\right\}\). You will also need priors over \(\left\{e^{\textrm{pos}}_n\right\}\) and \(\left\{e^{\textrm{neg}}_n\right\}\) and should derive their posteriors too.
  2. Use the data file (LINK: \(N\approx20\), \(M\approx100\), \(L\approx100\)) and infer posteriors over \(\alpha_{ml}\), \(\left\{e^{\textrm{pos}}_n\right\}\) and \(\left\{e^{\textrm{neg}}_n\right\}\). Visualise the posteriors over \(\left\{e^{\textrm{pos}}_n\right\}\) and \(\left\{e^{\textrm{neg}}_n\right\}\) and note your observations. Visualise the posterior over \(\left\{\alpha_{ml}\right\}\) for each object.
  3. Comment on whether there are enough data in the file or whether more measurements should be made.

Wednesday, June 15, 2011

Conjugate Inference: Multivariate Gaussian Likelihood

Domain: \[\vec{x}\in\mathbb{R}^d\]

Parameters: The Gaussian likelihood is parametrised by its mean vector, \(\vec{\mu}\), and precision matrix, \(\Lambda\).
\[\Theta = \{\vec{\mu}, \Lambda\}\]

Likelihood: \[P(\vec{x}|\Theta) = N(\vec{x}|\vec{\mu},\Lambda^{-1})\]

Prior: A normal–Wishart distribution.
\[P(\Theta) = N(\vec{\mu}|\vec{\eta}_0,(\tau_0\Lambda)^{-1})\ W(\Lambda|V_0,\nu_0)\]
The probability density function of the Wishart distribution is
\[W(\Lambda|V,\nu) = \frac{|\Lambda|^{(\nu-d-1)/2}\exp\left(-\frac{1}{2}\textrm{Trace}(V^{-1}\Lambda)\right)}{2^{\nu d/2}|V|^{\nu/2}\Gamma_d(\nu/2)}\]
where
\[\Gamma_d(\nu/2) = \pi^{d(d-1)/4}\ \prod_{i=1}^d \Gamma\left(\frac{\nu-i+1}{2}\right)\]
is the multivariate gamma function.

Posterior:
\[P(\Theta|D) = N(\vec{\mu}|\vec{\eta}_1,(\tau_1\Lambda)^{-1})\ W(\Lambda|V_1,\nu_1)\]
with
\begin{eqnarray}
\vec{\eta}_1 &=& \frac{\tau_0\vec{\eta}_0 + \vec{S}^{(1)}}{\tau_1} \\
\tau_1 &=& \tau_0 + S^{(0)} \\
\nu_1 &=& \nu_0 + S^{(0)} \\
V_1^{-1} &=& V_0^{-1} + S^{(2)} + \tau_0\vec{\eta}_0^{I\!I} - \tau_1\vec{\eta}_1^{I\!I}
\end{eqnarray}
where
\begin{eqnarray}
S^{(0)} &=& |D| \\
\vec{S}^{(1)} &=& \sum_{\vec{x}\in D} \vec{x} \\
S^{(2)} &=& \sum_{\vec{x}\in D} \vec{x}^{I\!I}
\end{eqnarray}
and \(\vec{x}^{I\!I} \equiv \vec{x}\vec{x}^T\) is the outer product of a vector with itself.

Marginal likelihood:
\[P(D) =
\pi^{-S^{(0)}d/2}
\left(\frac{\tau_0}{\tau_1}\right)^{d/2}
\frac{|V_1|^{\nu_1/2}}{|V_0|^{\nu_0/2}}
\ \prod_{i=1}^d \frac{\Gamma\left((\nu_1-i+1)/2\right)}{\Gamma\left((\nu_0-i+1)/2\right)}
\]

Monday, May 9, 2011

Product of Dirichlet Distributions

The Dirichlet distribution is
\[
D(\vec{x}|\vec{\alpha}) =
\frac{\Gamma\left(\sum_{j=1}^d\alpha_j\right)}{\prod_{j=1}^d\Gamma(\alpha_j)}
\prod_{j=1}^d x_j^{\alpha_j-1}
\]
where \(\Gamma(\cdot)\) is the gamma function and \(d\) is the dimensionality of \(\vec{x}\).

A product of Dirichlet distributions is proportional to another Dirichlet distribution.
\[
\prod_{i=1}^n D(\vec{x}|\vec{\alpha}_i) =
Z\times D(\vec{x}|\vec{\alpha}')
\]
where
\[
\vec{\alpha}' - 1 = \sum_{i=1}^n \left[\vec{\alpha}_i - 1\right]
\]
and
\[
Z = \frac
{\prod_{j=1}^d\Gamma(\alpha'_j)}
{\Gamma\left(\sum_{j=1}^d\alpha'_j\right)}
\ \prod_{i=1}^n\left[ \frac
{\Gamma\left(\sum_{j=1}^d\alpha_{ij}\right)}
{\prod_{j=1}^d\Gamma(\alpha_{ij})} \right]
\]

Product of Normal–Gamma Distributions

The normal–gamma distribution is
\begin{eqnarray}
NG(\mu,\lambda|\eta,\tau,\alpha,\beta)
& = & N(\mu|\eta,(\tau\lambda)^{-1})\ G(\lambda|\alpha,\beta) \\
& = &
\frac{\beta^{\alpha}\sqrt{\tau}}{\Gamma(\alpha)\sqrt{2\pi}}
\lambda^{\alpha-\frac{1}{2}}
\exp\left( -\beta\lambda - \frac{1}{2}\tau\lambda(\mu-\eta)^2 \right)
\end{eqnarray}
where \(\Gamma(\cdot)\) is the gamma function.

A product of normal–gamma distributions is proportional to another normal–gamma distribution.
\[
\prod_{i=1}^n NG(\mu,\lambda|\eta_i,\tau_i,\alpha_i,\beta_i)
= Z\times NG(\mu,\lambda|\hat{\eta},\hat{\tau},\hat{\alpha},\hat{\beta})
\]
where
\begin{eqnarray}
\hat{\tau} &=& \sum_{i=1}^n \tau_i \\
\hat{\tau}\hat{\eta} &=& \sum_{i=1}^n \tau_i\eta_i \\
2\hat{\beta} + \hat{\tau}\hat{\eta}^2 &=& \sum_{i=1}^n \left[2\beta_i + \tau_i\eta_i^2\right] \\
\hat{\alpha}-\frac{1}{2} &=& \sum_{i=1}^n \left[\alpha_i - \frac{1}{2}\right]
\end{eqnarray}
and
\[
Z =
\frac{\Gamma(\hat{\alpha})\sqrt{2\pi}}{\hat{\beta}^{\hat{\alpha}}\sqrt{\hat{\tau}}}
\prod_{i=1}^n\left[\frac{\beta_i^{\alpha_i}\sqrt{\tau_i}}{\Gamma(\alpha_i)\sqrt{2\pi}}\right]
\]

Monday, May 2, 2011

Conjugate Inference: Gaussian Likelihood

Domain: Here we consider only univariate normal distributions. \[x\in\mathbb{R}\]

Parameters: The Gaussian likelihood is parametrised by its mean, \(\mu\), and inverse variance, \(\lambda\).
\[\Theta = \{\mu, \lambda\}\]

Likelihood: \[P(x|\Theta) = N(x|\mu,\lambda^{-1})\]

Prior: A normal–gamma distribution.
\[P(\Theta) = N(\mu|\eta_0,(\tau_0\lambda)^{-1})\ G(\lambda|\alpha_0,\beta_0)\]
Note that the following parametrisation of the gamma distribution is used:
\[G(x|\alpha,\beta) = \frac{\beta^{\alpha}}{\Gamma(\alpha)} x^{\alpha-1} e^{-\beta x}\]
(There is another parametrisation, which uses \(\beta^{-1}\) rather than \(\beta\)).

Posterior:
\[P(\Theta|D) = N(\mu|\eta_1,(\tau_1\lambda)^{-1})\ G(\lambda|\alpha_1,\beta_1)\]
with
\begin{eqnarray}
\eta_1 &=& \frac{\eta_0\tau_0 + S^{(1)}}{\tau_1} \\
\tau_1 &=& \tau_0 + S^{(0)} \\
\alpha_1 &=& \alpha_0 + \frac{S^{(0)}}{2} \\
\beta_1 &=& \beta_0 + \frac{1}{2}\left( S^{(2)}+\eta_0^2\tau_0 - \eta_1^2\tau_1 \right)
\end{eqnarray}
where
\begin{eqnarray}
S^{(0)} &=& |D| \\
S^{(1)} &=& \sum_{x\in D} x \\
S^{(2)} &=& \sum_{x\in D} x^2
\end{eqnarray}

Marginal likelihood:
\[P(D) =
(2\pi)^{-S^{(0)}/2}
\frac{\sqrt{\tau_0} \beta_0^{\alpha_0} \Gamma(\alpha_1)}
{\sqrt{\tau_1} \beta_1^{\alpha_1} \Gamma(\alpha_0)} \]

Conjugate Inference: Gaussian Likelihood with Known Variance

Domain: Here we consider only univariate normal distributions.
\[x\in\mathbb{R}\]

Parameters: \[\Theta = \{\mu\}\] where \(\mu\) is the mean of the Gaussian likelihood.

Likelihood: \[P(x|\Theta) = N(x|\mu,v)\] Note that \(v\) is the known variance of the Gaussian likelihood.

Prior: A normal distribution.
\[P(\Theta) = N(\mu|\mu_0,\sigma_0^2)\]

Posterior:
\[P(\Theta|D) = N(\mu|\mu_1,\sigma_1^2)\]
with
\begin{eqnarray}
\mu_1 &=& \sigma_1^2 \left( \frac{\mu_0}{\sigma_0^2} + \frac{S^{(1)}}{v} \right) \\
\sigma_1^2 &=& \left( \frac{1}{\sigma_0^2} + \frac{S^{(0)}}{v} \right)^{-1}
\end{eqnarray}
where
\begin{eqnarray}
S^{(0)} &=& |D| \\
S^{(1)} &=& \sum_{x\in D} x \\
S^{(2)} &=& \sum_{x\in D} x^2
\end{eqnarray}

Marginal likelihood:
\[P(D) =
\left(\frac{1}{\sqrt{2\pi v}}\right)^{S^{(0)}}
\frac{\sigma_1}{\sigma_0}
\exp\left( -\frac{1}{2}\left( \frac{\mu_0^2}{\sigma_0^2} - \frac{\mu_1^2}{\sigma_1^2} + \frac{S^{(2)}}{v} \right) \right)\]