{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#Gaussian Process Class for non-parametric fitting of systematics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This module was originally developed to fit transit light curves using Gaussian Processes (GPs) in order to model the systematics as a stochastic process, as described in Gibson et al. (2012), but it is also useful as a general tool for fitting datasets using GPs with arbitrary kernel and mean functions. It was originally part of my general Infer module, but new features have regularly been added, some of which are described in more recent papers, see e.g. Gibson et al. (2013a, 2013b) and Gibson (2014) for details. Please consider citing these if you make use of this code."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Installation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'\\nto get the GeePea class: \\n\\n$ git clone https://github.com/nealegibson/GeePea \\n\\n$ cd GeePea/ \\n\\n$ python setup.py build \\n\\n$ python setup.py install \\n\\n'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "to get the GeePea class: \\n\n",
    "$ git clone https://github.com/nealegibson/GeePea \\n\n",
    "$ cd GeePea/ \\n\n",
    "$ python setup.py build \\n\n",
    "$ python setup.py install \\n\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## GPs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A Gaussian Process (GP) is a non-parametric method for regression, used extensively for regression and classification problems in the machine learning community. For an introduction to GPs please see the above mentioned papers, or for a text book introduction see Gaussian Processes for Machine Learning. For a more generic introduction to Bayesian inference including a relatively easy intro to GPs, I recommend Pattern Recognition and Machine Learning. I also found this tutorial by M. Ebden incredibly useful when learning GPs. (http://www.robots.ox.ac.uk/~mebden/reports/GPtutorial.pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Started"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First import the necessary modules:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import GeePea\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now lets just create some data, which in this case, has a white noise component and a sinosoidal noise aspect, which can be interpreted as red noise in time series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "x = np.linspace(0,1,50)\n",
    "y = np.sin(2*np.pi*x) + np.random.normal(0,0.1,x.size) # generating data with some systematic noise (sinosoidal) + white noise "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Just plotting the data to see what is initially created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "plt.plot(x,y,'o')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also need to define the initial parameters of the GP. By default, we have a zero mean function and are using the squared exponential kernel. For a 1D input this takes three parameters, a height scale, length scale, and white noise. However, one could define an array of values for a number of optical state parameters that could be the cause of the red noise and the GP will use them as an input for calculating the covariance matrix."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "p = [1,1,0.1] # hyperparameters of the squared potential kernel [height scale, length scale, white noise]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to define our simple GP as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "gp = GeePea.GP(x,y,p)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can plot out data along with the GP regression. .plot() is a method of the GP class that has already been defined.  Please look into GP class source files for details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "gp.plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A useful exercise is to explore how the GP behaves with varying hyperparmeters - try changing the height scales and length scales to see how the fit changes. In general however, we should not arbitrarily pick the hyperparameters. The simplest thing we can do is optimise our GP likelihood with respect to the parameters:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Running Nelder-Mead simplex algorithm... \n",
      "Optimization terminated successfully.\n",
      "         Current function value: -26.617275\n",
      "         Iterations: 90\n",
      "         Function evaluations: 164\n",
      "(Time: 0.107426 secs)\n",
      "Optimised parameters:  [ 1.24923914  0.32448097  0.10339139] \n",
      "\n"
     ]
    }
   ],
   "source": [
    "gp.optimise()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "gp.plot()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "And we have our first GP fit to data!  This document is a near copy of an exercise from Neale Gibson's site explaning his GP class in detail.  For a complete description and more examples, please visit his site at: http://eso.org/~ngibson/GeePea/index.html"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}