Github repository

We have set up a Github repository where all presented statistical problems and a piece of code to solve them will be available. The slides and a brief explanation will also be uploaded. Check out the MCMC Coffee repository here.

Outcomes from the coffee sessions


Presenting the MCMC Coffee | J. Lillo-Box & A. Plunkett
Parametric bootstrapping to get uncertainties | Elyar Sedaghati

Abstract. This is a follow-up on the Python Coffee from May 5, now exploring parametric methods for fitting data. It is paramount to quote realistic and conservative estimates of precision on the derived parameters of the model being investigated. I will, very briefly, discuss a few different ways of obtaining these parameter uncertainties, and continue the discussion of the bootstrapping technique application for robust determination of those errors. While I’ll be using Python, the statistical methods and the difference between parametric and non-parametric methods are important for data interpretation, regardless of what language you prefer.


How to fit a (straight) line | A. Mérand

Abstract. Please comme all to what sounds like the most terrible subject, so boring and so basic nobody will want to come! I fit straight lines for a living: I am interested in estimating zero-points of the (linear) Period-Luminosity relations for Cepheid pulsating stars. Still, I did not even understand what I will present until recently. I *promise* most of the attendants (if not all) will learn something! If you think you know everything about (straight) line fitting, come at 11:00 for the MCMC coffee and lets discuss this topic together.


Descriptive statistics and distribution funtions | J. Lillo-Box

Abstract. In this first session of the first proper season of the MCMC Coffee I will give a very brief introduction to the basics of the tools that we have to describe our data (mean, median, standard deviation, skewness, kurtosis, etc.). I will review different shapes in which our data can be distributed as well as their properties and when can they be applied.


Central limit theorem and correlation coefficients | Daniel Asmus

Abstract.In this session, we will continue our adventure deeper into the realms of statistics and explore why/when repeating measurements can be a good idea, i.e. leading to more accurate results, and how we can test whether two measured quantities correlate quantitatively. On our journey, we will encounter the central limit theorem, Kendall's Tau, Pearson's and Spearman's ranks, and possibly some other treasures. But beware of the pirates!


Maximum Likelihood Estimations I: synthetic colour-magnitude diagram fitting | Bruno Dias

Abstract. Maximum Likelihood Estimation (MLE) is a common special case of the frequentist philosophy. At the same time Bayesian philosophy is tied to likelihood. Therefore, likelihood-based approach deserves special attention in the MCMC coffee. In this session we will see the application of MLE to the case of synthetic colour-magnitude diagram fitting. This is becoming a very common technique to derive age, metallicity, distance, and reddening of star clusters. In the past, people assumed metallicity, distance, and reddening from other sources and fitted by eye an isochrone line to constraint the age of a star cluster. During the case study some theoretical concepts will be briefly revised.


Maximum Likelihood Estimations II: hypotheses, distributions, modeling and histograms | Adele Plunkett

Abstract.We will continue our discussion of Maximum Likelihood Estimation (MLE). In this session, I’ll explain hypothesis testing and comparison of distributions. For example, when analyzing your observations how confident are you that the source you detected is indeed a source, and not background noise? In statistics-speak, can you reject the “null hypothesis”? In another case, how do you know when two measurements/observations indeed imply a difference in the measured quantity? I will explain parametric versus nonparametric tests of sample distributions. Finally, we’ll explore how histograms represent a general feature of nonparametric models, and we will learn how to get a histogram “right” by choosing the number and size of bins. Most generally, astronomers should understand how to effectively measure and present their statistical analyses with confidence.


Bayesian Inference 1 (Bayes theorem, priors) | J. Lillo-box & A. Plunkett

Abstract. After learning the principles of Maximum Likelihood Estimation, in the next two MCMC sessions we will learn the very basics of Bayesian Analysis. In this first part, we will dissect the Bayes Theorem and show some examples to illustrate the power of this statistical analysis.


Bayesian Inference 2 (model selection) | Blake Pantoja

Abstract. -


Identification of planetary signals with posterior samplings | Mikko Tuomi

Abstract. Various sampling techniques are applied when analysing radial velocity time-series in a search for planetary signals. We focus on adaptive Metropolis sampling algorithm with delayed rejection to identify maxima of a multimodal posterior probability distribution potentially corresponding to such planetary signals. The greatest advantage is that any noise models accounting for stellar activity and instrument noise can be used with equal(*) ease and their free parameters can be estimated simultaneously with the orbital parameters to avoid analyses of data residuals that are known to lead to biased results. Although this comes with a computational cost that increases linearly as a function of the number of data and heavily as a function of number of free parameters, Moore's law keeps enabling faster and faster computational facilities.


Time Series Analysis - searching and evaluating periodic signals in your data | Liz Bartlett

Abstract. In this MCMC I’m going to talk about a popular method for searching for periodic signals in time domain astrophysical data: The Lomb-Scargle periodogram. I will show examples from different wavelength regimes and areas of astrophysics and discuss the strengths and weaknesses of this tool, as well as some of the common pitfalls of its use..


How to statistically distinguish several distribution? | Marion Villenave

Abstract. I will present very briefly a non-parametric method that tests differences between several independent groups. The Kruskal-Wallis statistical test is based on the rank on data from the different groups. I used it in the context of protoplanetary disk evolution to test whether differently aged star forming regions can be statistically distinguished. This test has the advantage to be able to compare multiple distributions all together.


Correlated noise analysis with Gaussian Processes | Elyar Sedaghati

Abstract. As precision of observations improves with improved instrumentation, our observations are becoming more and more dominated by the correlated to red noise. Incorrect treatment of their presence will lead to biases in both the accuracy and the precision of derived parameters. Increasingly, the preferred method of their treatment is using Gaussian Processes or GPs. This is because they provide a stochastic, non-parametric approach to the treatment of the systematic noise and have been shown to provide more realistic estimates of parameter uncertainties as well as more accurate solutions. Through the use of some dummy data, I will show how the ignorance of red-noise analysis leads to errors and present some basic ideas behind implementing GP’s.


|

Abstract. .


|

Abstract. .