Sign in

Behavioural Data scientist. PhD in Evolutionary Biology + Psychology.

Analysing the clean data

Before we get started: If you’ve been following from the start, you’ll have scraped your own MMA data. Because you’ll have scraped it on a different day to me, you’ll have slightly different data: some fighters will have won fights and others will have lost. Some new fighters will be added to the table and some…

I’ve made a dashboard in Tableau, where I explore publicly available big data on the happiness of Europeans. The dashboard can be found here:

It’s interactive, so if you click a country, it changes plots and tables to zoom in on that country. A screenshot of it is below to give you a feel for it:

I posted it here because I sort of use Medium as a portfolio for my stuff, so here’s an easy way to find it!

Let me know what you think!

Cleaning web scraped data in R

Cleaning data is a huge part of what data scientists do on a daily basis, so it’s good to practice and look for ways to do it efficiently. If you can think of any ways to improve on my code, I’d love to hear them!

Let’s get started

Read in your data using the code below. Note, your file…

A complete analytics project in R

Though I posted the code for the paper on my github, I thought I’d make a multi-part tutorial on how to scrape, clean and analyse the data, so you can see the conclusions for yourself. Part 1 (here) will deal with scraping…

An unappreciated source of insight in our data

Data people love averages. They love means, they love medians. Which countries are the happiest on average? What is the average life expectancy of someone with a rare disease? What is the median income in a country? What is the average price of an engagement ring?

Means and medians are great because they tell us what is generally going on with our data. If your data is normally distributed (and often if it’s not) then an average will tell you most of what you need to know. As a result, most of our statistical tests, like t-test, ANOVA etc are…

A data analysis walkthrough in R

I recently published a paper on arm length and fighting success in mixed martial arts (MMA) fighters. I found that fighters with greater arm length have a very small advantage in fighting i.e. they win more. One issue I ran into was that I couldn’t rule out the hypothesis that leg length was driving the effect, because leg length data is not routinely collected for MMA fighters. We’d expect fighters with longer arms to have longer legs too, which would also presumably provide an advantage. …

When pre-registering your study, there are many things to consider: sample size, what stats you’ll run, etc. One factor that receives little attention is what you’ll do with outliers. Outliers are a major source of researcher degrees of freedom. They can have large impacts on our results, but it can be highly subjective which data points to include or exclude. So its very important to include an outlier plan in your pre-reg.

But where to start? I’ve got you covered! I sat down and tried to list all the possible ways outliers could creep into social science studies. This document…

TL:DR: Sometimes you have so much data you can waste hours exploring without answering the important questions. I share 5 tips on how to analyse large complex datasets productively by constraining yourself.

The joys and stresses of too much data. Photo is me!

Researchers these days often have access to a lot of data. You might be analysing The European Social Survey , with hundreds variables and tens of thousands of respondents. Or perhaps you collected tons of data yourself; You might do a study where participants do a huge battery of tests, leaving you with heaps of data per person.

How do you go about analysing this beautiful mess?

Well, I’ll tell you what not to do:


For students, conferences are an important part of academia: an opportunity to meet the names on the papers in the flesh, get your work out there, foster collaborations, and even make friends. But they can also be nerve-wracking: you barely know anyone, and few people know you. You don’t want to make a fool of yourself, especially in front of people who might hire you in a few years. These issues are magnified when you have to go to a conference on your own, without your supervisor or labmates.

I’ve never been to a conference this formal, but it was the only best stock photo I could find before I gave up looking

I go to every conference on my own. My supervisor and…

So I’ve seen a few articles arguing this point: that sex differences increase in more gender equal countries. This has been found for personality, school performance, a many individual differences and occupation choice. Even as someone who studies evolutionary and biological influences on behaviour, this is counter-intuitive for me. I’d expect that in a culture of gender equality, those who don’t want to follow a sex-typical life would do so, blurring the lines between the sexes. But that’s not what the data seem to suggest.

The Theoretical issue

Thomas Richardson

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store