Posts

Correlation Doesn’t Equal Causation: Statistics #8

Image
Correlation is a measure of how two variables move together, and we’ll also introduce some useful statistical terms you’ve probably heard of like regression coefficient, correlation coefficient (r), and r^2. But first, we’ll need to introduce a useful way to represent bivariate continuous data in the scatter plot. The scatter plot has been called “the most useful invention in the history of statistical graphics ” but that doesn’t necessarily mean it can tell us everything. Just because two data sets move together doesn’t necessarily mean one CAUSES the other. This gives us one of the most important tenets of statistics: correlation does not imply causation.   Here we discuss  relationships. No, not why you and your bestie are platonic soulmates, or why your  cat just doesn’t seem to like you, we’re talking about data relationships like how  you can use one variable to predict another. Like if you can predict whether people who write in all capital letters are ...

The Shape of Data: Distributions: Statistics #7

Image
When collecting data to make observations about the world it usually just isn't possible to collect all the data. So instead of asking every single person about student loan debt for instance we take a sample of the population, and then use the shape of our samples to make inferences about the true underlying distribution of our data. It turns out we can learn a lot about how something occurs, even if we don't know the underlying process that causes it. Here, we’ll also introduce the normal (or bell) curve and talk about how we can learn some really useful things from a sample's shape - like if an exam was particularly difficult, how often old faithful erupt, or if there are two types of runners that participate in marathons.   Data visualization and different kinds of frequency plots--like dot plots and histograms tell us how frequently things occur in data we actually have. But so far in this series, the data we have talked about usually isn’t all the data  that exists....

Plots, Outliers: Data Visualization Part 2: Statistics #6

Image
  INTRODUCTION: Post: Edit (blogger.com) Dot plot: A dot plot takes a histogram and replaces the solid bars which use their height to show the  frequency with dots. There’s one dot for each data point contained in the bar, so we can just count the number  of dots to find out how many there are. The dot plot for our olive oil data looks like this, unsurprisingly similar to the histogram  for that data. This gives us a nice way to explore the general shape of our data, but we still lose information  about the individual data values, just like with the histogram. Occasionally we WANT that extra information.  Stem and Leaf plot: A stem and leaf plot is a cousin of the dot plot.  It also gives us information about data and their frequencies by stacking objects on top  of each other. However, stem and leaf plots use values from the raw data instead of dots. So,  we’ll turn our Olive oil dot plot into a stem and leaf plot. And no, ...