[06/27/2024] Statistics in Psychological Research 6
Categorizing Variables
Variables can be classified as categorical (emotions), discrete (number of pets), or continuous (commute time in minutes).
Notice that discrete variables have no intermediate values (e.g., we cannot have 1.5 dogs), while continuous variables can have additional observations between any two values (e.g., commute time can be measured as 5.5 minutes).
Variables can also be classified as qualitative (also known as categorical) or quantitative (can be discrete or continuous).
Qualitative variables answer the question of “what kind?” (e.g., what kind of emotion, for the variable emotion).
Quantitative variables answer the questions of “how much?” or “how many?” (e.g., how many siblings, for the variable siblings).
A variable’s level of measurement can be classified as nominal (type of food), ordinal (size of pizza: small, medium, large, etc.), interval (temperature in degrees Fahrenheit), or ratio (number of questions missed on a quiz).
While interval-level and ratio-level measurements both have equally spaced increments, only ratio-level measurements include a meaningful 0. For the ratio measurement number of questions missed on a quiz, a 0 value would indicate no questions missed. A 0 value for the interval measurement temperature in degrees Fahrenheit, however, does not indicate an absence of temperature.
Variables can also be described based on their use in a research study.
In an experimental design, an independent variable, such as the type of study technique, is manipulated and a dependent variable, such as quiz score, is measured. The independent variable can be thought of as the grouping variable and the dependent variable can be thought of as the outcome variable.
In some research studies, there is no manipulated variable, and instead researchers want to analyze how one measured variable, known as a predictor variable, predicts another measured variable, known as an outcome variable. Notice that the terms dependent variable and outcome variable both refer to a measured outcome variable.
Key Vocabulary
Categorical variables (qualitative variables): Variables that have values that are types, kinds, or categories and that do not have any numeric properties.
Continuous variables: Variables with values that can always be further divided, and where it is at least theoretically possible to find an observation with a value that will fall between two other values, no matter how close they are.
Discrete variables: Variables represented by whole numbers and for which there are no intermediate values for two values that are adjacent on the scale.
Dependent variable: A variable that is measured for change or effect after the occurrence or variation of the independent variable in an experiment.
Independent variable: The variable in an experiment that is specifically manipulated to test for an effect on a dependent variable. In regression analysis, an independent variable is likely to be referred to as a predictor variable.
Interval: A scale of measurement where the values are ordered and evenly spaced along some underlying dimension, but there is not a true zero value.
Levels of measurement (scales of measurement): The four levels by which variables are measured: nominal, ordinal, interval, and ratio.
Nominal: A scale of measurement that has unordered categories, which are mutually exclusive.
Operationalize: Precisely defining a variable with specific measurable terms.
Ordinal: A scale of measurement where the values are ordered along some underlying dimension, but not spaced equally along that dimension.
Predictor variable: Used in regression analysis and other models, to measure the strength and direction of its association with an outcome. The term sometimes is used interchangeably with independent variable.
Qualitative variables (categorical variables): Variables that have values that are types, kinds, or categories and that do not have any numeric properties.
Quantitative variables: Variables that have values with numeric properties such that, at a minimum, differences among the values can be described in terms of “more” or “less.”
Ratio: A scale of measurement where the values are ordered, with evenly-spaced increments and a meaningful zero.
Variables: Characteristics of an entity, person, or object that can take on different categories, levels, or values.
Describe Data
The three measures of central tendency, the mean, median, and mode, each provide a way to express information about the middle of a distribution.
The mean is often referred to as the “average” and is the most common form of central tendency. However, when data are skewed or outliers are present, the mean is pulled in the direction of the tail of the distribution or an outlier, because it relies on the size of each value, and will not represent the middle value in a data set. The mean can only be used with interval- or ratio-level variables.
The median is the midpoint of a set of ordered scores. Because it is not pulled toward outliers or the tail of skewed distributions, it is a good choice to describe the central tendency of data that are positively skewed (e.g., tail toward the right side of distribution) or negatively skewed (e.g., tail toward left side of distribution). The median can be used with ordinal-, interval-, or ratio-level variables.
The mode is the most frequently occurring score in a set of data, but is used least often. An advantage of the mode is that it can be used for data with any level of measurement, and is the only measure of central tendency that can be used to summarize nominal-level measurements (e.g., you can describe the most frequent eye color in a group, but you could not calculate the mean or median eye color). In a distribution of scores, the mode will be seen as the peak of the distribution.
Variability, or dispersion, is used to express the spread in a set of scores, and is commonly expressed through the range, variance, and standard deviation.
The range is a measure of the difference between the minimum and maximum values in a data set and is a quick way to judge the overall spread of the data. It is heavily influenced by outliers, however, so it is less useful when outliers are present.
Variance is the average squared deviation from the mean for a set of values and provides a way to consider spread from the mean of a set of data. Interpretation of the variance is difficult, however, because it is in squared units.
Standard deviation indicates how narrowly or broadly the sample scores deviate from the mean. It is used more commonly than variance because the value of standard deviation is in the units of the variable, which makes it much easier to interpret.
Key Vocabulary
Bimodal: A distribution that includes two peaks or modes.
Central tendency: A measure that describes the middle of a distribution. Most commonly measured with the mean, median, and mode.
Descriptive statistics: Statistics that summarize a set of data. For example, a measure of central tendency or variability.
Deviation score: The value obtained when you subtract an observation minus the sample mean, that is x − M.
Mean: A measure of the average value in a data set calculated by adding all the scores and dividing by the number of scores in the data set.
Median: A measure of the midpoint (middle value) of a data set when organized numerically.
Mode: A measure of the most frequently occurring value in a data set.
Multimodal: A distribution that includes multiple (usually more than 2) peaks or modes.
Negatively skewed: A distribution in which the tail on the left side of the distribution is longer than the tail on the right side.
Outcome variable: An effect that one wants to predict or explain in nonexperimental research, such as correlational or regression research. The term sometimes is used interchangeably with dependent variable.
Outlier: An observation judged to be so extreme that it may differ in some way from other observations in the sample.
Positively skewed: A distribution in which the tail on the right side of the distribution is longer than the tail on the left side.
Range: The number of values between the minimum and maximum values. Subtract the minimum from the maximum to calculate the range.
Standard deviation: A measure of the typical deviation from the mean. It is calculated by taking the square root of the variance.
Variability (Dispersion): The amount of variation, or difference, in a group or population.
Variance: A measure of the average squared deviations from the mean.
Visualize Data
Although numeric values, such as a measure of central tendency or variability, are useful in describing the data, it is often helpful to graph data so that patterns can be visualized.
A frequency distribution represents every observation in the data set and provides a comprehensive picture of the sample results for a variable. Bar graphs, histogram, and frequency polygons are examples of frequency distributions where a nominal-, ordinal-, interval-, or ratio-level variable is plotted on the x-axis and the frequency of that variable in a data set is plotted on the y-axis.
A bar graph is a frequency distribution for nominal- and ordinal-level variables. The values of the variable of interest are arranged on the x-axis (the horizontal line) and the frequency of occurrence of that value in the set of data on the y-axis (the vertical line). With nominal-level variables, the order of the categories on the x-axis is arbitrary. With ordinal-level variables, the order of the values on the x-axis follows the ranking of the values of the variable, but because the values are not separated by equally spaced intervals, the width and spacing of the bars are arbitrary. Because of the arbitrary order of the values on the x-axis of a bar graph, the particular shape of the distribution is not meaningful.
A histogram is a frequency distribution for interval- or ratio-level variables. For a histogram, the shape of the distribution is meaningful because the units on the x-axis are spaced meaningfully because the values are not arbitrary. If the variable is continuous (or if there is a large number of discrete values), instead of representing a single value, then each bar might represent a range of values, or bin. It is common that histogram bars are touching unless there are observations that are absent.
A frequency polygon is a frequency distribution of an interval- or ratio-level variable. It is very similar to a histogram but is constructed simply by marking a point at the center of each bin that corresponds to the frequency for that bin, and then drawing a line to connect the points. The frequency polygon may provide an even clearer sense of the shape of the frequency distribution, and also has the advantage that multiple distributions can be plotted on a single set of axes.
The normal distribution is an abstract theoretical distribution that is derived from a mathematical formula, in contrast to an empirical frequency distribution derived from data. A special type of distribution is a normal distribution, also called a normal curve or bell curve. The normal distribution is useful because it can help us to make predictions.
Unlike a frequency distribution, which plots variables on only the x-axis, bivariate plots plot variables on both the x- and y-axis. These include scatterplots and line graphs.
A scatterplot, where individual data points represent two variables, one plotted on the x-axis, the other on the y-axis, is commonly used with two interval- or ratio-level variables. The distributions of two variables are plotted together so that the relationship, if any, between the two variables can be easily visualized.
A line graph is a data representation that displays data for interval- or ratio-level dependent variables (on the y-axis) as a function of interval- or ratio-level independent variables (on the x-axis). A line graph may also be used to plot summary data (on the y-axis) for groups (on the x-axis) to show change over time, make predictions with a line of best fit, or allow the reader to search for trends in the data.
Key Vocabulary
Bar graph: A graph in which each value of a nominal- or ordinal-level variable is represented by a bar drawn to a height representative of that value’s frequency in a set of data (or of some other dependent variable).
Frequency distribution: A graph that shows the frequency of occurrence (y-axis) of the values of a variable (x-axis).
Frequency polygon: A graph with a connected series of points in which individual values or ranges of values of an interval- or ratio-level variable are each represented by a point plotted at a height representative of the frequency of those values in a set of data.
Histogram: A graph in which individual values or ranges of values of an interval- or ratio-level variable are represented by a bar drawn to a height representative of the frequency of those values in a set of data.
Line graph: A line graph displays data for interval- or ratio-level dependent variables (on the y-axis), as a function of interval- or ratio-level independent variables (on the x-axis).
Normal distribution (normal curve): A mathematical function that graphs as a particular bell-shaped curve.
Scatterplot: A scatterplot displays the individual data points in a sample for two interval- or ratio-level variables, one plotted on the x-axis, the other on the y-axis.
Correlation
A correlation refers to a particular quantitative measurement (e.g., correlation coefficient) of the direction and strength of the relationship between two or more measured variables.
A Pearson correlation coefficient, which is denoted by the letter r and ranges from −1 to +1, quantifies the linear relationship between two variables in terms of the direction and strength.
The direction of the correlation is conveyed by the sign of the correlation coefficient and can be a positive correlation (variables change in the same direction), a negative correlation (variables change in opposite directions), or a zero correlation (values of one variable are not related to changes in the value of the other variable).
The strength of the correlation coefficient is conveyed by the absolute value of the statistic r, with the upper bound of 1 and the lower bound of 0. Values of r closer to −1 or +1 indicate stronger relationships; values closer to 0 indicate weaker relationships.
A correlation only allows us to conclude that two variables covary to some degree. Correlation does not indicate causation.
Key Visuals
Overview: Review the following figures. If the figures don’t look familiar or you are puzzled about their meaning, we encourage you to go back through the course and reread the relevant material.
Direction of Correlation
Note. The direction of correlation, characteristics of the scatterplot, and things to look for to identify the type of correlation are provided for positive, negative, and zero correlation scatterplots. For positive correlations, both variables increase or decrease together, the correlation coefficient is a positive value, and data in the scatterplot extend from lower left to upper right. For negative correlations, one variable increases and the other variable decreases, the correlation coefficient is a negative value, and data in the scatterplot extend from the upper left to the lower right. For zero correlations, no relationship is evident between the change of one variable and the change in the other variable, the correlation coefficient is very close to zero, and data in the scatterplot is scattered randomly around a flat line.
Strength of Correlation
Note. The strength of correlation, characteristics of the scatterplot, and things to look for to identify the strength of the correlation are provided for perfect, strong, and weak correlation scatterplots. For a perfect correlation, both variables change at the same rate, the correlation coefficient is either a value of positive one or negative one, and the scatterplot data form a straight line. For a strong correlation, the variables change at a very similar rate with slight spread, a correlation coefficient that is closer to a value of positive or negative one than to zero, and with data that closely, but imperfectly, represent a straight line. For a weak correlation, the variables change at different rates with large spread, a correlation coefficient that is closer to zero than to a value of positive or negative one, and with data that loosely represent a straight line.
Key Vocabulary
Correlation: A particular quantitative measurement of the direction and strength of the relationship between two or more variables.
Correlation coefficient: The sample statistic in a correlational analysis that quantifies the linear relationship between two variables and ranges from +1 to −1.
Negative correlation: A relationship in which two variables change in opposite directions, with one variable increasing and the other decreasing—an inverse relationship.
Pearson correlation coefficient (Pearson product-moment correlation coefficient): A correlation statistic used to measure the linear relationship between variables that have interval- or ratio-level measurements.
Positive correlation: A relationship in which two variables change in the same direction, either both increasing or both decreasing.
Zero correlation: Changes in the values of one variable are not related to changes in the value of the other variable.
Inferential Statistics
Inferential statistics are used to make inferences about the population from which a sample was collected, which are based on probability and sampling theory. Using samples to make inferences about populations adds uncertainty, known as sampling error, to a statistical conclusion.
The mean and standard deviation of the sample are unlikely to be identical to the mean and standard deviation of the population. Sampling error occurs when sample statistics depart from the corresponding population parameters by some amount due to chance, which is why probability is important. Inferential statistical techniques allow us to make inferences about the population based on probability and the sample.
Probability theory is used to know what it means to draw a random sample, or how to estimate how likely it is that we might observe a particular value of a variable or statistic for our sample. A probability of exactly 0 means that an event will never occur, and a probability of exactly 1 means that an event will certainly occur.
A single probability, for example, might be the probability of landing on your feet after a backflip or the probability that you will catch COVID after attending a music festival. Other times we are interested in a range of probabilities, in which case we refer to a probability distribution, which is simply a representation of a set of probabilities.
For continuous variables, probabilities are determined by the area under a curve of the probability distribution defined by a mathematical function. The mathematical function that is used depends on the statistical test being used, such as the functions listed in the Continuous Probability Distributions figure. Such distributions are theoretical distributions, because they are based on mathematical functions, rather than empirical frequency distributions, which are based on numerical samples. For probability distributions, the x-axis lists the possible range of values and the y-axis lists the resulting value from the relevant mathematical function, which varies depending on the statistical test. But what we are more interested in is the area under the curve for a range of x-values.
The key point is that the specific shape of the distribution is less important than the idea that we can determine the proportion of the whole area under the curve for any range of values. This area represents the probability of observing any of the x-axis values within the specific range being investigated.
The normal distribution is a special type of theoretical distribution that is useful for making predictions. The normal distribution can be observed in many situations, which makes it useful in hypothesis testing.
Normal distributions are symmetrical around a center point equal to the mean, median, and mode of the distribution. Because they are symmetrical, each half mirrors the other with identical proportions, which means that areas under corresponding regions of the curve are always identical proportions of the whole. In total, the area under the curve totals to a proportion of 1.
The region between the mean and 1 standard deviation above the mean has an area that makes up about 34.1% of the total area under the distribution. This same proportion of area is also under the curve between the mean and −1 standard deviation below the mean.
All normal distributions will have the same proportion of the area between 1 standard deviation above and below the mean (about 68.2%), 2 standard deviations above and below the mean (about 95.4%), and 3 standard deviations above and below the mean (about 99.8%). A distribution that is not symmetrical and that does not have this pattern of proportions is not a normal distribution. The Common Features of Normal Distributions figure illustrates these important features.
The normal distribution is a good predictor of the population if the observed sample data approximate a normal distribution. Any data that approximate a normal distribution can be analyzed in this way.
The normal curve is an important tool for inferential statistics because it forms the basis of many statistical tests. We refer to these tests as parametric tests because they rely on an assumption of normality of the parameters of the population distribution from which the sample was taken. If this assumption is not met, then a different category of statistical tests can be used, nonparametric tests.
Key Visuals
Overview: Review the following figures. If the figures don’t look familiar or you are puzzled about their meaning, we encourage you to go back through the course and reread the relevant material.
Continuous Probability Distributions
Note. The figure shows examples of continuous probability distributions, with the value of a variable plotted on the x-axes and relative frequency plotted on the y-axes. From left to right, examples of a rectangular distribution, one member from a family of t distribution, three members of the family of F distributions, and three members of the family of ꭓ2 (i.e., chi-square) distributions. The specific shapes of the distributions are less important than the idea that we can identify a region under any of these curves and describe what proportion that region is of the whole area under the curve as a way of estimating the probability of observing any of the x-axis values within the specified range.
Common Features of Normal Distributions
Note. A normal distribution illustrates that they are symmetrical around the center point with an equal mean, median, and mode of the distribution. A normal distribution illustrates that the regions under the curve are proportional. When divided in units of standard deviation, the regions +1 or −1 standard deviation of the mean are each about 0.341 of the total. Each area between 1 and 2 (and between −1 and −2) standard deviations account for about 0.136 of the total. Each area between 2 and 3 (and between −2 and −3) standard deviations account for about 0.021 of the total. Each area between 3 and 4 (and between −3 and −4) standard deviations account for about 0.001 of the total. The area above 4 or below −4 standard deviations will account for the balance so that the total area under the curve will equal 1.
Key Vocabulary
Inferential statistics: Statistical techniques used to draw conclusions about a population from a sample taken from that population.
Nonparametric test: A type of hypothesis test that does not make any assumptions (e.g., of normality or homogeneity of variance) about the population of interest.
Parametric test: A hypothesis test that involves one or more assumptions about the underlying arrangement of values in the population from which the sample is drawn.
Probability: A value between 0 and 1, inclusive, that represents the long-run relative frequency that a process will yield a particular event.
Probability distribution: A representation of a set of probabilities.
Sampling error: The difference between a sample statistic’s estimate of a population parameter and the actual value of the parameter.
Null Hypothesis Logic
Null hypothesis significance testing (NHST) compares sample data to a null result (i.e., a result in which there is no difference or no relationship).
The first step of NHST is to set up the null hypothesis (H0); this hypothesis assumes there is only random variability in the sample and that there is no difference between groups or relationship between variables.
The alternative hypothesis (with notation H1), on the other hand, indicates that the data do reflect some difference between groups or relationship between variables (i.e., there is more than random variation in the data).
The second step of NHST is to specify which parts of the null probability distribution are unlikely (i.e., have a low probability). The probability associated with the unlikely areas in the null distribution is called α (alpha level), and is usually set at or near .05.
The third step of NHST is to calculate the sample statistic and the associated probability, p, of encountering that statistic (or any statistics that are more extreme) at random from the null distribution. This probability is known as the p value.
The fourth step of NHST is making a statistical decision to reject the null hypothesis or fail to reject the null hypothesis. If p < α (i.e., p < .05, when α = .05), then the decision is to reject null, which indicates that the result is statistically significant. If p > α, then the decision is to fail to reject the null hypothesis and provides support for the alternative hypothesis.
It is possible that the decisions in null hypothesis significance testing are not correct. These statistical errors are known as Type I error and Type II error.
A Type I error is a false positive (e.g., a false alarm), in which the statistical decision is to reject the null hypothesis, but the null is actually true (i.e., there is no actual difference). We limit the likelihood of a Type I error by setting the alpha level before a study begins to a small portion of the distribution, such as .05. Type I error is equal to the alpha level.
A Type II error is a false negative, in which the statistical decision is to retain the null hypothesis, but the null is actually false (i.e., there is an actual difference). The likelihood of a Type II error is equal to ꞵ.
Key Visuals
Overview: Review the following figure. If the figure doesn’t look familiar or you are puzzled about the meaning, we encourage you to go back through the course and reread the relevant material.
Type I and Type II Error Table
Note. The table shows the probabilities of making the two correct decisions (i.e., retaining the null when it is true, with probability one minus alpha, and rejecting the null when it is false, with probability one minus beta) and of making the two incorrect decisions (i.e., retaining the null when it is false, with probability beta, and rejecting the null when it is true, with probability alpha). The statistical decision you make and the value of the “truth” in the population intersect to create correct decisions or incorrect decisions. For incorrect decisions, a Type I error is a false positive in which the null hypothesis is rejected when it is true. A Type II error is a false negative in which the null hypothesis is retained when it is actually false.
Key Vocabulary
Alpha level (α): The value, selected by the researcher, that marks an extreme probability of the null distribution in which scores are less likely to occur by chance; this value is also the likelihood of making a Type I error.
Alternative hypothesis (research hypothesis): A hypothesis that contradicts the null hypothesis by stating that the population parameter is less than, greater than, or not equal to the value given in the null hypothesis.
Null hypothesis (H0): The hypothesis that there is no difference between a certain population parameter and another value.
Null hypothesis significance testing (NHST): A set of procedures used to determine whether the differences between two groups or models are statistically significant (i.e., unlikely to arise solely from chance).
p value: The probability of observing a test statistic or one more extreme when the null hypothesis is true.
Statistically significant: A result is statistically significant when it is improbable if the null hypothesis is true.
Type I error: A statistical error in which the hypothesis test decision was to reject the null hypothesis, but the null is actually true; this is considered a false positive, the probability of which is represented by α (alpha).
Type II error: A statistical error in which the hypothesis test decision was to fail to reject the null hypothesis, but the null is actually false; this is considered a false negative, the probability of which is represented by β (beta).
Inferential Tests
An important step for selecting a significance test includes assessing whether the purpose of the study is to analyze differences among means, relationships between variables, or the distribution of frequencies over the values of one or more variables.
Significance tests that analyze differences among means include t-tests and analysis of variance (ANOVA) tests.
Significance tests that analyze relationships between variables include correlation analyses and regression analyses.
Significance tests that analyze the distribution frequencies of variables include chi-square analyses.
If the study involves analyzing differences among means, then whether a t-test, an ANOVA, or a different test is appropriate will depend on the number of sample means, the number of independent variables, whether the samples are related or not, and whether the statistical assumptions for the test are met.
A t-test is used when the population standard deviation is unknown and tests whether one mean differs from a known value, requiring a one-sample t-test, or two means differ from each other, requiring either a paired-samples t-test or an independent-samples t-test, depending on whether the conditions are related or unrelated.
An ANOVA requires one dependent variable measured at the interval or ratio level and one or more independent variables defined at the nominal level. When there is one independent variable, then it is referred to as a one-way ANOVA or as a factorial ANOVA when there is more than one independent variable. When the independent variables are related, then an ANOVA is referred to as a within-groups design or as a between-groups design when unrelated.
The t-tests and ANOVAs are both types of parametric tests and only two of many types of significance test. With each, there are statistical assumptions that should be met if the test is to yield reliable results. A common assumption for parametric tests is the assumption of normality. If this assumption is not met, then one option is to use a nonparametric test alternative.
If the study involves analyzing relationships between variables, then whether a correlation analysis or a regression analysis is appropriate will depend on whether the study is to test if a predictor variable, x, predicts an outcome variable, y, and about the number of predictor variables in the study.
A correlation analysis may be used to test whether a correlation coefficient, r, is significantly different from zero. Pearson’s correlation coefficient is a common correlation statistic that measures the relationship between two variables that are measured at the interval or ratio level, and it ranges between −1 and +1. When highly correlated, r is close to −1 and +1, but when poorly correlated then r is close to 0.
A regression analysis is used to investigate how one or more predictor variables can be used in an equation that yields the value of an outcome variable. A bivariate linear regression is used to investigate the linear relationship between a single interval- or ratio-level predictor variable and an interval- or ratio-level outcome variable. A multiple regression is used to investigate the linear relationship between two or more interval- or ratio-level predictor variables and an interval- or ratio-level outcome variable. Nonparametric regression procedures also exist for nominal- and ordinal-level variables.
If the study involves analyzing the distribution of frequencies of variables, then which chi-square test is selected will depend on the number of variables in the study.
A chi-square goodness-of-fit test is a nonparametric null hypothesis significance test to determine if the frequency distribution of nominal-level data matches that of a set of expected frequencies.
A chi-square test of independence is a nonparametric null hypothesis significance test to determine if the frequency distributions of two variables match.
Key Visuals WIP
Overview: Review the following figures. If the figures don’t look familiar or you are puzzled about their meaning, we encourage you to go back through the course and reread the relevant material.
Selecting a Statistical Test
Note. This decision tree can be used to select an appropriate statistical test based on the design criteria for a study. The question “What is the type of analysis?” is the first to answer, following the arrows to each new question until a statistical test is reached. Step down the rows of the column titled Study Criteria until the design criteria are met then step to the column titled Statistical Test to identify an appropriate statistical test. Select the following document link to download the file.
Nonparametric Alternatives
Note. A table of nonparametric test alternatives. The parametric and nonparametric alternatives include the one-sample t-test and the nonparametric one-sample Wilcoxon signed-rank test; the paired-samples t-test and the nonparametric Wilcoxon signed-rank test; the independent-samples t-test and the nonparametric Mann-Whitney U test; the one–way within-groups ANOVA and the nonparametric Friedman test; the one-way between-groups ANOVA and the nonparametric Kruskal-Wallis test; the factorial within-groups ANOVA and the nonparametric aligned rank transform (ARC) ANOVA; the factorial between-groups ANOVA and the nonparametric aligned rank transform ANOVA, the factorial mixed ANOVA and the aligned rank transform ANOVA; the Pearson correlation and the nonparametric Spearman’s rank correlation; the bivariate linear regression and the nonparametric binary logistic regression; the multivariate linear regression and the nonparametric multinomial logistic regression.
Key Vocabulary
Analysis of variance (ANOVA): A statistical procedure in which the variance in data is divided into distinct components: systematic and random variance.
Between-groups design (between-subjects design): A study in which individuals are assigned to only one of several treatments or experimental conditions and each person provides only one score for data analysis.
Bivariate linear regression: A regression analysis in which one predictor, or independent, variable (x) is assumed to be related to one criterion, or dependent, variable (y) in such a manner that the direction and rate of change of one variable is constant with respect to the changes in the other variable.
Chi-square goodness-of-fit: A null hypothesis test that evaluates whether a variable whose values are categorical fits a theoretical distribution.
Chi-square test of independence: A null hypothesis test that evaluates the relationship between two variables whose values are categorical.
Condition (level): A category or level of a variable whose values are manipulated by a researcher. Study participants are then assigned to receive or be exposed to one or more of the different conditions.
Correlational analysis: A statistical procedure to test whether a correlation coefficient is significantly different from zero, which represents the null hypothesis.
F-Distributions: A continuous probability distribution of all possible values of the F-statistic, usually used in ANOVA and other F-tests. It is asymmetric and has a minimum value of zero but no maximum value.
Factorial ANOVA: A statistical procedure with two or more nominal-level independent variables to understand the effect on an interval- or ratio-level dependent variable.
Factorial between-groups ANOVA: A factorial analysis of variance in which the effects of two or more independent variables are seen through the comparison of scores of different participants observed under separate treatment conditions.
Factorial within-groups ANOVA: A factorial analysis of variance in which the effects of two or more independent variables are seen through the comparison of scores of the same participants observed under all the treatment conditions.
Factorial mixed ANOVA: An factorial analysis of variance in which at least one independent variable is a within-groups factor and at least one independent variable is a between-groups factor.
Independent-samples t-test: A null hypothesis significance test that compares the difference between the means of two completely unrelated samples to an expected difference between means of 0 in the population.
Matched-group design (matched-pairs design): A study involving two groups of participants in which each member of one group is matched with a similar person in the other group, that is, someone who matches them on one or more variables that are not the main focus of the study but nonetheless could influence its outcome.
Mixed design: A study that combines features of both a between-groups design and a within-groups design.
Multiple regression: A statistical technique that is used to describe, explain, or predict (or all three) the variance of an outcome or dependent variable using a score on one predictor or independent variable.
One-sample t-test: A null hypothesis significance test used to compare a sample mean to a population mean.
One-way ANOVA (one-factor ANOVA): An analysis of variance that evaluates the influence of different levels or conditions of a single independent variable upon a dependent variable.
One-way between-groups ANOVA: An analysis of variance in which individuals are assigned to only one of several treatments or experimental conditions and each person provides only one score for data analysis.
One-way within-groups ANOVA: An analysis of variance in which the effects of treatments are seen through the comparison of scores of the same participant observed under all the treatment conditions.
Paired-samples t-test: A null hypothesis significance test used to compare the mean difference between two related samples; the sample mean difference is compared to an expected mean difference of 0 in the population.
Regression analysis: Any of several statistical techniques that are used to describe, explain, or predict (or all three) the variance of an outcome or dependent variable using scores on one or more predictor or independent variables.
t-Distribution: A theoretical probability distribution that plays a central role in testing hypotheses about population means, among other parameters. Also called student’s t-distribution.
t-Test: A statistical test that is used to test hypotheses about two means.
Two-way ANOVA (two-factor ANOVA): An analysis of variance design that isolates the main effects of two independent variables, a and b, and their interaction effect, a x b, on a dependent variable.
Within-groups design (repeated-measures design): An experimental design in which the effects of treatments are seen through the comparison of scores from the same participant observed under all the treatment conditions.
Significance, Effect Size and Confidence Intervals
Statistical significance (often phrased as “p < .05”) has a very narrow, specific meaning and lets us answer the question, “How likely is this result, assuming that the null hypothesis is true?” Because the interpretations of p values have a limited scope, it is important to consider additional statistical information in your analyses, such as effect size and confidence intervals.
Effect size is a measure of the magnitude of the effect, which is not dependent on sample size (unlike test statistics and p values). Effect size, along with information about the study and the variables, conveys information about the importance of a difference and can help a researcher determine if the difference, or relationship, is meaningful in a practical sense.
A common standardized measure of effect size is Cohen’s d, which measures the distance between two means in standard deviation units.
Confidence intervals are used to overcome some of the difficulties of hypothesis testing by providing an interval estimate instead of only a point estimate.
The confidence interval is described in terms of a percentage, such as a 95% confidence interval or a 99% confidence interval, which is the confidence level. The confidence interval procedure, if repeated a large number of times, will generate an interval that includes the population parameter of interest x% of the time, x being the confidence level.
Remember that a 95% confidence interval will contain the population parameter only about 95% of the time. When you have a particular interval (after it is generated), it either contains the population parameter (p = 1) or it does not (p = 0). Yet we cannot be sure which is the case.
The effect size and confidence interval values are typically listed following the p value in a APA style summary of the results of a null hypothesis significance test.
Key Vocabulary
Cohen’s d: A form of effect size that measures the distance between two means in standard deviation units.
Confidence interval: An interval centered on a statistic of interest, bounded by lower and upper limits that define a range of values within which a population parameter will lie with a certain confidence level, given as a percent.
Confidence level: An interval centered on a statistic of interest, bounded by lower and upper limits that define a range of values within which a population parameter will lie with a certain confidence level, given as a percent.
Effect size: A quantitative measure of the magnitude of an effect that is not dependent on sample size.
Interval estimate: A parameter estimate that defines a range of values within which the parameter may lie.
Margin of error: The absolute value of the distance between the center of a confidence interval and the lower and upper bounds of that interval.
Point estimate: A parameter estimate that is restricted to a single value of the variable of interest.
Statistical Power
Statistical power is most relevant to a researcher before a study is conducted; a researcher often wants to know how to achieve enough power to detect a true difference between groups if one exists.
Calculating statistical power is dependent on the sample size (n), Type I error rate (α), and effect size (d).
As sample size increases (and the spread of the distributions decreases), power increases.
As Type I error increases (i.e., there is a larger rejection region), power increases.
As effect size increases (i.e., there is a larger mean difference between the groups), power increases.
Power can also be calculated as 1-β.
Key Visuals
Overview: Review the following figures. If the figures don’t look familiar or you are puzzled about their meaning, we encourage you to go back through the course and reread the relevant material.
Power and Type II Error
Note. The null distribution, H0, is on the left and the alternative distribution, H1, is on the right with a vertical line, the critical z value, over the right tail of the null distribution. The critical z value for the null distribution divides the alternative distribution into two regions. The region to the left of the critical z value represents the Type II error and the region to the right of the critical z value represents statistical power. The critical value, marked as a vertical black line, divides the alternative distribution (H1), the right-most distribution, into two regions, β (Type II error) to the left and power to the right.
Factors That Influence Power
Note. A causal chain is listed for each of the primary factors that affect power, which are power, effect size, and alpha value. As sample size increases, variability decreases, the overlap between the null and alternative distributions decreases, and power increases. As effect size increases, the mean distance between the null and alternative distributions increases, the overlap between the null and alternative distributions decreases, and power increases. As alpha increases, the Type I error rate increases, the Type II error rate decreases, and power increases.
Key Vocabulary
Statistical power: The ability to detect a difference, or relationship, between groups if a difference exists.
Alternatives to NHST
Even from the time when psychologists and other scientists were only just beginning to use null hypothesis significance testing (NHST) methods consistently, other psychologists and scientists were objecting that these methods had numerous potential problems.
Among the potential problems were concerns that NHST methods
are a poor substitute for scientific judgment, turning a continuous dimension like probability (p) into a mechanical decision of significant vs. not significant;
use bad logic that is confusing and that leads to frequent misunderstandings and misinterpretations;
test a hypothesis that no one believes anyway, namely that there is literally zero difference between conditions or zero relationship between variables; and
distort the scientific process.
Among the ways that NHST methods distort the scientific process (the last point, immediately above), they
allow smaller samples to have out-sized influence by being more likely to produce false positive results;
similarly allow true small effect sizes to produce misleadingly large, positive results;
allow scientists to use a scattershot approach to conducting a large number of statistical tests without sufficiently controlling for the possibility that at least one false report becomes increasingly likely as the number of tests increases;
allow scientists to take advantage of flexibility in the definition of variables and choice of statistical methods in such a way that makes false reports more likely;
are susceptible to outside influences and biases so that false positive findings are more likely to occur; and
are even more susceptible to these outside influences and biases if the research topic is one that draws greater interest from the scientific community.
In response to concerns about NHST methods, a number of alternatives have been proposed, including
Bayesian inference, which depend on a view of probability in which probabilities represent how existing beliefs (priors) are updated as new information becomes available, resulting in new beliefs (posterior probabilities), instead of as the long-run relative frequencies of members of the set of possible events (termed the frequentist view);
estimation methods and estimation graphics that use NHST methods but shift the focus from testing the null hypothesis to estimating the size of an effect;
meta-analysis methods, which summarize, often in the form of a weighted average, all the available studies of a given effect size, employing forest plots and other specialized methods to summarize the relevant studies; and
model building, which emphasizes the specification of the most relevant variables and the mathematical relationships between them to create an equation or equations that define those relationships.
Key Visuals
Overview: Review the following figures. If the figures don’t look familiar or you are puzzled about their meaning, we encourage you to go back through the course and reread the relevant material.
Estimation Graphic
Resilience Forest Plot
Note. Forest plots include a list of the studies included in the meta-analysis along with relevant details such as means, sample sizes, and so on, along with a measure of effect size, Hedges’s g in this case. Each of these observations is also plotted on an axis appropriate to the effect size measure, and the weighted average is depicted below the list of included studies, indicated in this case by a diamond. The individual data points represent entire studies that are themselves made up of some number of participants; these data points vary in size to represent the sample size of the study, and the points are accompanied by a bar that represents the 95% confidence interval around the effect size.
Key Vocabulary
Bayesian inference: A form of reasoning depending on the idea that a probability is a degree of belief, rather than a long-run frequency, such that a prior set of beliefs can be updated by new evidence to create a posterior set of beliefs.
Estimation graphics: A graphical representation of an interval estimation in which the focus of the graph is on an effect size, rather than on a null hypothesis.
Estimation methods: An approach to scientific inference that focuses on estimating the size of an effect rather than on testing a null hypothesis.
Forest plot: A type of graph that summarizes the findings of a meta-analysis.
Frequentist: The view of probability in which probabilities are interpreted as hypothetical long-run frequencies.
Meta-analysis: An approach to scientific inference that depends on collecting together relevant studies that estimate a particular effect size and then summarizing those estimates with meta-analytic statistics and graphics such as a forest plot.
Model: In science, a detailed depiction of a phenomenon of interest, usually in mathematical form and usually specifying the important variables involved and the relationships between them.
Posterior probabilities: In Bayesian inference, the probability distribution that describes the pattern of beliefs in possible hypotheses after new evidence has been taken into account.
Prior probabilities: In Bayesian inference, the probability distribution that describes the pattern of beliefs in possible hypotheses before new evidence has been taken into account.
Course Glossary
Alpha level (α): The value, selected by the researcher, that marks an extreme probability of the null distribution in which scores are less likely to occur by chance; this value is also the likelihood of making a Type I error.
Alternative hypothesis (research hypothesis): A hypothesis that contradicts the null hypothesis by stating that the population parameter is less than, greater than, or not equal to the value given in the null hypothesis.
Analysis of variance (ANOVA): A statistical procedure in which the variance in data is divided into distinct components, systematic and random variance.
Bar graph: A graph in which each value of a nominal- or ordinal-level variable is represented by a bar drawn to a height representative of that value’s frequency in a set of data (or of some other dependent variable).
Bayesian inference: A form of reasoning depending on the idea that a probability is a degree of belief, rather than a long-run frequency, such that a prior set of beliefs can be updated by new evidence to create a posterior set of beliefs.
Between-groups design (between-subjects design): A study in which individuals are assigned to only one of several treatments or experimental conditions and each person provides only one score for data analysis.
Bimodal: A distribution that includes two peaks or modes.
Bivariate linear regression: A regression analysis in which one predictor, or independent, variable (x) is assumed to be related to one criterion, or dependent, variable (y) in such a manner that the direction and rate of change of one variable is constant with respect to the changes in the other variable.
Categorical variables (qualitative variables): Variables that have values that are types, kinds, or categories and that do not have any numeric properties.
Central tendency: A measure that describes the middle of a distribution; most commonly measured with the mean, median, and mode.
Chi-square goodness-of-fit: A null hypothesis test that evaluates whether a variable whose values are categorical fits a theoretical distribution.
Chi-square test of independence: A null hypothesis test that evaluates the relationship between two variables whose values are categorical.
Confidence interval: An interval centered on a statistic of interest, bounded by lower and upper limits that define a range of values within which a population parameter will lie with a certain confidence level, given as a percent; for example, a 95% confidence interval around a sample mean.
Confidence level: The percentage of the time that the population parameter will be found within a confidence interval if the sampling procedure is repeated many times.
Cohen’s d: A form of effect size that measures the distance between two means in standard deviation units.
Condition (level): A category or level of a variable whose values are manipulated by a researcher; study participants are then assigned to receive or be exposed to one or more of the different conditions.
Continuous variables: Variables with values that can always be further divided, and where it is at least theoretically possible to find an observation with a value that will fall between two other values, no matter how close they are.
Correlation: A particular quantitative measurement of the direction and strength of the relationship between two or more variables.
Correlational analysis: A statistical procedure to test whether a correlation coefficient is significantly different from zero, which represents the null hypothesis.
Correlation coefficient: The sample statistic in a correlational analysis that quantifies the linear relationship between two variables and ranges from −1 to +1.
Dependent variable: A variable that is measured for change or effect after the occurrence or variation of the independent variable in an experiment.
Descriptive statistics: Statistics that summarize a set of data; for example, a measure of central tendency or variability.
Deviation score: The value obtained when you subtract an observation minus the sample mean, that is x − M.
Discrete variables: Variables represented by whole numbers and for which there are no intermediate values for two values that are adjacent on the scale.
Effect size: A quantitative measure of the magnitude of an effect that is not dependent on sample size.
Estimation graphics: A graphical representation of an interval estimation in which the focus of the graph is on an effect size, rather than on a null hypothesis.
Estimation methods: An approach to scientific inference that focuses on estimating the size of an effect rather than on testing a null hypothesis.
F-Distributions: A continuous probability distribution of all possible values of the F-statistic, usually used in ANOVA and other F-tests. It is asymmetric and has a minimum value of zero but not maximum value.
Factorial ANOVA: An experimental design in which two or more nominal-level variables are simultaneously manipulated or observed in order to study their joint influence (interaction effect) and separate influences (main effects) on a separate interval- or ratio-level dependent variable.
Factorial between-groups ANOVA: A factorial analysis of variance in which the effects of two or more independent variables are seen through the comparison of scores of different participants observed under separate treatment conditions.
Factorial within-groups ANOVA: A factorial analysis of variance in which the effects of two or more independent variables are seen through the comparison of scores of the same participants observed under all the treatment conditions.
Factorial mixed ANOVA: A factorial analysis of variance in which at least one independent variable is a within-groups factor and at least one independent variable is a between-groups factor.
Forest plot: A type of graph that summarizes the findings of a meta-analysis.
Frequency distribution: A graph that shows the frequency of occurrence (y-axis) of the values of a variable (x-axis).
Frequency polygon: A graph with a connected series of points in which individual values or ranges of values of an interval- or ratio-level variable are each represented by a point plotted at a height representative of the frequency of those values in a set of data.
Frequentist: The view of probability in which probabilities are interpreted as hypothetical long-run frequencies.
Histogram: A graph in which individual values or ranges of values of an interval- or ratio-level variable are represented by a bar drawn to a height representative of the frequency of those values in a set of data.
Independent-samples t-test: A null hypothesis significance test that compares the difference between the means of two completely unrelated samples to an expected difference between means of 0 in the population.
Independent variable: The variable in an experiment that is specifically manipulated to test for an effect on a dependent variable; in regression analysis, an independent variable is likely to be referred to as a predictor variable.
Inferential statistics: Statistical techniques used to draw conclusions about a population from a sample taken from that population.
Interval scale: A scale of measurement where the values are ordered and evenly spaced along some underlying dimension, but there is not a true 0 value.
Interval estimate: A parameter estimate that defines a range of values within which the parameter may lie.
Line graph: A line graph displays data for interval- or ratio-level dependent variables (on the y-axis), as a function of interval- or ratio-level independent variables (on the x-axis).
Margin of error: The absolute value of the distance between the center of a confidence interval and the lower and upper bounds of that interval.
Matched-group design (matched-pairs design): A study involving two groups of participants in which each member of one group is matched with a similar person in the other group, that is, someone who matches them on one or more variables that are not the main focus of the study but nonetheless could influence its outcome.
Mean: A measure of the average value in a data set calculated by adding all the scores and dividing by the number of scores in the data set.
Median: A measure of the midpoint (middle value) of a data set in which the values have been sorted in order from the minimum to the maximum.
Meta-analysis: An approach to scientific inference that depends on collecting together relevant studies that estimate a particular effect size and then summarizing those estimates with meta-analytic statistics and graphics such as a forest plot.
Mixed design: A study that combines features of both a between-groups design and a within-groups design.
Mode: The most frequently occurring value in a data set.
Model: In science, a detailed depiction of a phenomenon of interest, usually in mathematical form and usually specifying the important variables involved and the relationships between them.
Multimodal: Having multiple (usually more than two) peaks or modes.
Multiple regression: A statistical technique that is used to describe, explain, or predict (or all three) the variance of an outcome or dependent variable using a score on two or more predictor or independent variables.
Negative correlation: A relationship in which two variables change in opposite directions, with one variable increasing and the other decreasing—an inverse relationship.
Negatively-skewed: A distribution in which the tail on the left side of the distribution is longer than the tail on the right side.
Nominal scale: A scale of measurement that has mutually exclusive, unordered values that represent types, kinds, or categories.
Nonparametric test: A type of hypothesis test that does not make any assumptions (e.g., of normality or homogeneity of variance) about the population of interest.
Normal distribution (normal curve): A mathematical function that graphs as a particular bell-shaped curve.
Null hypothesis (H0): The hypothesis that there is no difference between a certain population parameter and another value.
Null hypothesis significance testing (NHST): A set of procedures used to determine whether the differences between two groups or models are statistically significant (i.e., unlikely to arise solely from chance).
One-sample t-test: A null hypothesis significance test used to compare a sample mean to a population mean.
One-way ANOVA (one-factor ANOVA): An analysis of variance that evaluates the influence of different levels or conditions of a single independent variable upon a dependent variable.
One-way between-groups ANOVA: An analysis of variance in which individuals are assigned to only one of several treatments or experimental conditions and each person provides only one score for data analysis.
One-way within-groups ANOVA: An analysis of variance in which the effects of treatments are seen through the comparison of scores of the same participant observed under all the treatment conditions.
Operationalize: Precisely defining a variable with specific measurable terms.
Ordinal scale: A scale of measurement where the values are ordered along some underlying dimension, but not spaced equally along that dimension.
Outcome variable: An effect that one wants to predict or explain in nonexperimental research, such as correlational or regression research; the term sometimes is used interchangeably with dependent variable.
Outlier: An observation judged to be so extreme that it may differ in some way from other observations in the sample.
p Value: The probability of observing a given test statistic or one more extreme when the null hypothesis is true.
Parametric test: A hypothesis test that involves one or more assumptions about the underlying arrangement of values in the population from which the sample is drawn.
Paired-samples t-test: A null hypothesis significance test used to compare the mean difference between two related samples; the sample mean difference is compared to an expected mean difference of 0 in the population.
Pearson correlation coefficient (Pearson product-moment correlation coefficient): A correlation statistic used to measure a linear relationship between variables that have interval- or ratio-level measurements.
Point estimate: A parameter estimate that is restricted to a single value of the variable of interest.
Positive correlation: A relationship in which two variables change in the same direction, either both increasing or both decreasing.
Positively-skewed: A distribution in which the tail on the right side of the distribution is longer than the tail on the left side.
Posterior probabilities: In Bayesian inference, the probability distribution that describes the pattern of beliefs in possible hypotheses after new evidence has been taken into account.
Predictor variable: A variable used in regression and other analyses to predict the value of an outcome variable based on the strength and direction of its association with the outcome; the term sometimes is used interchangeably with independent variable.
Prior probabilities: In Bayesian inference, the probability distribution that describes the pattern of beliefs in possible hypotheses before new evidence has been taken into account.
Probability: According to the frequentist view, a value between 0 and 1, inclusive, that represents the long-run relative frequency that a process will yield a particular event; according to the Bayesian view, a value between 0 and 1, inclusive, that represents a degree of belief.
Probability distribution: A representation of a set of probabilities.
Qualitative variables (categorical variables): Variables that have values that are types, kinds, or categories and that do not have any numeric properties.
Quantitative variables: Variables that have values with numeric properties such that, at a minimum, differences among the values can be described in terms of “more” or “less.”
Range: The number of values between the minimum and maximum values; subtract the minimum from the maximum to calculate the range.
Ratio scale: A scale of measurement where the values are ordered, with evenly-spaced increments and a meaningful zero.
Regression analysis: Any of several statistical techniques that are used to describe, explain, or predict (or all three) the variance of an outcome or dependent variable using scores on one or more predictor or independent variables.
Sampling error: The difference between a sample statistic’s estimate of a population parameter and the actual value of the parameter.
Scales of measurement (levels of measurement): The four levels by which variables are categorized: nominal, ordinal, interval, and ratio.
Scatterplot: A scatterplot displays the individual data points in a sample for two interval- or ratio-level variables, one plotted on the x-axis, the other on the y-axis.
Standard deviation: A measure of the typical deviation from the mean. It is calculated by taking the square root of the variance.
Statistical power: The ability to detect a difference, or relationship, if such a difference or relationship exists.
Statistically significant: A result is statistically significant when it is improbable under the assumption that the null hypothesis is true.
t-Distribution: A theoretical probability distribution that plays a central role in testing hypotheses about population means, among other parameters; also called student’s t-distribution.
t-Test: A statistical test that is used to test hypotheses about one or two means.
Two-way ANOVA (two-factor ANOVA): An analysis of variance design that isolates the main effects of two independent variables, a and b, and their interaction effect, a x b, on a dependent variable.
Type I error: A statistical error in which the hypothesis test decision was to reject the null hypothesis when the null was actually true; this is considered a false positive, the probability of which is represented by α (alpha).
Type II error: A statistical error in which the hypothesis test decision was to fail to reject the null hypothesis when the null was actually false; this is considered a false negative, the probability of which is represented by β (beta).
Variability (dispersion): The amount of variation, or difference, in the values of a variable measured for a sample or population.
Variables: Characteristics of an entity, person, or object that can take on different categories, levels, or numerical values.
Variance: A measure of the average squared deviations from the mean.
Within-groups design (repeated-measures design): An experimental design in which the effects of treatments are seen through the comparison of scores from the same participant observed under all the treatment conditions.
Zero correlation: Changes in the values of one variable are not related to changes in the value of the other variable.
References
American Psychological Association. (2020). Publication manual of the American Psychological Association 2020: The official guide to APA style (7th ed.). American Psychological Association.
Chaplin, L. N., Lowrey, T. M., Ruvio, A. A., Shrum, L. J., & Vohs, K. D. (2020). Age differences in children’s happiness from material goods and experiences: The role of memory and theory of mind. International Journal of Research in Marketing, 37(3), 572–586.
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Neyman, J. (1937). Outline of a theory of statistical estimation based on the classical theory of probability. Philosophical Transactions of the Royal Society of London. Series A. Mathematical and Physical Sciences, 236(767), 333−380. https://doi.org/10.1098/rsta.1937.0005
Last updated