Investigator initiated trials are conducted to answer a research question or hypothesis about a general population. Since it would be too expensive and time consuming to test the entire population, we can only sample a portion of the population and hope that the sample is representative. We can use descriptive statistics to describe the sample, and then use inferential statistics, which we will discuss in this article, to make predictions about the entire population.
Inferential Statistics
As the name implies, inferential statistics allows you to use sample statistics to make predictions about the population. In order to make accurate predictions, the sample needs to be truly representative of the population. However, since the sample size is often magnitudes smaller than the actual population size, there will always be some degree of errors. These errors are the differences between the actual population values, also known as parameters, compared to those in the sample. Therefore, it is critical that the sample is randomly selected and unbiased to reduce these potential errors as much as possible. In addition, selecting an adequate sample size (read our previous blog on sample size calculation) will help ensure that your inferences are accurate.
Estimating the Population
Statistics is a measure that is used to describe the sample, and the parameters are used to describe the population. We can use statistics to estimate the population parameters using point estimates and interval estimates. A point estimate is used to estimate a single value, such as using a sample mean to predict the population mean. On the other hand, an interval estimate, also known as a confidence interval, is used to estimate a range in which the values of the population parameter is expected to fall within.
Confidence Intervals
A confidence interval takes into account sampling error and the variability associated with a statistic. For example, a 95% confidence interval would suggest that if you were to conduct the clinical study 100 different times, then your estimate of the population parameter would fall within the specified range of values 95% of the time. However, it’s important to keep in mind that confidence intervals only give you a range, not where exactly the actual values lie.
Hypothesis Testing
We can use inferential statistics to tests a hypothesis or prediction about a particular population. These statistical tests can be classified as parametric or nonparametric, with the former being more statistically powerful because it has to meet several assumptions:
- Population is normally distributed
- Sample size is large enough to represent the population
- Variance within each group are similar
If any of these assumptions are not met, then a nonparametric test should be used.
Types of Variables
The type of statistical analysis suitable for your study depends on the type of variables contained in your data set. Variables can be described as nominal, ordinal, interval, and ratio.
- Nominal variables, also commonly called a categorical variable, are names or labels with no specific order. For example, gender, and types of symptoms are nominal variables.
- Ordinal variables are similar to a categorical variable but have a clear order. For example, a survey questionnaire asks participants to rate their contact lens comfort level. (1-Uncomfortable, 2-Neutral, 3-Comfortable, 4-Very Comfortable)
- Interval variables are similar to ordinal variables except that the values are equally spaced apart. Examples of an interval variable are measurements of temperature and dioptres.
- Ratio variables are similar to an interval variable except that the values cannot fall below zero. Some examples are age, weight, and height.
Comparative Statistics
Inferential statistical tests can either be comparative, correlational, or regressional in nature. Comparative tests are used to compare a point estimate, such as a mean, between two or more groups. The choice of which tests depends on the data type and whether it meets the requirements for a parametric test. Most often, a t-test or an analysis of variance (ANOVA) will be used to compare two or more samples respectively. For more information on which test you should use, refer to Table 1 below.
Correlational and Regression Analysis
Correlation tests are used when you need to compare how well two variables relate to one another. For instance, you can use a correlation test to understand how closely the amount of active ingredient in an eye drop correlates with comfort. Correlation tests, however, do not tell you anything about cause and effect between two variables, only that they are related. If you need to show whether an independent variable (predictor) causes a change in a dependent outcome variable, then use a regression analysis. Please refer to the table below for more information.
Table 1: Common types of statistical tests for comparison, correlation, and regression analysis
Comparison Test |
Comparison variable |
Parametric |
t-test |
Means (2 samples) |
Yes |
ANOVA |
Means (3+ samples) |
Yes |
Mood’s median |
Median (2+ samples) |
No |
Wilcoxon signed-rank |
Distributions (2 samples) |
No |
Mann-Whiney U |
Sum of rankings (2 samples) |
No |
Kruskal-Wallis H |
Mean rankings (3+ samples) |
No |
Correlation tests |
Variable type |
Parametric |
Pearson’s r |
Interval/ratio |
Yes |
Spearman’s r |
Ordinal/interval/ratio |
No |
Chi square test of indepdence |
Ordinal/nominal |
No |
Regression tests |
Variable type |
Parametric |
Simple linear regression |
Interval/ratio |
Yes |
Multiple linear regression |
Interval/ratio |
Yes |
There is a selection of programs that you can use to help you run your statistical analysis, including Excel, Statistica, R, and Graphpad Prism, to name a few. If you need more help in running your statistical analysis or interpreting the data, please contact Sengi.