Data and Statistical Analysis

There are so many statistical software packages that it is hard to choose. Should you use SAS, SPSS, Stata, Statistica, SigmaPlot, R, or even Excel for your data and statistical analysis?

We are big advocates of using R. More than a piece of software, R is a language and environment for data, statistics, and graphics. It can perform just about any statistical technique (linear and nonlinear modelling, classification, clustering, time-series analysis, etc.) and be used to create graphics for research publication.

There is a steep learning curve when learning any statistical software, but R seems to be known for being difficult to use at first. We encourage everyone to stick with it. Try using R Commander, a graphical user interface (GUI), if you are nervous about coding at first. When you get more comfortable, you can then transition to using RStudio (another GUI) and swirl (a tool that teaches R directly within the console).

After the initial bumps, you will find using R greatly speeds up your data and statistical analysis, in addition to its other advantages. In this guide, we will discuss some of these other benefits of using R.

Why Choose R: Cost

One of the chief advantages of using R is the cost. R is free! Although many statistical software packages offer large discounts for students, after graduation you could pay thousands of dollars per year to continue using the software. For example, SAS is around $10,000 per year, while SPSS could cost about $200 per month.

Most of the different statistical softwares also charge per install. So, if you want to do statistics on the computer in your research lab and on your personal laptop, you will need to pay twice. Not with R. Again, R is free, and can be installed on Windows, Mac, and Linux. With R, you can install it on any computer and with any budget.

Why Choose R: Open Source

Another big advantage of R is the fact that it is open source. This means that anyone can modify and share R. Proprietary software is “closed” and only the companies can make changes. If an update of proprietary software ruins your data and statistical analysis, you will have to wait for the next update, or reinstall to an earlier version. With R, you can fix bugs yourself.

Since R is open source, there is a lot of effort put into creating, modifying, and sharing packages. These packages provide useful data and statistical analysis and graphical functions. The popular packages caret (for creating predictive models) and ggplot2 (for creating plots) may not have been created if R was not open source. There are currently over 10,000 available packages and more are being added every day.

Additionally, if you have a new procedure for looking at your data, you can create your own package in R. You would likely be out of luck trying to get any of the proprietary software to incorporate your new analytics procedure. Imagine trying to tell Facebook that they should use a different procedure for showing ads. They would ignore you.

Why Choose R: Popular

More than 2 million people use R. That means that if you do get stuck initially when using R (and you will) you can search the error message you receive in Google and easily find a fix. Similarly, if you are trying to do a specific task, search for it because it is likely information already exists on how to do it, and a package may already exist with the functions you need.

The popularity of R also means that it will be around in the long term. It does not make sense to invest time into software that will disappear in a few years. That time will be wasted. R will be around for a long time. It is very popular with statisticians, and is one of the goto software for advanced analytics and machine learning.

Why Choose R: Publication Ready Graphics

If you have ever used Excel, then you know that the standard plots are not research publication ready. They require a lot of modification (colours, axes, font, size, error bars, etc.) and require it for each individual plot. That takes a lot time and mouse clicks. With R, a couple lines of code can create the plots you need with all the customizations. Copy the code, change the variables, and you have a similar plot with minimal effort compared to Excel.

R also makes it easier to create less popular plots, such as a cumulative density function, or a scatterplot matrix. Below is a scatterplot matrix of the iris dataset that is included in R, and which contains measurements of sepal and petal width and length (in cm). This figure was created in R in less than 1 minute with 2 lines of code:

scatterplot matrix Why Choose R Publication Ready Graphics


R simplifies research publication by simplifying data and statistical analysis and figure preparation. Compared to other statistical software packages, R is free, open source, will be around a long time, and has robust analysis and graphical capabilities. It deserves your serious consideration.