Choosing R over Python

In this post I’ll be setting out my view for why you may want to learn to code in R rather than in Python, or at least try R. At the end of this post I also share some suggestions for getting started with either.

 

Some background on my programming history (feel free to skip):

I’ve done bits of programming for quite a while. Whilst never being a star programmer or been able to do anything sophisticated for a long time, I have at one point or another written some code in QBASIC, Turbo Pascal, MATLAB, C , C++ and PHP.

I’ve dabbled in a bit of Javascript, HTML and CSS – even going so far as to write my own media-queries in CSS before I knew what responsive frameworks were!

Although not proficient in any of the above, I could do the necessary stuff for coursework and personal websites.

I’m a fair hand at Excel and some might say, pretty good at it. Yes it is kind-of a programming language, in the sense that it is more like programming than it is not.

After all of this, when it came to wanting to do very cool stuff, I discovered the world of Python and really got into it – Python 3 that is 🙂 . Reading many articles, doing online courses, listening to all Talk Python To Me episodes and dabbling around enough to do some data cleaning and a bit of visualisation. I even bought a Raspberri Pi!

So you can see I am not stumbling into R completely blindly when I make my R-over-Python argument.

This brings me onto R. After somehow following a few people like Jesse Mostipak and Mara Averick on Twitter, they made R look so fun and welcoming (not that Python isn’t), that I decided to give it a try. Wow, mind blown! After having done all the other bits and pieces in other languages, R with its Tidyverse set of packages, really makes a lot of sense to me.

 

Before vs after Tidyverse

When I first started with Python, I had read many comments and articles stating that R had a steeper learning curve than Python, but I think this must be a reference R before days of the Tidyverse. The Tidyverse is the name for a collection of packages which all work together really well, with common syntax across them. I can’t tell exactly when the Tidyverse was released but I would guess that it has come into common use in the last two years or so, say from 2017 onwards. I noticed last year there was some of the “should we teach base R or Tidyverse first” type of discussions online, so it’s pretty recent.

My experience is that R has a less-steep learning curve than Python.

 

Data Analysis and visualisation

If your main aim in programming is to manipulate and visualise data, then you should give R a try. Especially if you are coming from an Excel-heavy background. Visualisation is much easier in R.  My first participation in the TidyTuesday challenge really blew my mind. In a few lines of code I was able to do what would have been quite a painful experience in Python.

 

 

Again the reason is the Tidyverse. All the packages for manipulating and analysing data share a common underlying philosophy and syntax which makes it really a pleasure to work with. In contrast, if you want to make a decent visualisation in Python, you’ll need to string a number of packages together (Pandas, Seaborn & Matplotlib) which are just different enough to make it tough to customise.

 

Software engineering vs Data Analysis

I think the core difference comes down to the two different starting points of these languages. Python comes at data science from a software engineering perspective, whereas R comes from a data analysis perspective, so it isn’t surprising which seems to be more comfortable with handling data from an analysis and visualisation perspective.

 

Why choose?

Actually, you don’t have to . There isn’t a Python vs R holy-war, even though some people may be trying to make one happen. You can use one, or the other, or both. Of course the more time you spend on one the more proficient you’ll be at it. My suggestion is that if you are coming from Excel, and want to do analysis and visualisation, try R. You will probably love it.

 

Getting started with R

New to R or haven’t found a resource that “clicks” yet?

Have a look at the R4DS online book. It’s excellent and free. I didn’t get very far before I was hooked. I didn’t even complete half of it and I can already do some cool stuff.

After that, take part in the TidyTuesday challenge on Twitter. It very fun and very welcoming to newcomers – be sure to let us know if it’s your first post!

I have started a YouTube channel called “Other People’s Rstats” with screencasts covering R packages, tips and TidyTuesday posts. I highly recommend it :D.

 

Getting started with Python

The first step I suggest, especially for non-programmers, is to go through the amazing Automate the Boring Stuff with Python book online version. There’s a discount code in it to get a massive discount on the online Udemy course.  This is a great intro to the power of Python to automate things and makes it really engaging to learn about the language.

The next step is another great online resource. Follow Kevin Markham’s youtube series introduction to the Pandas library. It makes data wrangling really fun and I wouldn’t dare do any in Python without this library. Kevin is a really great instructor.

To immerse yourself in Python, I also recommend listening to the Talk Python To Me  podcast.

 

In conclusion

Don’t worry too much about which language you start with. Your decision is not set in stone.

Any learning you do will be valuable and there are many transferable skills if you do decide to switch.