Tuesday, 28 October 2014

aRrrrrrrrgghhh!

A recent excursion into "R" has left me raging, embarrassed and tired.

Many people I respect and like use R in their work and like the way it works. It is backed up by "R Studio" as well as help files and a large, user community. After an enormous amount of frustration caused by Pandas for Python I thought I might try R instead. Let's start with the good points. R Studio is a great way to use R. It is for R what "Spyder" would like (but fails) to be for Python. It is considerably more stable and smooth than Spyder and looks good too. R Studio provides some pretty useful tricks for coding, such as running only the marked text in a script file or running step by step, line by line.
R itself comes with many packages and tools, especially for statistics. This is the main focus of R: statistics for ecologists and biologists. This explains and is explained by the strong connexion with these research communities. Plotting (at least with the standard "plot" function and not the hideous "ggplot") is simple and elegant, much easier than "matplotlib" in Python.

The bad points. If you are new to scripting then R is probably fine, you will learn its "psychology" and work with it. If you have learnt any other language first then R will feel unreasonably and pointlessly quirky and counter-intuitive. Indexing is confusing, lists have names for their elements, which might set them apart from vectors, except that so do vectors. Many hours have been spent by many people developing the packages for R, which provides all that functionality, but why R? It doesn't provide any data structure that is superior to other scripting languages. It's not as if "factors" provide a quick and easy mechanism for sub-setting and analysis, certainly no easier than in other languages. The help files are many and mostly useless for the newbie. They read and look like poorly written, first draft man pages.

I suppose my biggest complaint about R is "what's the point?" All that time and effort could have been put into another language and the statistics would work just as well. As I have stated, R provides no intrinsic advantage for statistical calculations, it is simply that engaged enthusiasts have written the libraries in R. Learning another language is fine but in this case, the language sits at some uncomfortable angle to most others and provides no clear advantage, making learning frustrating but without any real incentive. I suppose I shall just have to learn to accept Pandas and its idiotic time stamp behaviour and Python's general love of bloating with evermore object types for no good reason whatsoever.

Or just stop bitching about this sort of thing.

2 comments:

  1. I agree with you that R is awkward and clumsy, but the real advantage, as you write, is that many important methods are written as R packages by very competent people. This alone, is often a good enough excuse to not to invest a lot of time and effort to code the methods myself. Another aspect is that R allows to pack everything, including data, as package, thus contributing to reproducible science.

    ReplyDelete
    Replies
    1. Certainly, and it is also true that R is useable, it's just that it doesn't provide enough of an incentive to move away from Python and (I hate to say this) Pandas. The data/code package is little incentive, really, when you consider that the same can easily be achieved in other languages or with other methods. I would argue rather that code and data should be kept separate if one wishes to aid reproducibility.

      Delete