Tuesday 28 October 2014

aRrrrrrrrgghhh!

A recent excursion into "R" has left me raging, embarrassed and tired.

Many people I respect and like use R in their work and like the way it works. It is backed up by "R Studio" as well as help files and a large, user community. After an enormous amount of frustration caused by Pandas for Python I thought I might try R instead. Let's start with the good points. R Studio is a great way to use R. It is for R what "Spyder" would like (but fails) to be for Python. It is considerably more stable and smooth than Spyder and looks good too. R Studio provides some pretty useful tricks for coding, such as running only the marked text in a script file or running step by step, line by line.
R itself comes with many packages and tools, especially for statistics. This is the main focus of R: statistics for ecologists and biologists. This explains and is explained by the strong connexion with these research communities. Plotting (at least with the standard "plot" function and not the hideous "ggplot") is simple and elegant, much easier than "matplotlib" in Python.

The bad points. If you are new to scripting then R is probably fine, you will learn its "psychology" and work with it. If you have learnt any other language first then R will feel unreasonably and pointlessly quirky and counter-intuitive. Indexing is confusing, lists have names for their elements, which might set them apart from vectors, except that so do vectors. Many hours have been spent by many people developing the packages for R, which provides all that functionality, but why R? It doesn't provide any data structure that is superior to other scripting languages. It's not as if "factors" provide a quick and easy mechanism for sub-setting and analysis, certainly no easier than in other languages. The help files are many and mostly useless for the newbie. They read and look like poorly written, first draft man pages.

I suppose my biggest complaint about R is "what's the point?" All that time and effort could have been put into another language and the statistics would work just as well. As I have stated, R provides no intrinsic advantage for statistical calculations, it is simply that engaged enthusiasts have written the libraries in R. Learning another language is fine but in this case, the language sits at some uncomfortable angle to most others and provides no clear advantage, making learning frustrating but without any real incentive. I suppose I shall just have to learn to accept Pandas and its idiotic time stamp behaviour and Python's general love of bloating with evermore object types for no good reason whatsoever.

Or just stop bitching about this sort of thing.