Wednesday 29 January 2014

Importing rasters into Python

I have been prompted to post a little piece on importing rasters into Python using GDAL.
The following code is something I use personally and may not be suited to everyones needs but it should be easy to adapt it.

def rasterImport(file):
    '''Import a raster file and return grid plus meta data. Returns three objects: data, meta, metadata '''
    # Tell the user that the function is working
    print '\nImporting ',file
    # Register all of the GDAL drivers
    gdal.AllRegister()
    # Open the file
    raster = gdal.Open(file)
    if raster is None:
        print 'Could not open ',file,'\n'
        sys.exit(1)
    # Get coordinate system and affine transformation parameters
    projec = raster.GetProjection()
    srs = osr.SpatialReference(wkt=projec)
    transf = raster.GetGeoTransform()
    # Get meta data
    # Get short name and then long name of driver
    driveshrt = raster.GetDriver().ShortName
    driver = gdal.GetDriverByName(driveshrt)
    metadata = driver.GetMetadata()
    metakeys = metadata.keys()
    # Read the file band to a 'matrix' called band
    band = raster.GetRasterBand(1)
    # Access data in rastern band as array
    data = band.ReadAsArray(0,0,cols,rows)
    # gdal_grid creates "upside down" files, hence this
    Yres=yres
    if Yres/np.fabs(Yres)==-1:
        Yres=-1*Yres
        data = np.flipud(data)
    # From here you can 'return' the data to the main routine
    # and skip the rest of this.
    #
    # Get the upper left coordinates and the pixel resolution
    ul_x = transf[0]
    ul_y = transf[3]
    xres = transf[1]
    yres = transf[5]
    # Get image size
    rows = raster.RasterYSize
    cols = raster.RasterXSize
    dims = {'xres':xres,'yres':yres,'rows':rows,'cols':cols}
    # Calculate all corners
    ll_x = ul_x
    ll_y = ul_y + (rows * yres)
    ur_x = ul_x + (cols * xres)
    ur_y = ul_y
    lr_x = ur_x
    lr_y = ll_y
    # Make dictionary of corners
    corners = {'ll':(ll_x,ll_y),'ul':(ul_x,ul_y),'ur':(ur_x,ur_y),'lr':(lr_x,lr_y)}
    # Get nodata value
    nandat = band.GetNoDataValue()
    # Get minimum and maximum value
    mindat = band.GetMinimum()
    maxdat = band.GetMaximum()
    # If the stats aren't available yet calculate the min-max
    if mindat is None or maxdat is None:
            (mindat,maxdat) = band.ComputeRasterMinMax(1)
    dats = [nandat,mindat,maxdat]
    # This next bit sums the values for a volume calculation
    sumvals = data[np.where(np.logical_not(data == nandat))]
    sumval = sumvals.sum()
    numvals = len(sumvals)
    avval = sumval/numvals
    volvals = sumval*(math.fabs((transf[1]*transf[5])))
    # Organise the metadata for export
    meta = {'transform':transf,'corners':corners,'dimension':dims,'dats':dats,'sumval':sumval,'numvals':numvals,'avval':avval,'volvals':volvals}
    # Tell user about the file
    if srs.IsProjected:
        print 'projcs 1: ',srs.GetAttrValue('projcs')
        print 'projcs 2: ',srs.GetAuthorityCode('projcs')
        print 'projcs 3: ',srs.GetAuthorityName('projcs')
    print 'geogcs 1:: ',srs.GetAttrValue('geogcs')
    print 'Sum of pixel values = ',sumval,' in range ',mindat,' to ',maxdat
    print 'From ',numvals,' of ',rows,' X ',cols,' = ',rows*cols,' pixels, or ',((1.0*numvals)/(rows*cols))*100,'%'
    print 'Average then is ',avval
    print 'Pixels size, x:',transf[1],' y:',transf[5]
    print '"Volume" is then the sum * 1 pixel area: ',volvals
    # Return the data matrix plus metadata to main routine
    return data, meta, metadata

That's it. Not the greatest, most efficient code but it does what I want.

Monday 27 January 2014

Hidden files

Tonight's post isn't strictly GIS, but then not much is. This is really just a post about a short but handy script I've written for showing and hiding invisible files.

In case you didn't know OSX has a large number of hidden files even in the folders that you use. You may have encountered them before in the terminal when using the -a flag on ls.

ls and ls -a 

As you can see, there are several dot-files on my desktop that don't show up with ls or in Finder (normally). The single dot is a stand-in for "here" and the double dot is a stand-in for "one up" or the directory containing this directory. These two are common to all (I think) Unix like systems. The ".DS_Store" file is OSX own file for keeping track of what is happening in that folder. These three will exist in all folders created under OSX and ".DS_Store" will be created by OSX when it opens a folder no matter who created it (this can be turned off for searching through servers).
So how do we get Finder to show these files? Well, we can type

defaults write com.apple.finder AppleShowAllFiles TRUE
killall Finder

into the terminal and that will do it. To hide them again do the following:

defaults write com.apple.finder AppleShowAllFiles FALSE
killall Finder

That's a start but it would be nice to not have to remember all that. We could create aliases for both but another way, a way that introduces Bash scripting is the following

#!/bin/sh
echo "Show or hide dot files"
select yn in "show" "hide"; do
    case $yn in
        show ) defaults write com.apple.finder AppleShowAllFiles TRUE; break;;
        hide ) defaults write com.apple.finder AppleShowAllFiles FALSE; break;;
    esac
done
#
killall Finder

Enter the above text as is into a text file, save it as "hidden.sh" in a directory that is on your path e.g. /usr/local/bin or even better, create your own Bash script folder and put that on the path (my post on paths). You then need to change the permissions for the script so that it can be executed (run as a programme). In the terminal move to the folder where you saved the script, the folder that is now on your path and type

chmod 751 hidden.sh

This command changes the permission for you, the owner to 7, the user called "group" to 5 and others to 1. What does that mean, well in short, 7 means "read, write and run", 5 means "read and run" and 1 means "run". See here for more info.

The "hidden.sh" script in action









So that's it, it works.

Thursday 23 January 2014

gdaltransform has a sed ending

What I wanted this post to be about was how to transform coordinates in a csv file from one coordinate system to another using gdaltransform. But in order to get a nicely functioning code I had to jump through some hoops.

gdaltransform is a command line tool for transforming coordinates

gdaltransform in action
Here I transform RT90 2,5g V to SWEREF99 TM. As you can see, I give the source coordinate system and the target coordinate system then hit return. I can then paste/type coordinates separated by spaces and a coordinate triplet is returned (Eastings, Northings, Elevation). Great, but what if I have a whole list of coordinates? No problem:

gdaltransform on a list of coordinates


So  gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 < indata.csv > outdata.csv took the contents of indata.csv and returned outdata.csv. The problem is that it reads the first two space separated columns and return a file with three space separated columns. If you happen to have a data file that contains more columns than this and/or the coordinate columns aren't ordered so that x,y and perhaps z are the first three columns and/or you are a stickler and think that "csv" should mean "csv" and not "ssv" then you will need to do some work.

Lets just say that your coordinates are in columns 5 and 6 of a file called "Summer2004.csv", which mine are. The following will ask awk to skip the first line (the column headers) and then pass the contents of columns 5 and 6 to gdaltransform via "pipe" (the | thing).

awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 

This dumps the space separated triplets to the terminal. So far so good but not ideal. So then we replace the spaces with commas:

awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}'

Again awk to the rescue. But now we want to put that data at the right hand side of our file i.e. add it as columns to the original table. We can use paste for this:

paste -d',' Summer2004.csv <(awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}') > Summer2004_trans.csv

So, paste wants two files (or things to stick together) and we have told it to put a comma between the one and the other. The first thing/file is the source file "Summer2004.csv" then comes that <(awk .......) bit with all the awk gdal awk messing around and then out to "Summer2004_trans.csv".
Notice that the awk gdal stuff is in parentheses and is pointed at the paste command using <
But there is a problem:

Quicklook of csv file with missing column headers

As you can see from this Quicklook of the csv the three columns are in place but have no headers and have filled from the top. So how do we fix this? This is where sed should come riding in to the rescue. On Linux it does:

paste -d',' Summer2004.csv <(awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}' | sed "1i\X,Y,Z")  > Summer2004_trans.csv

This uses sed to insert at line 1 (1i\) the column headers X,Y,Z before the now complete columns of transformed coordinates are then passed to paste to be combined into the result file.

On Mac OSX it isn't that simple. The standard installation of sed on OSX is an older version and the syntax is different. After a lot of searching for a solution to this I found nothing that works.
Except...
Install a newer version of sed. But what if OSX needs the old version? This is a problem. The OSX standard sed is installed under /usr/bin/ as it should be. Now, this should mean that OSX won't mind if we replace it with a version of sed that uses a different syntax, but let's be honest, something is bound to go wrong. Even if OSX won't throw a wobbly, some of your other installations might.
My solution is as follows:
Go to the sed download page and download sed
Then cd into the unzipped directory and run

./configure
make
sudo make install

This will put a brand new (-ish, 2009) copy of sed in your /usr/local/bin/ directory. Then open your .bash_profile and create a new alias

alias sad='/usr/local/bin/sed'

Then you can restart the terminal and try again using sad instead of sed.

paste -d',' Summer2004.csv <(awk -F "," 'NR>1 {print $5 " " $6}' Summer2004.csv | gdaltransform -s_srs EPSG:3021 -t_srs EPSG:3006 | awk '{gsub(" ",",",$0); print $0}' | sad "1i\X,Y,Z")  > Summer2004_trans.csv

All's well that ends well



A quick update on this post for a question posted on StackExchange. Here is a bash script for processing multiple files:


#!/bin/bash
#
for i in $( ls *.csv ); do
    paste -d',' ${i} <(awk -v OFS="," -F " " 'NR>1 {print $1 " " $2}' ${i} | gdaltransform -s_srs EPSG:4326 -t_srs EPSG:32633 | awk '{gsub(" ",",",$0); print $0}' | /usr/local/bin/sed "1i\X,Y,Z")  > utm${i}

done



Monday 13 January 2014

The Python path

After the rather long introduction to the Bash path we can now move on to the Python path.

Like the Bash path the Python path can be set in different ways. The path can be added to from your .bash_profile using a line such as

export PYTHONPATH=$PYTHONPATH:/Library/Frameworks/GDAL.framework/Versions/1.10/Python/2.7/site-packages

This line works in pretty much the same way as the other PATH exports do in your .bash_profile except that Python reads this variable. So you could just add to your Python path using this method; however, a more common way for Python paths to be added is to use *.pth files stored in locations already on the path.

Basically, Python starts by looking in its own set of directories for all the standard tools it comes with, such as "sys", then if it encounters a file with the extension "pth" it tries to read that file and add the path contained therein to the Python path.

Before we go on with the *.pth files lets look at the address shown in the path export above; it ends with "site-packages" which is a useful folder found in many Python libraries and there is even one in the Python installation directory. As noted in the previous post, I use Anaconda which is a bundle that includes the basic Python installation plus many other useful packages/libraries. So, looking at my Python installation we get the following:

The site-packages directory of my Python installation
As you can see from the image above, my Python installation contains a directory called "site-packages". This directory can contain libraries and files that you install or write yourself, the name "site-packages" implying "things local to this site/computer". It may also contain *.pth files, four of which can be seen listed in the image above.
The paths contained in these files are appended to the sys.pth file by Python automatically. According to Python's own documentation you may alter the sys.pth file as well but this file will be overwritten during an update of Python so it is best to not do that.
The AM_functions.pth file contains the following line:

import sys; sys.path.insert(0,'/Users/andrew/Dropbox/5_Python/Functions')

which is perhaps overkill when a simple

/Users/andrew/Dropbox/5_Python/Functions

would suffice. Which ever type you use the address will be added to the path and the files therein will be found by Python. The advantage of the sys.path.insert method is that I can specify where in the path order the address should be inserted. The number "0" after the opening parenthesis tells Python to insert the address at the first location of the path list (Python counts from 0, not from 1).

That should cover most of the basics of paths, there are some specific path issues that I will deal with as appropriate, especially concerning Anaconda and GDAL, but all that fun will have to wait.


Paths, the route of nearly all evil

Generally speaking, if something isn't working on an OSX GIS it's a path issue. You can spend hours, days, months even, trying to get an installation to work only to discover that the path you thought was all set and correct is being overridden by another path setting.
Firstly, what is a path? A path is a list of addresses that the computer looks in when you tell it to run a programme or read a library. For instance, when you type "gdalinfo" into the terminal the computer looks at the first address in the list and looks for an executable file ("programme") called "gdalinfo". If it doesn't find it at the first address it moves on to the next address and so on.

To check which folders are currently "on your path" go to the terminal (you will find this under "Applications/Utilities" or use cmd-spacebar to bring up Spotlight and type "terminal" there), once in the terminal type "echo $PATH"

The system path


Which looks like this:

~ 16:53:37 $ echo $PATH
/Users/andrew/anaconda/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/MacGPG2/bin:/usr/texbin:/Library/Frameworks/GDAL.framework/Programs:/Library/Frameworks/PROJ.framework/Programs:/Library/Frameworks/PROJ.framework/Versions/4/PROJ:/Library/Frameworks/SQLite3.framework/Programs:/Users/andrew/Documents/BashScripts/gslib90/:/Applications/GRASS-6.4.app/Contents/MacOS/:/Users/andrew/Documents/BashScripts/
~ 16:53:45 $

In the path shown above each address is separated by a colon, thus "/Users/andrew/anaconda/bin" is the first address, "/usr/bin" is the second and so on.

From the image above you may be able to work out that "echo $PATH" means "print to the terminal the contents of the variable called PATH" in Bash. If you don't know it yet "Bash" is almost certainly the language you will use to communicate with the terminal in OSX (as well as Ubuntu and other Unix systems). There are other languages, all very similar, but Bash is the default in OSX.

There is another important path to be aware of: the python path. Python is a scripting language used by many programmes to add functions and aid interaction. It is also a very handy programming language in itself that you should consider learning. In the terminal type "python" to launch the Python interpreter.

Python and the Python path

As you can see, after starting Python I first imported a set of tools called "sys" using "import sys".
Then I asked sys to display the path that Python searches through. Like the system path shown by Bash, the path is one long line of text so instead I asked Python to loop through the path address by address so that the path becomes more readable. Type "quit()" to exit the Python interpreter.

Some of you may be wondering what the "anaconda" bit is all about. I can highly recommend Anaconda as a tidy, easy and easily controlled installation of Python and the many packages that you will need for scientific programming and GIS. Anaconda's website

So how do we set the path? Lets start with the system path, the Bash path.
There are three places where addresses are added to the Bash path permanently (i.e. not lost as soon as the terminal is exited). If you are comfortable using the terminal cd to "/private/etc"

Change directory to /private/etc/ and list the contents

Then type "ls" to list the contents of the folder (if you are wondering, I have set an alias in my .bash_profile so that ls runs with the -laF flags by default, e.g. alias ls='ls -laF'). Look through the list to find the two entries as below.

Path files in /private/etc/
If you would prefer to use Finder then go to the Finder menu bar and click on "Go" then enter the address "/private/etc".

Access a folder using Finder
Now scroll down the contents of the folder until you find "paths" and "paths.d".

Open the file called "paths" by right clicking on it in Finder and choosing "Open With TextEdit" or your text editor of choice. From the terminal you can open it using nano, vim or default editor, e.g. "nano paths" or "open paths".

The paths file in /private/etc/
TextWrangler (TextWrangler homepage) shows the file like this. Looking back at the path again:

etc 18:05:32 $ echo $PATH
/Users/andrew/anaconda/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin:/usr/local/MacGPG2/bin:/usr/texbin:/Library/Frameworks/GDAL.framework/Programs:/Library/Frameworks/PROJ.framework/Programs:/Library/Frameworks/PROJ.framework/Versions/4/PROJ:/Library/Frameworks/SQLite3.framework/Programs:/Users/andrew/Documents/BashScripts/gslib90/:/Applications/GRASS-6.4.app/Contents/MacOS/:/Users/andrew/Documents/BashScripts/

we can recognise this list as the addresses listed after the Anaconda bin folder ("bin" stands for "binary" and implies compiled, executable programmes) and before the usr/X11/bin folder.
If you are still confused about the order be patient, all will become clear soon enough, just recognise that the order in which the addresses are listed here is the same as in the PATH variable.

The next thing to look at is the contents of the "paths.d" folder (directory).

The paths.d folder and its contents

So we can see from the image above that the paths.d folder contains 3 files, simple text files, each containing a single address and nothing more. Comparing their order in the folder with their order in the path we can see that the folder contents are read directly after the "paths" file and that the file name determines the order in the PATH.

So that's two places where the path is set. The third is in your .bash_profile. There is, of course, a fourth location, the system wide profile under "/etc/bashrc" but don't mess with this, it doesn't set any paths anyway, but it could in theory. You may even have a "launchd.conf" file under "/etc" in some older OSXs where the path may be meddled with. Way, way too much.

Lets get back to your .bash_profile, which is accessed most easily via the terminal or via TextWrangler and its ability to show and open hidden files. The name ".bash_profile" starts with a dot which means that the file is hidden from Finder and even from the terminal under normal circumstances.

In the terminal type "cd". This will move you to your home directory no matter where you currently are. Then type "ls". Unless you have already set an alias for ls to show hidden files your .bash_profile won't be listed. Enter "man ls" into the terminal. This opens the "man page" for the "ls" command. "man" is short for "manual" and is intended to be a usage guide for each bash command. Use the arrow keys to navigate down the man page. You should be able to find the following:

    -a      Include directory entries whose names begin with a dot (.).

Press "q" to exit the man page. Now enter "ls -a" into the terminal and it will list hidden files as well. If you prefer try "ls -al" to list all files on separate rows rather than one after the other.

If you do not have a .bash_profile you can easily create one using the terminal's own text editor. Enter "nano .bash_profile" into the terminal. If you have a .bash_profile already it will be opened if not, one will be created and opened for you. If the file is empty you may want to enter some text to explain what the file is, start by typing something along these lines:

The first lines of my .bash_profile




Always start a comment line (text that is to be read by humans and not the computer) with a hash, #
To save this, press ctrl-o to "write out" then ctrl-x to exit nano (note that nano uses ctrl not cmd).
You may continue to edit in nano if you wish or open the .bash_profile file with another editor ("open .bash_profile").

In the following example I set the path for the GDAL library as installed by the KyngChaos installer :

PATH=$PATH:/Library/Frameworks/GDAL.framework/Programs
export PATH

What these two lines say is "make the variable path the same as it is now plus the address /Library/Frameworks/GDAL.framework/Programs then export the PATH to the operating system". Approximately. If the first line had read:

PATH=/Library/Frameworks/GDAL.framework/Programs:$PATH

then the /Library/Frameworks/GDAL.framework/Programs address would be placed at the front of the list, i.e. it would be found first when reading the path.

We can also alter and export the path in a slightly more compact way.

export PATH="/Users/andrew/anaconda/bin:$PATH"

And that's it for the bash path. The Python path is a chapter in itself and I shall have to turn to that next time. Right now it's time for tea.