Wednesday, 3 December 2014
Yosemite, no cat-like reflexes found here
Tuesday, 28 October 2014
aRrrrrrrrgghhh!
Many people I respect and like use R in their work and like the way it works. It is backed up by "R Studio" as well as help files and a large, user community. After an enormous amount of frustration caused by Pandas for Python I thought I might try R instead.
Let's start with the good points. R Studio is a great way to use R. It is for R what "Spyder" would like (but fails) to be for Python. It is considerably more stable and smooth than Spyder and looks good too. R Studio provides some pretty useful tricks for coding, such as running only the marked text in a script file or running step by step, line by line.
R itself comes with many packages and tools, especially for statistics. This is the main focus of R: statistics for ecologists and biologists. This explains and is explained by the strong connexion with these research communities. Plotting (at least with the standard "plot" function and not the hideous "ggplot") is simple and elegant, much easier than "matplotlib" in Python.
The bad points. If you are new to scripting then R is probably fine, you will learn its "psychology" and work with it. If you have learnt any other language first then R will feel unreasonably and pointlessly quirky and counter-intuitive. Indexing is confusing, lists have names for their elements, which might set them apart from vectors, except that so do vectors. Many hours have been spent by many people developing the packages for R, which provides all that functionality, but why R? It doesn't provide any data structure that is superior to other scripting languages. It's not as if "factors" provide a quick and easy mechanism for sub-setting and analysis, certainly no easier than in other languages. The help files are many and mostly useless for the newbie. They read and look like poorly written, first draft man pages.
I suppose my biggest complaint about R is "what's the point?" All that time and effort could have been put into another language and the statistics would work just as well. As I have stated, R provides no intrinsic advantage for statistical calculations, it is simply that engaged enthusiasts have written the libraries in R. Learning another language is fine but in this case, the language sits at some uncomfortable angle to most others and provides no clear advantage, making learning frustrating but without any real incentive. I suppose I shall just have to learn to accept Pandas and its idiotic time stamp behaviour and Python's general love of bloating with evermore object types for no good reason whatsoever.
Or just stop bitching about this sort of thing.Tuesday, 9 September 2014
Kebnekaise is still there.
A couple of years ago I measured it at the end of the ablation season using a Trimble GeoXH, accompanied by "Sveriges television". They managed to put together a long report on it for the main national news. I have not got a link for that. This year we made it on to Aftonbladet TV", so not the same punch, oh well. But the media interest for this is huge. Gunhild Rosqvist, my boss at "Tarfala Research Station", receives a never-ending stream of calls in the run up to this odd little job. Why?
This year I surveyed "Nordtoppen" to 2096.8 metres above sea level and "Sydtoppen" to 2097.5 metres above sea level (both in RH2000, using a Trimble R7 rover and base station). The difference is so small that their relationship could be reversed easily, given another high ablation year. So what's the problem? Well, the ridge between the two is no walk in the park and people are going to die when tourists start insisting on going to "Nordtoppen" instead of "Sydtoppen", so maybe that's why there is all this fuss. Maybe. Or maybe "Sydtoppen" losing another 70 cm of ice will be a symbol of climate disaster, doom and gloom that the collective Swedish psyche can cling to so that they may join in with the rest of the world in suffering.
Tuesday, 15 July 2014
Numpy masked array and NaNs
Now, these are very useful but like much of Python suffer from a surfeit of functions, methods, whatever. In short, it can be very confusing and frustrating to work with Numpy as behaviour isn’t consistent. To be fair this fact is admitted to in the help pages.
Lets’s start by importing what we need:
import sys
import os
import numpy as np
import numpy.ma as ma
import scipy.stats as stats
from scipy.stats.stats import nanmean
import textwrap
Basic arrays
- Create “a” then convert it to array “b”.
- Do the same with “c” to “d”.
“a1” and “b1” are masked arrays each created from list objects and numpy arrays respectively. Using the
.mean
method the mean of the masked values can be calculated.And finally, in “a2” a masked array is created by stating which values to mask out.
a = [1,2,3,4,5,6]
b = np.array(a)
c = [0,0,0,1,0,0]
d = np.array(c)
a1 = ma.masked_array(a, mask=c)
b1 = ma.masked_array(b, mask=d)
a1mean = a1.mean()
b1mean = b1.mean()
a2 = ma.masked_values(a,4)
#
# PRINT TO TERMINAL
print textwrap.dedent("""\
a = [1,2,3,4,5,6]
b = np.array(a)
c = [0,0,0,1,0,0]
d = np.array(c)
a1 = ma.masked_array(a, mask=c)
b1 = ma.masked_array(b, mask=d)
a1mean = a1.mean()
b1mean = b1.mean()
a2 = ma.masked_values(a,4)
""")
print "a: ", a
print "b: ", b
print "a1: ", a1
print "mean of a1: ", a1mean
print "a2: ", a2
print "b1: ", b1
print "mean of b1: ", b1mean
print "\n"*3
2D arrays and how they are masked
This time, when calculating the mean we can either call
.mean()
which returns the mean of all unmasked values or we can specify an axis thus: .mean(0)
. This would return the mean of each column. e = np.array([[1,2,3],[4,5,6]])
f = np.array([[0,0,0],[1,0,0]])
e1 = ma.masked_array(e, mask=f)
e2 = ma.masked_values(e,4)
e1mean = e1.mean()
e1mean0 = e1.mean(0)
e1mean1 = e1.mean(1)
#
# PRINT TO TERMINAL
print textwrap.dedent("""\
e = np.array([[1,2,3],[4,5,6]])
f = np.array([[0,0,0],[1,0,0]])
e1 = ma.masked_array(e, mask=f)
e2 = ma.masked_values(e,4)
e1mean = e1.mean()
e1mean0 = e1.mean(0)
e1mean1 = e1.mean(1)
""")
print "e: ", e
print "e1: ", e1
print "e2: ", e2
print "mean of e1: ", e1mean
print "mean of each column: ", e1mean0
print "mean of each row: ", e1mean1
print "\n"*3
Dealing with NaN values
np.nan
(note that the “np” is short for numpy and was defined at the import). Unfortunately the standard mean
function can’t handle NaN’s so you have to use nanmean
which is imported from scipy.stats.stats
(I know, dumb as hell, but there you go). If we try to use .mean
on an array containing np.nan it returns “nan”. nanmean
gives us the mean of each column though and we can’t just take the mean of the means.g = np.array([[1,2,3],[np.nan,5,6]])
gmean = g.mean()
g_nanmean = nanmean(g)
g_nanmean_all = nanmean(g_nanmean)
#
# PRINT TO TERMINAL
print textwrap.dedent("""\
g = np.array([[1,2,3],[np.nan,5,6]])
gmean = g.mean()
g_nanmean = nanmean(g)
g_nanmean_all = nanmean(g_nanmean)
""")
print "g: ", g
print "gmean: ", gmean
print "nanmean: ", g_nanmean
print "g_nanmean_all: ", g_nanmean_all
print "\n"*3
.mean
does what we want. np.ravel()
will flatten it for us. print "The np.ravel() function: "
print "np.ravel(g) to flatten e: "
print g, "\nwhich becomes ", np.ravel(g), "\n"
gravelnmean = nanmean(np.ravel(g))
#
# PRINT TO TERMINAL
print textwrap.dedent("""\
gravelnmean = nanmean(np.ravel(g))
""")
print "gives: ", gravelnmean
np.nan
when the masked_array
seems more flexible? Well, consistency in implementation in numpy is a bit of a problem. Some packages within numpy contain functions that work in different ways, which can be very frustrating to a newcomer. Mask np.nan values to be doubly sure
- from the NaNs in your array (h1)
- from the NaNs in another array (h2)
- from the mask in another masked_array (h3)
h = np.array([[1,2,3],[np.nan,5,6]])
h1 = ma.masked_where(np.isnan(h),h)
h2 = ma.masked_where(np.isnan(g1),h)
h3 = ma.masked_where(ma.getmask(g1), h)
#
# PRINT TO TERMINAL
print textwrap.dedent("""\
h = np.array([[1,2,3],[np.nan,5,6]])
h1 = ma.masked_where(np.isnan(h),h)
h2 = ma.masked_where(np.isnan(g1),h)
h3 = ma.masked_where(ma.getmask(g1), h)
""")
print "h: ", h
print "h1: ", h1
print "h2: ", h2
print "h3: ", h3
Friday, 4 April 2014
Classes, don't know why but here they are
In this script we will define a type of thing, an “object” in parlance of who ever is that gets excited about these matters. This object is a pixel like thing, it has x,y and z coordinates and we could just call it a “point” but I’ve called it a pixel. When this pixel is passed another pixel by the script it calculates the distance between itself and the other Pixel. That’s pretty much it. Of course, there will be some other stuff in there just to illustrate the various bits and pieces that are “objects” and “classes” but I’ll try to make the annotations as helpful as possible.
Let us begin:
#! /usr/bin/python
import sys
import os
#
# Create our class.
class Pixel:
# Set a counter for how many objects of this class we have created.
pixelCount = 0
# This is the initialisation bit. We pass the new object 3 values.
# If we only pass 2 values the 3 is assigned a default value.
def __init__(self, east,north,elev = 0):
# Now we give the values passed to the initialing thingy to internal variables.
self.easting = east
self.northing = north
self.elevation = elev
# Then we increase the class counter by 1.
Pixel.pixelCount += 1
We can now continue to define functions for the class. Of course functions aren’t called “functions”, they are called “methods” because …
#! /usr/bin/python
import sys
import os
# Create our class.
class Pixel:
# Set a counter for how many objects of this class we have created. This is shared by
# all "Pixels"
pixelCount = 0
# This is the initialisation bit. We pass the new object 3 values.
# If we only pass 2 values the 3 is assigned a default value.
def __init__(self, east,north,elev = 0):
# Now we give the values passed to the initialing thingy to internal variables.
self.easting = east
self.northing = north
self.elevation = elev
# Then we increase the class counter by 1.
Pixel.pixelCount += 1
#
# This function doesn't do a great deal other than tell you how many pixels you have.
def counter(self):
print 'Total number of pixels: %s' % Pixel.pixelCount
#
__init__
bit, that is shared by all objects created from this class.Now lets do something more adventurous. Lets calculate the 2D and 3D distances.
#! /usr/bin/python
import sys
import os
# Create our class.
class Pixel:
# Set a counter for how many objects of this class we have created.
pixelCount = 0
# This is the initialisation bit. We pass the new object 3 values.
# If we only pass 2 values the 3 is assigned a default value.
def __init__(self, east,north,elev = 0):
# Now we give the values passed to the initialing thingy to internal variables.
self.easting = east
self.northing = north
self.elevation = elev
# Then we increase the class counter by 1.
Pixel.pixelCount += 1
#
# This function doesn't do a great deal other than tell you how many pixels you have.
def counter(self):
print 'Total number of pixels: %s' % Pixel.pixelCount
#
# This is a function (method if you want to get all technical) that uses the object's
# own variables plus some other thing.
def dist2d(self,other):
# That other thing is another Pixel object, so check that that is what it is.
if not isinstance(other, Pixel):
sys.exit(1)
# A little unnecessary this but lets just pass variable values on.
e1, n1 = self.easting, self.northing
e2, n2 = other.easting, other.northing
# Some calculation (remember Pythagoras?)
edif = e2 - e1
ndif = n2 - n1
dist2 = (edif**2 + ndif**2)**0.5
# Talk to the user.
print 'Eastings: %s - %s = %s' %(e2,e1,edif)
print 'Northings: %s - %s = %s' %(n2,n1,ndif)
print 'Distance in 2D: %s\n' %(dist2)
# Return the calculated value to whatever called the function (method)
return dist2
#
# As before but this time in glorious 3D.
def dist3d(self,other):
if not isinstance(other, Pixel):
sys.exit(1)
e1, n1, z1 = self.easting, self.northing, self.elevation
e2, n2, z2 = other.easting, other.northing, other.elevation
edif = e2 - e1
ndif = n2 - n1
# We don't really need to create too many new variables,
# so lets just use what we have. Get the vertical distance directly from the
# object's self storage.
zdif = other.elevation - self.elevation
dist3 = (edif**2 + ndif**2 + zdif**2)**0.5
print 'Eastings: %s - %s = %s' %(e2,e1,edif)
print 'Northings: %s - %s = %s' %(n2,n1,ndif)
print 'Elevation: %s - %s = %s' %(other.elevation,self.elevation,zdif)
print 'Distance in 3D: %s\n' %(dist3)
return dist3
Now lets initiate some objects and play around a little.
##! /usr/bin/python
import sys
import os
# Create our class.
class Pixel:
# Set a counter for how many objects of this class we have created.
pixelCount = 0
# This is the initialisation bit. We pass the new object 3 values.
# If we only pass 2 values the 3 is assigned a default value.
def __init__(self, east,north,elev = 0):
# Now we give the values passed to the initialing thingy to internal variables.
self.easting = east
self.northing = north
self.elevation = elev
# Then we increase the class counter by 1.
Pixel.pixelCount += 1
#
# This function doesn't do a great deal other than tell you how many pixels you have.
def counter(self):
print 'Total number of pixels: %s' % Pixel.pixelCount
#
# This is a function (method if you want to get all technical) that uses the object's
# own variables plus some other thing.
def dist2d(self,other):
# That other thing is another Pixel object, so check that that is what it is.
if not isinstance(other, Pixel):
sys.exit(1)
# A little unnecessary this but lets just pass variable values on.
e1, n1 = self.easting, self.northing
e2, n2 = other.easting, other.northing
# Some calculation (remember Pythagoras?)
edif = e2 - e1
ndif = n2 - n1
dist2 = (edif**2 + ndif**2)**0.5
# Talk to the user.
print 'Eastings: %s - %s = %s' %(e2,e1,edif)
print 'Northings: %s - %s = %s' %(n2,n1,ndif)
print 'Distance in 2D: %s\n' %(dist2)
# Return the calculated value to whatever called the function (method)
return dist2
#
# As before but this time in glorious 3D.
def dist3d(self,other):
if not isinstance(other, Pixel):
sys.exit(1)
e1, n1, z1 = self.easting, self.northing, self.elevation
e2, n2, z2 = other.easting, other.northing, other.elevation
edif = e2 - e1
ndif = n2 - n1
# We don't really need to create too many new variables,
# so lets just use what we have. Get the vertical distance directly from the
# object's self storage.
zdif = other.elevation - self.elevation
dist3 = (edif**2 + ndif**2 + zdif**2)**0.5
print 'Eastings: %s - %s = %s' %(e2,e1,edif)
print 'Northings: %s - %s = %s' %(n2,n1,ndif)
print 'Elevation: %s - %s = %s' %(other.elevation,self.elevation,zdif)
print 'Distance in 3D: %s\n' %(dist3)
return dist3
#
# Time to create some objects.
# Create object "a" with Easting = 10, Northing = 10, Elevation = 10.
a = Pixel(10,10,10)
# Create object "b" etc.
b = Pixel(50,45,20)
# Now get to work by assign the variable "ans1" the value returned
# by sending Pixel object "b" to the dist2D function (method) of Pixel object "a"
ans1 = a.dist2d(b)
# Same again but in 3D
ans2 = a.dist3d(b)
print 'ans1 and ans2: %s %s\n' %(ans1,ans2)
print 'Call Pixel.pixelCount Number of pixels is %s' % Pixel.pixelCount
print 'Then call a.counter()'
a.counter()
print 'Then call b.counter()'
b.counter()
#
# What about only sending 2 coordinates?
c = Pixel(20,30)
ans3 = a.dist3d(c)
print '\nans3',ans3
print 'Call Pixel.pixelCount Number of pixels is %s' % Pixel.pixelCount
print 'Then call c.counter()'
c.counter()
#
# We can call an objects variables whenever.
print '\nEasting for a: ', a.easting
# We can even assign it some new ones
a.distTo_b = ans2
print 'Distance a to b: ', a.distTo_b
# It doesn't even have to make sense.
a.stupidName = 'Wibble'
print 'New attribute value: ', a.stupidName
Wednesday, 26 March 2014
gdalinfo and statistics
$ gdalinfo SPOT4_HRVIR2_046-209_2_080911_110453.tif
Driver: GTiff/GeoTIFF
Files: SPOT4_HRVIR2_046-209_2_080911_110453.tif
SPOT4_HRVIR2_046-209_2_080911_110453.tfw
Size is 4440, 4180
Coordinate System is:
PROJCS["SWEREF99 TM",
GEOGCS["SWEREF99",
DATUM["SWEREF99",
SPHEROID["GRS 1980",6378137,298.2572221010002,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6619"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4619"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",15],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","3006"]]
Origin = (610600.000000000000000,7560400.000000000000000)
Pixel Size = (20.000000000000000,-20.000000000000000)
Metadata:
AREA_OR_POINT=Point
TIFFTAG_COPYRIGHT=Lantm?teriet
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 610600.000, 7560400.000) ( 17d39'42.97"E, 68d 8' 9.21"N)
Lower Left ( 610600.000, 7476800.000) ( 17d34'41.45"E, 67d23'12.54"N)
Upper Right ( 699400.000, 7560400.000) ( 19d47'31.12"E, 68d 5'16.24"N)
Lower Right ( 699400.000, 7476800.000) ( 19d38'30.00"E, 67d20'25.87"N)
Center ( 655000.000, 7518600.000) ( 18d40' 6.86"E, 67d44'28.15"N)
Band 1 Block=4440x128 Type=Byte, ColorInterp=Red
Band 2 Block=4440x128 Type=Byte, ColorInterp=Green
Band 3 Block=4440x128 Type=Byte, ColorInterp=Blue
Band 4 Block=4440x128 Type=Byte, ColorInterp=Undefined
$ gdalinfo -mm SPOT4_HRVIR2_046-209_2_080911_110453.tif
Driver: GTiff/GeoTIFF
Files: SPOT4_HRVIR2_046-209_2_080911_110453.tif
SPOT4_HRVIR2_046-209_2_080911_110453.tfw
Size is 4440, 4180
Coordinate System is:
PROJCS["SWEREF99 TM",
GEOGCS["SWEREF99",
DATUM["SWEREF99",
SPHEROID["GRS 1980",6378137,298.2572221010002,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6619"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4619"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",15],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","3006"]]
Origin = (610600.000000000000000,7560400.000000000000000)
Pixel Size = (20.000000000000000,-20.000000000000000)
Metadata:
AREA_OR_POINT=Point
TIFFTAG_COPYRIGHT=Lantm?teriet
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
INTERLEAVE=BAND
Corner Coordinates:
Upper Left ( 610600.000, 7560400.000) ( 17d39'42.97"E, 68d 8' 9.21"N)
Lower Left ( 610600.000, 7476800.000) ( 17d34'41.45"E, 67d23'12.54"N)
Upper Right ( 699400.000, 7560400.000) ( 19d47'31.12"E, 68d 5'16.24"N)
Lower Right ( 699400.000, 7476800.000) ( 19d38'30.00"E, 67d20'25.87"N)
Center ( 655000.000, 7518600.000) ( 18d40' 6.86"E, 67d44'28.15"N)
Band 1 Block=4440x128 Type=Byte, ColorInterp=Red
Computed Min/Max=0.000,255.000
Band 2 Block=4440x128 Type=Byte, ColorInterp=Green
Computed Min/Max=0.000,255.000
Band 3 Block=4440x128 Type=Byte, ColorInterp=Blue
Computed Min/Max=0.000,255.000
Band 4 Block=4440x128 Type=Byte, ColorInterp=Undefined
Computed Min/Max=0.000,255.000
$ gdal_translate -stats SPOT4_HRVIR2_046-209_2_080911_110453.tif output.tif
Input file size is 4440, 4180
0...10...20...30...40...50...60...70...80...90...100 - done.
$ gdalinfo output.tif
Driver: GTiff/GeoTIFF
Files: output.tif
Size is 4440, 4180
Coordinate System is:
PROJCS["SWEREF99 TM",
GEOGCS["SWEREF99",
DATUM["SWEREF99",
SPHEROID["GRS 1980",6378137,298.2572221010002,
AUTHORITY["EPSG","7019"]],
AUTHORITY["EPSG","6619"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4619"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",15],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AUTHORITY["EPSG","3006"]]
Origin = (610600.000000000000000,7560400.000000000000000)
Pixel Size = (20.000000000000000,-20.000000000000000)
Metadata:
AREA_OR_POINT=Point
TIFFTAG_COPYRIGHT=Lantm?teriet
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 610600.000, 7560400.000) ( 17d39'42.97"E, 68d 8' 9.21"N)
Lower Left ( 610600.000, 7476800.000) ( 17d34'41.45"E, 67d23'12.54"N)
Upper Right ( 699400.000, 7560400.000) ( 19d47'31.12"E, 68d 5'16.24"N)
Lower Right ( 699400.000, 7476800.000) ( 19d38'30.00"E, 67d20'25.87"N)
Center ( 655000.000, 7518600.000) ( 18d40' 6.86"E, 67d44'28.15"N)
Band 1 Block=4440x1 Type=Byte, ColorInterp=Red
Min=0.000 Max=255.000
Minimum=0.000, Maximum=255.000, Mean=34.493, StdDev=40.536
Mask Flags: PER_DATASET ALPHA
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=34.493035637312
STATISTICS_MINIMUM=0
STATISTICS_STDDEV=40.53643151195
Band 2 Block=4440x1 Type=Byte, ColorInterp=Green
Min=0.000 Max=255.000
Minimum=0.000, Maximum=255.000, Mean=34.832, StdDev=42.388
Mask Flags: PER_DATASET ALPHA
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=34.832346814518
STATISTICS_MINIMUM=0
STATISTICS_STDDEV=42.388441124655
Band 3 Block=4440x1 Type=Byte, ColorInterp=Blue
Min=0.000 Max=255.000
Minimum=0.000, Maximum=255.000, Mean=44.688, StdDev=49.705
Mask Flags: PER_DATASET ALPHA
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=44.688375576533
STATISTICS_MINIMUM=0
STATISTICS_STDDEV=49.70528306057
Band 4 Block=4440x1 Type=Byte, ColorInterp=Alpha
Min=0.000 Max=255.000
Minimum=0.000, Maximum=255.000, Mean=54.075, StdDev=60.230
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=54.075123227294
STATISTICS_MINIMUM=0
STATISTICS_STDDEV=60.229994798131