attribute: Phillie Casablanca

The New Open Data Revolution?



For years Tim O'Reilly has been saying that "Data is the Intel Inside". When I first heard this I didn't understand, but gradually I came to see the importance and strength of data, but from the business perspective. Companies, like google, use, process and transform data into something that provides value.


I've just read, "Day the Universe Changed", and I'm now seeing the statement in a new light. Since, O'Reilly's original Web 2.0 Manifesto, not only has the value of data become more realized, but data has also become more open. Companies like EveryBlock have lead the opening and availablity of public data, and even data.gov, a portal for government data, has been created.


What does this opening of data mean? It could be more than any of us can comprehend. I see this as a new opening of knowledge to the public. Just as the Gutenburg's printing press solidified "fact" and moved knowledge and awareness from elders and institutions of authority, open data extends the avaliable public knowledge.


With this tsunami of data, the public needs the knowledge to understand it. In the pre-printing press world, knowledge was spread by word of mouth, there was no choice but to understand and accept the world as we and our community perceived it. The printing press and the written word extended our perception of the world, so that we could picture a world beyond that of our life and our local social circle, increasing the resolution of our perception of the world. Now, that resolution is set to increase again.


With the availbility of data, Governments, analysts, and experts are no longer the solitary keepers of knowledge. When this happens data literacy becomes more important. The public will need basic statistical knowledge in order for meaningful conversation to occur. As data becomes more open, the number of "experts" will explode, and some of these new "experts" will, not unlike authority figures of the past, take advantage of those who lack the basic "data literacy" skills. While I suspect schools are struggling to teach basic math skills, in order to start these meaningful conversations, our students need to have a minimal level of data comprehension.


With the spread of data literacy, we may move from rehtorical debates where the keys of persuasion are emotional to debates of data analysis and validaty. These data debates are being had, but until now it was only by the keepers of the data.


I urge academics and researchers to open and share data and data keeping methods. I sense that there still persists a sense of ownership in the academic community. Understand with the opening of data, and more examination and discussion we are one step closer to a better understanding of the world around us.


monkut // Feb. 16, 2013 // 10:10 a.m.

Understanding raster, basic GIS concepts and the python gdal library



I've been learning about how to handle *raster* files and getting comfortable with the python libraries that are available to manipulate them.
(I'm still learning, but hopefully this will help someone like me who is just getting started)

First thing you need to know about common *decimal degree* latitude/longitude notation is that it is **NOT** a linear representation, but *angular* (defining a specific point on a sphere-like shape). So performing `cartisian coordinate <http://en.wikipedia.org/wiki/Coordinate_ssytem>`_ (where the distance from (1,1) to (1,2) and the distance from (1,1) to (1,0) is an equal distance) operations is going to give you very accurate results.

Next, the common *decimal degree* latitude/longitude values you see use a specific **geographic coordinate system** (Commonly *WGS84*). In addition to **geographic**, **projected coordinate systems** are also used.

This quote from the `OGR Projections Tutorial <http://www.gdal.org/ogr/osr_tutorial.html>`_ helped me to understand what's going on here:

>> There are two primary kinds of coordinate systems. The first is **geographic** (positions are measured in *long/lat*) and the second is **projected** (such as UTM - positions are measured in *meters* or *feet*).

>> A **projected** coordinate system (such as UTM, Lambert Conformal Conic, etc) requires an underlying **geographic** coordinate system as well as a definition for the *projection transform* used to translate between linear positions (in meters or feet) and angular long/lat positions.

In my examples here I'll be using **UTM** for the *projected* coordinate system and **WGS84** for the *geographic*. I'll be calling the "projection transform", *geo_transform* in my code below.

.. note::

UTM is split into multiple *zones* and may be difficult to work with in places where you will be spanning zones. The "Google" projection, *900913*, may be better for simple manipulations.

First some simple coordinate conversion. When doing manipulation of data you're going to need to do some conversions, here's how to do it with the python `GDAL <http://pypi.python.org/pypi/GDAL/>`_ package:

import osr

def transform_utm_to_wgs84(easting, northing, zone):
utm_coordinate_system = osr.SpatialReference()
utm_coordinate_system.SetWellKnownGeogCS("WGS84") # Set geographic coordinate system to handle lat/lon
is_northern = northing > 0
utm_coordinate_system.SetUTM(zone, is_northern)

wgs84_coordinate_system = utm_coordinate_system.CloneGeogCS() # Clone ONLY the geographic coordinate system

# create transform component
utm_to_wgs84_geo_transform = osr.CoordinateTransformation(utm_coordinate_system, wgs84_coordinate_system) # (, )
return utm_to_wgs84_geo_transform.TransformPoint(easting, northing, 0) # returns lon, lat, altitude

def transform_wgs84_to_utm(lon, lat):
def get_utm_zone(longitude):
return (int(1+(longitude+180.0)/6.0))


def is_northern(latitude):
"""
Determines if given latitude is a northern for UTM
"""
if (latitude < 0.0):
return 0
else:
return 1

utm_coordinate_system = osr.SpatialReference()
utm_coordinate_system.SetWellKnownGeogCS("WGS84") # Set geographic coordinate system to handle lat/lon
utm_coordinate_system.SetUTM(get_utm_zone(lon), is_northern(lat))

wgs84_coordinate_system = utm_coordinate_system.CloneGeogCS() # Clone ONLY the geographic coordinate system

# create transform component
wgs84_to_utm_geo_transform = osr.CoordinateTransformation(wgs84_coordinate_system, utm_coordinate_system) # (, )
return wgs84_to_utm_geo_transform.TransformPoint(lon, lat, 0) # returns easting, northing, altitude

As you may have noticed, *easting* and *northing* are the *UTM* terms for *x*, and *y*.

Now, why would you want to use *rasters*? Well, rasters are used as a way to aggregate data in a given area. I've commonly called this "*binning*", but in raster terms the "*bin*" elements of a raster are refered to as a "**cell**" or "**pixel**". The *raster* structure/object model in gdal is like this:

- A *raster* consists of a *dataset*.

- A *dataset* consists of one or more *bands*.

- A *band* contains multiple *cells*/*pixels*/*bins*.

- A *cell*/*pixel*/*bin* is a location with a *value*.

Now for the code. I think there may be a good way to create a *RasterBuilder* class to handle the raster functions, but I'll leave that to you.

/

import numpy
from osgeo import gdal
from osgeo import osr
gdal.AllRegister() # register all drivers
gdal.UseExceptions()
from shapely.geometry import MultiPoint # used to get extend/bounds

def get_point_bounds(points):
mp_points = MultiPoint(points) #from shapely.geometry import MultiPoint
(min_lon, min_lat, max_lon, max_lat) = mp_points.bounds
return min_lon, min_lat, max_lon, max_lat

def get_raster_size(minx, miny, maxx, maxy, cell_width, cell_height):
"""
Determine the number of rows/columns given the bounds of the point data and the desired cell size
"""
cols = int((maxx - minx) / cell_width)
rows = int((maxy - miny) / abs(cell_height))
return cols, rows

def lonlat_to_pixel(lon, lat, inverse_geo_transform):
"""
Translates the given lon, lat to the grid pixel coordinates in data array (zero start)
"""
# transform to utm
utm_x, utm_y, utm_z = transform_wgs84_to_utm(lon, lat)

# apply inverse geo tranform
pixel_x, pixel_y = gdal.ApplyGeoTransform(inverse_geo_transform, utm_x, utm_y)
pixel_x = int(pixel_x) - 1 # adjust to 0 start for array
pixel_y = int(pixel_y) - 1 # adjust to 0 start for array

return pixel_x, abs(pixel_y) # y pixel is likly a negative value given geo_transform

def create_raster(points, filename="test.tiff"):
"""
points == { (lon, lat): value, ...}
lon/lat values in WGS84
"""
# create empty raster
driver = gdal.GetDriverByName(OUTPUT_FORMAT)
number_of_bands = 1
band_type = gdal.GDT_Float32

x_rotation = 0
y_rotation = 0
cell_width_meters = 50.0
cell_height_meters = 50.0

(min_lon, min_lat, max_lon, max_lat) = get_point_bounds(points) # retrieve bounds for point data

srs = osr.SpatialReference()
srs.SetWellKnownGeogCS("WGS84") # Set geographic coordinate system to handle lat/lon
srs.SetUTM( get_utm_zone(min_lon), is_northern(max_lat)) # Set projected coordinate system to handle meters

# create transforms for point conversion
wgs84_coordinate_system = srs.CloneGeogCS() # clone only the geographic coordinate system


wgs84_to_utm_transform = osr.CoordinateTransformation(wgs84_coordinate_system, srs)
utm_to_wgs84_transform = osr.CoordinateTransformation(srs, wgs84_coordinate_system)

# convert to UTM
top_left_x, top_left_y, z = transform_wgs84_to_utm(min_lon, max_lat)
lower_right_x, lower_right_y, z = transform_wgs84_to_utm(max_lon, min_lat)

cols, rows = get_raster_size(top_left_x,
lower_right_y,
lower_right_x,
top_left_y,
cell_width_meters,
cell_height_meters)
dataset = driver.Create(filename, cols, rows, number_of_bands, band_type)

# GeoTransform parameters
# --> need to know the area that will be covered to define the geo tranform
# top left x, w-e pixel resolution, rotation, top left y, rotation, n-s pixel resolution
geo_transform = [ top_left_x, cell_width_meters, x_rotation, top_left_y, y_rotation, -cell_height_meters ] # cell height must be negative (-) to apply image space to map
dataset.SetGeoTransform(geo_transform)
dataset.SetProjection(srs.ExportToWkt())
inverse_geo_transform = gdal.InvGeoTransform(self.geo_transform)[1] # for mapping lat/lon to pixel

# get the empty raster data array
band = dataset.GetRasterBand(1) # 1 == band index value
data = band.ReadAsArray(0, 0, cols, rows).astype(numpy.cfloat)

# populate array values for output
for lon, lat in points.keys():
value = points[(lon, lat)]
# apply value to array
pixel_x, pixel_y = lonlat_to_pixel(lon, lat, inverse_geo_transform)
data[pixel_x][pixel_y] = value


# write the updated data array to file
band.WriteArray(data, 0, 0)
band.SetNoDataValue(NULL_VALUE)
band.FlushCache()

# set dataset to None to "close" file
dataset = None


monkut // May 2, 2012 // 6:15 p.m.


Using Django 1.4 with GDAL 1.9 and PostGIS-2.0



Recently both Django 1.4 and PostGIS-2.0 were released and I needed to setup a new development environment.


However, since the release dates of both were pretty close, it looks like django-1.4 didn't manage to get support for Postgis-2.0 into the 1.4 release and the following error occures:



ERROR: operator class "gist_geometry_ops" does not accept access method "gist"

Luckily this is already identified as an issue. (See Ticket #16455 )


While the patch hasn't been accepted yet, it does appear to work for me, so at least you can use geometry columns in PostGIS-2.0 when you apply the patch.


Enable PostGIS-2.0



  1. Install Postgres 9.1 then install PostGIS-2.0

  2. Apply the patch to django.contrib.gis.db.backends.postgis.creation described in Ticket #16455

  3. Check/Set the GDAL_DATA environment variable is set to C:\Program Files (x86)\PostgreSQL\9.1\gdal-data

  4. Check/Set the PROJ_LIB environment variable is set to C:\Program Files (x86)\PostgreSQL\9.1\share\contrib\postgis\proj

  5. Check/Set the PROJ_SO environment variable is set to C:\Program Files (x86)\PostgreSQL\9.1\bin\libproj-0.dll



Note


The gdal-data folder is ONLY installed with postgis-2.0, and is not found in postgis-1.5.
This folder is needed to perform projection translations and the python GDAL library also needs this information.



Once this is done you should be able to create/syncdb for models that contain geometry fields.


Enable GDAL-1.9.0:


Almost every time I setup or someone in my team sets up a new development environment someone has an issue getting GDAL to work. In my case I need some of the raster features available in the stock gdal package which are NOT available in the django version (see note). So I nned to install the standard python GDAL package in pypi.


Getting this to work with django was a long process of trial and error. Here's what I did (on win7):



  1. Install GDAL-1.9




  2. Next, create the directory C:\gdallibs.



  3. Then copy all the DLLs that were installed in C:\Python27\Lib\site-packages\osgeo to C:\gdallibs



  4. Add C:\gdallibs to the PATH environment variable.



    • Once added to your PATH django should find the libraries.



  5. django-1.4 does not yet have support for gdal-1.9.0 and cannot find the 1.9 library. Overwrite the local django.contrib.gis.gdal.libgdal with this updated version libgdal.py




    • This adds gdal1.9 to the library search and also allows configuration of GDAL_LIBRARY_PATH as an environment variable. (Although, I found it was not needed once C:\gdallibs was created and added to the PATH.




  6. Confirm that gdal.HAS_GDAL+ and *gdal.GEOJSON are True:



    >>> from django.contrib.gis import gdal
    >>> gdal.HAS_GDAL
    True
    >>> gdal.GEOJSON
    True




Note


You may need to open a NEW console for the updated PATH environment variable to take effect.



monkut // May 2, 2012 // 4:45 p.m.

sqlite, zlib not working on centos



Playing around with centos, and have been working to get python 2.6 installed.
(Centos defaults to 2.4)

I managed to download and install python 2.6 using:


sudo ./configure
sudo ./make altinstall

However, when I tried to install additional packages I hit a zlib error.
I remembered that sqlite3 was also an additional package and checked and that failed as well.


>>> import zlib
ImportError: No module named zlib

>>> import sqlite3
ImportError: No module named _sqlite3

I found this post which mentioned zlib-devel, and found that installing that and sqlite-devel did the job.

So just run:


sudo yum install zlib-devel
sudo yum install sqlite-devel

# change to python directory
cd Python*
sudo ./configure
sudo make altinstall

And now both modules can be loaded properly.
altinstall will automatically install python to be python2.6.
This allows you to maintain the 2.4 build that centos needs for various system tools.


monkut // June 23, 2010 // 12:37 a.m.

creating pdfs with Japnaese text using sphinx/pdflatex



This describes the process for preparing a setup on windows that can create PDF containing Japanese from sphinx output *.tex file.

Installation Steps
====================

1. Create C:\w32tex directory

2. Create C:\w32tex\archivedpackages directory

3. Download "texinst757.zip" from http://www.fsci.fuk.kindai.ac.jp/kakuto/win32-ptex/web2c75.html

4. Unzip "texinst757.zip" into C:\w32tex

5. Download ALL files from the "最小インストール"(Minimal Install) and "標準インストール" (Standard install) lists and place them in C:\w32tex\archivedpackages

6. In addition download the following packages from the "フルインストール" (Full Install) list
- ums.tar.gz
- omegaj-w32.tar.gz
- ttf2pt1-w32.tar.bz2
- utf.tar.gz -- not sure if it's needed
- uptex-w32.tar.bz2 -- not sure if it's needed

7. Download and install ghostscript ftp://akagi.ms.u-tokyo.ac.jp/pub/TeX/win32-gs/gs863w32full-gpl.zip
1. Check the "Use Windows TrueType fonts for Chinese, Japanese and Korean" option and Install into C:\w32tex\gs

8. Download and install IPA fonts from http://lx1.avasys.jp/OpenPrintingProject/openprinting-jp-0.1.3.tar.gz
1. Drag and drop the *.ttf files found in the archive to the Windows/Fonts directory.
- this will install the fonts on the pc

9. Run the following command:

texinst757.exe C:\w32tex\archivedpackages

NOTE: This will take some time

10. Add the following to your system path:

- C:\w32tex\bin
- C:\w32tex\gs\gs8.63\bin
- C:\w32tex\gs\gs8.63\lib

11. Download ftp://cam.ctan.org/tex-archive/macros/latex/contrib/titlesec.zip, unzip and copy the titlesec folder to C:\w32tex\share\texmf\latex\

12. Run the following command (You may need to restart your console in order to take the new path settings into effect)
- This command will search and update fonts/styles (*i think*) that were added in step 11.

texhash

Done!

Converting Sphinx *.tex output to PDF
========================================

Now that you have w32tex installed, you still need to adjust some of your process in order to create a pdf properly.
Note: I'm using sphinx 0.6.1

1. In your conf.py add/uncomment the latex_elements and update to look like this:

latex_elements = {
'preamble': '\usepackage{ums}\input jpdftextounicode\pdfgentounicode=1',
'inputenc': '',
'fontenc': '',
'fontpkg': '',
}

2. Build your *.tex file using sphinx.
- My source files are utf8, and I believe sphinx outputs the *.tex to utf8.

3. Convert the output text to cp932 encoding
- Ideally, topdftex.exe/pdflatex.exe support utf8, but I haven't figured this out, so this is currently a workaround step

import codecs
import shutil
utf8_f = codecs.open("sphinx_output.tex", "rb", "utf8")
cp932_f = codecs.open('sphinx_output.cp932.tex', 'wb', 'cp923')
cp932_f.write(utf8_f.read())
utf8_f.close()
cp932_f.close()

4. Run topdftex.exe on the converted output, renaming to the orignal *.tex name.

topdftex.exe sphinx_output.cp932.tex sphinx_output.tex

5. Run pdflatex.exe on the final output, sphinx_output.tex.

It was quite a struggle to get this far, I hope this helps others.


monkut // May 9, 2010 // 9:40 p.m.