attribute: Phillie Casablanca

OSMデータから他言語をpostgresにロードする方法



ubuntu 14.04を使用



Background


最近、マップ上にデータを表示するためのマップほしいと思ってきて、時間があったので、作ろうと決意した。


検索した結果、やっぱり、同じことを考えている人がいた!
https://www.mapbox.com/blog/designing-minimalist-openstreetmap-baselayer/


以前のプロジェクト、オフラインのマップサーバを作るために、OpenStreetMap (OSM)のデータをダウンロードして ( http://download.geofabrik.de/asia/japan.html )、使って、mapboxのTileMill ( https://www.mapbox.com/tilemill/ ) にロード、マップサーバのタイルを作成した経験があった。


上記のブログをみるとそのように作っているようですが、結果のデザインをGITHUBなどで残していないようだ。
よんでみると、他人のOSM Bright ( https://github.com/mapbox/osm-bright )をテンプレートとして作ったようだが、
あっちこっち色の設定が多いので、探してみたら、すでに白黒のものがあった!


https://github.com/aaronlidman/Toner-for-Tilemill


これが、オンラインマップデザインの王様Stamen Designの"toner" ( http://maps.stamen.com/toner/#12/37.7706/-122.3782 )を真似してつくったようだ。
使いたい目的には、ちょっと暗いが、これだったら、色調整が少なくて済むと思って、Toner For Tilemillをベースにして進んでいる。


さって、調整していくと気がついた。都市名がすべてローマ字。。。




OSMデータからマップ上に日本語の表示へ


OSMのデータをみてみると、日本語が入っているしているようだ。


とりあえず、準備する方法を下記


OSMデータを http://download.geofabrik.de/asia/japan.html からダウンロードしたら、osm2pgsqlを準備する。


https://github.com/openstreetmap/osm2pgsql


osm2pgsqlをGITHUBからPull:
(まだ、gitがインストールされていない場合、sudo apt-get install gitでインストール)


git clone https://github.com/openstreetmap/osm2pgsql.git


そこで”default.style”をコピーして、編集:


cd osm2pgsql
cp default.style osm2pgsql.style


osm2pgsql.style内に、下記の行のしたに



node,way name text linear

これを追加:



node,way name:ja text linear


Note


OSM tagについて下記のリンクへ参照
http://wiki.openstreetmap.org/wiki/JA:Map_Features



修正が終わったら、dbを作って、DBへロードする:



sudo -u postgres pgsql -c "CREATE DATABASE osm_db;"
sudo -u postgres pgsql -d "osm_db" -c "CREATE EXTENSION postgis;"

sudo -u postgres osm2pgsql -c -G -S ./osm2pgsql.style -d osm_db ~/maps/japan/japan-latest.osm.pbf -C 22000


これでDBに必要なデータがロードされる。
しかし、これを地図上に使うなら、簡単には、Tilemill内で、DBレーヤを追加することなんだが、
気がついたら、Toner for Tilemillには、都市名がきれいに表示されるように、既に処理して、ズームレベルごとに、SHPファイルに用意してある。


つまり、都市名が重なってしまう。


同じようなきれいな表示するなら、Dymo ( https://github.com/migurski/Dymo/ )があるようだ。
それを使って、結果のgeojsonを作ってみたんだが、なぜかファイルをロードしてくれない。。。


ちなみに、osm2pgsqlで作ったDBからdymoの入力CSVを作成してくれるスクリプトをつくってみた:


https://github.com/monkut/dymo_label_prep


(そのスクリプトにあるSQLQueryをレーヤとして、できれば、日本語レベルの表示問題が解決されるかも。。。。)



monkut // Aug. 23, 2014 // 10:42 a.m.

The New Open Data Revolution?



For years Tim O'Reilly has been saying that "Data is the Intel Inside". When I first heard this I didn't understand, but gradually I came to see the importance and strength of data, but from the business perspective. Companies, like google, use, process and transform data into something that provides value.


I've just read, "Day the Universe Changed", and I'm now seeing the statement in a new light. Since, O'Reilly's original Web 2.0 Manifesto, not only has the value of data become more realized, but data has also become more open. Companies like EveryBlock have lead the opening and availablity of public data, and even data.gov, a portal for government data, has been created.


What does this opening of data mean? It could be more than any of us can comprehend. I see this as a new opening of knowledge to the public. Just as the Gutenburg's printing press solidified "fact" and moved knowledge and awareness from elders and institutions of authority, open data extends the avaliable public knowledge.


With this tsunami of data, the public needs the knowledge to understand it. In the pre-printing press world, knowledge was spread by word of mouth, there was no choice but to understand and accept the world as we and our community perceived it. The printing press and the written word extended our perception of the world, so that we could picture a world beyond that of our life and our local social circle, increasing the resolution of our perception of the world. Now, that resolution is set to increase again.


With the availbility of data, Governments, analysts, and experts are no longer the solitary keepers of knowledge. When this happens data literacy becomes more important. The public will need basic statistical knowledge in order for meaningful conversation to occur. As data becomes more open, the number of "experts" will explode, and some of these new "experts" will, not unlike authority figures of the past, take advantage of those who lack the basic "data literacy" skills. While I suspect schools are struggling to teach basic math skills, in order to start these meaningful conversations, our students need to have a minimal level of data comprehension.


With the spread of data literacy, we may move from rehtorical debates where the keys of persuasion are emotional to debates of data analysis and validaty. These data debates are being had, but until now it was only by the keepers of the data.


I urge academics and researchers to open and share data and data keeping methods. I sense that there still persists a sense of ownership in the academic community. Understand with the opening of data, and more examination and discussion we are one step closer to a better understanding of the world around us.


monkut // Feb. 16, 2013 // 10:10 a.m.


Understanding raster, basic GIS concepts and the python gdal library



I've been learning about how to handle *raster* files and getting comfortable with the python libraries that are available to manipulate them.
(I'm still learning, but hopefully this will help someone like me who is just getting started)

First thing you need to know about common *decimal degree* latitude/longitude notation is that it is **NOT** a linear representation, but *angular* (defining a specific point on a sphere-like shape). So performing `cartisian coordinate <http://en.wikipedia.org/wiki/Coordinate_ssytem>`_ (where the distance from (1,1) to (1,2) and the distance from (1,1) to (1,0) is an equal distance) operations is going to give you very accurate results.

Next, the common *decimal degree* latitude/longitude values you see use a specific **geographic coordinate system** (Commonly *WGS84*). In addition to **geographic**, **projected coordinate systems** are also used.

This quote from the `OGR Projections Tutorial <http://www.gdal.org/ogr/osr_tutorial.html>`_ helped me to understand what's going on here:

>> There are two primary kinds of coordinate systems. The first is **geographic** (positions are measured in *long/lat*) and the second is **projected** (such as UTM - positions are measured in *meters* or *feet*).

>> A **projected** coordinate system (such as UTM, Lambert Conformal Conic, etc) requires an underlying **geographic** coordinate system as well as a definition for the *projection transform* used to translate between linear positions (in meters or feet) and angular long/lat positions.

In my examples here I'll be using **UTM** for the *projected* coordinate system and **WGS84** for the *geographic*. I'll be calling the "projection transform", *geo_transform* in my code below.

.. note::

UTM is split into multiple *zones* and may be difficult to work with in places where you will be spanning zones. The "Google" projection, *900913*, may be better for simple manipulations.

First some simple coordinate conversion. When doing manipulation of data you're going to need to do some conversions, here's how to do it with the python `GDAL <http://pypi.python.org/pypi/GDAL/>`_ package:

import osr

def transform_utm_to_wgs84(easting, northing, zone):
utm_coordinate_system = osr.SpatialReference()
utm_coordinate_system.SetWellKnownGeogCS("WGS84") # Set geographic coordinate system to handle lat/lon
is_northern = northing > 0
utm_coordinate_system.SetUTM(zone, is_northern)

wgs84_coordinate_system = utm_coordinate_system.CloneGeogCS() # Clone ONLY the geographic coordinate system

# create transform component
utm_to_wgs84_geo_transform = osr.CoordinateTransformation(utm_coordinate_system, wgs84_coordinate_system) # (, )
return utm_to_wgs84_geo_transform.TransformPoint(easting, northing, 0) # returns lon, lat, altitude

def transform_wgs84_to_utm(lon, lat):
def get_utm_zone(longitude):
return (int(1+(longitude+180.0)/6.0))


def is_northern(latitude):
"""
Determines if given latitude is a northern for UTM
"""
if (latitude < 0.0):
return 0
else:
return 1

utm_coordinate_system = osr.SpatialReference()
utm_coordinate_system.SetWellKnownGeogCS("WGS84") # Set geographic coordinate system to handle lat/lon
utm_coordinate_system.SetUTM(get_utm_zone(lon), is_northern(lat))

wgs84_coordinate_system = utm_coordinate_system.CloneGeogCS() # Clone ONLY the geographic coordinate system

# create transform component
wgs84_to_utm_geo_transform = osr.CoordinateTransformation(wgs84_coordinate_system, utm_coordinate_system) # (, )
return wgs84_to_utm_geo_transform.TransformPoint(lon, lat, 0) # returns easting, northing, altitude

As you may have noticed, *easting* and *northing* are the *UTM* terms for *x*, and *y*.

Now, why would you want to use *rasters*? Well, rasters are used as a way to aggregate data in a given area. I've commonly called this "*binning*", but in raster terms the "*bin*" elements of a raster are refered to as a "**cell**" or "**pixel**". The *raster* structure/object model in gdal is like this:

- A *raster* consists of a *dataset*.

- A *dataset* consists of one or more *bands*.

- A *band* contains multiple *cells*/*pixels*/*bins*.

- A *cell*/*pixel*/*bin* is a location with a *value*.

Now for the code. I think there may be a good way to create a *RasterBuilder* class to handle the raster functions, but I'll leave that to you.

/

import numpy
from osgeo import gdal
from osgeo import osr
gdal.AllRegister() # register all drivers
gdal.UseExceptions()
from shapely.geometry import MultiPoint # used to get extend/bounds

def get_point_bounds(points):
mp_points = MultiPoint(points) #from shapely.geometry import MultiPoint
(min_lon, min_lat, max_lon, max_lat) = mp_points.bounds
return min_lon, min_lat, max_lon, max_lat

def get_raster_size(minx, miny, maxx, maxy, cell_width, cell_height):
"""
Determine the number of rows/columns given the bounds of the point data and the desired cell size
"""
cols = int((maxx - minx) / cell_width)
rows = int((maxy - miny) / abs(cell_height))
return cols, rows

def lonlat_to_pixel(lon, lat, inverse_geo_transform):
"""
Translates the given lon, lat to the grid pixel coordinates in data array (zero start)
"""
# transform to utm
utm_x, utm_y, utm_z = transform_wgs84_to_utm(lon, lat)

# apply inverse geo tranform
pixel_x, pixel_y = gdal.ApplyGeoTransform(inverse_geo_transform, utm_x, utm_y)
pixel_x = int(pixel_x) - 1 # adjust to 0 start for array
pixel_y = int(pixel_y) - 1 # adjust to 0 start for array

return pixel_x, abs(pixel_y) # y pixel is likly a negative value given geo_transform

def create_raster(points, filename="test.tiff"):
"""
points == { (lon, lat): value, ...}
lon/lat values in WGS84
"""
# create empty raster
driver = gdal.GetDriverByName(OUTPUT_FORMAT)
number_of_bands = 1
band_type = gdal.GDT_Float32

x_rotation = 0
y_rotation = 0
cell_width_meters = 50.0
cell_height_meters = 50.0

(min_lon, min_lat, max_lon, max_lat) = get_point_bounds(points) # retrieve bounds for point data

srs = osr.SpatialReference()
srs.SetWellKnownGeogCS("WGS84") # Set geographic coordinate system to handle lat/lon
srs.SetUTM( get_utm_zone(min_lon), is_northern(max_lat)) # Set projected coordinate system to handle meters

# create transforms for point conversion
wgs84_coordinate_system = srs.CloneGeogCS() # clone only the geographic coordinate system


wgs84_to_utm_transform = osr.CoordinateTransformation(wgs84_coordinate_system, srs)
utm_to_wgs84_transform = osr.CoordinateTransformation(srs, wgs84_coordinate_system)

# convert to UTM
top_left_x, top_left_y, z = transform_wgs84_to_utm(min_lon, max_lat)
lower_right_x, lower_right_y, z = transform_wgs84_to_utm(max_lon, min_lat)

cols, rows = get_raster_size(top_left_x,
lower_right_y,
lower_right_x,
top_left_y,
cell_width_meters,
cell_height_meters)
dataset = driver.Create(filename, cols, rows, number_of_bands, band_type)

# GeoTransform parameters
# --> need to know the area that will be covered to define the geo tranform
# top left x, w-e pixel resolution, rotation, top left y, rotation, n-s pixel resolution
geo_transform = [ top_left_x, cell_width_meters, x_rotation, top_left_y, y_rotation, -cell_height_meters ] # cell height must be negative (-) to apply image space to map
dataset.SetGeoTransform(geo_transform)
dataset.SetProjection(srs.ExportToWkt())
inverse_geo_transform = gdal.InvGeoTransform(self.geo_transform)[1] # for mapping lat/lon to pixel

# get the empty raster data array
band = dataset.GetRasterBand(1) # 1 == band index value
data = band.ReadAsArray(0, 0, cols, rows).astype(numpy.cfloat)

# populate array values for output
for lon, lat in points.keys():
value = points[(lon, lat)]
# apply value to array
pixel_x, pixel_y = lonlat_to_pixel(lon, lat, inverse_geo_transform)
data[pixel_x][pixel_y] = value


# write the updated data array to file
band.WriteArray(data, 0, 0)
band.SetNoDataValue(NULL_VALUE)
band.FlushCache()

# set dataset to None to "close" file
dataset = None


monkut // May 2, 2012 // 6:15 p.m.

Using Django 1.4 with GDAL 1.9 and PostGIS-2.0



Recently both Django 1.4 and PostGIS-2.0 were released and I needed to setup a new development environment.


However, since the release dates of both were pretty close, it looks like django-1.4 didn't manage to get support for Postgis-2.0 into the 1.4 release and the following error occures:



ERROR: operator class "gist_geometry_ops" does not accept access method "gist"

Luckily this is already identified as an issue. (See Ticket #16455 )


While the patch hasn't been accepted yet, it does appear to work for me, so at least you can use geometry columns in PostGIS-2.0 when you apply the patch.


Enable PostGIS-2.0



  1. Install Postgres 9.1 then install PostGIS-2.0

  2. Apply the patch to django.contrib.gis.db.backends.postgis.creation described in Ticket #16455

  3. Check/Set the GDAL_DATA environment variable is set to C:\Program Files (x86)\PostgreSQL\9.1\gdal-data

  4. Check/Set the PROJ_LIB environment variable is set to C:\Program Files (x86)\PostgreSQL\9.1\share\contrib\postgis\proj

  5. Check/Set the PROJ_SO environment variable is set to C:\Program Files (x86)\PostgreSQL\9.1\bin\libproj-0.dll



Note


The gdal-data folder is ONLY installed with postgis-2.0, and is not found in postgis-1.5.
This folder is needed to perform projection translations and the python GDAL library also needs this information.



Once this is done you should be able to create/syncdb for models that contain geometry fields.


Enable GDAL-1.9.0:


Almost every time I setup or someone in my team sets up a new development environment someone has an issue getting GDAL to work. In my case I need some of the raster features available in the stock gdal package which are NOT available in the django version (see note). So I nned to install the standard python GDAL package in pypi.


Getting this to work with django was a long process of trial and error. Here's what I did (on win7):



  1. Install GDAL-1.9




  2. Next, create the directory C:\gdallibs.



  3. Then copy all the DLLs that were installed in C:\Python27\Lib\site-packages\osgeo to C:\gdallibs



  4. Add C:\gdallibs to the PATH environment variable.



    • Once added to your PATH django should find the libraries.



  5. django-1.4 does not yet have support for gdal-1.9.0 and cannot find the 1.9 library. Overwrite the local django.contrib.gis.gdal.libgdal with this updated version libgdal.py




    • This adds gdal1.9 to the library search and also allows configuration of GDAL_LIBRARY_PATH as an environment variable. (Although, I found it was not needed once C:\gdallibs was created and added to the PATH.




  6. Confirm that gdal.HAS_GDAL+ and *gdal.GEOJSON are True:



    >>> from django.contrib.gis import gdal
    >>> gdal.HAS_GDAL
    True
    >>> gdal.GEOJSON
    True




Note


You may need to open a NEW console for the updated PATH environment variable to take effect.



monkut // May 2, 2012 // 4:45 p.m.

sqlite, zlib not working on centos



Playing around with centos, and have been working to get python 2.6 installed.
(Centos defaults to 2.4)

I managed to download and install python 2.6 using:


sudo ./configure
sudo ./make altinstall

However, when I tried to install additional packages I hit a zlib error.
I remembered that sqlite3 was also an additional package and checked and that failed as well.


>>> import zlib
ImportError: No module named zlib

>>> import sqlite3
ImportError: No module named _sqlite3

I found this post which mentioned zlib-devel, and found that installing that and sqlite-devel did the job.

So just run:


sudo yum install zlib-devel
sudo yum install sqlite-devel

# change to python directory
cd Python*
sudo ./configure
sudo make altinstall

And now both modules can be loaded properly.
altinstall will automatically install python to be python2.6.
This allows you to maintain the 2.4 build that centos needs for various system tools.


monkut // June 23, 2010 // 12:37 a.m.