Toss the dwarf or pick up the axe

March 26th, 2007
BTW, a member of the ANSI C committee once told me that the only thing rand is used for in C code is to decide whether to pick up the axe or throw the dwarf, and if that’s true I guess “the typical libc rand” is adequate for all but the most fanatic of gamers . Tim Peters. 21 June 1997

import ogr
import osr
import random

ds = ogr.Open(r'c:\hobu\shapefiles\data.shp', 1)

layer = ds.GetLayer(0)
count = layer.GetFeatureCount()
for i in range(count):
    feature = layer.GetFeature(i)
    fudge = random.randint(0,10000)
    geometry = feature.GetGeometryRef()
    gcount = geometry.GetGeometryCount()

    for j in range(gcount):
        g = geometry.GetGeometryRef(j)
        pcount = g.GetPointCount()

        for p in range(pcount):
            x,y = g.GetX(p), g.GetY(p)
            g.SetPoint(p,x+fudge,y+fudge,0)

    layer.SetFeature(feature)

ds.Destroy()

ArcSDE developments

March 11th, 2007

The last two months have also been all-ArcSDE, all-the-time for me, as I’ve worked on three development efforts that have focused on integrating the ESRI database abstraction technology with Open Source technologies.

PySDE

In 2002-2003, I developed a SWIG’ified wrapper to the ESRI ArcSDE C API called PySDE that wrapped up the library and allowed its use via Python. I’ll be the first to admit that it was ugly as hell, but its development was my first introduction to the Open Source GIS community, and although there were some good ideas in it, to be truly useful and sustainable the thing needed to be rewritten.

I’ve been noticing more and more posts on the ArcSDE forums about folks wanting to script and program the ArcSDE SDK with C# rather than going through ArcObjects. I suppose they’re wanting to do this for more control and performance reasons. My experience with both MapServer and GDAL and their SWIG bindings has been very instructive, and I have started rewriting PySDE to follow GDAL’s approach and layout. This means that there is good potential to support more than just Python. I think that PySDE could still fill a useful niche for folks, and I’m looking for folks who wish to support its further development. Head to http://sde.hobu.biz if you’re interested in finding out more and looking at the current state of the code.

GDAL ArcSDE Driver

Project number two related to ArcSDE was the big one — development of a full GDAL raster driver. I had posted a little bit about this last fall, and I was able to secure some funding to make it happen. GDAL now has a driver checked into subversion that supports overviews, statistics, many different data types (1, 4, 8 bit and so on), coordinate systems, and colormaps. Head to http://www.gdal.org/frmt_sde.html to find out more information about the driver. If you’re interested in testing it out (I have been looking for more testers…), grab a copy of FWTools 1.2.2, check the box for the ArcSDE plugin, and after installing, head to http://hobu.stat.iastate.edu/mapserver/build_output/gdal_SDE.zip to get a new version of the plugin that has seen a number of improvements since the FWTools release. You’ll need the ArcSDE 9.1 SDK on your PATH to be able to use it.

Development of this driver was an excellent learning experience, and I would like to thank Frank Warmerdam for his guidance while I developed. I hope that the ArcSDE raster driver proves to be very useful, especially for government types hoping to wedge Open Source GIS’s foot in the door in their machine rooms. I think it (along with MapServer’s SDE support and OGR’s SDE vector support) has the potential to be a major glob of glue for organizations not looking to abandon their editing and analysis tools while still looking to things like MapServer and MapGuide to fulfill their web story.

MapServer ArcSDE Joins

The final ArcSDE item I worked on was small, but not so straightfoward. I added the ability for MapServer’s ArcSDE driver to one-to-one joins with another in-database table. Obviously this is very specific and only useful to a very small percentage of the userbase. One area where spatial databases like PostGIS and Oracle (and maybe MySQL if it ever supported geometric algebra and OGC Simple Features operations) is the construction of a query defines what the “layer” is as far as MapServer is concerned. ArcSDE doesn’t have that luxury, and some fire-burning hoop-jumping is required even to do a simple join.

Hobu Pro

With the upcoming move to the new city, I will also be making Hobu, Inc. a full-time endeavor. It will be an exciting and challenging step to take, but I think that now is as good as time as ever to try and build it into a sustainable business. The timeframe for going full-time is still in the air a bit, as we’ve not yet closed on our house in the old city, but once we know the house is sold (buy it here!), I will give my notice at Iowa State University and head off across the state to do GIS software development and consulting full-time. The hope is that we’re moved to the new city by the middle of April, but real estate markets as they are, it’s hard to know for sure.

I’m excited to bring it full-time, and your chance to hire Hobu Pro ™ to do your proprietary <-> open source GIS software bridging development dirty work is soon approaching :)

Geographic Projection Web Services redux

January 9th, 2007

A couple of years ago I was playing around with Twisted Python and developed a simple web service for reprojecting data (this is no longer live). It was merely a glorified wrapper around the spatial reference tools in GDAL, but a few folks found it useful and I still receive an email about it here and there.

Lately, I’ve been playing around with Django, mostly as an effort to find a less punishing framework than Zope/Plone. I’m not very much of a web person however, but I’ve been following some of the recent developments like OpenLayers, GeoJSON, and, of course, the Google. One of the more common and somewhat insane things that I see as part of my daily routine hanging out on #gdal on IRC is somebody showing up asking about projection math and how they can implement it in JavaScript. The Open Source GIS world already has a world-class projections library in Proj.4, but for some reason it makes sense to a few folks to try and re-invent this wheel. I’ve always thought it was kind of silly, and any attempt to do so would probably result in a miserable development effort that retreads the misery that Gerald and Frank endured making Proj.4 in the first place.

JSON

JSON, or JavaScript Object Notation, is rapidly becoming the transport of choice for all of the AJAXian wunderkinds out there. The greatest advantage of JSON is that the parsing overhead of it for JavaScript is zero, and it can practically be eval’d in directly to the client. With these two things in mind, I set about to play around with a JSON-emitting webservice that projects geographic data (points).

The projector lives at http://projection.hobu.biz/json . It takes in these parameters:

Parameter Value Description
x -93.0 The x coordinate (longitude)
y 42.0 The y coordinate (latitude)
inref EPSG:4326 The input spatial reference of the point
outref EPSG:26915 The output spatial reference of the point
function myfunction The name of the JavaScript function to wrap the output in
id 3 An id to give the data

Here’s an example that projects a point in Iowa from EPSG:4326 to EPSG:26915 (decimal degrees WGS 84 to UTM Zone 15):

http://projection.hobu.biz/json/project/?y=43&x=-93&function=myfunction&outref=EPSG%3A26915&inref=EPSG%3A4326

Using the service with EPSG codes would probably be the most common way to work, but because the service builds on the spatial referencing library in GDAL, there are many more options. For example, we could specify our input projection in Proj.4 format instead:

Proj.4 input example

Or even get really nuts and specify our output projection in OGC WKT:

OGC WKT output example

Finally, we can output a coordinate in a spatial reference that doesn’t have an EPSG code, like USGS Albers:

USGS Albers output example

Python usage

You aren’t limited to using GET requests either (although you could probably construct one easily with urllib). For example, with jsonrpclib for Python, you can request projection of points much like you would with XMLRPC:

>>> import jsonrpclib>>> rpc = jsonrpclib.ServerProxy('http://projection.hobu.biz/json')>>> rpc.project('-93.0', '42.0', 'EPSG:4326', 'EPSG:26195')>>>{u'id': 2, u'result': {u'features': {u'center': [500000.0, 4649776.2247029999, 0.0], u’title’: u’title’, u’spatialCoordinates’: [[500000.0, 4649776.2247029999, 0.0]], u’srs’: u’EPSG:26915′, u’geometryType’: u’point’, u’id’: u’1′}}}

If you find it useful, let me know.

GDAL 1.4.0 Released

January 7th, 2007

Head to http://www.gdal.org to get a copy. Here’s the note from Frank and relevant news items that describe some of the changes:

The GDAL development team is pleased to announce the release of
GDAL/OGR 1.4.0.  This new release includes many new features
and bug fixes since the 1.3.2 release nine months ago.  These
are described at:

http://www.gdal.org/NEWS.html

The new release source may be downloaded from:

http://www.gdal.org/dl/gdal-1.4.0.tar.gz
http://www.gdal.org/dl/gdal140.zip

Binaries corresponding to the GDAL/OGR 1.4.0 release can
be found included in the FWTools 1.1.3 release for Windows
and Linux at:

http://www.gdal.org/dl/fwtools/FWTools113.exe (windows)
http://www.gdal.org/dl/fwtools/FWTools-linux-1.1.3.tar.gz (linux)

The GDAL project has also introduced the new gdal-announce
list, hosted by OSGeo.  Those interested in just occasional
notices of GDAL/OGR project progress are encouraged to join
this mailing list.

http://lists.osgeo.org/mailman/listinfo/gdal-announce

GDAL/OGR 1.4.0 - General Changes

Perl Bindings:
  • Added doxygen based documentation.
NG Python Bindings:
  • Implemented numpy support.
CSharp Bindings:
  • Now mostly operational.
WinCE Porting:
  • CPL
  • base OGR, OSR and mitab and shape drivers.
  • GDAL, including GeoTIFF, DTED, AAIGrid drivers
  • Added test suite (gdalautotest/cpp)
Mac OSX Port:
  • Added framework support (–with-macosx-framework)

GDAL 1.4.0 - Overview Of Changes

WCS Driver:
  • New
PDS (Planetary Data Set) Driver:
  • New
ISIS (Mars Qubes) Driver:
  • New
HFA (.img) Driver:
  • Support reading ProjectionX PE strings.
  • Support producing .aux files with statistics.
  • Fix serious bugs with u1, u2 and u4 compressed data.
NITF Driver:
  • Added BLOCKA reading support.
  • Added ICORDS=’D’
  • Added jpeg compression support (readonly)
  • Support multiple images as subdatasets.
  • Support CGM data (as metadata)
AIGrid Driver:
  • Use VSI*L API (large files, in memory, etc)
  • Support upper case filenames.
  • Support .clr file above coverage.
HDF4 Driver:
  • Added support for access to geolocation arrays (see RFC 4).
  • External raw raster bands supported.
PCIDSK (.pix) Driver:
  • Support METER/FEET as LOCAL_CS.
  • Fix serious byte swapping error on creation.
BMP Driver:
  • Various fixes, including 16bit combinations, and non-intel byte swapping.
GeoTIFF Driver:
  • Fixed in place update for LZW and Deflated compressed images.
JP2KAK (JPEG2000) Driver:
  • Added support for reading and writing gmljp2 headers.
  • Read xml boxes as metadata.
  • Accelerate YCbCr handling.
JP2MrSID (JPEG2000) Driver:
  • Added support for reading gmljp2 headers.
EHDR (ESRI BIL) Driver:
  • Support 1-7 bit data.
  • Added statistics support.

OGR 1.4.0 - Overview of Changes

OGR SQL:
  • RFC 6: Added support for SQL/attribute filter access to geometry, and
    style strings.
OGRSpatialReference:
  • Support for OGC SRS URNs.
  • Support for +wktext/EXTENSION stuff for preserving PROJ.4 string in WKT.
  • Added Two Point Equidistant projection.
  • Added Krovak projection.
  • Updated support files to EPSG 6.11.
OGRCoordinateTransformation:
  • Support source and destination longitude wrapping control.
OGRFeatureStyle:
  • Various extensions and improvements.
INFORMIX Driver:
  • New
KML Driver:
  • New (write only)
E00 Driver:
  • New (read only)
  • Polygon (PAL) likely not working properly.
Postgres/PostGIS Driver:
  • Updated to support new EWKB results (PostGIS 1.1?)
  • Fixed serious bug with writing SRSes.
  • Added schema support.
GML Driver:
  • Strip namespaces off field names.
  • Handle very large geometries gracefully.
ODBC Driver:
  • Added support for spatial_ref_sys table.
SDE Driver:
  • Added logic to speed things up while actually detecting layer geometry types
PGeo Driver:
  • Added support for MDB Tools ODBC driver on linux/unix.
VRT Driver:
  • Added useSpatialSubquery support.

Five Things

January 3rd, 2007

Sean tagged me, so here goes:

  1. I have cheated death three times.  First, I overturned a farm tractor and had a 500 lb. barrel roll inches past my head.  About two years later, I fell asleep driving and drove off a curve at 60+ mph.  Finally, about a year afterward, I was struck by lightning driving down the freeway at 70 miles per hour (totaled the car but didn’t touch me). 
  2. I am halfway to completing my private pilot’s license (no close calls so far ;)
  3. My first successful Linux installation was on a Compaq iPaq using the handhelds.org distribution about five years ago.
  4. I own an original Palm I handheld. I still would have rather had a Newton.
  5. My first computer was an Apple IIgs.  Graphics and sound, baby!

Mateusz, Frank, Sandro, Jo, and Gary,  you’re it.

2006 Year in Review

December 27th, 2006

General Thoughts

OSGeo bootstrapped itself from the crumbled MapServer Foundation announcement at the end of 2005. Considerable energy and leadership from people like Frank Warmerdam, Jo Walsh, Gary Lang, and Tyler Mitchell have kept it moving forward and working to attain the goals which it set forth to accomplish. I expect that it will continue to gather momentum in 2007, despite Sean and other’s continued hating on it, because the benefits it provides are already starting to be realized in terms of visibility, cross-project collaboration, and financial stewardship.

ESRI had their Vista release. I’ve been mostly out of the ESRI loop the past couple of years, due to my involvement with open source stuff and the fact that the ESRI stack is too stovepipe-like for my development group’s taste. From my small business/local government perspective and GIS weblog perspective, it is clear that many are looking at open source/cheaper alternatives to accomplishing the same tasks. Integrated and polished open source solutions will have a real opportunity over the next 18 months (obviously in the web space with Mapguide, MapServer, GeoServer, and OpenLayers, but there are other areas too).

Apple continues to clean house. I so wish I would have bought stock after recovering buying my first Mac three years ago. 2007 should be exciting in the Apple world with a new OS release for Microsoft to attempt to copy and continued hardware releases from Apple’s wise decision to hop on the i386 bandwagon. Hobu, Inc. is completely an Apple shop now as far as workstations and laptops. Apple’s servers are interesting and nice from a management perspective, but their cost does not justify their existence when compared to commodity pizza boxes and a solid Linux distribution.

Review

February 4th

I was in an O’Hare hotel for the birthing of OSGeo.

February 12th

I completed the MySQL driver for OGR. This was my first experience developing for OGR, and it was straightforward. Maybe someday MySQL will realize that they need to beef up their spatial support. Currently for most, it is but a speedbump on the way to implementing PostGIS, I think.

Late February

I released the first version of the MapServer Buildkit. The Buildkit is a big ball of wax that has everything pre-configured for you to build MapServer on Windows with MSVC2003. The Buildkit has been the base from which MS4W has been built on, and its use has eliminated my need to release an independent set of MapServer windows binaries for ArcSDE and Oracle Spatial support.

March 5th

My wife accepted a job at the University of Iowa, and I found out I’ll be moving to a new town.

April

I was busy setting up Buildbots for GDAL, MapServer, and GEOS. They have been an excellent way to keep track of developments in these projects, and Mateusz has taken up the torch of helping to implement them for all OSGeo projects that want them.

June 1st

I was at MetroGIS explaining to Clint Brown of ESRI that it makes sense to release the ArcSDE C API because third party developers building software with it will *sell them more software.* It had no effect…

June 26th

LizardTech releases their latest SDK with input from Frank Warmerdam and myself about the licensing agreement. I also did some testing to make sure it worked as a universal binary. I earned a LizardTech polo shirt for my efforts.

August 29th

I headed off to FOSS4G in Lausanne, Switzerland after honeymooning in Italy for a couple of weeks.

Sept 18th

Returned home from the conference to a broken toilet, flooded house, and wet and trashed computers. Most of the damage was contained to my office, the bathroom above, and the basement below. One silver lining was a Mac Pro to replace my dual G5, but other than that, it’s been pretty miserable.

End of November and Early December

During this timeframe, I was working on numpy bindings for GDAL, I was closing bugs in preparation for GDAL’s release, and I received a commitment for funding to support the development of ArcSDE Raster support for GDAL.

December 10th

I created a new website to revive PySDE. I will provide more detail about that in a future post.

2007 and Beyond

As I noted in my review section, my wife has accepted a job at the other university in Iowa. Telecommuting back to my job at the Center for Survey Statistics and Methodology at Iowa State is not an option, and I will be striking out on my own to try my hand at consulting full time. The exact date has not been set yet, but it will coincide with the repairs to the house in Ames from the flood and our ability to sell it. So far, my prospects look fairly good, but in a couple of months I’ll be looking to take on more. Here’s your chance to get in line to hire me :)

OS X and I got Geoserved

December 16th, 2006

Attempt #1 - Try to install it on OSX Server’s JBoss (45 minutes)

My machine meets all of the requirements of the Install Mac OSX document on the site. I tried following the scant document, even interpolating some necessary things like changing file permissions and ownership when needed, but I was never able to get it working this way.

Attempt #2 - Try to install it directly in a fresh Tomcat (45 minutes)

Next, I thought it my be prudent to try install in a fresh Tomcat instance, as this would likely be the most common configuration and installation of Tomcat. I got Tomcat working and I grabbed the geoserver.war file and dumped it into webapps. Restarted Tomcat and went to ./geoserver and got nothing.

Attempt #3 - Try to install the packaged binaries (geoserver-bin) (1 hour)

After converting the linefeeds of the startup scripts (dos linefeeds in a shell script that supposedly starts the server is inexcuseable) and following the documents for setting, java finally dies with a bunch of java.lang.NullPointerExceptions.

Update: After some more tweaks, pops, and pokes and synthesizing the Geoserver wiki into my neural network I was able to get this approach to work.

My Failures

  • I’m impatient and I don’t like to read documents. Geoserver caters to me because there isn’t that much to read.
  • I’m running OSX, which evidently isn’t a common configuration. Java’s java, and it is supposed to run everywhere though, right?
  • I’m not a java developer, so I don’t have the experience to weather common trip-ups that a developer would have.

Geoserver’s Failures

  • Wikis are horrible for project documentation. They are by nature divergent and the result is conflicting information. I would much rather have a single document that is precisely wrong once than four or five documents that are imprecisely wrong in different ways. As a naive user, I have no way to descriminate between the various bits of information and determine which are right, which are wrong, and which are current.
  • Geoserver’s (this is probably just java in general) packaging is very spartan. It and the documentation assume I actually know what I am doing. The common use case of dumping things on a windows box appears be covered though, so maybe that covers 99% of the users, and I’m just in a very small minority.

You could argue that I’m an impatient, no-documentation-reading, know-nothing user who hasn’t spent enough time with the project and you would be right. These attributes still should prevent me from getting something going and seeing what it is about. Good packaging is hard. Good documentation is hard. Do people have exactly these same frustrations when they start out blind, naked, and dumb with MapServer too?

Update #2 I maybe just didn’t spend enough time and was too impatient, because after three or four hours I was able to get things to work. The problem of not having sexy or straight-forward enough documentation and packaging for open source software can make the barrier quite high. I know that most of the open source GIS projects I’m involved with continue to have this problem as well (my rhetorical question about MapServer notwithstanding). Starting again with an unfamiliar project with fresh eyes highlighted how frustrating this can be. I have no suggestions how to fix it though, as writing documentation, packaging software, and maintaining project websites is not egoboo-generating work. Maybe someday the firehose will point at these projects and these issues will all just go away :)

Open Source software is not always openly developed

October 19th, 2006

The terms “Open Source” and “Free Software” almost always describe the licensing state of a piece of software. For the software projects that they are attached to, they describe a crucial attribute — your rights as a software developer and user to read, modify, build on, and use the source code of a project for your uses. “Open Source” and “Free Software” are also implicitly attached to the concept of an open development process — the software is worked on, improved, and distributed by a self-selected group of developers to the benefit of all — but this is not always the case. This article will describe and discuss the development processes of four projects that I am familiar with in the Open Source GIS domain and will describe the role that I think that OSGeo should provide in ensuring open development for its member projects.

Karl Fogel’s “Producing Open Source Software” is *the* book to read on this topic. It is the handbook of open development, describes the attributes of successful open source project that is openly developed, and it even covers sticky topics like how to work through issues like money entering the project, forking, and governance. If you get a chance, I highly recommend you read through it, if to only become familiar with the attributes of open source projects with respect to development and cues to help you evaluate them.

ERMapper’s ECW

During the summer of 2005, ERMapper released the software to their wavelet compression/decompression engine called ECW. While the source code is technically open, the licensing terms do not meet any of the requirements of any of the mainstream open/free software licenses, and by all accounts, the development process of ECW is anything but open.

My opinion is that ERMapper is in limbo with respect to opening ECW. They want outside contribution, especially with respect to building and deploying the software and fixing bugs, but they are concerned about their format and its possible divergence from the tons of data already out there in ECW format. The possible licenses you can assume and fall under reflect this ambiguity, and each variant of the license reflects what you as a *user* can do with the software.

As a developer, your only real option at this point is to not bother. According to Fogel’s criteria, ECW meets hardly any of the attributes of a successful open source project:

  • There is no common public source code repository.
  • There is no development mailing list.
  • There is no public bug tracking.
  • There is no public history of development — why the software is the way it is.

I think a project like ECW (or even MrSID) has the potential to be a highly impactful open source project that can reach beyond the geospatial domain. Wavelet compression is becoming an important topic, and participation in a project that utilizes this has the potential to catalyze and seed many other development efforts. Hopefully, ERMapper will see that their worries about format divergence will actually be mitigated by a truly open development effort. Time will tell on this one…

Refractions’ GEOS

GEOS has an interesting history. It started as a port of JTS to the C++ platform by Refractions. It is the geometry engine underneath PostGIS, and it is used by many other open source projects in the ‘C’ camp of open source GIS to provide topology and geometric algebra operations. Licensing-wise, it meets the criteria of open source software, and it is licensed under the LGPL (there is no explicit LICENSE.txt in the source code, but each source file is described as “licensed under the LGPL”).

Here are the attributes of Open Development that GEOS has:

  • A source code repository
  • A public bug tracker
  • Development list
  • Website

GEOS meets most of the criteria listed in Fogel’s book, but in my opinion it is missing couple of key components to truly qualify as open development (disclaimer: I am a source code committer on the GEOS project). First, it really has no community-oriented governance model. The project leader is ostensibly a maintainer that is paid by Refractions to organize and push forward GEOS’ development. Major developments must be vetted by Refractions (and frequently funded through Refractions) before they can be undertaken. Releases are made to serve Refractions’ business needs (PostGIS or client-funded improvements). These attributes make GEOS risky from the perspective of an individual, independent developer because your investment in GEOS as a developer may be thwarted if it is not in line with the interests of the company that “owns” the project.

A quote from Fogel’s money chapter illustrates this point more elegantly than I can:

However, funding also brings a perception of control. If not handled carefully, money can divide a project into in-group and out-group developers. If the unpaid volunteers get the feeling that design decisions or feature additions are simply available to the highest bidder, they’ll head off to a project that seems more like a meritocracy and less like unpaid labor for someone else’s benefit. They may never complain overtly on the mailing lists. Instead, there will simply be less and less noise from external sources, as the volunteers gradually stop trying to be taken seriously. The buzz of small-scale activity will continue, in the form of bug reports and occasional small fixes. But there won’t be any large code contributions or outside participation in design discussions. People sense what’s expected of them, and live up (or down) to those expectations.

The second, and more important divergence in my opinion, is that the head of GEOS is disconnected from its body. Definition of *how* the software is supposed to work — design and architectural decisions — actually comes from JTS. Strict adherence to the “C++ port of a Java library” mindset, and GEOS’ insistence on following JTS to the letter instead of in spirit means that technological advantages that C++ could provide can’t really be taken. There’s no feedback loop so that possible improvements made in GEOS make their way back to JTS. GEOS is the way it is because JTS is that way, and JTS defines what GEOS is. As a developer, only small improvements and changes can be made, as long as the parallelism between GEOS and JTS is not broken and the changes are in line with Refractions’ interests, and these restraints, in my opinion, causes internal forking efforts to happen and retard the development of GEOS in what it aspires to be — a fast, correct, C++-based open source library for geometry and topological operations.

Frank Warmerdam’s GDAL

GDAL meets most of Fogel’s criteria for an open development project (disclosure: I am a committer and PSC member on the GDAL project). It has a source code repository with history going back almost to the first day that Frank checked code into the project. It has a bug tracker with thousands of bugs (most of which are labeled as “fixed”). It has a very active user community, a documentation website, and an IRC channel.

GDAL has historically followed the Linux model, where most changes go through Frank, and lieutenants are left to be in charge of certain areas of the software. From an open development and governance standpoint, GDAL is currently in transition. While we all believe that Frank has a time stopping machine hidden up in the deep Canadian woods with him that allows him to be so prolific, there are limits to what one person can do, and GDAL is increasingly approaching them.

The move to OSGeo for GDAL has brought about the emergence of a Project Steering Committee, and Frank has relinquished outright dictatorial control of the project to it. The release process is slowly moving to a more community-oriented affair, and members of the GDAL development community are stepping forward to take care of maintenance of larger areas of the software. Sweeping technological changes and addition of features now go through a proposal and committee process to ensure that everyone can be made aware of such developments. Explicit communication (and the implicit history this creates) about these things ensures that developers on the project are roughly moving in the same direction.

UMN’s MapServer

In my opinion, MapServer meets Fogel’s criteria for an open development project (disclosure: I am also a committer and PSC member on the MapServer project). Unlike the other three projects that I described which are targeted almost exclusively toward other application developers, MapServer’s audience is both user-oriented web developers and some standard GIS-type application development. This diversity manifests itself throughout the project, most notably in the governance structure and developer community, which I have already described MapServer’s in a previous post. It also contributes to MapServer’s creeping featuritis that is both its blessing and its curse.

Frank’s MS RFC-1 has become one of the prominent models for project governance in OSGeo, and many of the member projects have copied or modified it to fit their culture and needs. MapServer has a ring (or two) of businesses that use it to their competitive advantage. Funding opportunities that arise from this ring feeds back into the software in the form of new features, general maintenance, documentation, and maillist support. Many of the businesses in the ring are represented in the PSC of MapServer. The project soldiers on, despite individual developers coming and going, and it still sees significant growth release over release as measured by software downloads, maillist posts, and bug submissions.

I think that MapServer is a very functional, but slightly imperfect model of open development. There are things described in Fogel’s book we aren’t doing yet, or aren’t so good at, but we’re open to suggestions and highly motivated individuals can have a lot of impact on the project. Technologically, the MapServer project is fairly conservative, and its not prone to withstand or put up with large refactorings. Its governance body is not elected by the community in an effort to sidestep the sticky issue of determining *who* the community is and if they have a right to vote on such things. Its diverse audience and diverse developer base pulls the project in many directions at once, and its focus, as defined by “MapServer is not a GIS!” leaves an awful lot of room to define what it actually is.

How OSGeo can play a role in ensuring Open Development

Hop on the bus, Gus

An important part of a project’s migration to OSGeo and its incubation is the codification of its governance culture. One goal of this governance is to increase what Sean calls the “bus number,” or the number of developers (or entities) a bus would need to run over to kill the project. I think it is important to make a fine distinction between the bus number of the project and the bus number of the software because they might not always be the same. OSGeo should aspire to ensure that the bus number of its member projects is much greater than one entity or person. In fact, I think this should be a requirement that must be met for incubation — if a project can’t satisfy it, it should be clear that there isn’t enough community support for the project to keep it viable. The problem, of course, is actually measuring the bus number of a project is a rather messy endeavor :)

Mo’ money, mo’ problems

Money coming into a project, as Fogel devotes an entire chapter to, can create significant tension within a project. In my opinion, an overriding incentive to form OSGeo by its initial member projects was money — in the form of direct support for the projects through some kind of collective pass-through funding, money for visibility and marketing and the leverage that an organization like OSGeo can provide, and money in the form of member projects sharing infrastructure that is common to all and commonly replicated. OSGeo must be clear, explicit, and careful about how money is distributed (if there is any to distribute). Perception is reality in this instance, and any shady stuff will probably be met with significant backlash.

Insight, foresight, more sight

Another role of OSGeo is to provide infrastructure to mitigate disputes and provide a neutral home for the project that is agnostic with respect to a specific entity. OSGeo could act as the grown-up in the case of intra-project disputes like Apache has done in the past. It aspires to act as the facilitator, in hope of congealing its member projects into a mass of non-overlapping, useful, and integrated technology. Finally, it can act as an educator, introducing the technology of its member projects and providing buoyancy in a way that a single project by itself could not do.

Conclusion

Some final thoughts for those of you who’ve actually read this far. Open source projects, even in the GIS domain, vary widely with respect to their development practices and how open or closed they are. As an open source developer, volunteer, and user, I’m attracted to projects that are openly developed. I will not invest much effort in something where my stake, in effort, is not recognized and respected. Finally, a significant measure of OSGeo’s success or non-success in my mind is how good of a job it does at fostering and ensuring open development of its member projects.

Update:

Please head to SlashGeo if you’re interested in commenting on this article. Still haven’t found a good solution to comment spam for my weblog yet.

Dear ESRI EDN:

October 11th, 2006

Dear ESRI,

Please listen to us. We’re your cold, poor, hungry, and sometimes taken advantage of third party developer community. We don’t get to sell $56 million of services and software to the federal government each year. We live with much smaller budgets, much shorter timelines, and much higher expectations. Like those of any software company, you have some products that are fantastic, some of them that are mediocre, and some of them should have never been sold to the public. One of the products that I think is really fantastic is a piece of spatial database middleware you bought a while back called ArcSDE. Lots of your customers use this product to do lots of really neat things. And I would like to use it to do allow my customers to do some really neat things with the software they bought from you.

I’m part of a group of developers of an open source library called GDAL (Geospatial Data Abstraction Library). I like to call it the libc/libc++ of geospatial software. It’s used everywhere, it takes care of the dirty job of raster data translation, and it allows people to write fantastic software.

I would like to write a driver for GDAL that can read ArcSDE raster data so people can do fantastic things with the petabytes of raster data that is stored in ArcSDE throughout the world. To use this driver, people would still have to purchase an ArcSDE license from you. In fact, the ability to use ArcSDE raster data from GDAL might allow you to sell ArcSDE licenses you might not otherwise have sold.

With a shortage of funds and no shortage of id, I set to writing a proposal for implementing the driver for my client. I had heard about the EDN program last summer, and it was my understanding that I would be able to get a temporary (one year) license to be able to use ArcSDE for this type of development. After some investigation, I came across some show-stopping issues with the EDN license that cause a lot of concern.

The first part of the license (which for some reason has PDF copy protection and doesn’t allow me to copy-paste out of it) says:

Licensee shall not reverse engineer, decompile, or disassemble the Software, Data, Web Services, or Documentation, except to the extent that such activity is expressly permitted by applicable law notwithstanding this restriction.

The license doesn’t say if I’m ever released from this restriction. If I sign this agreement, does it mean that I can not ever participate in reverse engineering any ESRI software in perpetuity? A lot of my problem with this restriction is that it depends on what’s being called “reverse engineering,” and in my experience a good portion of software development *is* reverse engineering in the sense that you are trying to figure out what another developer did, why a piece of software works the way it does, and how you as a developer can work with, around, and through it.

The second part of the license, the EDN-specific part, has a few things that cause me to pause:

Licensee may grant access to server applications developed using the EDN Software Library to Licensee’s customers and internal users for acceptance testing purposes only; provided that Licensee’s customers and internal users will not perform any debugging, configuration, or maintenance.

Publicly disclose results of benchmark testing except with prior written permission of ESRI.

What does this first part mean? For something like the ArcSDE raster driver, my client is other software developers. The only way they can reliably do acceptance testing *is* to jump in and debug what it is doing. I understand not using EDN software for configuration and maintenance (well, maybe), because you don’t want folks using the cut-rate EDN software to do “real work.” My problem with a restriction like this is that it appears to prevent someone like me from using EDN to do *any* work.

With respect to benchmark testing, why are you afraid to have unabashed results of how your software performs out in public?

Broad, complicated (I had to spend an inordinate amount of time tracing through multiple layers of footnotes) licensing agreements like the ESRI general and ESRI EDN ones scare your developers away. Third party developers provide you with value, momentum, and a pool of potential talent to poach from. A large wall of legalese causes casual folks to not bother. If you’re ever looking to garner the momentum, energy, and hype of the GYM neogeographers and masheruppers, you’ll never get it by making them sign to twelve pages of stuff like this. It won’t matter if your software is fantastic or not.

Hopefully, someone from your company will stumble across this weblog post and be able to answer a question for me. I don’t have any licensing or maintenance agreements with you, but there is some financial incentive for you to do so because, like most of your third-party products, I think an ArcSDE raster driver for GDAL will sell you more licenses of your software.

As an independent developer, is it possible for me to use an EDN seat to develop a raster driver for GDAL?

Sincerely,

Howard Butler,

Hobu, Inc.

Gone to Italy/Switzerland.

August 29th, 2006

See you at FOSS4G