Here's how easy OC Transpo's Open Data should be

OC4425_20080802_IMG_4629Skip down to the Storify provided by Richard Akerman for context. I'm going to jump to just how easy and cheap OC Transpo's Open Data initiative should be to implement.

Step 1: GPS enabled buses are sending their location to OC Transpo every 30 seconds. This has already been implemented by the vendor.

Step 2: Dump said messages, raw, onto the Internet. Extracting the updates from the vendor's solution involves some cost but it should be minimal. There is no need to "markup" the messages with extra details.

  • If the raw messages are less than 140 characters long then just tweet the damn things. I'm only half kidding about that.
     
  • Append each message to a text file. Only perform whatever minimal encoding is necessary (if any). Don't worry about merging the raw message with metadata about the bus routes.
    http://api.octranspo.com/opendata/YYYY/mm/dd/hh/mm/gps.txt
     
  • HTTP "GET" requests can retrieve static files from the OC Transpo web server. This involves almost no load. The minute-by-minute text files will be relatively small. No need for API keys, usage limits, etc.
     
  • To limit abuse, delete any data that is older than one day. I'll even provide the script:

    find /var/www/api/opendata -name gps.txt -mtime 1 | xargs rm -f
     

  • OC has 1000 buses in the fleet. Assuming all of them are in use and sending two updates per minute, and the size of each update is 200 bytes, each "gps.txt" file can only grow to 390kB maximum. Outside of peak hours the size will be much smaller. If it's still too big the granularity can be lowered to 15 seconds. More files, but less bytes wasted. This is an easily conquerable problem.

Step 3: Done.

The key to Open Data strategies is to provide the data to interested amateurs. Some of them will use it to produce unreliable garbage. Some of them are Computer Science students who'd rather work on a real problem for a school assignment instead of "pretend" problems. Inevitably one, or two, or more will create truly valuable applications that fill a niche OC Transpo would never think of, or prioritize. Someone might even clean OC Transpo's clock and produce the best product out there.

Perhaps down the line there could be a fourth step where OC Transpo exposes more data to developers - cancelled trips - detours - etc. Leave that problem for the future. Right now we have the data. Release it and let the community innovate.

Here's the storify from today's Transit Commission meeting.