Tuesday, June 28, 2016

Intelligent US Address formatting

I recently needed to have a good solution for parsing US postal addresses into a standard format. After a bit of quick googling I found a neat github project that does most of what I needed. A big shout-out to the Atlanta Journal Constitition newspaper for doing this.

If you're running python, all you have to do is 'pip install usaddress' and you're up and running.

I then wrote a quick python script to convert usaddress output to a standard format, like this:

[Optional Name,  though you really shouldn't pass it]
Address line 1
[Optional Address Line 2]

Here's the usaddr.py script (python 3.x):


import sys
import usaddress

for addr in sys.argv[1:]:
    c = usaddress.tag(addr)[0]
    l = []
    breakfields = [
    multikill = {
      'AddressNumberPrefix' : [ 'AddressNumber' ]

    for k,v in c.items():
      if k in breakfields:
        if k in multikill:
          for mk in multikill[k]:
            if mk in breakfields:
        l = [i for i in l if i]
        if l != []:
          print(' '.join(l))
        l = [v]
    l = [i for i in l if i]
    if l != []:
      print(' '.join(l))
  except usaddress.RepeatedLabelError as e :
    print('Error: cannot parse address')

Since I'm running a LAMP stack on one of my servers that happens to have python also installed, I wrote a quick usaddr.php script to implement an API:

< ?php

foreach (explode('&', $_SERVER['QUERY_STRING']) as $chunk) {
  $param = explode("=", $chunk);

  if ($param) {
    $cmd = urldecode($param[0]);
    $arg = urldecode($param[1]);
    if ($cmd === 'addr') {
      $command = 'python /path/to/your/script/usaddr.py ' . escapeshellarg($arg);
      $output = shell_exec($command);
      echo $output;


So this meant I could access http://myserver.com/myapi/usaddr.php?addr=[address to reformat] from other apps (like FileMaker) to do address reformatting.

No comments: