Tuesday, June 28, 2016

Intelligent US Address formatting

I recently needed to have a good solution for parsing US postal addresses into a standard format. After a bit of quick googling I found a neat github project that does most of what I needed. A big shout-out to the Atlanta Journal Constitition newspaper for doing this.

If you're running python, all you have to do is 'pip install usaddress' and you're up and running.

I then wrote a quick python script to convert usaddress output to a standard format, like this:

[Optional Name,  though you really shouldn't pass it]
Address line 1
[Optional Address Line 2]
City
State
Zipcode
#

Here's the usaddr.py script (python 3.x):

#!/usr/bin/python

import sys
import usaddress

for addr in sys.argv[1:]:
  try:
    c = usaddress.tag(addr)[0]
    l = []
    
    breakfields = [
      'Recipient',
      'BuildingName',
      'LandmarkName',
      'CornerOf',
      'USPSBoxType',
      'AddressNumber',
      'AddressNumberPrefix',
      'PlaceName',
      'StateName',
      'ZipCode'
    ]
    
    multikill = {
      'AddressNumberPrefix' : [ 'AddressNumber' ]
    }

    for k,v in c.items():
      if k in breakfields:
        breakfields.remove(k)
        if k in multikill:
          for mk in multikill[k]:
            if mk in breakfields:
              breakfields.remove(mk)
        l = [i for i in l if i]
        if l != []:
          print(' '.join(l))
        l = [v]
      else:
        l.append(v)
        
    l = [i for i in l if i]
    if l != []:
      print(' '.join(l))
    print('#')
          
  except usaddress.RepeatedLabelError as e :
    print('Error: cannot parse address')

Since I'm running a LAMP stack on one of my servers that happens to have python also installed, I wrote a quick usaddr.php script to implement an API:

< ?php

foreach (explode('&', $_SERVER['QUERY_STRING']) as $chunk) {
  $param = explode("=", $chunk);

  if ($param) {
    $cmd = urldecode($param[0]);
    $arg = urldecode($param[1]);
        
    if ($cmd === 'addr') {
      $command = 'python /path/to/your/script/usaddr.py ' . escapeshellarg($arg);
      $output = shell_exec($command);
      echo $output;
    }
  }
}

?>

So this meant I could access http://myserver.com/myapi/usaddr.php?addr=[address to reformat] from other apps (like FileMaker) to do address reformatting.

No comments: