Continuing on the library data project, I have a headache and much frustration.  I had a brief interruption for a possible job that turned out to be a nogo, but other than that I’ve been trying to clean up the database.

Eight hours of data cleanup.  Only to discover I made an error in the first 20 minutes that affected everything after it.  Bleah.

The problem is that the data files aren’t clean between years.  Libraries change ID numbers — well, their states change their numbers — for a multitude of reasons.  Libraries change their names.  And addresses.  So before I can do year by year analysis of the over-9000 public libraries in the database I have to make sure I can track each library through the years.

In any year, as many as 95% of the libraries keep the same ID and name.  But the irregularity of the changes means there’s no automated way (of which I’m aware) to ensure apples remain apples.  I’m done for today.

Well, not entirely.  I did manage to confirm that of the roughly 33% growth in use since 1992, 2/3 (or ~22%) has been since 2000.

I also managed to get a first look at the computer correlation – a visual look, not statistical.  Addition of public computers does not particularly correlate to growth in use.  Well, going from zero to some appears to have serious impact, but after that the effect has no relationship to growth.  Some libraries that tripled user computers maintained less than 3% visit per capita growth over the nine years evaluated.


