Again, a large part of the day was spent with my daughter. (Toy Story 3 takes on extra poignancy when seen with your just-started-college child. GADS I sympathized with Mom near the end.)
However, I did do a chunk of crunching on the library data. Still in clean-up, however. I’ve got the 2000 and 2008 records out and I’m working so both files contain only libraries in common. That’s about a thousand records deleted from each file.
I thought about scripting, but as I mentioned there are cases where the one of the two IDs have changed. In about 20 cases both IDs changed, but the name and address of the library remained the same. (Thank you Alabama.) So it’s tedious hand crunching. The good news is that I can create my own independent ID for both record sets when I’m done. That will allow me to set up some simple text database searches and extractions.
After the records are cleaned I’m planning to move the fields so they’re in the same order, and make those fields not in the other year’s data go away. Not that they’re not important, it’s just I can’t use them for this purpose.
At this rate I might be actually looking for correlations by Wednesday.
As a general reminder (for me more than you), I intend to identify the top and bottom in growth of visits per population. I then intend to identify stats with high correlation in each or either group with the hope of finding things in one group that are not present in the other.
Originally I’d planned to pull an arbitrary 250 to 500 libraries from each end. At this time I’m thinking about seeing how many libraries are outside one standard deviation of the mean, instead. Since there are close to 8000 libraries at this point I may even get to go to two Std Devs. Again, however, we’ll see where the actual data allows me to go.