News & Views‎ > ‎

Using Excel to Prepare Data for Analysis

posted Mar 23, 2018, 5:37 AM by Greg Smith   [ updated Mar 27, 2018, 5:24 PM ]

A month ago I presented a session at an online conference called Wrangling Library Data: Analytics, Dashboards, and Spreadsheets. My presentation was entitled "An Old Tool with Enduring Value: Using Excel to Prepare Data for Analysis."

The premises of my presentation were pretty simple:

  • Data sets need to be prepared for analysis. The data we’re called on to analyze typically need stuff done to them. Particularized data need to be grouped into meaningful categories. Free text needs to be broken down into keywords. Dirty data need to be cleansed and other irregularities need to be dealt with. In sum, signals need to be distinguished from noise.
  • Data manipulation tools more potent than Microsoft Excel are available, but I find that …
    • I’m already using Excel frequently.
    • I’m often far from exhausting its capabilities.
    • It’s easier for me to learn new tricks in a familiar environment than in an unfamiliar one.

With those thoughts in mind, I set out to share some of the data manipulation tricks that I’ve learned to use, mostly with library-oriented data, over the years. Specifically, I showed how I’ve used Excel formulas to perform the following tasks:

  • deriving time-series categories from date and time stamps
  • pre-coding survey comments based on keywords
  • dealing with messy data points such as call numbers and publisher names
  • removing extraneous punctuation and symbols from textual data
  • identifying keywords that are misspelled or on a stopword list

I offered presentation viewers eight utility files—that is, spreadsheets where one can paste a column of data and see it transformed into something more conducive to insightful analysis. In the coming days, I’ll make posts featuring a few of these utilities. The files that I demonstrated, along with my slide deck, can be downloaded here.