An Introduction to OpenRefine
Clean Up Your Messy Data:
Munging and Cleansing Using OpenRefine
This workshop focuses on OpenRefine and provides two tutorials to explore how cleaning, transforming it from one format into another; and extending it with web services and external data can enhance its usefulness for scholarly analysis.
Tutorial 1: Introduction to OpenRefine
Tutorial 2: Taking OpenRefine a Step Further
Each of these tutorials provides steps and access to sample data for use during the workshop.
The presentation deck is also available.
There are a variety of other tools that may or may not be mentioned but are worth knowing about in the context of today’s discussion.
- Trifacta Data Wrangler (Only in Chrome)
- How to clean your data for DataWrapper
- Mr Data Converter
- RapidMiner
- Dataiku Data Science Studio
- Tableau
- Google FusionTables (Gone :-()
Additional sources of information about OpenRefine:
- OpenRefine web site
- OpenRefine Documentation for Users
- Using OpenRefine book by Ruben Verborgh, Max De Wilde and Aniket Sawant
- OpenRefine history from Wikipedia
Additional Resources of Note from earlier Data Munging Exercises and Workshops (archive / legacy):
- Google Solve for X
- Google Research
- Google Scholar
- Google Keep
- Google Public Data Explorer
- Google Developers
- IBM Watson
- Google Groups
- Google Cultural Institute
- Google NGram Viewer
- Google Books
- Google Trends
- Google Trends Visualiser
- Google Correlate
- Google Gap Minder (Hans Rosling: New Insights on Poverty)
- Google Docs Extended/Secrets
- Google BigPicture Group
- Google Analytics
- Google Apps
- Evolution of the Web
- Google Time