Named entity extraction
The techniques we discussed in the Cleanup and Reconciliation parts come in very handy when your data is already in a structured format. However, many fields (notoriously description) contain unstructured text, yet they usually convey a high amount of interesting information. To capture this in machine-processable format, named entity recognition can be used.
Download our Refine extension
Named-entity recognition has never been easier: thanks to our brand new OpenRefine extension, you can enrich your description fields right from your workspace.
- Download the latest version of the extension and unzip it.
- Copy the unzipped folder to your extensions folder.
- To find your extensions folder, choose Browse workspace directory from the Refine interface, and navigate to the folder extensions (which you should create if it doesn't exist yet).
- Start or restart Refine.
- Open or create a project.
- Click the Named-entity recognition button, choose Configure API keys... and enter your personal API keys.
- Click the triangle before the column and choose Extract named entities...
- Select the services you want to use.
- Click Start extraction.