Using Natural Language Processing for Keyword Research

Posted by Owen Powis on 10 Jan, 2018
View comments Keyword Research
We’re now using Natural Language Processing in Wordtracker to improve our toolset and provide even better results in Inspect. So find out exactly what NLP is, how it works and how it can help you with your keyword and competitor research.

What is Natural Language Processing?

At its core Natural Language Processing is about communication between people and computers. Allowing a person's speech to be properly categorised and understood by a computer and also for a computer to be able to effectively communicate with us.

At the moment there are many implementations of NLP in our everyday lives. Home automation programs such as Google Home and Alexa are direct uses of this technology. The only input and output is via voice, and this is only possible as the software is able to interpret what you are saying and - crucially - understand the intention behind it.

This may seem simple, such as teaching a computer to understand a set phrase, such as "Play BBC Radio 4", but humans don’t naturally speak like this. We may ask the same thing in many different ways. "Could you turn on BBC Radio 4", "Put on Radio 4 please" and so on. The aim for Natural Language Processing is to make it possible for a machine to do more than just keyword matching - that is, to actually understand the nature of what you are saying.

One of the things Natural Language Processing excels at is taking large volumes of unstructured data and categorizing it. Take for example your medical records. In many countries, these might be handwritten or typed into a computer. Even digitized these are unstructured, with a free text field that the doctor types your notes into. They can’t, for example, look up how many times you’ve had the flu in the past, or even in the current year, without scrolling back through the notes and looking at the different entries. With NLP you could structure that information so a doctor can see a complete overview of your medical history and understand the number of occurrences of different illnesses, different medications used and other useful data. Structuring the information that is already there can make it many times more powerful.

How Wordtracker is now using Natural Language Processing

The Wordtracker Inspect tool, available with the Keyword tool, is designed to provide you with key insights into not just your own pages but those of your competitors as well. Using Natural Language Processing we can better understand the nature of the content on page and the words which relate to it. In effect we are looking at the meaning of the content, not just the keywords presented, allowing you to find terms which other tools that use just keyword matching can’t uncover.

Taking the content from the page of the URL you enter we scan the page and convert it to a text only format. This then provides us with a block of text which we run through an NLP to return key information about the page and the words it contains. This reveals the terms which are not contained on the page, but which are highly relevant to it.

In effect, it shows the words that reflect a summarization of the information based on its meaning, not just keyword matching terms.

How we actually do the processing

Wordtracker isn't an NLP company, so instead we chose to use the expertise of a company that has been doing this for a long time. Open Calais is an NLP project first started by ClearForest all the way back in 1998. It is now run by Thompson Reuters and is generously kept free to use and access. It’s a very powerful tool and also has an excellent API. You can feed in plain text or HTML and the API will return structured JSON data.

If you’re interested in learning more about what Open Calais does and how it works check out the demo here:

It’s a project I've personally had my eye on for a few years and one I've been regularly checking back on over time. So it’s great to have found such a useful application for it within Wordtracker.

Watch this space, as we intend to bring more and more cutting edge technologies to keyword research.

Recent articles

Google's March 2024 updates - the impact so far
Posted by Edith MacLeod on 19 March 2024
Google is rolling out the March 2024 core and spam updates
Posted by Edith MacLeod on 15 March 2024
Interaction to Next Paint comes to Core Web Vitals
Posted by Edith MacLeod on 12 March 2024
Google says AI-generated images in Merchant Center must be labeled as such
Posted by Edith MacLeod on 4 March 2024
Microsoft’s guide to using prompt engineering for your ad copywriting
Posted by Edith MacLeod on 2 March 2024