Once upon a time, search engines were about providing links to pages.
That time is past.
These days, search engines—and by “search engines,” I mean Google—are about providing answers to questions. Don’t believe me? Try it. Go input “what time does kmart close” into Google. (Or, for those of you on the other side of the Atlantic, try “asda sutton” instead of “kmart.”)
Notice what you get right up there at the top of the page. It’s an answer to your question. And then there’s that bit over to the right—the part that tells you all about K-Mart. A good 2/3 of the content you see above the fold isn’t links to pages. It’s Google providing you the information you’re looking for, right there in the search results. No second click required.
So what kind of black magic lets Google do this?
Two words: semantic content.
The Chinese Room
About 35 years ago, an American philosopher named John Searle proposed what has come to be known as the Chinese Room Argument. The paper is about artificial intelligence generally, and about the famous Turing Test specifically. (The Turing Test says that if a machine can reliably pass as a human in an online chat, then the machine counts as intelligent.)
Searle thinks the Turing Test doesn’t work, and the Chinese Room Argument is his thought experiment to show why.
The quickie version: Imagine that you’re locked in a room containing lots and lots of books filled with a bunch of markings. On the table in front of you is yet another book filled with some instructions in English. A mail slot in one of the walls is your only access to the outside world. We’re also going to assume that you speak only English. (Americans and Brits do have at least some stereotypes in common.)
Every once in a while, a sheet of paper comes through the mail slot with some markings on it. You then use your instructions to look up the markings that came through the slot, go find a different set of markings in your books, copy those markings onto a new sheet of paper, and then push the results back through the mail slot.
As you’ve probably already deduced by now, the markings are actually Chinese writing, and the things coming in through the mail slot are questions in Chinese. The markings you’re looking up and spitting back out are perfectly coherent answers to those questions. To anyone outside the room, you’d pass the Turing Test for speaking Chinese.
But you don’t understand a word of Chinese.
Searle concludes that the ability to manipulate language based purely on the shapes of symbols (or syntax) is not sufficient for actual understanding (or semantics).
Or, more briefly, syntax is not semantics.
Back to the Web
Google’s search robots live inside the Chinese Room. They’re great at syntax. Google has written a truly impressive rulebook for the robots to use in looking things up. Those rules rely on the markup you’ve used to display your content. (“Markup” is shorthand for all those funny-looking tags that show up when you accidentally hit the “view source” button—the <div>s, the <p>s, the <title>s, and the <article>s, the <section>s and the <figcaption>s.)
Sadly, Google’s robots are terrible at semantics. All the stuff that’s inside your markup (aka, the things between the tags; aka, your content)…they’re just a bunch of shapes. The rules tell the robots what to do with those shapes. But the robots can no more read those shapes than you can read the symbols in the Chinese room.
Fortunately, there’s a way to cheat.
The robots don’t understand the semantics of what shows up inside your markup. But they do recognize semantic meaning if you put that meaning inside your markup.
You can’t make the robots understand semantics. But you can help Google write a much better rulebook for its robots to follow.
To torture Searle’s metaphor a bit, the paper coming into the room that used to say:
[set of markings]
will now say something like:
Polite greeting: [set of markings]
And your rulebook will go from:
If [set of markings], then go to shelf 4, book 12, page 174 and copy paragraph 4.
to something more like:
If polite greeting: [set of markings], then go to the conversational openers shelf, select the small talk book, and copy polite greeting response: [set of markings].
These changes don’t make Google’s robots any smarter. But they do make your content a little smarter. And, even more importantly, it lets the really smart engineers at Google write a much, much better rulebook for the robots.
Toward Semantic Content
Fortunately, a lot of the work we’ll need to do to add semantic meaning to our markup has already been done for us. Google, Microsoft, Yahoo, and others sponsor a project called Schema.org, which houses a standardized set of semantic markup. All the tags you’ll find there are already incorporated into the rulebook that Google’s robots use.
And if your content doesn’t quite fit into any of the existing models…well, you’re still in luck. Schema.org is opensource. You can add your very own set of terms, and those will go into the rulebook as well.
Using schema.org’s semantic markup means that instead of displaying the ISBN of a book as:
You would show it as:
Mark up your content this way, and when someone asks Google’s robots about the ISBN for your latest book, the robots will return an answer and not just a list of pages.
And while there probably aren’t a lot of people searching for the ISBN for your book, there are plenty of people asking questions that your book answers. You can answer their question, or you can give them a link to a page that answers their question. But as Noz Urbina reminded attendees at his Confab workshop:
People aren’t searching for links. They’re searching for answers to questions.
Sadly, implementing semantic markup isn’t an overnight fix. It’s going to require some help from your IT team. And it’s also going to require structured content. After all, if your book content type has a separate field for the ISBN, it’s pretty easy to write some code that will apply right semantic tag automatically.
We wonks have really good, high-quality answers to all sorts of questions that ordinary people ask every day. But many of those ordinary people will never see our websites because Google’s robots are pulling lower-quality answers from Wikipedia.
The robots would be equally happy to pull those answers from us instead.
We just have to make our content smart enough for them to find it.