Semantic Content

Once upon a time, search engines were about providing links to pages.

That time is past.

These days, search engines—and by “search engines,” I mean Google—are about providing answers to questions. Don’t believe me? Try it. Go input “what time does kmart close” into Google. (Or, for those of you on the other side of the Atlantic, try “asda sutton” instead of “kmart.”)

screenshot of search results for "what time does kmart close"

Notice what you get right up there at the top of the page. It’s an answer to your question. And then there’s that bit over to the right—the part that tells you all about K-Mart. A good 2/3 of the content you see above the fold isn’t links to pages. It’s Google providing you the information you’re looking for, right there in the search results. No second click required.

So what kind of black magic lets Google do this?

Two words: semantic content.

The Chinese Room

About 35 years ago, an American philosopher named John Searle proposed what has come to be known as the Chinese Room Argument. The paper is about artificial intelligence generally, and about the famous Turing Test specifically. (The Turing Test says that if a machine can reliably pass as a human in an online chat, then the machine counts as intelligent.)

Searle thinks the Turing Test doesn’t work, and the Chinese Room Argument is his thought experiment to show why.

The quickie version: Imagine that you’re locked in a room containing lots and lots of books filled with a bunch of markings. On the table in front of you is yet another book filled with some instructions in English. A mail slot in one of the walls is your only access to the outside world. We’re also going to assume that you speak only English. (Americans and Brits do have at least some stereotypes in common.)

Every once in a while, a sheet of paper comes through the mail slot with some markings on it. You then use your instructions to look up the markings that came through the slot, go find a different set of markings in your books, copy those markings onto a new sheet of paper, and then push the results back through the mail slot.

As you’ve probably already deduced by now, the markings are actually Chinese writing, and the things coming in through the mail slot are questions in Chinese. The markings you’re looking up and spitting back out are perfectly coherent answers to those questions. To anyone outside the room, you’d pass the Turing Test for speaking Chinese.

But you don’t understand a word of Chinese.

Searle concludes that the ability to manipulate language based purely on the shapes of symbols (or syntax) is not sufficient for actual understanding (or semantics).

Or, more briefly, syntax is not semantics.

Back to the Web

Google’s search robots live inside the Chinese Room. They’re great at syntax. Google has written a truly impressive rulebook for the robots to use in looking things up. Those rules rely on the markup you’ve used to display your content. (“Markup” is shorthand for all those funny-looking tags that show up when you accidentally hit the “view source” button—the <div>s, the <p>s, the <title>s, and the <article>s, the <section>s and the <figcaption>s.)

Sadly, Google’s robots are terrible at semantics. All the stuff that’s inside your markup (aka, the things between the tags; aka, your content)…they’re just a bunch of shapes. The rules tell the robots what to do with those shapes. But the robots can no more read those shapes than you can read the symbols in the Chinese room.

Fortunately, there’s a way to cheat.

The robots don’t understand the semantics of what shows up inside your markup. But they do recognize semantic meaning if you put that meaning inside your markup.

You can’t make the robots understand semantics. But you can help Google write a much better rulebook for its robots to follow.

To torture Searle’s metaphor a bit, the paper coming into the room that used to say:

[set of markings]

will now say something like:

Polite greeting: [set of markings]

And your rulebook will go from:

If [set of markings], then go to shelf 4, book 12, page 174 and copy paragraph 4.

to something more like:

If polite greeting: [set of markings], then go to the conversational openers shelf, select the small talk book, and copy polite greeting response: [set of markings].

These changes don’t make Google’s robots any smarter. But they do make your content a little smarter. And, even more importantly, it lets the really smart engineers at Google write a much, much better rulebook for the robots.

Toward Semantic Content

Fortunately, a lot of the work we’ll need to do to add semantic meaning to our markup has already been done for us. Google, Microsoft, Yahoo, and others sponsor a project called Schema.org, which houses a standardized set of semantic markup. All the tags you’ll find there are already incorporated into the rulebook that Google’s robots use.

And if your content doesn’t quite fit into any of the existing models…well, you’re still in luck. Schema.org is opensource. You can add your very own set of terms, and those will go into the rulebook as well.

Using schema.org’s semantic markup means that instead of displaying the ISBN of a book as:

<p>0123456789</p>

You would show it as:

<span itemprop="isbn">0123456789</span>

Mark up your content this way, and when someone asks Google’s robots about the ISBN for your latest book, the robots will return an answer and not just a list of pages.

screenshot of google search results for "what is the isbn for harry potter"

And while there probably aren’t a lot of people searching for the ISBN for your book, there are plenty of people asking questions that your book answers. You can answer their question, or you can give them a link to a page that answers their question. But as Noz Urbina reminded attendees at his Confab workshop:

People aren’t searching for links. They’re searching for answers to questions.

Sadly, implementing semantic markup isn’t an overnight fix. It’s going to require some help from your IT team. And it’s also going to require structured content. After all, if your book content type has a separate field for the ISBN, it’s pretty easy to write some code that will apply right semantic tag automatically.

We wonks have really good, high-quality answers to all sorts of questions that ordinary people ask every day. But many of those ordinary people will never see our websites because Google’s robots are pulling lower-quality answers from Wikipedia.

The robots would be equally happy to pull those answers from us instead.

We just have to make our content smart enough for them to find it.

About

Joe Miller is the Director of Digital Media Strategy at Eastern Research Group. He came to ERG from The Century Foundation, where he transitioned the organization to digital-first publishing. Previously he created the digital strategy program while heading the web team at the Congressional Budget Office, worked as a senior staff writer at FactCheck.org, as a writer with the Mack/Crounse Group, and taught as an assistant professor of philosophy at the University of North Carolina—Pembroke and the United States Military Academy. He received his PhD in political philosophy from the University of Virginia, his MA in philosophy from Virginia Tech, and his BA in philosophy from Hampden-Sydney College.

Posted in Opinion
3 comments on “Semantic Content
  1. […] WonkComms, I make the case for smarter content that helps Google answer questions rather than just offer up links to pages. Much of that piece was […]

  2. josephbarnsley says:

    Excellent piece Joe. Content to answer questions is a great challenge

  3. s tidak mungkin bagi perusahaan untuk mengabaikan godaan internet .
    Dengan gabungan dari PHP dan LAMP klien mendapatkan kegiatan bisnis online mereka .
    Sebuhah desain inovatif merebut perhatian maksimum
    pelanggan sasaran .

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: