November 21, 2019

How search and content recommendation algorithms work

How search and content recommendation algorithms work

This post originally appeared in the November 21, 2019 issue of The Content Technologist with the email subject line "How algorithms work" and a review of SEO tool Morningscore.

You are correct to be skeptical of algorithms. Proprietary algorithms control the content we see and have determined how we interact with the internet.

Our minds have a symbiotic relationship with algorithms, to the point that we’re near-cyborgs. You enter phrases into a search bar in a specific way because the algorithm has taught you that you will get better results by using specific nouns and adjectives. Algorithms change to meet your search behavior. You're right: it’s weird.

We use content recommendation algorithms because they can process an immense amount of information and if you’re like me, you’re overwhelmed by too much choice. Algorithms organize that vast amount of information for you so you don’t have to exist in the paradox of choice. In the faux MBA jargon that all the Serious Businesspeople speak now, they scale quickly.

The algorithms are proprietary because the theory goes: they’re less likely to be manipulated. They’re intellectual property that provides a service no one else can. We can probably argue that, but let’s just accept that intellectual property is also a weird thing. It’s all weird.

Snuggie infomercial dance [gif]
Intellectual property is super strange. Case in point: there is only one Snuggie.

What are organic vs. paid algorithms?

On Google search results pages, there are organic results from Google’s massive index of the entire web. They’re called “organic” or “natural” search, and they’re what we’re concerned with here. You cannot pay and you have never been able to pay to manipulate who appears on top of those search results. They’re based on the Google algorithm, which is the type of algorithm we’re discussing here. SEO describes the industry that optimizes for these types of organic results.

There are also ads on search results pages, which are paid placements that are clearly marked as ads. They have their own algorithm, the Google Ads algorithm. I have a preliminary understanding of how that paid algorithm works but that’s irrelevant here. We’re talking organic, non-paid content recommendation. More clarification can be found here.

How does a content recommendation algorithm, like Google’s SEO algorithm or Facebook’s or Spotify’s algorithm, work?

Why I am qualified to explain content algorithms: I’ve published content on the internet for 21 years and have had a deep obsession with figuring out how to build an audience. I’ve been working deeply in SEO for six years, half of those at a top-tier performance marketing agency that provided amazing resources, colleagues and training in digital analytics and measurement in all channels. I deeply understand and have a healthy but critical view of digital content measurement. I see software demos all the time and ask a lot of questions about proprietary algorithms to help me understand what I’m seeing. I have assisted in building recommendation algorithms for multiple clients. Knowing how content algorithms work is a core part of my strategic business, whether those algorithms are social, search or other sorts of computer-assisted decision-makers.

Why I am not qualified to explain algorithms: I haven’t taken a math class since high school. I have never taken a computer science course or statistics and couldn’t tell you how those types of classes would improve my understanding of algorithms anyway. I know the bare minimum of code and that’s only front-end appearance stuff.

So. There's your grain of salt. Now let’s do this.

We're ready to look at the computer [gif]

How organic content recommendation algorithms are built

  1. Start with problem that needs to be solved — more often than not, it’s “I can’t find the X that I want and I know it’s out there.”
  2. Break down the problem and make a list: “These criteria will help you find the best X.” In analytics world, we call those criteria dimensions of the original problem.

    For example, if you want people to find music so people can dance to it, you’re going to create a dimension called tempo. Right now, we’re going to make a dimension called style, meant to highlight content that is particularly unique and has a strong voice.
  3. Devise some ways to assign numbers to that dimension. In this case, let’s assign numbers and structure the abstract human concept of style so it can evaluate a webpage:
  • The variety and breadth of vocabulary
  • The cadence or patterns of punctuation and sentence length
  • The amount of unique or original phrase combinations
  • The number of outside experts who would also say the webpage was “stylish”*

    The above are our ranking signals. (Are they called ranking signals in other non-Google algorithms? Not going down that rabbit hole right now, but that's what I call them because my launchpad is always search.)

    *Awwww yeah, this is the problematic one! Who's to say who the experts are and why are they important? More about this type of signal, known as Authority, within the next couple of weeks.

4. Assign those ranking signals weight and order. Which ones are most important to the concept of style? If someone has a breadth of vocabulary or makes up their own words, do they have more style than someone who coins creative uses of the same words? Solely from the words they spoke on their early Food Network shows, did Emeril have more style than Rachael Ray? Your algorithm can now decide!

And then program it into a computer, using your if/then statements and logic and adding varying levels of complexity. I say this like it’s easy, but we all know it’s not.

Emma Stone says, "That's maaaaaath!" on SNL.

You could feed your algorithm datasets like Food Network shows or the archives of Gawker, and it would come up with style, based on your very human considerations of how style operates and how you've weighed and structured them.

Congratulations! That's it! It’s oversimplified but that’s it. An algorithm is a mathematical model for solving a human problem.

If algorithms are so easy, how come SEO is a thing?

In mathematical terms, Google has a literal fuckton of ranking signals that continue to evolve. Many of those ranking signals are based on the words users input into Google — search queries — and the results they actively use in Google. How people use technology is constantly evolving, so those human-driven ranking signals evolve for Google as well. To add extra complication, the Google algorithm enables machines to learn from human communication and actions, so it’s a bit of a mess.

To add to that, there are all kinds of technical ranking signals SEO folks are trying to figure out: structure, site speed, security. Seriously, there are fuckton of very complex ranking factors. But they aren't unknowable.

SEO professionals spend an inordinate amount of time trying to figure out what those ranking signals are and how to show a website off to the algorithm, like a pretty peacock shimmying for a mate.

A nutso black-and-neon bird calls a lady over and does a hoppy little mating dance. [gif]

Google gives clues about how they adjust the search algorithm based on how people are using it, but the algorithm itself is off limits. SEO is an industry built on solving that mystery. Oh yeah, it's weird. But hey! There's always another mystery to solve.

I don't care for a lot of what Google does as a company. But I do fundamentally trust organic search to sort the information that’s available out there. I couldn’t live without this algorithm, even if I didn’t work in the industry.

Are search algorithms manipulated?

Humans make algorithms; computers didn’t just birth them on their own. That would be terrifying.

Humans have biases. So yes, algorithms are manipulated and reflect existing human biases. Google tweaks search results only when they are considered harmful (i.e., placing a suicide hotline at the top of self-harm-related queries.)

I’ve worked at and studied media companies. I’ve worked in and studied digital marketing. I have a strong understanding of how both of industries work. I’ve never worked at Google, but I have spent a lot of time evaluating ranking signals and results. And I can safely say: There is no “black magic” or shady human manipulation in Google's organic search algorithms. You can’t call up Google and ask for favors the way you can just call up, like, Ukraine. If you have a Google rep, they are working with paid ads or factual errors in local business listings and not organic search.

Big companies can devote more resources to understanding the Google algorithm because they have more resources! But that doesn’t mean big companies will always win... one of the reasons I love the organic algorithm is that ancient but informative websites rank highly all the time because they have the best information!

Last week’s WSJ article on search caused an uproar in the search industry because it didn't make an honest attempt to understand how the algorithms worked. The reporters misquoted experts and forced algorithms into a narrative of conspiracy. They understood “black box” as “black magic,” even though it’s a vastly different metaphor. They let their personal and business biases determine the facts and framing of their story. But, like the Google algorithm, the Wall Street Journal is not required to disclose or admit those biases.

Why do we assign algorithms so much power?

We’ve chosen computers to make these decisions because people have trouble processing more than a few criteria at a time. Computers process immense amounts of information quickly and because, in the words of Cady Heron/Tina Fey, math is the same in every language. But that doesn’t mean that they’re free of bias or history.

Algorithms are intensely, fundamentally human. Humans decide the criteria of what algorithms consider “good” or “bad,” and they approve the results before sending them off into the world. The problem with algorithms is the human part: the brain is a crazy thing full of thoughts and emotions and omissions and justifications.

We assign algorithms an objectivity that simply doesn’t exist. If it doesn’t exist in people, with all their emotions and perceptions and biases, objectivity certainly doesn’t exist in an algorithm that was made by a person. Using proprietary algorithms to make recommendation in complex social and emotional situations like law enforcement and healthcare is immensely inadvisable for a number of reasons, but humans are excited to get difficult decisions off their plates?

Most content algorithms are not wholly different from the concept of news judgment in journalism or principles of art critique or a company’s declared values or heck, the fundamentals of global democracy. Algorithms are just a framework that evolves over time and operates within the culture where they’re established.

I haven’t addressed more complex factors like machine learning or authority, or how people try to “hack” the Google algorithm. There’s still plenty to talk about. But understanding how algorithms are made are part of today’s media literacy.


How algorithms work | SEO | structured content | entity | content intelligence


Want more Content Technologist in your inbox every Thursday? Forever free for the first 1,000 subscribers.