HUGE leak of Google search documents reveals the inner workings of the ranking algorithm


A slew of leaked Google documents have given us unprecedented insight into Google Search and revealed some of the most important elements Google uses to rank content.

What happened. Thousands of documents, which appear to come from Google’s internal Content API Warehouse, were published on March 13 on Github by an automated bot called yoshi-code-bot. These documents were shared with Rand Fishkin, co-founder of SparkToro, earlier this month.

  • Read on to find out what we learned from Fishkin, as well as Michael King, CEO of iPullRank, who also reviewed and analyzed the documents (and plans to provide a more in-depth analysis for Search Engine Land soon).

Why we care. We got an overview of how Google’s ranking algorithm works, which is invaluable for SEOs who can understand what it all means. In 2023, we got an unprecedented look at Yandex Search’s ranking factors via a leak, which was one of the biggest stories of that year.

This leaked Google documents? This will probably be one of the biggest stories in the history of SEO and Google search.

What’s inside. Here’s what we know from the internal documents, thanks to Fishkin and King:

  • Current: The documentation indicates that this information is accurate as of March.
  • Ranking Features: 2,596 modules are represented in the API documentation with 14,014 attributes.
  • Weighting: The documents do not specify how the ranking characteristics are weighted – they simply exist.
  • Twiddlers: These are reranking features that “can adjust a document’s information retrieval score or change a document’s classification,” according to King.
  • Downgrades: Content may be downgraded for various reasons, such as:
    • A link does not correspond to the target site.
    • SERP signals indicate user dissatisfaction.
    • Product reviews.
    • Location.
    • Exact match domains.
    • Porn
  • Change history : Google apparently keeps a copy of every version of every page it has indexed. This means that Google can “remember” every change made to a page. However, Google only uses the last 20 changes of a URL when analyzing links.

Connections matter. Shocking, I know. The diversity and relevance of links remains essential, the documents show. And PageRank is still very present in Google’s ranking features. The PageRank of a website’s home page is taken into account for each document.

Successful clicks count. This shouldn’t be shocking, but if you want to rank well, you need to continue creating quality, document-based content and user experiences. Google uses various metrics, including badClicks, goodClics, last clicks longest And Clicks not overwritten.

Additionally, longer documents may be truncated, while shorter content gets a score (from 0 to 512) based on originality. Scores are also assigned to Your Money Your Life content, such as health and news.

What does all this mean? According to King:

  • “(Y)ou need to drive more successful clicks using a broader set of queries and get more link diversity if you want to continue ranking. Conceptually this makes sense because very strong content will do this. By focusing on driving more qualified traffic towards a better user experience, you will send a signal to Google that your page is worth ranking.

Documents and testimony from the US antitrust lawsuit against Google have confirmed that Google uses clicks in ranking – particularly with its Navboost system, “one of the important signals” that Google uses for ranking. Learn more about our coverage:

Brand matters. Fishkin’s big takeaway? The brand matters more than anything else:

  • “If there was one universal piece of advice I would give to marketers looking to dramatically improve their organic search rankings and traffic, it would be: “Create a notable, popular, well-recognized brand in your space, outside of Google search. »

Entities matter. Fatherhood lives. Google stores author information associated with content and attempts to determine whether an entity is the author of the document.

Site Authority: Google uses something called “siteAuthority”.

Chrome data. A module called ChromeInTotal indicates that Google uses data from its Chrome browser for ranking.

Whitelists. A few modules indicate that Google is whitelisting certain election and COVID-related domains – isElectionAuthority And isCovidLocalAuthority. Although we’ve long known that Google (and Bing) have “exception lists” for when “specific algorithms inadvertently impact websites.”

Small pitches. Another feature is smallSiteStaff – for a small personal site or blog. King speculated that Google could boost or demote these sites through a Twiddler. However, this remains an open question. Again, we don’t know for sure how heavily these features are weighted.

Other interesting discoveries. According to internal Google documents:

  • Freshness matters – Google looks at dates in the signature (parlineDate), URL (syntactic date) and the content of the page (semanticDate).
  • To determine whether or not a document is a central topic of the website, Google vectorizes pages and sites and then compares page embeddings (siteRadius) to site integrations (siteFocusScore).
  • Google stores domain registration information (Registration information).
  • Page titles still matter. Google has a feature called titlematchScore this is believed to measure how well a page’s title matches a query.
  • Google measures the weighted average font size of terms in documents (average weight) and anchor text.

The articles.

Fast precision. There is some debate as to whether these documents were “leaked” or “discovered”. I was told it was likely that the internal documents were accidentally included in a code review and livestreamed from Google’s internal codebase, where they were later discovered.

The source. Erfan Azimi, CEO and Director of SEO at digital marketing agency EA Eagle Digital, posted a video claiming responsibility for sharing the documents with Fishkin. Azimi is not employed by Google.



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top