I’ve Seen Things w/ John Mueller

I recently had the pleasure of being a host on a virtual session with John Mueller, Google’s Search Advocate, during the SEO Charity second event – SEPT25 SEO for Paws 2, 2024. An online conference aimed at helping small cat and dog shelters in Ukraine. I also edited John’s session into a separate video:

Known for his sense of humor and insightful advice, John shared some fascinating and sometimes bizarre issues that Google’s team has encountered. Below are some of the specific stories and lessons he shared. It’s interesting that some notifications that they sent through Google Search Console were created manually.


The Search Relations Team

Before diving into the stories, John provided some background on his team at Google. The Search Relations team focuses on helping websites get the most out of search. They achieve this through:

  • Documentation: Extensive guides and resources for webmasters.
  • Search Policies: Guidelines to ensure fair and effective search results.
  • Videos and Events: Educational content and community engagement.
  • Direct Outreach: Contacting site owners when indexing or crawling teams encounter unusual problems.

Anecdotes

1. The European Consent Redirect

A European website added a Data Privacy Information page to comply with regulations and decided to show it to all users. As a result, they redirected everyone who visited the site to a page called “Consent.”

  • Issue: Redirecting all traffic to the consent page caused the entire website to become inaccessible to search engines.
  • Lesson: Be cautious with site-wide redirects; they can prevent search engines from accessing your content.

2. The Empty Burger Wrapper

A burger restaurant displayed an image of an empty burger wrapper on missing pages but returned a 200 OK status code instead of a 404 error.

  • Issue: Search engines indexed these pages, mistaking them for valid content.
  • Lesson: Ensure missing pages return the correct 404 status code to prevent indexing of non-existent content.

3. Default Server Page Dilemma

A new server was set up with the default placeholder page.

  • Issue: Search engines crawled and indexed this default page, considering it duplicate content seen on other sites.
  • Consequence: The website was crawled less frequently, delaying the indexing of actual content.
  • Lesson: Replace default pages with custom content or leave the server empty until ready.

4. Robots.txt Snippet Confusion

A server blocked Googlebot IP addresses specifically for the robots.txt file.

  • Issue: Google couldn’t access the robots.txt file and assumed it couldn’t crawl or index any pages.
  • Consequence: Only the “robots.txt snippet” appeared in search results.
  • Lesson: Ensure robots.txt is accessible to search engines to avoid unintended blocking.

5. JavaScript Canonical

A JavaScript framework-based website added server-side rendering.

  • Issue: The rel="canonical" tag output was {Object Object} due to a bug.
  • Consequence: Search engines couldn’t properly index the pages.
  • Lesson: Always test rendered content to ensure such tags are correctly implemented.

6. Adult Links to Gov Website

An official website had its homepage blocked in robots.txt and had many odd inbound links.

  • Issue: Adult sites linked to it using adult-related anchor text.
  • Consequence: Search snippets displayed inappropriate content when users searched for the official site.
  • Lesson: Don’t block important pages via robots.txt; it can lead to unintended search appearances.

7. Accidental Site Removal

An OK website wasn’t showing up in Google search results.

  • Issue: The site owner accidentally submitted a site removal request while trying to set up canonicalization between www and non-www versions.
  • Consequence: The entire site was removed from Google’s index.
  • Lesson: Be cautious with site removal tools; they can have sweeping effects.

8. Server Error Status Code

A self-made website returned an HTTP 500 server error status code for all pages.

  • Issue: Browsers displayed the content, but search engines saw server errors.
  • Consequence: The site wasn’t indexed.
  • Lesson: Ensure your server returns the correct status codes.

9. DNS Blocking Googlebot

Another self-made website wasn’t being indexed by Google.

  • Issue: Their DNS server blocked Googlebot.
  • Consequence: Google couldn’t access the site at all.
  • Lesson: Check DNS configurations to ensure accessibility for search engines.

10. Testing Subdomain Troubles

A site owner used a separate subdomain for testing.

  • Issue: Redirected everyone to the testing subdomain, which served an HTTP 400 bad request status code.
  • Consequence: Search engines couldn’t index any content.
  • Lesson: Avoid redirecting users to testing environments and ensure proper status codes.

11. Infinite Scroll Indexing Issues

A news website implemented infinite scroll with multiple articles loading as users scrolled.

  • Issue: Search engines indexed pages containing content from multiple articles.
  • Consequence: Titles and snippets in search results didn’t match, causing confusion.
  • Lesson: Prevent search engines from triggering infinite scroll to ensure content is indexed correctly.

12. The Mysterious .edu Links

An adult website noticed it had inbound links from .edu domains when checking Search Console.

  • Issue: The links originated from an ancient PDF hosted on a university website.
  • Cause: The PDF was a scanned document with dust speckles that, when OCR (Optical Character Recognition) was applied, were misinterpreted as URLs pointing to the adult site.
  • Consequence: The adult site appeared to have authoritative .edu backlinks.
  • Lesson: no lesson here, just funny example.

13. JavaScript Crypto Miner Woes

A website added a JavaScript crypto miner to their pages.

  • Issue: Googlebot attempted to render the page and run the miner, causing it to run out of memory.
  • Consequence: The content wasn’t loaded or indexed.
  • Lesson: Avoid adding resource-intensive scripts that can hinder page rendering.

14. Stale Pre-Rendering on COVID Site

An unofficial COVID-19 site used server-side rendering and cached pre-rendered content.

  • Issue: Cached content from early March 2020 was never updated.
  • Consequence: Search engines displayed outdated information.
  • Lesson: Regularly update cached content, especially for time-sensitive information.

15. YouTube CAPTCHA Block

YouTube began showing a CAPTCHA to Googlebot.

  • Issue: Googlebot was asked, “Are you a bot?” preventing it from crawling YouTube pages.
  • Consequence: Reduced indexing of YouTube content.
  • Lesson: Ensure that essential resources are accessible to search engines, even on large platforms.

16. JavaScript Subdomain Blocked

A site used subdomains to serve JavaScript files.

  • Issue: The JavaScript subdomain was blocked by robots.txt.
  • Consequence: Search engines couldn’t render pages properly.
  • Lesson: Allow search engines access to all resources needed for rendering.

17. The Mexican Restaurant Mystery

A Mexican restaurant served an HTTP 400 bad request status code for all pages.

  • Issue: Googlebot couldn’t access any content.
  • Humorous Note: John’s team noted, “Googlebot is not able to get black beans from your website.”
  • Lesson: Correct status codes are crucial for site accessibility.

18. Geo-IP Redirects Gone Wrong

An international website used geo-IP redirects and a country picker.

  • Issue: The country picker was blocked in robots.txt, and all requests were redirected there.
  • Consequence: Search engines couldn’t access any regional content.
  • Lesson: Ensure essential navigation pages are crawlable.

Final Thoughts

It’s fascinating to peek behind the curtain and see the unique challenges that can arise in the world of SEO. A big thank you to John Mueller for sharing these insightful and entertaining anecdotes. His dedication to educating and engaging with the community continues to make the world of SEO a better place.