“Specifically, we’ll start crawling HTTPS equivalents of HTTP pages, even when the former are not linked to from any page,” Google said in its announcement. “When two URLs from the same domain appear to have the same content but are served over different protocol schemes, we’ll typically choose to index the HTTPS URL.”
Google has been a frontrunner in promoting the Electronic Frontier Foundation’s and Tor’s HTTPS Everywhere extension, which offers a secure connection for web requests where none exists. It was also among the first to offer HTTPS by default for Gmail and most of its online services.
It also responded quickly to revelations made in the Snowden documents that the National Security Agency was tapping connections between its overseas data centers by encrypting those critical connections.
Yesterday’s announcement does hinge on a handful of conditions, Google said:
- It doesn’t contain insecure dependencies.
- It isn’t blocked from crawling by robots.txt.
- It doesn’t redirect users to or through an insecure HTTP page.
- It doesn’t have a rel=”canonical” link to the HTTP page.
- It doesn’t contain a noindex robots meta tag.
- It doesn’t have on-host outlinks to HTTP URLs.
- The sitemaps lists the HTTPS URL, or doesn’t list the HTTP version of the URL.
- The server has a valid TLS certificate.
“Although our systems prefer the HTTPS version by default, you can also make this clearer for other search engines by redirecting your HTTP site to your HTTPS version and by implementing the HSTS header on your server,” Google said.