While on a routine bout of CS meandering, I discovered a strange phenomenon on the Google Scholar page of a VP at Google Research. Expecting a standard academic homepage, I was instead greeted by a brief period of loading with a warning about a non-HTTPS site, after which I arrived at “Online Pharmacy” offering male enhancement products with the venerable slogan “Reliable online pharmacy”...sometimes. You’d go to different sites each time you clicked “Homepage”, with slightly varying logos and text.
Deepening the intrigue was the fact that, when the “Homepage” hyperlink was copied and pasted, the academic site would load as expected with no issues. And when one clicked the hyperlink again from the Google Scholar page after doing this, there was no sign of the fraudulent pharmacy site. This behavior narrowed the root cause to something that is remedied or at least obfuscated by browser caching of the proper site. Owing to the changing scam URL at each click, it couldn’t be the result of a DNS-based attack, pointing to a fault on Google Scholar’s side, something causing the site to bounce, or the site’s hosting being compromised. Analyzing the site behavior from Google Scholar to Online Pharmacy with browser inspection tools yielded no insights and the referral chain seemed unremarkable.
Loading was protracted when going to the scam site vs. the intended page, about three times slower. Signs pointed to mischief server side. The nameserver appeared innocuous and was hosted by GoDaddy, resulting in a normal response when requesting the URL. What could the difference possibly be, then, that resulted in going to the pharmacy site? As it turns out, it was Google Scholar, but not in the way you might think. Here is what the raw request looks like when including the referrer as Google Scholar:
Trying [IP]...
Connected to [realsite.com].
GET / HTTP/1.1
Host: [realsite.com]
Referer: https://scholar.google.com/
HTTP/1.1 200 OK
Date: Mon, 05 Feb 2024 08:11:15 GMT
Server: Apache
X-Powered-By: PHP/5.6.40
Vary: Referer,Accept-Encoding
Upgrade: h2,h2c
Connection: Upgrade
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
91
<html><head><meta http-equiv=“refresh” content=“0; url=https://[fraudsite.com]/search.html?key=cialis&t=p181_basrel&dummy=best”></head></html>
We can presume the Google researcher did not want his website redirecting to a Cialis storefront. The compromised server was reported to GoDaddy. This cloaking attack specifically targeted Google Scholar users, and wouldn't be evident to those without that specific referrer information sent to the server. This is a textbook example of a cloaking attack, wherein a “‘bait-and-switch’ technique [is] used to hide the true nature of a Web site by delivering blatantly different semantic content to different user segments” (Voelker, 2011).
Thanks to cybersecurity doyens Stefan Savage and Alex Gantman for their insights.