Law.com reported today on the decision by Judge Robert Kelly, Jr., in Healthcare Advocates Inc. v. Harding Earley Follmer & Frailey, No. 05-3524 (E.D. Pa. July 20, 2007), the case in which trademark owner Healthcare Advocates sued the law firm that represented a rival company in prior litigation, for accessing archived versions of its Web site that were available on the Internet Archive's "Wayback Machine." Healthcare Advocates filed suit against the law firm and certain of its individual attorneys, as well as the Internet Archive (which earlier was dismissed from the litigation), when it learned that the law firm had viewed archived versions of its Web site on the Wayback Machine, and made printouts of those Web pages for use in the underlying litigation. Healthcare Advocates alleged that it had deployed a "robots.txt" file on its Web site to prevent access to the archived versions on the WayBack Machine, but the law firm bypassed that "technological measure" and succeeded in accessing the archived pages.
Judge Kelly dismissed Healthcare Advocates' claims under copyright law, the anticircumvention provisions of the Digital Millennium Copyright Act, the Computer Fraud and Abuse Act and state common law, ruling among other things that the law firm's access to and use of the archived Web pages in order to investigate the allegations of Healthcare Associates' underlying lawsuit against its client was permissible fair use. But the finding of fair use with respect to the copyright claims did not dispose of the DMCA anticircumvention claims or the Computer Fraud and Abuse claims.
A prominent feature of the DMCA anticircumvention portion of the opinion is the discussion of the role of the mundane and little-discussed "robots.txt" file in the archiving of Web sites, and the court's holding that in this case at least, a robots.txt file constitutes a "technological measure that effectively controls access to a work" within the meaning of the DMCA. What is a robots.txt file? It's a simple text file containing instructions to Web crawlers detailing which parts of a Web site is has permission to crawl and archive. A "robots.txt" exclusion file previously figured prominently in the reported opinion in Field v. Google, 412 F. Supp. 2d 1106 (D. Nev. Jan. 19, 2006), in which a Web site owner's failure to configure his robots.txt file to exclude search engine caching of his copyrighted content supported a ruling that the owner was estopped from asserting a copyright claim against the search engine.
As the Healthcare Associates v. Harding opinion relates in great detail, the Internet Archive periodically "crawls" the Internet and harvests screen shots of Web sites, and saves the results in the computer database that constitutes the WayBack Machine. The archived results, showing views of the archived Web site on a number of discrete dates in the past, can then be retrieved by entering the url of the Web site in a search box on the WayBack Machine Web site.
By placing a properly formatted and located robots.txt file on a Web site, the owner can control the crawling and archiving activities of the WayBack Machine, as well as other search engines that abide by the "robots.txt exclusion standard." Using the term standard may be a little strong, however, to the extent that it implies a formally adopted standard. The robots.txt exclusion standard, according to this Web site page authored by Martijn Koster, "represents a consensus on 30 June 1994 on the robots mailing list." That mailing list is now defunct, but the consensus has persisted. Wikipedia and other sources, including the Google search engine refer to this Web page as the authoritative source for the robots.txt exclusion standard.
AS related in the Healthcare Advocates opinion, the WayBack Machine also honors a robots.txt file retroactively. In other words, a Web site owner can retroactively block access to pages that were already crawled and archived by the WayBack machine by placing a robots.txt file on its site at any time. See the WayBack Machine explanation of retroactive blocking here.
And that's exactly what Healthcare Advocates Inc. did, deploying a robots.txt exclusion file on its Web site shortly after filing the complaint in the underlying litigation.
So how was the Harding Earley firm able to access the WayBack machine archive, after Healthcare Advocates deployed the robots.txt file? Healthcare Advocates claimed that the law firm "hacked" the WayBack Machine database in order to access the archive, despite the robots.txt "digital padlock" that it had deployed on its Web site. But the court found otherwise, based upon the consistent testimony of the experts presented by both the defendant and the plaintiff, that on the days that the law firm accessed the archive, the WayBack Machine's retroactive blocking functionality was not working due to a malfunction in the servers that controlled the blocking functionality.
Relying on dictionary definitions of the statutory terms terms "avoid" and "bypass," the court concluded that "[t]hese words, as well as the remainder of the words describing circumvention, imply that a person circumvents a technological measure only when he affirmatively performs an action that disables or voids the measure that was installed to prevent them from accessing the copyrighted material." According to the court, when the law firm accessed the archive, there was no protective measure to circumvent. The law firm "could not avoid or bypass any protective measure, because nothing stood in the way of them viewing these screenshots."
The court's finding that the blocking functionality was malfunctioning was fatal not only to Healthcare Associates' DMCA anticircumvention claim, it doomed its Computer Fraud and Abuse Act claim as well: "The Harding firm got lucky, because the servers were malfunctioning, but getting lucky is not equivalent to exceeding authorized access."
Judgment for the defendants.