Upper Tribunal Administrative Appeals Chamber

[2025] UKUT 319 (AAC)

Fecha: 11-Jun-2025

The FTT’s findings of fact

15.

Having heard evidence given by Clearview’s General Counsel, Mr Mulcaire, at the FTT hearing, the FTT made the following findings of fact about the Service:

“28.
The creation of the Database is, as described by the witness [Clearview’s General Counsel], achieved by the:
a.
copying (which is often referred to as “scraping”) of photographic images which have been published to the world at large on the public internet, i.e. without privacy controls being circumvented to copy the image;
b.
copying of additional information which relates to the photographic image such as a static URL (Footnote: ¹), a link to the social media profile and the name of the profile if the image was sourced from a social media profile;
c.
the separation of those images that do not contain an image of a face from those containing images of faces (the former being discarded) (Footnote: ²);
d.
sending of the additional information to be stored in a proprietary database called SpeedyDB;
e.
creation of a set of vectors for each facial image using [Clearview]’s machine learning facial recognition algorithm;
f.
sending of the facial vectors to be stored in a database called Neural Network Data Base (NNDB). Vectors of faces that are similar to each other will be stored closer within the digital space than vectors of faces that are very different to each other. This clustering facilitates the efficient provision of search results to clients. The process of clustering similar vectors together was referred to as “indexing” during the proceedings;
g.
sending the Stored Image itself to be stored in a cloud database of images hosted by a third-party service provider;
h.
the retention of any image uploaded by a client in order to perform a search on the system (the “Probe Image”) together with information that relates to the search such as its date and time. The Probe Images are not accessible to [Clearview] employees.
29.
The scraping process uses automated programmes that visit publicly available websites and copy the images they find regardless of whether they contain an image of a face. These programmes are known as “scrapers” and the task of visiting websites as “crawling”. A scraper may be website-specific, that means it is specifically tailored to visit one website and copy the images from that one site more effectively. An open scraper will crawl numerous websites as it copies the images from each site. The [Clearview] open web scraper collects the most images, the website specific scrapers are not deployed at all times.
30.
[Clearview] operates the open web crawler in-house but also uses contractors to provide scraped images.
31.
Website-specific crawlers are used for sites that host a lot of images and are likely to be of interest to [Clearview]’s clients.
32.
Websites may contain instructions within them that instruct web crawlers not to access them, such as robot.txt files. [Clearview]’s in-house open web crawlers will not scrape images from websites that have robot.txt files that do not authorise access by search engines. However, they also use results from external (outsourced) scrapers that are targeted at a single website; these scrapers do not abide by the instructions given by the robot.txt files. Such instruction will not prevent access without being accompanied by a preventative measure such as password protection.
33.
Scrapers can be designed to evade privacy controls, such as those that protect some types of private social media accounts but scrapers used by [Clearview] are not programmed to do this. So, if a page is password protected, [Clearview]’s scrapers (both in-house and external) will not be able to access that page.
34.
[Clearview] used to provide, to UK residents, a mechanism whereby a member of the public can request that their images are no longer used/stored by [Clearview] for the Service. This protection relied on positive action being taken by the member of the public.
35.
[Clearview]’s web crawlers are prevented by their internal instructions from accessing tens of thousands (Footnote: ³) of adult websites. Neither do they copy content from some large social media platforms such as Snapchat and TikTok. This is because of technical reasons, for example certain social media platforms use a programming language called JavaScript which presents technical challenges.
36.
A web crawler can be tasked to save the entirety of the web pages it visits. The web crawler used to compile traditional internet search engines or internet archives will do so, however [Clearview]’s scrapers copy only the image and additional information, not the entire page.
37.
The additional information that is collected with an image will depend on the source of the image and what has been attached to it. These pieces of additional information are forms of data collectively known as “metadata”. [Clearview]’s scrapers will also collect the following types of metadata with each copied image:

a static URL, (the internet source of the image);

any text snippet that accompanies the image on its internet source page (e.g., the title of an image);

a link to the associated social media profile if the image was sourced from a social media profile;

the name of that profile and the text of the profile’s description field;

any HTML meta element information which provides structured information about the source page;

any HTML “hover text” (also referred to as “hidden text”) associated with the image that appears when a mouse cursor hovers over that image;

the file extension of the image file;

the Multipurpose Internet Mail Extension (or “MIME”) of the image file (which indicates the nature and format of a document, file, or assortment of bytes);

a checksum hash of the image file (that is a digital data fingerprint of the image);

the image file’s width, height and file size;

any available exchangeable image file data (“EXIF”), which may include camera-specific information, such as shutter speed, model details, flash settings, colour, space, date, and time.

38.

[Clearview]’s scrapers only collect geolocation data, i.e. where a photograph was taken, if that image has retained the information within the EXIF data. This is because EXIF data is usually stripped away in the uploading process from the member of the public to the social media platform or other host site from which it is scraped. [Clearview] estimates that, in January 2022, 2% of the images on the database were accompanied by geolocation EXIF data based on a search of 3 billion images in the database. A previous estimate of 10% provided by the CEO in June 2020 was arrived at without such a search being carried out. It is also possible that a client can identify the location at which an image was taken from information stored in the webpage if they access the source of the image.

39.

[Clearview] has the capacity to identify and block the utilisation of images taken in particular locations if such information is specified within the EXIF data of the image. The company can also place a “geo-fence” around a location to prevent the creation of facial vectors from any images scraped from that location as revealed in retained EXIF data. Any such images are discarded after collection by the web crawlers. This is clear from the steps taken by [Clearview] after what was referred to as the “Illinois Settlement” of 4 May 2022 in which [Clearview] voluntarily:

Blocked all photos in the database that were geolocated in Illinois from being searched;

Constructed a ‘geofence’ around Illinois;

Decided that it will not collect facial vectors from images that contain metadata associated with Illinois; and

Decided that it will not collect facial vectors from images stored on servers that are displaying Illinois IP addresses or websites with URLs containing keywords such as “Chicago” or “Illinois”.

40.

[Clearview]’s Database contains billions of images. The size grows according to the number of images copied by the scrapers. In October 2022 it was estimated that the Database included over 20 billion images and increasing as new images are scraped. We were provided with an estimate of a growth rate of 75 million images per day.

41.

Indexing is related to the value of the vectors created. Each facial vector is represented by a long list of numbers that represent coordinates in a coordinate plane which is the final output of a multi-layered algorithmic process. Vectors that derive from similar faces will have similar coordinates nearer together in the coordinate plane, and therefore will be saved nearer to each other. The database is not arranged to enable identification of a person’s relatives or ethnicity. The algorithm focuses on what makes a person unique across different images and does not result in a significant family clustering effect. No index is kept of other objects in the Stored Image. The vectors created by [Clearview] are not transferable to another system, even though there are superficial similarities to software used to unlock phones or tablets and to other proprietary facial recognition systems. So, you could not take the vectors and input them to a phone or any other system to provide an image of the face in the photograph.

42.

If one of [Clearview]’s clients wishes to use the Service, they will upload a facial image of an individual to [Clearview]’s system, this is known as a Probe Image. The system will create vectors for the face in the Probe Image. These vectors are then compared to the vectors created from the Stored Images using a machine learning facial recognition algorithm with a view to delivering a match or matches to the client. The results of that comparison are delivered to the client as search results that show the Probe Image alongside thumbnails of any Stored Images that the system has identified as having sufficient similarity to it. The number of results is capped at 120 for each search due to technical reasons.

43.

The search results will include an assessment of the degree of similarity between each of the Stored Images returned by the search and the Probe Image, they will be presented in order of degree of similarity but no assessment of the accuracy of the matches is provided, the system does not indicate that the person in the Probe Image has been identified nor give a numerical percentage of confidence. The degree of similarity is represented by a coloured circle; a green circle indicates very close likeness between the vectors, whereas an amber circle would indicate a less strong likeness. The system does not say whether the images are of the same person, that decision is left to the client.

44.

On a test by the US National Institute of Standards and Technology, a globally recognised test for facial recognition accuracy, [Clearview]’s service achieved 99%+ accuracy statistics. The algorithm is designed to require a high level of confidence before matching a Stored Image to Probe Image and returning it as a result of a search. Thus, it will not return the best match if the quality of the match is not high enough to satisfy that level of confidence, even if it is the best match from within the Stored Images. In those circumstances there will be no matches returned by the system.

45.

The search results allow the client to select any of the thumbnails of the Stored Images. This will allow the client to see that image enlarged on screen together with the additional information including the URL. By using the URL the client may visit the internet page from which a Stored Image was copied/scraped.

46.

The client will see three buttons in the search results for each image that when clicked on function as follows:

“Download image” will download the image to the client’s computer;

“Copy site URL” will copy the URL into the client’s clipboard so that they may enter it into another document/system;

“Open site URL” will open that URL in a new internet tab.”