Photo by Stephen Francis from Pexels — Photo by **Stephen Francis** from **Pexels**

Fashion vs Anti-Surveillance Function: Can Stylish Masks Block Facial Recognition?

November 15, 2020

When it comes to privacy, it seems not all masks are sewn equal. While we’re deep in COVID’s 3rd wave in the United States, we’re getting renewed reminders of why wearing masks is not only critical for public health and safety, but also the added protection they have inadvertently brought to our privacy, especially in protest settings.

Are some fashion face masks better at preventing the detection of a face by a facial recognition system than others? How can we tell?

The Hunch:

As someone who designs surveillance-confounding fabrics I’ve felt blessed that, as we’ve found out from studies conducted to test the efficacy of existing systems, a facial-recognition confounding mask is mostly any mask properly worn on your face. And we know already from experiments run by NIST that color and coverage appears to matter:

“Wearing face masks that adequately cover the mouth and nose causes the error rate of some of the most widely used facial recognition algorithms to spike to between 5% and 50%, a study by the US National Institute of Standards and Technology (NIST) has found. Black masks were more likely to cause errors than blue masks.”

So we’re off to a good start with these results regarding black masks vs blue. But can I corroborate these results myself? What other colors or patterns might matter, especially if I’d like to pick a fashionable mask to match my favorite outfits?

Can we gather other information that might help people who aren’t computer vision researchers to infer what’s going on, so we can make decisions about how to better our chances of protecting ourselves from image recognition systems as we shop online?

Setting Up The Experiment:

I went to the website of one of my favorite custom clothing makers, and looked at their beautiful and varied mask selection. I gathered up about 400 photos of different styles of mask, including some that only cover the nose and mouth, some that covered the neck, some that were contoured, others pleated, many colors, and featured on models of different races and genders.

One benefit of working with a dataset of product images for retail clothing is that these photos are extremely consistent. For each of the models, the lighting, poses, and styling are similar, if not identical. Only the mask is typically the variant, so we have a lot of factors that are consistent from image to image. This will ideally help individual styles of mask stand out.

I used the instructions to run these 400 images through the OpenFace facial detection model in Python using this tutorial. I chose to use a pre-trained model and not train my own to better simulate the kinds of pre-trained systems that we are often most likely to encounter out in public spaces. Open source systems can be similar or even the exact same ones implemented by makers of both policing systems, and commercial technologies.

In tagging my test images set, I took note of general qualities of the image, like the mask style, which model it was on, the style of mask, whether the mask was light or dark toned, which I defined as more than 50% of the pattern being darker or lighter than mid-tone, or halfway between black and white.

The Results:

Of the 400 images run through a pre-trained model of OpenFace, a face was detected 25% of the time. This poor performance of this model in detecting faces with masks on is reflective of the same kinds of problems cited in the above NIST study.

This has us off to a good start but, confirming my hunch, it seemed some colors and patterns of masks appeared consistently were more permissive of facial detection than others. I’m going to show some below and let you see if you as the human can find a pattern between them.

If you’re guessing that the masks that were lighter in color, and generally more likely to match a lighter-skinned person’s skin tone were more consistently detected by OpenFace, you were right.

In my experiment, models wearing lighter-toned masks made up 78% of the images that registered as containing a face.

Another 5% of the face-detected set were false positives, or when an object is detected that is not a face but a piece of hair or a fold of a shirt, or a small piece of fabric pattern. The dark-tone masked models were least likely to be detected by OpenFace, making up a scant 13% of all images that registered as having a face in them at all.

So, what might be going on here?

The Hunch, Part 2 (Electric Boogaloo):

Let’s revisit an important concept in color theory that can help us abstract what might be happening here. Colors are described as having a pure “hue” but adding black, white, or grey to that color creates different tints, tones, and shades as you can see below:

Many models of image recognition rely on finding contrast in an image to help define where a feature is, which then can be used to try and identify features similar enough to the training set to confirm that a new image contains (in this case) a face. Let’s look at what happened with some models photographed in the same pose, and same style mask, but different colors:

Certainly these masks are lighter in color, but we should check in particular why the blue, green, and red masks were still not considered light enough colors to detect as faces. We know black masks are very effective in blocking recognition, but it would be helpful to have a notion of why other colors can work too. So to check out the tones of these colors, we’re going to turn the whole image black and white.

When the image is desaturated, we can see that some colors register as closer to our skin tone, or degree of lightness versus darkness, even if they are not the same hue. We could take a guess that having the entire face area be close enough to the tone the facial recognition system expects to see, might allow it to feel confident enough to detect a face even if the features that register to us as humans like a nose or a mouth are missing.

This has important implications for what we know to be the existing racial biases in training sets and models used for facial recognition technologies. If perhaps most of an image is made up of a tone expected to be seen on those typically white faces in a training set, the detection system may be more likely to see a face as well.

As I look to try similar at-home mini experiments in the future, I hope to find images of wider varieties of models, with a wider range of skin tones, ages, and hair styles, not just different color masks. It’s my hope that I could use these images to better assess what kind of masks may better work to block recognition of different kinds of people. If you find a data set of masked models that are a better representation of the public than the typically mono-racial and majority male ones we find online, and that may help in this kind of anti-surveillance research, please reach out at the contact page.

In order to help choose the mask that may be most likely to help you, especially as we can see how personal this is, you can do the following experiment at home.

How to pick a mask that may help you block facial recognition:

tl;dr version: Odds are you are looking for a mask that is likely not a lighter color, especially as these image recognition systems are, especially in their implementation in surveillance and commercial systems, more racially biased towards detecting lighter-skin faces.

The color you choose, regardless of hue (even if it’s blue, green, etc) should be different enough from your personal skin tone that the tones do not match in grayscale.

A great way to check whether a color matches your skin tone is to take a photo of your face with the mask on with your phone, open the image, then use your phone’s editing software to turn it black and white. In most phones this will be in the “Saturation” setting, where you can drag the toggle to black and white.

Remember: you’re looking to ensure that when both your skin and the mask are in grayscale, they don’t seem to match too closely. This isn’t a 100% guarantee no system can ever read in your face. But you can double-check your effort by opening Instagram, Snapchat, or even the camera in your phone to see if the built-in facial-detection functions have trouble identifying your masked face.

Screencaps of how to desaturate a photo in iOS

If you want extra insurance for your facial recognition-blocking intentions at the next protest you attend: add some sunglasses and a hat. Always great to be able to protect your health, your skin, and your privacy at the same time.

Photos by Robin Sloan as featured in “Your Phone Wasn’t Built for the Apocalypse” by Ian Bogost, 2020

Eyes on the Apocalypse: the Struggle of Humans and Computers to See Wildfire Skies

October 18, 2020

In Ian Bogost’s Atlantic piece “Your Phone Wasn’t Built for the Apocalypse” he describes the difficulty in photographing the accurate color of the otherworldly orange skies blanketing the West Coast after devastating wildfires. This phenomenon is also a reminder that human eyes weren't built to fully account for the apocalypse either, as our brain works very hard to correct for the different colors of light cast on our world.

Many West Coast residents found themselves having issues capturing an “accurate” photo of the intensity of the sky, as their phone camera kept attempting to automatically correct the color to something more reflective of the usual hues expected in a human world. This is due to cameras correcting for something called white balance, where “color casts” or the color of the light shone on scenes and objects, is removed so that objects that would appear white in our world appear white in the photo.

As Bogost describes, “Camera sensors are color-blind—they see only brightness, and engineers had to trick them into reproducing color using algorithms… Most cameras now adjust the white balance on their own, attempting to discern which objects in a photo ought to look white by compensating for an excess of warm or cool colors.” As frustrating as this aspect of digital cameras can be, they can show a very interesting challenge for living beings to handle in seeing under various environmental conditions.

Human vision has its own version of a "white balance" where (as we think we understand it so far) the brain will dial how much of certain wavelengths of light it chooses to perceive so that the color spectrum still makes sense even if say, the light cast on it is tinted orange or yellow. The tendency humans have to be able to refer to colors we see correctly regardless of the color of light cast on them is referred to as color constancy.

This effect can be seen in illusions created to show the persistence of the brain in attempting to perceive the full spectrum, like this edited photo of a strawberry tart. Would you believe that there are actually no red or reddish pixels in this image? By using the appearance of blue tint on the image of berries, our brain believes that by comparison, the grey pixels in this image must be red.

Image of strawberry tart edited to appear as if a bluish green tint is on the grey pixels of berries, making them appear reddish.

The illusion gets even stronger when looking at the same image edited to simulate multiple other colors of light. Indeed in each of these images, the pixels of the strawberry are still grey.

Color constancy demonstrations with strawberry images by Akiyoshi Kitaoka - full-sized pdf image

While a blown out white-balance function can be a negative quality to have in camera technology we’re attempting to use to accurately document the color of light after a wildfire, this ability in the human brain is incredibly useful. It ensures we can see color and differentiate objects regardless of the cast of light set on them at different times of day or under different weather conditions like the orange sky of a natural disaster.

If color constancy intrigues you, there are several more examples you can test out for yourself by watching this fascinating talk by researcher Beau Lotto on what optical illusions have to teach us about how we see:

So while we find flaws like an overactive white balance in our cameras to be frustrating, examining how the same behavior can be found even in biological vision might help us appreciate the intentionality and care we have to use in trying to “objectively” observe our changing world.

A person with glasses and glitter on their face, progressively pixellated to lower resolution, photo by Maria Eduarda Tavares

Three Things I Didn't Know About How Computers See

July 20, 2020

In working on projects and advocacy to help protect our privacy from surveillance, I realized pretty quickly it would help me to learn more about the fundamentals of the technology that underpins some of the fastest-growing areas of AI’s effect on our rights: computer vision, or how computers see.

I’ve decided that as I work to learn more about how computers see, I wanted to share what I found that was new to me, or familiar concepts that I didn’t know played such an important role in such a little-understood technology. I think if you’re like me and you care about these issues, you likely feel that knowing more about how it actually works instead of feeling like these systems are an all-powerful, omniscient black box technology can help us be both practical and tactical in our work to combat surveillance.

This piece is going to restate some things that are fairly obvious to the person who works with artificial intelligence or computer vision for a living. But I realized as I worked to become literate in these concepts so I could find exploits for them that almost all of the foundations of this technology were easier to understand than I assumed they would be.

Before working hands on with computer vision myself, I presumed such things were the domain of technological wizardry that requires a PhD in computer science to understand. It turns out the workings of these systems were simple enough that I could not only understand them with no such degrees, but develop an understanding deep enough to inform my own point of view on our questions around how to regulate these technologies to protect our privacy.

1. Computer vision is functionally just math.

Science fiction has constructed in our cultural imagination images of artificial intelligence, supercomputers, androids with very human-like eyes, maybe seen through the kind of cameras we use in our daily lives on our phones or in our homes. Our imagining of the outputs these cameras is an image that would be processed as a whole scene with context as it appears to us on our computer screens, like a photograph or video. That is after all, how we humans see.

As I’ll get into in a later post, how we see is actually more interesting and complicated than even this “camera” and “photograph” analogy. But regardless of our “looking through a camera” style of human experience, computers actually “see” pixel by pixel, row by row, experiencing this whole line of bits of a picture as a long list of numbers.

How do computers do this when pixels appear to us as colors, and numbers are numbers? Well each pixel on say, our computer screen, has very small electronic cells with a tiny bunch of colored lights packed together in sets. This might be 3 component colors for something called RGB or Red Green Blue color screens. This might be more if the screen colors are instead a schema like CMYK or Cyan, Magenta, Yellow, Key (black).

This is what the close-up of an RGB screen often looks like:

Close up of pixels on a screen showing the red, green, and blue components

As we can see, these pixels are each made of 3 lights: red, green, and blue. So when we want to tell the screen which lights to keep lit, a file in the computer holds 8 bits of information in a list to tell what lights have to be on and how brightly to create the impression of a particular color on the color wheel. These lists of also numbers tell each pixel when to light up or not over time, and in what order.

When looked at from a sufficient distance from the screen, our human eyes combine those 3 or more components in to something closer to the color, like orange or blue, we expect to see.

Close up view of lit up RGB pixels showing an orange color over close up view of RGB pixels showing a blue color

Pretty neat optical illusion, right? But since computers do not have biological eyes, they don’t get to enjoy the fruits of their light-show labor. They (for now) only “see” those 8 bit numeric values for each pixel in a given picture, like the pixels that make up this pigeon:

A computer knows a picture of a pigeon by the numeric value for the color and intensity of the very first pixel, then the pixel next to it, then the pixel next to it, and so on in each row after until all of the values for each are stripped off in a long list of numbers. When I give a computer a picture of say, this pigeon to look at, I have really given it a list of numbers with the word “pigeon” on it.

Picture of a pigeon wearing a “bread necklace”

Unfortunately for computers, a different image of the same pigeon with a bread necklace will have some numbers added to the list that represent the piece of bread, making things a bit more complicated for them. This means this second list will be different from the first even though this one is, as I have again told the computer, also a picture of a pigeon.

If a computer wants to compare these two pictures, it can do some surprisingly basic math (our good old friends addition and subtraction) to say: are the contents of these two lists the same? Certainly in this case they will not be, as there is a very bread-shaped difference between the numbers. And so the computer says no, these are not the exact same image.

But what if despite being different pictures, and two different lists, I want the computer to tell me whether they both might still contain a pigeon despite the addition of bread? How does that work?

2. The math used in image recognition can be done without a computer.

Computer vision is very dependent on the notion of what a computer “expects to see” based on its prior experience with other images, and some rules we try to give it to develop its own logic of what makes two pictures the same or different.

Image recognition is how the computer tries to tell whether the list of numbers that represents the pixels of one image are similar enough to the list that represents the pixels of another. For our two pigeons, not all of the numbers are the same or in the same order, but the numbers representing the pigeon’s heads and wings can probably be found in a group near each other on both lists. So this might help give us a clue that they both contain a pigeon. How do I find areas of the list that are important enough to compare to the lists that make up other pictures?

The process of comparing the two lists of numbers that represent two images is certainly made easier with the aid of a computer, but I was surprised to find out that the math used to compare these lists could hypothetically be done by a person. This kind of calculation can even be completed with the same kind of algebra that many of us learned in high school.

Let’s take a look at one way we try to teach computers how to see, identifying Haar-like features:

Left image: an “ideal Haar-feature” example, a 4 by 4 block of squares showing 8 white squares on the left, next to 8 black squares on the right. Right image: a sample real set of pixels shown as a 4x4 block showing 16 squares with greyscale values … — Left image: an “ideal Haar-feature” example, a 4 by 4 block of squares showing 8 white squares on the left, next to 8 black squares on the right. Right image: a sample real set of pixels shown as a 4x4 block showing 16 squares with greyscale values from 0.1 - 0.9, with lighter values on the left, and darker values on the right.

These two images represent a kind of math used to tell in some kinds of ex: facial recognition where there’s a feature on the picture we might need to note, so we can expect to see it in any matching pictures, like eyes, a nose, or a hairline. “Features” are designated (in this particular method) as areas of high contrast between light or dark pixels. Looking for this difference in “pixel intensity” in a kind of greyscale saves time; we don’t have to compare all the different colors that might come up in our different images.

You might imagine that math used to find features in a picture would be complex. But to find a feature using this method it just means clipping a section of a picture, adding all the values of the light pixels, adding all of the values of the dark pixels, and then subtracting the light from the dark pixels.

A “feature” in this method is any section of the picture where the difference between the dark and light intensity pixels is as close to 1 as we prefer. I then have an ideal number value for a feature I would “expect to see” if looking for them in another image.

There are many ways to calculate, using math as advanced as calculus or as foundational as arithmetic, whether these features and intensities are similar in new images I’m evaluating. Some algorithms are very dependent on finding and matching these high-contrast areas of images, while others lean on other ideas of what math we could use to see an “expected” feature based on pixel color, nearness to other pixels that suggest an edge of an object or its texture, or distance from other pixels of a different feature.

For example, this Tuna Scope app not only looks for certain intensity of pixels in a photo, but also color and placement to be able to tell if a cross-section of a tuna tail is as close as possible to what it would expect to see in high grade tuna.

Screenshot of the Tuna Scope image processing app for tuna tail section grading, indicating that the demo photo has a “B” grade.

But our world often gives us more concern about the impact of image recognition on people than on sushi. So let’s look at low resolution picture of someone’s face. It might be more of a pain to hand-calculate ex: Haar features and compare them to other similarly-sized photos on this image than our 4x4 pixel sample. But we could still do it with a pen and paper if given enough time.

Pixelated face of Abraham Lincoln, showing intensity values for each pixel from light to dark.

Recognize this person? While I’m sure that a computer would have a bit more of a difficult time matching this former president to other images with a high degree of certainty, as humans we don’t have a hard time seeing how something even this low-resolution could potentially with the right context be personally-identifying.

As you can imagine, smaller resolution fuzzy pictures contain fewer pixels to help the computer to make comparisons, resulting often in mistaken matches. Indeed, even on high resolution images with lots of pixels to compare, facial recognition is found in many current applications to be less effective in matching Black people, elderly or very young people, and often women. The math being used on all pictures is hypothetically the same, but the pictures we gave the computer to teach it what to expect in an image often contain mostly white people, and mostly men. This impacts how easy or hard it is for the computer to “expect to see” a person who doesn’t closely match the people in the training photos.

Table comparing how race and gender affects how much the threshold for two pictures being alike must be lowered to successfully register as a match in certain facial recognition systems, from Algorithmic Justice League’s Facial Recognition Primer.

These examples are what makes say, the common-sense policy of banning facial recognition use by governments something that must be written carefully in text of those laws. If I can do “facial recognition” technically by hand, is the policy I wrote phrased as if I were just attempting to ban someone from doing a certain kind of math? Could that then be harder to enforce than I expect, require unforeseen exceptions, or could a bad actor side-step regulations using this technicality?

We have historical evidence that trying to use policy to ban math technically anyone can do, like encryption, doesn’t work very well. It certainly doesn’t tend to really get at the problems we’re trying to solve in use and abuse of image recognition technology.

Any laws we write, from my perspective, must be durable to how “use math to compare two numbers representing a real world thing” technologies operate. They must preserve people’s rights to privacy and due process regardless of what math I use to calculate anything about a person or their life, no matter if it is simple or complex math. Most importantly, those rights should be preserved whether this math in practice is riddled with errors or perfectly correct.

We would hope any computer system built to do those high-stakes calculations on our behalf should also be able to talk to us and give us a reasonable idea of how it came to a conclusion that say, a picture of a person committing a crime allegedly matches a different person’s driver license photo.

And that leads me to the third very important thing I learned about how computers see:

3. We’re not really sure what parts of a picture computers are looking at when they match images to each other.

Yes, you read that right. When I first started running image recognition processes on my computer to create clothing that triggers automated license plate readers, I was sure my first task would be to figure out what the creators of that system had designated as being important to the algorithms that decide if a picture is indeed of a license plate. Surely they told the algorithm to prioritize certain fonts of license plates, shapes of letters, distance apart they are on a standard plate.

After all, don’t people who build facial recognition tell the computer to look for something that has features of say: eyes, a mouth, a nose all in the right distance from one another?

As it turns out, those creators definitely do not tell the computer what to look at specifically, but try to train it to figure out what to expect to see based on prior experience. Creators of these systems show a computer lots of license plates or faces, sometimes millions of them. They hope that by seeing what the statistical relationship between those sample pictures was, the computer will have a model to learn on its own what features it should be expecting to see in any new pictures.

In researching for my projects, I found out that not only do the researchers who create these image recognition systems not tell the computer what look for while doing that training process, being able to tell what features helped a computer decide two images are the same is an open area of research.

Examples of image classification errors from Wolfram Research’s Image Identification Project

This is because computers can’t talk back to us yet to tell us what they were looking at and why, often because the volume of calculations is so enormous and high-frequency, it wouldn’t make much sense to read as a human. So there are entire fields of research trying to figure out how to get these systems to show or tell us what on earth they were thinking when they decided images matched each other.

Indeed this is one of the major reasons issues like racism in facial recognition systems are so frustrating. Computers cannot tell us directly why they mismatch Black people’s faces more frequently than other people, what logic led to those erroneous conclusions. These areas of research are as a result both timely and critical as governments and corporations push ahead and install these error-prone systems in every aspect of our society.

If you’re not a computer scientist, as I am not, from where you sit image recognition systems probably carry with them a veneer of futuristic technology, a black box filled with mystery. You might have the same assumption I did that under the hood, they must be using components that someone might have to be a genius to understand.

I am sure this impression of complexity is true of many of the technologies we interact with, but it seems image recognition is not one of them. As I work to grow my understanding of computer vision systems, I look forward to continuing to share things I come across that might make it easier for us to decide where image recognition should or should not fit into our world.