OpenAI’s latest machine vision AI is misled by handwritten notes

Researchers from the machine learning laboratory OpenAI have discovered that their modern computer vision system can be defeated by tools that are no more sophisticated than a pen and a booklet. As illustrated in the image above, it can be simple to write down the name of an object and paste it on another, to mislead the software into misidentifying what it sees.

‘We refer to these attacks as typographical attacks, ”OpenAI’s researchers write in a blog post. “By utilizing the model’s ability to read text robustly, we find that even photos of handwritten text can often mislead the model.” They note that such attacks are similar to ‘conflicting images’ that can mislead commercial machine vision systems, but much simpler to produce.

Conflicting images pose a significant danger to machine vision systems. Researchers, for example, have shown that they can trick the software into Tesla’s self-driving cars into changing lanes without warning, by simply placing certain stickers on the road. Such attacks pose a serious threat to a variety of AI applications, from medical to military.

But the danger posed by this particular attack is, at least for now, nothing to worry about. The OpenAI software in question is an experimental system called CLIP that is not deployed in any commercial product. Indeed, the nature of CLIP’s unusual machine learning architecture has created the weakness that makes this attack possible.

‘Multimodal neurons’ in CLIP respond to photographs of an object as well as sketches and text.
Image: OpenAI

CLIP is intended to investigate how AI systems can learn to identify objects without careful supervision through training in large databases of image and text pairs. In this case, OpenAI used about 400 million pairs of images and text deleted from the Internet to train CLIP, which was launched in January.

OpenAI researchers published a new article this month describing how they opened CLIP to see how it performs. They have discovered what they call ‘multimodal neurons’ – individual components in the machine learning network that respond not only to images of objects, but also to the accompanying text. One of the reasons why it is exciting is that it seems to reflect how the human brain responds to stimuli, where single brain cells have been observed that respond to abstract concepts rather than specific examples. OpenAI’s research suggests that AI systems could potentially internalize such knowledge as humans do.

In the future, this may lead to more sophisticated vision systems, but at the moment, such approaches are in their infancy. While anyone can tell you the difference between an apple and a piece of paper with the word ‘apple’ on it, software like CLIP cannot. The same capability with which the program can link words and images on an abstract level creates this unique weakness, which OpenAI describes as the “error of abstraction.”

Another example of a typographic attack. Do not trust the AI ​​to put your money in the piggy bank.
Image: OpenAI

Another example that the laboratory gives is the neuron in CLIP that identifies piglets. This component responds not only to pictures of piggy banks, but also to strings of dollar signs. As in the example above, this means that you could mislead CLIP into identifying a chainsaw as a piggy bank if you cover it with ‘$$$’ strings, as if it were half price in your local hardware store.

The researchers also found that CLIP’s multimodal neurons contain exactly the kind of biases you would expect to find when accessing data from the Internet. They note that the neuron for ‘Middle East’ is also linked to terrorism and discovered a neuron that shoots for dark-colored people and gorillas. ‘It repeats a notorious bug in Google’s image recognition system, which marked black people as gorillas. This is another example of how different machine intelligence is from that of humans – and why it is necessary to pull the former apart to understand how it works, before we rely our lives on AI.

Source