Facebook AI Gets Better at Describing Photos for Visually Impaired Users The social network rolled out an update to its automatic alternative text (AAT) technology.
This story originally appeared on PCMag
In an effort to better accommodate users who are blind or visually impaired, Facebook this week updated its automatic alternative text (AAT) technology.
The feature, introduced in 2016 (and granted the Helen Keller Achievement Award from the American Foundation for the Blind in 2018), relies on object recognition to generate descriptions of photos on demand.
Blind and visually impaired (BVI) users have long relied on individuals to tag images with alternative text, or screen readers to mechanically describe pictures on their News Feed. The next generation of Facebook's AAT, however, makes scrolling through social media much more enjoyable.
"The latest iteration … represents multiple technological advances that improve the photo experience for our users," according to a Facebook AI blog post. The team expanded tenfold the number of concepts AAT can reliably detect and identify, promising more photos with more detailed descriptions, including activities, landmarks, types of animals, and more.
If someone navigating their feed, for instance, stops at a photo of friends posing in front of a famous Italian tourist attraction, the audio caption might say something like "May be a selfie of two people, outdoors, the Leaning Tower of Pisa."
In an apparent industry first, Facebook even makes it possible to include details of positional location and relative size of elements in a picture. So instead of describing the contents as "May be an image of five people," the site can specify that there are two people in the center and three on the sides. Or, rather than describing a landscape with "May be a house and a mountain," it can determine that the summit is the primary object based on its comparable size.
Related: Elon Musk Tells Followers to Use Signal Messaging App Amid WhatsApp Privacy Update
"Taken together, these advancements help users who are blind or visually impaired better understand what's in photos posted by their family and friends—and in their own photos—by providing more (and more detailed) information," the blog said.
When it launched nearly five years ago, the first version of AAT used human-labeled data to train a neural network; the completed model could recognize 100 common concepts like "tree," "mountain," and "outdoors," and identify faces (with opt-in consent). "But we knew there was more than AAT could do," Facebook said, "and the next logical step was to expand the number of recognizable objects and how we describe them."
Now trained on weakly supervised data in the form of billions of public Instagram images and their hashtags, automatic alternative text is more accurate and culturally and demographically inclusive, able to perceive more than 1,200 concepts. "We want to give our users who are blind or visually impaired as much information as possible about a photo's contents—but only correct information," the company added.
Facebook subsidiary Instagram in 2018 took steps to become more accessible, embracing object recognition technology that automatically identifies items in a photo and creates an audible description. Users are also encouraged to write up to 100 characters of alt text detailing what's in their images.