Imagine everyday activities such as driving, walking, and reading to be difficult or even impossible. An estimated 7 million adults living in the United States have visual impairments, according to the National Federation of the Blind.
Devi Parikh has developed a method that could potentially revolutionize the quality of life for low vision or blind individuals, earning a National Science Foundation (NSF) Faculty Early Career Development (CAREER) Award for her Visual Question Answering (VQA) research.
Parikh, an assistant professor in Virginia Tech's Bradley Department of Electrical and Computer Engineering, will use images to teach a computer to respond to any question that might be asked.
VQA provides a new model through which humans can interface with visual data and lend itself to applications like software, allowing blind users to get quick answers to aid them in accessing their surroundings.
Parikh and her team are building a deep, rich database that will mold a machine's ability to respond accurately and naturally to visual images. Given an image and a question about the image, the machine's task is to automatically produce an answer that is not only correct, but also concise, free form, and easily understood.
"To answer the questions accompanying the images, the computer needs an understanding of vision, language, and complex reasoning," said Parikh, the leader of the Computer Vision Lab at Virginia Tech.
The combined image of a road and the question, "Is it safe to cross the street?" the machine must judge the state of the road the way a pedestrian would, and answer "no" or "yes" depending on traffic, weather, and the time of day. Or, when presented with an image of a baby gleefully brandishing a pair of scissors, the machine must identify the baby, understand what it means to be holding something, and have the common sense to know that babies shouldn't play with sharp objects.
More examples of situations resulting in questions that might assist the visually impaired:
- In real-world situations -- "What temperature is this oven set to?"
- Aiding security and surveillance analysts -- "What kind of car did the suspect drive?"
- Interaction with the machine itself -- "Is my laptop in the bedroom?"
The computer must learn these lessons one question at a time -- a lengthy, painstaking, and detailed process.
With help from Amazon Mechanical Turk, an online marketplace for work, Parikh and her team will use the NSF award to continue collecting a large dataset of images, questions, and answers, which will teach the computer how to understand a visual scene. The publicly available dataset contains more than 250,000 images, 750,000 questions (three for each image), and about 10 million answers.
"Answering any possible question about an image is one of the holy grails of semantic scene understanding," said Parikh. "VQA poses a rich set of challenges, many of which are key to automatic image understanding, and artificial intelligence in general."
Teaching computers to understand images is a complex undertaking, especially if the goal is to enable the computer to provide a natural-language answer to a specific question. VQA is directly applicable to many situations where humans and machines must collaborate to understand pictures and images.
"This work can serve as a gentle springboard to computer vision and artificial intelligence in general," said Parikh, who has committed to improving the computer vision curriculum at Virginia Tech by introducing an emphasis on presentation and writing skills in her new Advanced Computer Vision course.
Parikh considers VQA a gateway subject into the field. Like science fiction, VQA captures the imagination of both technical and non-technical audiences, said Parikh.
Parikh's CAREER grant, which is the NSF's most prestigious award and is given to junior faculty members who are expected to become academic leaders in their fields, will bring her one step closer to fulfilling her long-term goal to enable machines to understand content in images and communicate as effectively as humans.
Parikh, who earned her Ph.D. at Carnegie Mellon University, joined the Virginia Tech in January 2013.
She is a recipient of the Army Research Office Young Investigator Program award, the Allen Distinguished Investigator Award in Artificial Intelligence from the Paul G. Allen Family Foundation, and three Google Faculty Research Awards.
Dedicated to its motto, Ut Prosim (That I May Serve), Virginia Tech takes a hands-on, engaging approach to education, preparing scholars to be leaders in their fields and communities. As the commonwealth’s most comprehensive university and its leading research institution, Virginia Tech offers 240 undergraduate and graduate degree programs to more than 31,000 students and manages a research portfolio of $513 million. The university fulfills its land-grant mission of transforming knowledge to practice through technological leadership and by fueling economic growth and job creation locally, regionally, and across Virginia.