An article posted in ExtremeTech is interesting, not only for the implications for unmanned systems, but also as an example of how a technology developed for one purpose can be used for something else entirely.
As noted in this blog, worldwide aging populations have generated interested in “social welfare” unmanned systems. These robots will assist the elderly with bathing, taking medications, cooking, and other Activities of Daily Living (ADL).
Programming every single ADL step is tedious, so the obvious option is to have the robots “learn” them. As described in the article below, Artificial Intelligence currently needs large data sets that have been labeled or otherwise processed in order to “infer functions,” i.e. learn.
The folks at the RoboWatch project claim that they can enable robot learning by having them watch numerous You Tube instructional videos. Presumably, no labeling or processing is required.
ExtremeTech points out that this development could be combined with work on “real-time video summarization,” a technique designed to automate surveillance by detecting behaviors that are deemed suspicious. RoboWatch’s project involves analyzing videos for universal steps that are essential in a process, while “video summarization” identifies anomalous actions that are correlated with criminal behavior. It is easy to see how these two methods could work together.
In other words, a method for teaching robots how to cook eggs could one day be used to identify terrorist suspects. This example proves just how difficult it is to predict the impact of innovations.
What happens if RoboWatch’s learning method is applied to non-instructional videos? What if robots try to “learn” by watching action movies? Sexually explicit films? Political debates? People have trouble distinguishing reality from the fiction they see on televisions. How will this affect robots?
Will the future ushered in by advanced robot learning techniques be a dangerous one? I have no idea, but I have a feeling things are going to get weird.
ExtremeTech article below:
Astute followers of artificial intelligence may recall a moment from three years ago, when Google announced it had birthed unto the world a computer able to recognize cats using only videos uploaded by YouTube users. At the time, this represented something of a high water mark in AI. To get an idea for how far we have come since then, one has only to reflect on recent advances in the RoboWatch project, an endeavor that is teaching computers to learn complex tasks using instructional videos posted on YouTube.
That innocent “learn to play guitar” clip you posted on your YouTube video feed last week? It may someday contribute to putting Carlos Santana out of a job. That’s probably pushing it; it’s more likely that thousands of home nurses and domestic staff will be axed long before guitar gods have to compete with robots. A recent groundswell of interest in bringing robots into the marketplace as caregivers for the elderly and infirm, in part fueled by graying population bases throughout the developed world, has created the necessity for teaching robots simple household tasks. Enter the RoboWatch project.
Most advanced forms of AI currently in use rely upon a branch of supervised machine learning, which requires large datasets to be “trained” on. The basic idea is that when provided with a sufficiently large database of labeled examples, the computer can learn to recognize what differentiates the items within the training set, and later apply that classifying ability to new instances it encounters. The one drawback to this form of artificial intelligence is that it requires large databases of labeled examples, which are not always available or require much human curation to create.
RoboWatch is taking a different tack, using what’s called unsupervised learning to discover the important steps in YouTube instructional videos without any previous labeling of data. Take for instance a YouTube video on omelet making. Using the RoboWatch method, the computer successfully parsed the video on omelet creation and catalog the important steps without having first been trained with labeled examples
Color code activity steps and automatically generated captions,
all created by the RoboWatch algorithm for making an omelet.
It was able to do this by looking at a large amount of instructional omelet-making videos on YouTube and creating a universal storyline from their audio and video signals. As it turns out, most of these videos will contain certain identical steps, such as cracking the eggs, whisking them in a bowl, and so on. When presented with enough video footage, the RoboWatch algorithm can tease out what the essential parts of the process are and what is arbitrary, creating a kind of archetypal omelet formula. It’s easy to see how unsupervised learning could quickly enable a robot to gain a vast assortment of practical household know-how while keeping human instruction to a minimum.
The RoboWatch project follows similar advances in video captioning pioneered at Carnegie Mellon University. Earlier this year, we reported on a project headed by Dr. Eric Xing, which seeks to use real-time video summarization to detect unusual activity in video feeds. This could lead to surveillance cameras with the built-in ability to detect suspicious activity. Putting these developments together, it’s clear unsupervised learning models using video footage are likely to pave the way for the next breakthrough in artificial intelligence, one that will see robots entering our lives in ways that are likely to both scare and fascinate us.