“It turns out a big part of getting things done is making a phone call,” intones Sundar Pichai, Google’s CEO, as he introduces Google Duplex.
“We think AI can help with this problem.”
What millennium is Pichai living in that “a big part of getting things done is making a phone call”?
Last time I made an appointment I did it online, not by making a phone call. As Ian Bogost says, writing for The Atlantic, “One of the ironies of modern life is that everyone is glued to their phones, but nobody uses them as phones anymore.”
And no, Google’s AI didn’t just pass the Turing Test! The Turing Test requires an AI to successfully imitate a human in conversation without restriction on the subject matter, while Google’s Assistant imitated a human in a very constrained domain, setting up a haircut appointment.
This is not to say that Google’s achievement doesn’t represent an important one for artificial intelligence.
Google’s own blog points to the actual achievements of this system, which don’t include an implementation of artificial general intelligence:
Still, even with today’s state of the art systems, it is often frustrating having to talk to stilted computerized voices that don’t understand natural language. In particular, automated phone systems are still struggling to recognize simple words and commands. They don’t engage in a conversation flow and force the caller to adjust to the system instead of the system adjusting to the caller.
Google Duplex is “a new technology for conducting natural conversations to carry out ‘real world’ tasks over the phone.” It addresses closed domains only, which Google says are “narrow enough to explore extensively…. It cannot carry out general conversations.” So, no passing of the Turing Test, which requires that a human interrogator be fooled by a machine in unconstrained conversation.
The real breakthrough that Duplex represents is that it does apparently “understand natural language” in real (though limited) conversations with humans. This feat is made difficult for two reasons: (1) speech recognition of human natural speech is more difficult than speech recognition when a human knows it’s commanding a machine, and (2) understanding natural language requires keeping track of and analyzing the context of a conversation. Different words and sentences will have different meanings depending on the context in which they are embedded.
Duplex goes further than simply understanding what the human says to formulate a response that is dependent upon the state of the conversation. It also uses techniques like saying “um” to simulate human conversation and make the exchange feel more natural to the human on the other side.
I’m impressed, but still, like Gary Marcus, underwhelmed: