AI bluffs the best
An artificially intelligent system has trounced the world’s top poker players for the first time.
A significant mark in the rise of AI, a system called Libratus has racked up over $US1.7 million ($2.2 million) worth of chips against four top professional poker players in a 20-day marathon poker tournament in the US.
Machines have already beaten human masters in chess, checkers, and most recently in the ancient game of Go, but poker is a different game.
Texas Hold ‘Em poker is an imperfect information game – betting strategies play out over dozens of hands, and involves guessing, operating blind and bluffing.
“The best AI's ability to do strategic reasoning with imperfect information has now surpassed that of the best humans,” said Tuomas Sandholm, professor of computer science at Carnegie Mellon University.
Dr Sandholm created Libratus in collaboration with PhD student Noam Brown.
“The computer can't win at poker if it can't bluff,” said Frank Pfenning, head of computer science at CMU.
“Developing an AI that can do that successfully is a tremendous step forward scientifically and has numerous applications.
“Imagine that your smartphone will someday be able to negotiate the best price on a new car for you. That's just the beginning,” he said.
Libratus relied on three different systems working together, relying largely on a form of AI known as reinforcement learning – essentially extreme trial-and-error.
Libratus was able to learn poker from scratch, playing millions of games against itself at an almost constant speed.
It then applied an algorithm called counterfactual regret minimisation, playing at random before go back over literally trillions of hands of poker to work out how it could have done better.
This allowed the system to challenge humans in ways they had never seen - playing a much wider range of bets than normal and randomizing its bets to stop rivals from guessing what cards it holds.
During the actual competition, a second system analysed the state of play and focused the attention of the earlier systems.
This allowed Libratus to come up with an “end-game solver” - detailed in a new research paper by Sandholm and Brown – that meant it did not have to go back over all the possible scenarios it had seen in the past, but could run through just the important ones.
The third system was designed specifically to stop human players from finding patterns in the machine’s play.
The algorithm identified any patterns formed over a day’s practice in the lab and removed them.
By constantly randomising and obscuring its processes, the computer learned to bluff.
It is interesting to wonder what might happen if this superhuman bluffing ability was applied to financial trading, or international diplomacy.
Libratus is an upgrade on a previous AI system named Claudico.
In a 2015 battle against Claudico, human players racked up more than $US700,000 ($923,000) over 80,000 hands, winning almost every day of competition.
In the 2017 match with Libratus, the four human players won just five out of 20 days.