This paper proposes an optimal policy learning method for multi-player poker games based on Actor-Critic reinforcement learning. Therefore, designing an effective optimal policy learning method has more realistic significance. However, it is not clear that playing an equilibrium policy in multi-player games would be wise so far, and it is infeasible to theoretically validate whether a policy is optimal.
Poker has been considered a challenging problem in both artificial intelligence and game theory because poker is characterized by imperfect information and uncertainty, which are similar to many realistic problems like auctioning, pricing, cyber security, and operations.