机器学习与数据科学博士生系列论坛(第四十六期)—— Markov Persuasion Processes and Efficient Reinforcement Learning

摘要:
In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align their long-term interest with the incentives of platform users. To study this problem in the sequential decision-making setting, a dynamic model of Bayesian persuasion called the Markov persuasion process is introduced. In this model, an informed sender observes an external parameter of the world and advises an uninformed receiver about actions to take over time. The sender cannot take actions directly, so he will exploit the information advantage to properly disclose information and persuade the receiver to act in a way such that the sender benefits most. 

In this talk, we first consider the computational problem of designing an optimal signaling scheme from the sender, which is tractable when faced with a myopic receiver while intractable when faced with a far-sighted receiver. Then we consider the online version of learning an optimal scheme where the prior distribution of the external parameter, the sender's utility function, and the transition dynamics are unknown to the sender. Applying the optimism-pessimism principle, we introduce an algorithm for designing robust signaling strategies while preserving sub-linear regret.