机器学习与数据科学博士生系列论坛(第四十三期)—— Two-sided Matching Markets with Stochastic Multi-armed Bandits

摘要:
The model of two-sided matching markets has a wide range of applications including labor employment, house allocation, and college admission. It has been studied for several decades since the seminal work of Gale and Shapley, 1962. A key requirement of any solution to the matching problem is stability, which ensures the market participants have no incentive to abandon the current partner. Conventional economic literature usually assumes the full preferences of players are known a priori, which is not realistic in many real-world applications.
To incorporate the uncertainty of players' preferences, recent works consider a framework of multi-player multi-armed bandit problem, where the players and the arms form the two sides of the market, and each side has preferences over the other side, which may be unknown to the agents and need to be learned through iterative interactions.
In this talk, we would first introduce the definition of stable matching and the classic Deferred Acceptance algorithm. The connection of two-sided matching and the multi-armed bandit problem is then presented and the notion of stable regret will be introduced. Finally, we would investigate several recent works which develop practical algorithms for online matching markets under either centralized or decentralized settings.