A fundamental challenge in imperfect-information games is that states do not have well-defined values. As a result, depth-limited search algorithms used in single-agent settings and perfect-information games do not apply. This paper introduces a principled way to conduct depth-limited solving in imperfect-information games by allowing the opponent to choose among a number of strategies for the remainder of the game at the depth limit. Each one of these strategies results in a different set of values for leaf nodes. This forces an agent to be robust to the different strategies an opponent may employ. We demonstrate the effectiveness of this approach by building a master-level heads-up no-limit Texas hold’em poker AI that defeats two prior top agents using only a 4-core CPU and 16 GB of memory. Developing such a powerful agent would have previously required a supercomputer.

Nguồn: https://zagran-tour.com/

Xem thêm bài viết khác: https://zagran-tour.com/game/

Am I correct in assuming that the blueprint if it was computed in just 700 hours was very very coarse-grained?

So the depth-limited real-time solving just works well even with this coarse-grained blueprint? Seems super powerful if that's correct.

Have you written about how you guys performed information abstraction in poker?

When doing monte carlo rollouts to estimate the values of playing the strategies after subgame leaf node, are you sampling player strategies or just chance nodes? Does this need to be done for every iteration of cfr?

When the opponent chooses an action that is not in the blueprint, you add it to the subgame. To estimate the values of the strategies after this subgame leaf node, you need to map it to a node in the blueprint. The size of the pot in the new subgame node will be different from the size of the pot in the blueprint node that it is mapped to. When evaluating the values of the strategies, is this new pot size "passed down" and used to when calculating the payoffs at the terminal nodes of the blueprint?

For example, The pot is 4 big blinds and the blueprint has a 2 big blind bet and a 4 big blind bet. You want to add a 5 big blind bet to your subgame. You'd map it to the 4 big blind node in the blueprint, and the strategies after the subgame leaf node would be the same strategies as 4 big blind bet. When calculating the values of the strategies, the 5 big blind bet may have different values than the 4 big blind bet, because the pot is larger at the terminal nodes. (Say after the bet, there is one more bet of half pot and then reaches showdown. The pot after the 5 big blind bet would be 4+5+ 4.5, whereas the pot after the 4 big blind bet would be 4 + 4 + 4)

edit* actually I think I understand now that you calculate the values as a percentage of the pot in the blueprint and then multiply that by the size of the pot in the subgame

I have watched the long presentation on microsoft too but I have some questions.

1) practically depth limited means that you are playing until the preflop and flop but not calculating for the turn and river!?

2) is there another difference between modicum and deepstack except that modicum is calculating to depth limited while deepstack is calculating till the end?

3) a new strategy for the opponent is a new bet size or new range that he plays?

In poker, how do you estimate the values that aren't in the blueprint strategy? The paper gave a simple and clean example where we had the value vs 0.5 psb and the value vs a 1 psb, and it took the average. Do you use linear regression for any size? So if the blueprint strategy has values for 0.25, 0.5, and 1, and we're trying to estimated value of a 1.5 psb, do you just use linear regression using all of the sizes that we have? Is it important to have many sises at each node to get accurate estimated values?

Can P2 choose between a mix of his leaf-node policies?

ID LOVE to see-hear more for modicum..a ii general for AI s ..the more people knows the more value for your field

These videos are incredibly helpful towards understanding your work! Thank you very much for that! 🙂

You should have more views. Thanks for this.