銆銆Demis Hassabis: So AlphaGo always tries to maximize its probability of winning rather than to maximize the size of the winning margin. So whenever we see it has a decision to make, it will always try to pick the more certain path… that it thinks is a more certain path to victory with less risk. So often in positions that’s what we see the tradeoff that AlphaGo is making is to decide about how certain it is about the margin of victory and how likely the probability of victory. David, if you want to add anything to that.
銆銆David Silver: So…it’s a very interesting question. The way AlphaGo works is as Demis said, it maximizes the probability of winning the game. This means that we program into AlphaGo a goal. That goal is in match what we really want it to do, which is to try and win games of Go. You could imagine other objectives being applied, such as maximizing the gap, the margin of victory, but this is not the objective that we chose for AlphaGo to play in the game of Go. So if you really focus on victory, then it leads to these behaviors where AlphaGo will try to win, and in doing so, it may give up a number of points in favor of actually just reducing any risks it may perceives, even if that risk seems to be very small.
·鍏氭瘏椋炪佽寖钄氳弫瑙ｆ瀽浜烘満澶ф垬 鏌娲 VS AlphaGo锛1锛
·鍏氭瘏椋炪佽寖钄氳弫瑙ｆ瀽浜烘満澶ф垬 鏌娲 VS AlphaGo锛2锛
·鍏氭瘏椋炪佽寖钄氳弫瑙ｆ瀽浜烘満澶ф垬 鏌娲 VS AlphaGo锛3锛
·鍏氭瘏椋炪佽寖钄氳弫瑙ｆ瀽浜烘満澶ф垬 鏌娲 VS AlphaGo锛4锛
·鍏氭瘏椋炪佽寖钄氳弫瑙ｆ瀽浜烘満澶ф垬 鏌娲 VS AlphaGo锛5锛
·鍏氭瘏椋炪佽寖钄氳弫瑙ｆ瀽浜烘満澶ф垬 鏌娲 VS AlphaGo锛6锛