Multi-agent systems in artificial intelligence and their interpretation in terms of game theory

Multi-agent systems in artificial intelligence and their interpretation in terms of game theory

The multi-agent system is an operating environment in which two or more rational agents interact; it is also called the multiple agent system.


Advertising message The scope of multi-agent systems is increasingly linked to the application area of ​​artificial intelligence (AI). Consequently, the question of how to face and govern situations in which different artificial intelligence systems operate in the same environment becomes relevant. Three examples are mentioned: a) autonomous trading, various independent computational systems – generally operating on behalf of different organizations – autonomously decide the volume to buy or sell within a certain market; b) autonomous cars that, on behalf of different users, travel the same roads to reach their destinations according to a safe and efficient criterion; c) autonomous robots used in logistics for the movement of goods operating inside a warehouse,

Thus the question arises as to how a set of autonomous AI agents can operate simultaneously in the same context (physical or virtual). Unpleasant interactions can arise: for example, in the case of autonomous cars simultaneously circulating in a city, each of them will try to occupy the common space; get to destinations along the same roads according to a criterion of safety and efficiency; etc.

To simplify the analysis, we assume that the multiagent system consists of only two autonomous AI subjects, which we indicate respectively as A and B.

Regarding the nature of the interactions between the two AI agents, we typically consider two strategies: “Don’t Cooperate” and “Cooperate”. If the strategy is not to cooperate, as in the previous example, an inefficient result is generated: traffic jams (transaction costs in terms of lost time), accidents, overcrowding in common areas, etc.

In a two-dimensional map, we assume that two robot agents move to pursue the same goal. We assume that the two autonomous agents are rational, according to neoclassical economic theory, that is, they pursue the aim of optimizing their objective function (maximization of utility, maximization of profits, minimization of costs, and so on).

Context factors, some of which are institutional and / or established by the legislator / regulator, are decisive. Typically, there is a privacy obligation (GDPR, Regulation (EU) 2016/679): this means that the two autonomous AI agents cannot reveal their data, that is, they cannot communicate. It follows that they cannot agree on a joint strategy to follow which would lead to the optimal result (i.e. the best for both, which has the characteristic of being Pareto-efficient).

Through a cross-fartilization between disciplines, we find ourselves in a classic situation of “prisoner dilemma”, in the context of Game Theory. Following the methodology, the two rational and autonomous agents of the IA are called player A and B, and the results obtained by each – indicated with a numerical value, generally a monetary value or a level of utility (in the theory of cardinal utility) – are called payoffs. The type of interaction between them generates a specific payoff.

We also assume that the two players make their moves – that is, interact – simultaneously (“one-shot game”) and cannot communicate with each other.

In the “prisoner dilemma”, the story with the related payoffs proposed by the police is known. The police do not have enough evidence to convict the two subjects of a certain crime that they have committed, and therefore – by closing them in two separate cells so that they cannot communicate – he offers them the following alternative strategies (where the payoffs are the years of prison ):

The two strategies are therefore: C = Confess, which is the non-cooperative strategy towards the accomplice; NC = Do not confess, which is the cooperative strategy between the two inmates.

The game is represented in normal or strategic form through a matrix (2 × 2) whose number of rows and columns is given by the number of strategies available to the player. In each cell, the first payoff refers to player A, the second to B.

For each of the two, the aim is to minimize one’s sentence, that is, the optimizing strategy. Consequently, the rational strategy of this game for both is Confess (C, C) because each prisoner does not know which strategy will choose the accomplice. They will be sentenced to 5 years in prison.

Game Theory predicts that there is only one balance. The one in which the two accomplices do not cooperate with each other, and therefore confess (C, C). Since the payoff pair that results from their interaction is consequently (5, 5), the solution is inefficient, albeit rational from the point of view of each.

In fact, both would have been better if they had adopted a cooperative strategy, Non Confessando (NC, NC): they would have had only 1 year in prison for firearms.

From the point of view of the design and implementation of AI systems, the simplest – but also the most inefficient – solution would be to leave the interactions between these systems uncoordinated and not governed, with obvious consequences on reliability and performance (Amigoni, 2020 ). That is to say, adopt a non-cooperative strategy.

Forms of coordination that lead to cooperation between AI systems operating in the same environment therefore appear necessary (Amigoni, 2020).

To achieve this, additional elements are introduced, which are called multiagent path planning or multiagent path finding in AI multi-agent systems.

Among the numerous approaches proposed to tackle multiagent path finding, an exogenous mechanism is used below, such as the introduction of “social conventions”, that is, “rules” of coordination.

Advertising message In the case of autonomous cars, the problem is therefore to plan the routes for all agents so that, when the autonomous cars follow these routes, all reach their destinations starting from their initial positions, without collisions and that a certain objective function is optimized, such as using the shortest route (Amigoni, 2020).

Also this time, the introduction and the result of rules of behavior can make use of Game Theory as an analysis methodology. The game is again represented in normal or strategic form through a matrix (2 × 2).

According to the rules, i.e. the Highway Code, the player who comes from the right (D) takes precedence: in our example, player B.

The strategies available to each player are: F = Stop; P = To pass.

The convention, if the rules of the road are respected by each player / driver, is that the agent coming from the right takes precedence. Its payoff is accordingly (-2, 0).

It is interesting to note that the coordination mechanism used in Game Theory explains the rationale (in the philosophical, economic and evolutionary sphere) of the birth of social institutions and, today – with the pervasiveness in our daily life of artificial intelligence – also one of the solutions multiagent path planning in the increasingly widespread context of multiagent systems in the field of AI.