The GVG-AI Framework


Sample Controllers

This page contains information about the sample controllers included with the framework.

Sample Random Controller:
This is, as simple as it can get, a completely functional controller for GVG-AI. It just returns a random action to be executed in the next game cycle:

package random; //The package name is the same as the username in the web.

public class Agent extends AbstractPlayer {

    protected Random randomGenerator;

    //Constructor. It must return in 1 second maximum.
    public Agent(StateObservation so, ElapsedCpuTimer elapsedTimer)
    {
        randomGenerator = new Random();
    }

    //Act function. Called every game step, it must return an action in 40 ms maximum.
    public Types.ACTIONS act(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {

        //Get the available actions in this game.
        ArrayList<Types.ACTIONS> actions = stateObs.getAvailableActions();

        //Determine an index randomly and get the action to return.
        int index = randomGenerator.nextInt(actions.size());
        Types.ACTIONS action = actions.get(index);

        //Return the action.
        return action;
    }
}
Sample One Step Look-Ahead Controller:

The Sample One Step Look-Ahead controller implements a simple controller that evaluates the states reached within one move from the current state. The controller tries all available actions in the current state (call to advance), and evaluates the states found after applying eacch one of these actions. The action that took to the state with the best reward is the one that will be executed. From Agent.java:

package sampleonesteplookahead;
public class Agent extends AbstractPlayer {

    public Agent(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {}

    public Types.ACTIONS act(StateObservation stateObs, ElapsedCpuTimer elapsedTimer) {
        Types.ACTIONS bestAction = null;
        double maxQ = Double.NEGATIVE_INFINITY; //Variable to store the max reward (Q) found.
        SimpleStateHeuristic heuristic =  new SimpleStateHeuristic(stateObs);
        for (Types.ACTIONS action : stateObs.getAvailableActions()) { //For all available actions.
            StateObservation stCopy = stateObs.copy();  //Copy the original state (to apply action from it)
            stCopy.advance(action);                     //Apply the action. Object 'stCopy' holds the next state.
            double Q = heuristic.evaluateState(stCopy); //Get the reward for this state.

            //Keep the action with the highest reward.
            if (Q > maxQ) { 
                maxQ = Q;
                bestAction = action;
            }
        }
        //Return the best action found.
        return bestAction;
    }

The state evaluation is performed by the class SimpleStateHeuristic, when the method evaluateState is called. The following code shows how this method (from SimpleStateHeuristic.java) evaluates the given state. Note that it uses some of the methods described in the Forward Model, querying for positions of other sprites in the game.

public double evaluateState(StateObservation stateObs) {

    //Position of the avatar:
    Vector2d avatarPosition = stateObs.getAvatarPosition();                                   
    //Positions of all NPCs in the game:
    ArrayList[] npcPositions = stateObs.getNPCPositions(avatarPosition);     
    //Positions of all portals in the game:
    ArrayList[] portalPositions = stateObs.getPortalsPositions(avatarPosition);   

    //First, evaluate win/lose condition, which takes preference.
    double won = 0;
    if (stateObs.getGameWinner() == Types.WINNER.PLAYER_WINS) {
        won = 1000000000;
    } else if (stateObs.getGameWinner() == Types.WINNER.PLAYER_LOSES) {
        return -999999999;
    }

    //Count how many NPCs are in the game, and keep the distance to the closest one.
    double minDistance = Double.POSITIVE_INFINITY;
    int npcCounter = 0;
    if (npcPositions != null) {
        //Each one of the arrays from 'npcPositions' corresponds to a different type of NPC.
        for (ArrayList npcs : npcPositions) {
            if(npcs.size() > 0)
            {
                minDistance = npcs.get(0).sqDist;   //This is the (square) distance to the closest NPC.
                npcCounter += npcs.size();
            }
        }
    }

    //If there are no portals, return an score based on the score plus the information from the NPCs.
    if (portalPositions == null) {
        double score = 0;
        if (npcCounter == 0) {
            score = stateObs.getGameScore() + won*100000000;
        } else {
            score = -minDistance / 100.0 + (-npcCounter) * 100.0 + stateObs.getGameScore() + won*100000000;
        }

        return score;
    }

    //We have portals. Get the number of portals and distance to the closest one.
    double minDistancePortal = Double.POSITIVE_INFINITY;
    Vector2d minObjectPortal = null;
    for (ArrayList portals : portalPositions) {
        if(portals.size() > 0)
        {
            minObjectPortal   =  portals.get(0).position; //This is the closest portal
            minDistancePortal =  portals.get(0).sqDist;   //This is the (square) distance to the closest portal
        }
    }

    //Return the reward of the state based on the score of the game and the portals information.
    double score = 0;
    if (minObjectPortal == null) {
        score = stateObs.getGameScore() + won*100000000;
    }
    else {
        score = stateObs.getGameScore() + won*1000000 - minDistancePortal * 10.0;
    }

    return score;
}
Sample Genetic Algorithm Controller:

The sample GA controller implements an online (rolling horizon) genetic algorithm to decide the next move to make. At every game step, a (new) small population of individuals is evolved during the 40 ms given. Each individual represents a sequence of actions, and its fitness is calculated with an heuristic that evaluates the state reached after applying all these actions. The move returned is the first action of the best individual found.

The complete code for this agent is contained in a single class: Agent.java (although it uses an heuristic class, also in the framework: WinScoreHuristic.java). Here, we highlight some intersting parts of this agent.

First, the function that evaluates an individual, from Agent.java

/**
 * StateObservation stateObs: state of the game that can be copied multiple times and advanced with a given action.
 * WinStateHeuristic heuristic: heuristic used to evaluate a given state of the game.
 * int[] policy: genome of an individual of the GA, each value is an action.
 */
private double simulate(StateObservation state, WinStateHeuristic heuristic, int[] policy) throws TimeoutException {

    //First, check that we have not run out of time. We set 35 ms as maximum time, to avoid overtiming and being
    //disqualified. The exception is caught in the caller of this function, to deal with the end of the algorithm.
    long remaining = timer.remainingTimeMillis();
    if (remaining < 35) 
        throw new TimeoutException("Timeout");

    //Create a copy of the current state to simulate actions on it.
    state = state.copy();
    int depth = 0;
    for (; depth < policy.length; depth++) { //Go through every action in the individual.

        //Get the action according to the value in the genome 
        //action_mapping is a class variable that maps integers to values of type Types.ACTIONS.
        Types.ACTIONS action = action_mapping.get(policy[depth]);

        //Advance the state with the next action.
        state.advance(action);

        if (state.isGameOver()) break; //If the game is over, no need to keep applying actions of this individual.
    }

    //Calculate the score of the state reached after applying all actions of the individual, using the heuristic.
    double score = Math.pow(0.9, depth) * heuristic.evaluateState(state);
    return score;
}

Here is the code of the heuristic that evalates a given state, used at the end of the function described above, from WinScoreHuristic.java. If the game was won or lost, it returns a huge positive (resp. negative) number as fitness. Otherwise, it returns the score of the game at that state.

public class WinScoreHeuristic extends StateHeuristic {

    public WinScoreHeuristic(StateObservation stateObs) {}

    //The StateObservation stateObs received is the state of the game to be evaluated.
    public double evaluateState(StateObservation stateObs) {
        boolean gameOver = stateObs.isGameOver();       //true if the game has finished.
        Types.WINNER win = stateObs.getGameWinner();    //player loses, wins, or no winner yet.

        if(gameOver && win == Types.WINNER.PLAYER_WINS)       return 10000000.0;  //We won, this is good.
        if(gameOver && win == Types.WINNER.PLAYER_LOSES)      return -10000000.0; //We lost, this is bad.

        return stateObs.getGameScore(); //Neither won or lost, let's get the score and use it as fitness. 
    }
}

Sample MCTS Controller:

The sample MCTS controller is programmed in three different classes:

  1. Agent.java: Main class of the agent, that extends class AbstractPlayer. Implements a constructor and the act function.
  2. SingleMCTSPlayer.java: Implements MCTS and holds a reference to the root node, object of class SingleTreeNode.
  3. SingleTreeNode.java: Each MCTS node, implementing the typical MCTS operations (uct, expand, rollout, backpropagation, etc.).

Analogously, Sample OLMCTS also implements a Monte Carlo Tree Search controller, but in an open loop fashion. This means that the states are not stored in the tree, what behaves better in non-deterministic scenarios. The classes are analogous to those from the Sample MCTS controller.



- Go Home -