A Versatile Genetic Algorithm System in Ada: Implementation and Applications
Max Edwards
This section introduces the basic genetic algorithm concepts, which are required in order to understand and use the Ada system presented in the next chapter.
Broadly speaking, genetic algorithms are computer programs that attempt to solve a problem by simulating "natural selection". This is the fundamental mechanism of Charles Darwin's Theory of Evolution1, which can be summarised as follows:
• In nature, a population of individual organisms constantly compete against each other for resources such as food, water, territory, shelter from the elements and predators, potential mates, and protection for their offspring.
• How well an individual competes in this way, that is to say its fitness, directly determines the likelihood of it's children being present in the next generation.
• An individual's ability to compete will be passed on to its children - this was Darwin's fundamental realisation. It is achieved via the organism's chromosomes, tiny packets of DNA along which millions of genes describe in minute detail the precise mechanisms for building an individual.
• Therefore, genes which cause an individual (the phenotype) to perform well in its natural habitat will become progressively more and more common in successive generations, so that the species as a whole will gradually become ideally suited to its habitat.
Genetic algorithms try to take advantage of this process by encoding potential solutions to a problem (the artificial phenotypes) as chromosomes, usually in the form of a string of digits encoding the solution's parameters, and maintaining a population of such chromosomes which are made to compete against each other. The breeding simulation is driven by certain key operators, algorithms which emulate specific aspects of the natural evolution process.
In order to simulate competition between chromosomes, some method of assessing a solution's performance is required, so that a judgement can be made as to which of the current population should be allowed to have offspring in the following generation of individuals. Such an assessment, which in the natural selection analogy plays the part of the unforgiving environment, is essentially a fitness function in the optimisation sense. Therefore genetic algorithms are inherently optimising strategies, and it is only possible to use them to solve problems which can be expressed as a fitness function acting on a potential solution.
1 First presented in Darwin's The Origin of Species, and later explored in fascinating detail in Richard Dawkins' The Selfish Gene.
Having said this, genetic algorithms have been widely used in applications not usually considered to be in the optimisation domain, such as image recognition, game playing, and other machine learning and "artificial intelligence" applications. It must be remembered though that in order for any computer system to learn to do anything competently, be it playing Noughts-and-Crosses or recognising handwriting, it must develop a set of rules defining its behaviour, and it is this fact which enables genetic algorithms to be used in such applications. A GA is able to learn by evolving a good set of rules, and assessing them in its fitness function, by for example using them in a game and seeing how long they are able to keep from losing. It is this re-stating of the problem, in the form of the optimisation model, that allows GAs to be used in a particular application, and unless a method is found of expressing a particular problem in this way there is no hope of using a genetic algorithm to solve it.
It cannot be stressed enough that genetic algorithms are optimisers through-and-through, sub-optimal ones at that, and in any problem where even a crude algorithmic method is to hand a GA will always be a bad choice - however much in vogue they may currently be. In certain tough applications though, where the only methods otherwise available are sub-optimal or heuristic approaches, they can perform with remarkable agility and robustness.
The basic genetic algorithm breeding process is as follows:
1. Initialise the population in some way; usually, a random population is generated, but in more advanced usage the population may be seeded with specific individuals. Each individual is assigned a fitness rating with the problem-specific fitness function.
2. Next, a selection operator is employed, in order to choose which individuals are allowed to place offspring into the next generation. Selection must give precedence to stronger individuals, and "weed out" poorer ones.
3. Then the offspring are generated, by taking a pair of selected parents and combining them in a process called crossover, an operator which creates a pair of children each bearing a resemblance to both parents. These children are passed to another operator which gives them the chance to mutate, i.e. have some of their genes spontaneously altered, which is for two reasons:
• it is a well documented phenomenon in natural crossover, and we wish to emulate the process as much as possible;
• more importantly, it serves the purpose of keeping the "gene pool" diverse, and allowing as yet unheard of solutions to emerge.
4. Finally, the children are inserted into the population, their fitnesses calculated, and the whole breeding cycle starts again at step 2.
In this way, as solutions compete for a place for their offspring in the next generation, the overall trend in the population is towards better and better solutions. This, at least, is the theory; in practice, there are many complicated issues that arise, and it is extremely easy to build a genetic algorithm that is basically a glorified and very inefficient random search. The problem is further complicated by the fact that a multitude of variations on the basic genetic algorithm have been proposed, all of which have the potential to improve its performance, but in many combinations lead to spectacularly poor results.
Genetic Algorithms have been applied in many areas, including
• game-playing (Bagley 1967)
• cell simulation (Rosenburg 1967)
• image processing (Cavicchio 1970)
• pattern recognition (Englander 1985)
• gas pipeline control (Goldberg 1983)
• solution of non-linear equations (Shaefer 1985)
• training neural networks (Harp 1989)
• task scheduling (Syswerda 1991)
Current areas of interest are wide ranging, but tend to focus on methods of improving the basic genetic algorithm design described in 2.2.3. Some possibilities for doing this lie in the following aspects of the basic algorithm, which were concentrated on during the research phase of this project:
• crossover and mutation methods, especially in combinatorial problems; in a travelling salesperson problem of n cities, the intuitive approach is to encode possible routes as "permutation strings" using each number from 1 to n exactly once. The problem then arises of how to perform crossover between two such permutation strings, so as to give offspring which resemble both of their parents whilst retaining the permutation property. Many possible permutation crossover algorithms have been proposed, such as Partially Matched Crossover, Order Crossover, and Cycle Crossover (see Goldberg 1989, p170), and some of these are explored in 4.5.
• breeding and selection strategies; for example, the elitist strategy allows a small number of the fittest members of the current population to survive unaltered to the next generation. Another option is elitist re-seeding, where a percentage of the population is periodically replaced by variations on the best individual so far found. In addition, consideration must be given to the method of choosing parents to be breed together to form the new population, and whether offspring which are less fit than their parents should be allowed to replace them. (See 4.6 and 4.8)
• fitness-value scaling; this is almost always required, since the values returned by the fitness function are always totally objective, judging as it does the worth of a particular solution in the context of the whole problem. What is actually required is some kind of relative rating ; usually, we require that the most fit individual has exactly twice the average fitness in the current population, since this gives it twice as much "breeding potential" as an average individual. Simple Linear Scaling (Goldberg 1985) is the most basic of the scaling methods in common use, but more exotic ones have been suggested (see 4.7 for further details).
The reader interested in exploring the subject of genetic algorithms, in more detail than is used in this chapter, is referred to the bibliography towards the end of this report.