Learning and Complex Behavior

Ch. 2

 Some simulations            using              bio-behaviorally informed     Selection Networks to implement reinforcement

Computer Simulations of Some Conditioning Phenomena 

Download the simulation program by left-clicking  


If left-clicking does not start to download the file, copy the link into your browser.

The program is compressed to reduce the size of the file. Use UnRAR (or UnZIP) to uncompress (“unpack”) the program. If you do not have WinZIP or WinRAR installed on your machine, UnRAR is available as a free download. To download UnRAR, left-click your mouse on  


Once UnRAR is installed, double left-click the downloaded file SelNet-Simulation-Demo.rar to extract it. The resulting folder will be placed in the same location as the downloaded program unless you specify otherwise. Install the program wherever you wish, for example on the Desktop or in the root directory of the C drive (C:\). 

Open the folder “SelNet-Simulation-Demo” and start the simulation program by left-clicking “SelnetDemo.exe” The speed with which the simulation runs is determined by the Delay option and the speed of your computer, Before running a simulation trial, choose either NO delay (runs as fast as your machine permits) or YES (simulation is slowed to permit you to follow its course more closely). This option may be changed before each run of the simulation.  

Two different types of conditioning procedures are available after left-clicking the Simulate option.  


     In this simulation, a neural network with the architecture of a Selection Network is trained according to an operant procedure. The simulation goes through three successive phases.

Phase 1: When the operant output unit is activated by stimulating one of the environmental input units, a diffuse reinforcement signal occurs. Note that the output unit corresponding to reinforcer-elicited activity (shown as cr in the lower graph) is conditioned before the output unit corresponding to the operant (shown as r in the upper graph). No effort has been made to simulate the true rate of conditioning, which depends on the values of the various parameters. (See Help option in the simulator for detailed information about the parameters.) The values of parameters in biobehaviorally constrained simulations are ultimately determined by independent experimental research.

Phase 2: During this phase, activity of the operant unit of the same network is no longer reinforced when it is activated by stimulating an environmental input unit. Note that CR activity extinguishes before R activity extinguishes. This is because CR activity implements internal conditioned reinforcement of connections within the network for some time after the environmental reinforcer no longer occurs.

Phase 3: During the final phase, reinforcers are again presented when the operant output unit (R) is activated. Note that reconditioning of the R unit occurs more rapidly than original conditioning. This is because extinction has not completely eliminated the increases in connection weights between units that were established during original conditioning, some of which enable internal conditioned reinforcement. Selection Networks “remember” some of what they have learned, even after extinction when operant responding no longer occurs. 

Differential-Nondifferential reinforcement 

     This simulation has two phases with the same network architecture as the preceding simulation.

Phase 1: In nondifferential reinforcement, either of two input units is stimulated in random order and activation of the operant unit is followed by a reinforcer no matter which input unit was stimulated. Usually the R unit is activated by stimulation of either input unit, but sometimes only one input unit becomes effective (that is, becomes a discriminative stimulus, or SD).

Phase 2: In differential reinforcement, the reinforcer is given only when the R output unit is activated by stimulating one input unit (S1), shown in blue. When the S2 input unit is activated, shown in red, the reinforcer is never given even if the R output unit is activated. Note particularly:

     (a) S1 and S2 usually both acquire control of responding during the nondifferential phase.

     (b) Sometimes only one stimulus acquires control during the nondifferential phase.

     (c) Only one stimulus (S1) usually maintains control of responding during the differential procedure and responding to the other stimulus (S2) extinguishes

     (d) Variation occurs from one simulation to the next.

When the “wrong” stimulus acquires control of the operant in the nondifferential phase, sometimes the “correct” stimulus never acquires control during the differential phase. Why might this sometimes occur with real learners? (Hint: Is there sometimes a difference between the experimenter-defined contingency and the learner-experienced contingency?)