Unit 6: Selecting Productions on the Basis of Their
Utilities and Learning these Utilities
Occasionally, we have had cause to set parameters of productions so that one production will be preferred over another in the conflict resolution process. Now we will examine how production rule utilities are computed and used in conflict resolution. We will also look at how these utilities are learned.
6.1 The Theory
There can be multiple productions that match the buffers’ current contents and the issue arises of which production to select to fire. Each production has associated with it a utility, which reflects how much the production is expected to contribute to achieving the model's current objective. The utility of a production i is defined as:
Ui = PiG
- Ci
This is calculated from three quantities:
Pi: The expected probability that production i firing will lead to a successful completion of the current objective. The objective is considered complete when a production that is marked as being either a success or a failure fires.
Ci: Expected cost of achieving that objective. Cost is measured in time and C is an estimate of the time from when the production is selected until the objective is finally completed.
G: Value of the objective. Given that the units of cost are measured in time, so is the value of the objective. G is typically set to 20 seconds, which is the default value.
For example, if Pi = .9, G = 20, Ci = 3 the utility of production i is __.
The values of P and C can be set for each production, or as we will describe they can be learned from experience. The value of G is a global value that is set as the parameter :g with the sgp command.
Among the productions that match, ACT-R will select the production with the highest utility. However, the equation above actually only gives the expected utility. Like activations, utilities have noise added to them so the full equation becomes
Ui = PiG
- Ci + e
The noise, e, is controlled by the utility noise parameter s which is set with the parameter :egs. The noise is distributed according to a logistic distribution with a mean of 0 and a variance of
![]()
As with activations for chunks, there is also a threshold which specifies the minimum utility necessary for a production to fire. The utility threshold is set with the :ut parameter.
If there are a number of productions competing with expected utility values Uj the probability of choosing production i is described by the formula

where the summation is over all the competing productions (those that match the current buffer contents) including i and the utility threshold.
6.2 Building Sticks
Example
We will illustrate these ideas with an example from problem solving. Lovett (1998) looked at participants solving the building-sticks problem illustrated in the figure below. This is an isomorph of Luchins waterjug problem that has a number of experimental advantages. Participants are given an unlimited supply of building sticks of three lengths and are told that their objective is to create a target stick of a particular length. There are two basic strategies they can select – they can either start with a stick smaller than the desired length and add sticks (like the addition strategy in Luchins waterjugs) or they can start with a stick that is too long and “saw off” lengths equal to various sticks until they reach the desired length (like the subtraction strategy). The first is called the undershoot strategy and the second is called the overshoot strategy. Subjects show a strong tendency to hillclimb and choose as their first stick a stick that will get them closest to the target stick.

You can go through a version of this by opening the model bst-nolearn in your unit6 folder. By evoking the command (do-set) you will be presented with a pair of problems:
? (do-set)
(UNDER OVER)
It returns a list of the solutions you initially tried on each of the problems, and in this version of the task there are only two problems. As it turns out both of these problems can only be solved by the overshoot strategy. However, the first one looks like it can be solved more easily by the undershoot strategy. The exact lengths of the sticks in pixels are:
A = 15 B = 200 C = 41 Goal = 103
The difference between B and the goal is 97 pixels while the difference between C and the goal is only 62 pixels – a 35 pixel difference of differences. However, the only solution to the problem is B – 2C – A. The same solution holds for the second problem:
A = 10 B = 200 C =
29 Goal = 132
But in this case the difference between B and the goal is 68 pixels while the difference between C and the goal is 103 pixels – a 35 pixel difference of differences in the other direction. You can run the model on these problems and it will tend to choose under for the first and over for the second but not always. One can run it multiple times by calling the function collect-data with its argument being the number of runs. The following is the outcome of 100 trials:
?
(collect-data 100)
(25 73)
where the two numbers in the list returned are the number of times overshoot was chosen on the first problem and the second problem respectively.
The model for the task involves a good number of productions for encoding the screen and selecting sticks. However, the behavior of the model is really controlled by four production rules that make the decision to apply the overshoot or undershoot strategy.
(p
decide-over
=goal>
isa
try-strategy
state
choose-strategy
strategy nil
under
=under
< over
(!eval! (- =under 25))
==>
=goal>
state
prepare-mouse
strategy over
+visual-location>
isa
visual-location
kind
oval
screen-y 60)
(p
force-over
=goal>
isa
try-strategy
state
choose-strategy
- strategy over
==>
=goal>
state
prepare-mouse
strategy over
+visual-location>
isa
visual-location
kind
oval
screen-y 60)
(p
decide-under
=goal>
isa
try-strategy
state
choose-strategy
strategy nil
over
=over
< under
(!eval! (- =over 25))
==>
=goal>
state
prepare-mouse
strategy under
+visual-location>
isa
visual-location
kind
oval
screen-y 85)
(p
force-under
=goal>
isa
try-strategy
state
choose-strategy
- strategy under
==>
=goal>
state
prepare-mouse
strategy under
+visual-location>
isa
visual-location
kind
oval
screen-y 85)
The key information is in the slots over, which encodes the pixel difference between the stick b and the goal, and under, which encodes the difference between the goal and stick c. These values have been computed by prior productions that encode the problem. If one of these differences is more than 25 pixels less than the other, then decide-under or decide-over can fire to choose the strategy. In all situations, the other two productions, force-under and force-over, can apply. Thus, if there is a clear difference in how close the two sticks are to the goal there will be three productions (one decide, two force) that can apply and if there is not then just the two force productions can apply. The choice among the production rules is determined by their relative utilities which we can see in the Procedural Memory Viewer window, or by using the spp command:
?
(spp force-over force-under decide-over decide-under)
Parameters for production Force-Over:
:Chance
1.000
:Effort
0.050
:P
0.500
:C
0.050
:PG-C
9.950
Parameters for production Force-Under:
:Chance
1.000
:Effort
0.050
:P
0.500
:C
0.050
:PG-C
9.950
Parameters for production Decide-Over:
:Chance
1.000
:Effort
0.050
:P
0.650
:C
0.050
:PG-C 12.950
Parameters for production Decide-Under:
:Chance
1.000
:Effort
0.050
:P
0.650
:C
0.050
:PG-C 12.950
(Force-Over
Force-Under Decide-Over Decide-Under)
The only differences among the productions are the values of P which were set by the spp command in the Commands window.
(spp
decide-over :p .65)
(spp
decide-under :p .65)
(spp
force-over :p .5)
(spp
force-under :p .5)
The P parameters for the force productions are .50 while they are a more optimistic .65 for the decide productions. With G set at the default value of 20 this leaves a difference of 3 between the PG-C values for the decide and choose productions.
Lets consider how these productions apply in the case of the two problems in the model. Since the difference between the under and over differences is 35 pixels, there will be one decide and two force productions that match the buffers. Let us consider the probability of choosing each production according to the equation.

The parameter s is set at 3 and the utility threshold is set to -100 (we want the probability that none of the productions are over the threshold to be essentially 0). First, consider the probability of the decide production:

Similarly, the probability of the two force productions can be shown to be .248. Thus, there is a .248 probability that a force production will fire that has the model try to solve the problem in the direction other than it appears.
6.3 Parameter
Learning
So far we have only considered the situation where the production parameters are static. However, they will change as experience is gathered about the relative costs of different methods and their relative probabilities of success. The probability of success of a production is calculated as
P =
where Successes and Failures are the number of experienced successes and failures. A success or failure occurs when a production explicitly tagged as a success or a failure fires. In the bst models there is one production that recognizes failure and starts over again and another production that recognizes success. One has the failure flag set to t and the other has the success flag set to t by a spp command:
(p
pick-another-strategy
=goal>
isa
try-strategy
state
wait-for-click
=manual-state>
isa
module-state
modality free
=visual-location>
isa
visual-location
> screen-y 100
==>
=goal>
state choose-strategy)
(p read-done
=goal>
isa
try-strategy
state
read-done
=visual>
isa
text
value
"done"
==>
+goal>
isa
try-strategy
state
start)
(spp
read-done :success t)
(spp
pick-another-strategy :failure t)
When such a production fires all the productions that have fired since the last marked production fired are credited with a success or failure.
A similar equation governs the learning of the cost:
C
= ![]()
where Efforts is the accumulated time over all the successful and failed applications of this production rule. The time for a particular success or failure credited to a production that is not the one marked as a success or failure is the difference in time between that production’s selection and the selection time of the marked production. The time credited to a marked production is its effort – the amount of time it takes to fire.
Productions have initial values of the parameters Efforts, Successes, and Failures at the beginning of a run. By default each production rule is created with Efforts = .05 seconds (the cost of one firing), Successes = 1, and Failures = 0. This means that the default value of P is 1 and the default value of C is .05 second. However, as we will see in the next section it is often necessary to set these to non-default values to reflect prior experience or biases. These prior values can be set with the spp command as shown below:
(spp
decide-over :failures 7 :successes 13 :efforts 100)
(spp
decide-under :failures 7 :successes 13 :efforts 100)
(spp
force-over :failures 10 :successes 10 :efforts 100)
(spp
force-under :failures 10 :successes 10 :efforts 100)
It is also possible to set the initial values for all of the productions by omitting a production name in the call to spp:
(spp
:efforts 500 :successes 100)
6.4. Learning in the
Building Sticks Task
Lovett did an experiment with a building sticks task. The following are the percent choice of overshoot for each of the problems in the training set from Lovett & Anderson (1996):
Lovett, M. C., & Anderson, J. R. (1996). History of success and current context in problem solving: Combined influences on operator selection. Cognitive Psychology, 31, 168-217.
a
b c Goal
%OVERSHOOT
15
250 55 125
20
10
155 22 101
67
14
200 37 112
20
22
200 32 114
47
10
243 37 159
87
22
175 40 73
20
15
250 49 137
80
10
179 32 105
93
20
213 42 104
83
14
237 51 116
13
12
149 30 72
29
14
237 51 121
27
22
200 32 114
80
14
200 37 112
73
15
250 55 125
53
The majority of these problems look like they can be solved by undershoot and in some cases the pixel difference is greater than 25. However, the majority of the problems can only be solved by overshoot. The first and last problems are interesting because they are identical and look strongly like they are undershoot problems. It is the only problem that can be solved either by overshoot or undershoot. Only 20% of the participants solve the first problem by overshoot but after the sequence of problems this rises to 53% for the last problem.
The model bst-learn is the one that simulates this experiment. This is the same as the model in bst-nolearn except that the learning mechanism is enabled (the :pl parameter is t) and all of the stimuli are presented by do-set. When the learning is on, we do not set the values of P and C directly. Instead, we set the parameters of the critical productions to have prior values of Successes, Failures and Efforts to produce the desired initial values of P and C:
?
(spp force-over force-under decide-over decide-under)
Parameters for production Force-Over:
:Chance
1.000
:Effort
0.050
:P 0.500
:C
5.000
:PG-C
5.000
:Successes
(10)
:Failures
(10)
:Efforts
(100)
:Success
nil
:Failure
nil
Parameters for production Force-Under:
:Chance
1.000
:Effort
0.050
:P
0.500
:C
5.000
:PG-C
5.000
:Successes
(10)
:Failures
(10)
:Efforts
(100)
:Success
nil
:Failure
nil
Parameters for production Decide-Over:
:Chance
1.000
:Effort
0.050
:P
0.650
:C
5.000
:PG-C
8.000
:Successes
(13)
:Failures
(7)
:Efforts
(100)
:Success
nil
:Failure
nil
Parameters for production Decide-Under:
:Chance
1.000
:Effort
0.050
:P
0.650
:C
5.000
:PG-C
8.000
:Successes
(13)
:Failures
(7)
:Efforts
(100)
:Success
nil
:Failure
nil
(Force-Over Force-Under Decide-Over Decide-Under)
The following is the performance of the model on a 100 simulation run:
? (collect-data
100)
CORRELATION: 0.733
MEAN
DEVIATION: 19.701
Trial
1 2
3 4 5
6 7 8
9 10 11
12 13 14
15
25
46 52 63
83 32 60
73 38 36
32 35 67
65 37
DECIDE-OVER
: 0.6578
DECIDE-UNDER:
0.6600
FORCE-OVER : 0.6457
FORCE-UNDER
: 0.4743
Also printed out are the average values of P for the critical productions after each run through the experiment over these 100 runs. As can be seen, the two decide productions retain their estimates of about 65% success and the force-under production retains its estimate of about 50% success. However, the system has learned that the force-over production is more generally successful -- about 65%. Here are the actual production parameters after one run through the experiment:
?
(spp force-over force-under decide-over decide-under)
Parameters for production Force-Over:
:Chance
1.000
:Effort
0.050
:P
0.636
:C
5.156
:PG-C
7.571
:Successes (21.0)
:Failures (12.0)
:Efforts (170.163)
:Success
nil
:Failure
nil
Parameters for production Force-Under:
:Chance
1.000
:Effort
0.050
:P
0.500
:C
4.873
:PG-C
5.127
:Successes (13.0)
:Failures (13.0)
:Efforts (126.69200000000001)
:Success
nil
:Failure
nil
Parameters for production Decide-Over:
:Chance
1.000
:Effort
0.050
:P
0.650
:C
5.000
:PG-C
8.000
:Successes
(13)
:Failures
(7)
:Efforts
(100)
:Success
nil
:Failure
nil
Parameters for production Decide-Under:
:Chance
1.000
:Effort
0.050
:P
0.636
:C
4.951
:PG-C
7.776
:Successes (14.0)
:Failures
(8.0)
:Efforts (108.928)
:Success
nil
:Failure nil
The values for the force productions had been 10 successes and 10 failures at the beginning of the run. In the case of force-over the system has experienced 11 more successes and only 2 more failures leading to totals of 21 and 12 and the more optimistic estimate of P of .636. In the case of force-under the system has experienced 3 additional successes and 3 additional failures leaving the estimate of P unchanged. The values for the decide productions had been 13 successes and 7 failures. In this run the decide-over production was never tried and so its values are unchanged. The decide-under production had been tried twice with one success and one failure leaving the values at 14 successes and 8 failures and a slightly reduced P value of .636.
6.5 Learning in a
Probability Choice Experiment
Your assignment is to develop a model for a "probability matching" experiment run by Friedman et al (1964). The difference between this assignment and earlier ones is that you are responsible for almost all of the code for the model, including the code which presents the experiment. The experiment to be implemented is very simple. The basic procedure, which is repeated for 48 trials, is:
1. The participant is presented with a screen saying "Choose"
2. The participant either types H for heads or T for tails
3. The screen is cleared and presents as feedback the correct answer, either "Heads" or "Tails" for 1 second.
Friedman et al arranged it so that heads was the correct choice on 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, and 90% of the trials (independent of what the participant had done). For your experiment you will only be concerned with the 90% condition. Thus, your experiment will be 48 trials and “Heads” will be the correct answer 90% of the time. We have averaged together the data from the 10% and 90% conditions (flipping responses) to get an average proportion of choice of the dominant answer in each block of 12 trials. These proportions are 0.72, 0.78, 0.82, and 0.84. This is the data that your model is to fit. Note, this is not the percentage of correct responses – the correctness of the response does not matter. Your model must begin with a 50% chance of saying heads. Then, rapidly adjust its probabilities so that it averages close to 72% over the first block of 12 trials, and increases to about 84% by the final block. You will run the model through the experiment many times (resetting before each experiment) and average the data of those runs for comparison. As an aspiration level, this is the performance of the model that I wrote, averaged over 100 runs:
?
(collect-data 100)
CORRELATION: 0.959
MEAN
DEVIATION: 0.026
Original
Current
0.720
0.679
0.780
0.785
0.820
0.801
0.840
0.817
In achieving this, the parameters I worked with were the noise in the utilities (set by the :egs parameter) and the initial number of successes and failures that I gave to the productions that chose heads and tails -- just as this is what I did in the case of the building sticks model.
The starting model you are given for this task, choice, contains only the functions necessary to run a person through one trial and to collect a key press response using the “trial at a time” experiment writing style. When either a person or the model presses a key, the string representing that key will be saved in the global variable *response*, and the function do-trial-person will run one trial returning the key that was pressed. You will have to write a similar function to run the model through one trial, which should be named do-trial-model. You also need to write a function called collect-data that takes one parameter and runs the experiment that many times and prints out the average results of the runs and the correlation and deviation of the average data to the experimental data. You also must write the model for the task that fits the data.
My suggestion would be to first write the do-trial-model function and a model that does the task (without trying to fit the data), and make sure that works correctly. Next write a function to run a block of 12 trials and test that to make sure the model works correctly between trials. Then write a function to iterate over 4 blocks for running one pass of the experiment and test that. After that is working write the collect-data function to run the experiment multiple times. Only then should you be concerned with actually fitting the model to the data, once you are sure everything else works.
To write the experiment for the model to interact with you will need to use a few ACT-R functions that were discussed in the experiment description files. Those functions will be described again here, and the models should provide plenty of examples of their use.
The reset function initializes ACT-R. It returns the model to the initial state as specified in the model file. It is the programmatic equivalent of pressing the Reset button in the environment.
The function pm-install-device takes one parameter which should be a window. That parameter tells ACT-R/PM with which window the model will be interacting. Everything in that window can be seen by the model, and all of the model’s motor actions (key presses and mouse clicks) will affect that window.
The pm-proc-display function is called to make the model “look” at the window. The model only encodes the screen when requested with a call to pm-proc-display. Thus, for the model to notice a change to the window pm-proc-display must be called after the change has occurred. This function performs the buffer stuffing of the visual-location buffer if it is empty and triggers the re-encoding if the model is attending an item.
To run the model, use the pm-run function. It has one required parameter, the maximum amount of time to run the model, and one optional keyword parameter called :full-time. If :full-time is specified as t then the model will run for the entire time specified. Otherwise, the model will stop immediately when there is nothing more it can do, which may be prior to the end of the specified running time.
In addition to these functions there are the correlation and mean-deviation functions that you will need to use. These calculate the correlation and mean-deviation between two lists of numbers.
Here is the function that runs the model through the paired associate task from unit 4:
(defun
do-experiment-model (size trials)
(let ((result nil)
(window
(open-exp-window "Paired-Associate Experiment" :visible nil)))
(reset)
(pm-install-device
window)
(dotimes (i trials)
(let ((score 0.0)
(time 0.0)
(start-time))
(dolist (x (permute-list (subseq
*pairs* (- 20 size))))
(clear-exp-window)
(add-text-to-exp-window :text (car x) :x 150 :y 150)
(setf *response* nil)
(setf *response-time* nil)
(setf start-time (pm-get-time))
(pm-proc-display)
(pm-run 5.0
:full-time t)
(when (equal (second x)
*response*)
(incf score 1.0)
(incf time (- *response-time*
start-time)))
(clear-exp-window)
(add-text-to-exp-window :text (second x) :x 150 :y 150)
(pm-proc-display)
(pm-run 5.0
:full-time t))
(push (list (/ score size) (and (>
score 0) (/ time (* score 1000.0))))
result)))
(reverse result)))
It is more complicated than the function you will need for this assignment because it is recording response times and averaging the data over multiple runs which your do-trial-model function will not be doing. It also calls reset which you should not do in your do-trial-model function because you want the model to continue to learn from trial to trial. You should only call reset at the start of each pass through the whole experiment. However, it does have a similar sequence of operations. If we ignore the averaging and response times the function opens a window, presents an item of text, runs the model, clears the screen, displays another item of text and then runs the model again (the code highlighted in red). The provided do-trial-person function provides the structure for the displaying of the items in the choice task:
(defun
do-trial-person ()
(let ((window (open-exp-window "Choice
Experiment" :visible t)))
(add-text-to-exp-window :text
"choose" :x 50 :y 100)
(setf *response* nil)
(while (null
*response*)
(allow-event-manager window))
(clear-exp-window)
(add-text-to-exp-window :text (if (<
(random 1.0) .9) "heads" "tails")
:x 50 :y 100)
(sleep 1.0)
*response*))
What you must do is write the do-trial-model function that presents the display as do-trial-person does, but has the appropriate interaction with ACT-R (the code colored green is not the interaction necessary for ACT-R to do the task).
It is also possible to write the experiment using an event-based style as discussed in the unit4 experiment documentation. That will require a little more work to program because it does not analogize as neatly to an existing model. If you would like to write the experiment in that way you should look at the Zbrodoff model as an example instead of the paired model as described above. In fact, the different ways to write the experiment will actually have an impact on the data fitting for this model because they will have slightly different timing on the events which will affect the efforts experienced by the productions. For the paired associate task the style of the experiment was not an issue because the lengths of the trials were fixed, but in this case, because the trials transition on the response of the model, the event-based experiment will provide a more veridical timing sequence because the events of the experiment are not impacted by components of the model other than its response. However, either solution is acceptable for the assignment.