and
John R. Anderson
Department of Psychology
Carnegie Mellon University
Pittsburgh, PA USA 15213
Declarative transfer from one domain to another can be observed in a systematic decrease in the times spent in reading an instructional text and processing help during problem solving times. Two experiments, done in the programming domain, tested the hypothesis that the subjects who have been introduced to a first programming language, develop a representation of basic programming concepts that helps them integrate new declarative knowledge from a second programming language. It is shown that the effect on reading is greater for the pages that are conceptually close across texts, and for subjects who have fully mastered the basic concepts in the first language. A regression model of reading shows an effect on processes that are responsible for the analysis of novel words and examples, while general strategic reading processes remain unaffected. The increase reading speed is not accompanied by a greater understanding of the text. Effects of a common programming interface and transfer of procedural knowledge appear to be negligible on the kind of problems considered. This study supports the distinction between procedural and declarative transfer.
This paper is concerned with the transfer of knowledge that occurs when subjects who have learned and mastered a complex domain of knowledge learn a different one (Cormier & Hagman, 1987; Singley & Anderson, 1989). Typical transfer results suggest that the main effect of transfer is a positive one where some of the knowledge acquired in the first domain can be used in the learning of a second domain. Such positive transfer has been reported in the case of highly procedural tasks such as text editing (Harvey, 1990; Harvey & Rousseau, in press; Polson, 1987, 1988; Singley & Anderson, 1985, 1988). However, transfer between problem solving tasks (Catrambone & Holyoak, 1989; Hayes & Simon, 1977) in more complex information processing domains such as computer programming is less clear and failures to transfer are often reported (see Salomon & Perkins, 1987; Scholtz & Wiedenbeck, 1989).
Part of the problem seems to be related to the fact that, unlike most laboratory tasks which consider simple and well defined tasks requiring the mastering of a limited number of concepts, transfer of knowledge between complex information-processing domains is dependent on the acquisition of a large variety of knowledge. To make any firm predictions about transfer one needs to do a task analysis that adequately represents this variety of knowledge. Certainly, it is necessary to understand the role of conceptual (or declarative) knowledge as well as procedural knowledge in learning a complex skill. Recent research has largely focused on the development and transfer of procedural knowledge. As a result, the experimental settings provide little opportunities to test for the transfer of previously acquired declarative knowledge. The aim of this work is to demonstrate that the declarative representation of knowledge plays an important role in transfer and that its effect is specific and predictable. These properties will be shown to be particularly useful to explain transfer between two programming languages at points where there exists apparently no common procedural skills.
Other authors have also recognized the need for declarative representations of knowledge in transfer. Bovair and Kieras (1991; Kieras & Bovair, 1986) assume that knowledge is first acquired in a declarative form but they represent it as if it were a set of production rules. Royer (1979, 1986) proposes a hierarchical schema model of transfer involving two levels of schemas. The first level corresponds to general information (declarative) while the second level involve specific procedural strategies. Similarly, Dixon and Gabrys (1991) distinguished between conceptual (declarative) and operational (procedural) similarities in emphasizing the role of prior knowledge in learning a complex device. Moreover, although they emphasize the role of rules, Gick and Holyoak (1987) suggest that a list of features describing an object may exist in cases where precise rules can not be defined, and such lists may play an important role in transfer. Brooks and Dansereau (1987) propose a taxonomy of transfer situations based on a distinction between content knowledge and skills which respectively correspond to declarative and procedural knowledge. Wu and Anderson (in preparation) suggest that some (declarative) algorithmic representation of a problem solution develops during learning a programming language and is transferred when the same problem is encountered in another language.
According to Brooks and Dansereau (1987) and Singley and Anderson (1989), there are at least two ways in which prior declarative knowledge gained in one domain may facilitate transfer in another domain. First, the old declarative knowledge may provide a general framework for embedding and elaborating more detailed "new" knowledge. Second, the old knowledge may provide an convenient analogy which can guide the acquisition of procedural information. To clearly distinguish between these two ways, different names have been given to them (Brooks & Dansereau, 1987, Singley & Anderson, 1989). The impact of declarative knowledge on the acquisition of new knowledge and on the creation of new productions rules have respectively been called declarative-declarative and declarative-procedural (or analogical) transfer by Singley and Anderson (1989) and content-to-content and content-to-skills transfers by Brooks and Dansereau (1987).
Declarative-declarative transfer mainly relates to the way information is gathered from the environment. The hypothesis is simply that the old declarative knowledge is used to integrate new incoming information. This is supported by research on schemas and mental models (e.g., Kieras & Bovair, 1984) which suggest that assimilating new knowledge is easier when an integrating structure have been previously acquired. This is also supported by work on advance organizers (Ausubel, 1963, 1978) which serve to activate the previously learned declarative representation by providing cues to the learner about the new incoming material.
Predicting the magnitude of this change is however not straightforward since many factors contribute to determine the reading speed of a scientific text (see Bovair & Kieras, 1991 for a review). Thibadeau et al. (1982) have precisely modeled the gaze duration of each word of a scientific text using a regression model with a number of regressors associated with the words. They observed an increase of about 686 ms in gaze duration for each new word found in the text. So, in the case where two domains share identical concepts, a simple model would predict that subjects which have been introduced to an instructional text in the first domain would save about 686 ms each time they encounter the same concepts in the other domains as compared to control subjects. Consequently, magnitude of transfer can be expected to be a function of the number of new words introduced by the second text.
Our research uses similar tutoring environments for both Lisp and Prolog that attempt to insure that a mastery of both the learning and transfer domains. This important condition to observe transfer will thus be satisfied. Also, the declarative instructions and the procedural representations of the programming skills underlying the mastery of these programming languages are available from the tutors. This reduces the somewhat arbitrary process of writing these descriptions.
We have a number of specific predictions about transfer between these two languages:
(1) An effect should be observed on reading times. For instance, the expository text in Figure 1 about lists and list processing can be a part of both Lisp and Prolog manuals with only minimal changes. Someone who has taken a Prolog course will have read this page and will have developed a representation of it, so that, she will, during her subsequent Lisp course read this page faster. To a lesser extent, pages which are not identical but which are closely related or specific may also be read faster since they may also share identical new words.
(2) We also expect an effect on declarative-procedural transfer. For instance, analogical transfer seems to be strongly involved in learning programming (Anderson, 1987). The knowledge of conceptual elements underlying the basic vocabulary may favor the process of solving problems in the other language. We may thus expect problems which introduced these conceptual elements to be easier to solve following transfer. Specifically, we predict fewer errors and less time spent in error states.
(3) Except for these two cases, very little transfer of procedural knowledge is expected since the basic primitives in LISP and Prolog are different. For instance, the function to extract the first element of a list in LISP, called car, is realized using an operator called the tail operator in Prolog. While they achieve the same effect, they cannot be realized by the same production rule in the two languages. In fact, none of the production rules underlying the programming skills in actual tutors are identical (Anderson, 1993). Thus, we expect relatively little transfer in terms of time to correctly write code although there might be some savings due to the common interface shared by the tutors for both languages. This prediction of lack of procedural transfer rests on the fact that we are looking at beginning programming where there is relatively little algorithm design. At more complex levels of programming, one sees transfer at the algorithmic level (Anderson, Conrad, Corbett, Fincham, Hoffman, & Wu, 1993).
Design. The experiment involved two phases (Figure 2). During the first phase, subjects of the transfer group were introduced to the Prolog programming language and to the Prolog tutoring system. During the second phase, subjects of control and transfer groups were both introduced to the Lisp tutoring system and to the Lisp programming language.
Subjects were introduced to three Prolog and three Lisp lessons. Each lesson was completed on a different day. Duration of the lesson was determined by the time it took for the subject to master the material, and lasted half an hour to three hours depending on the subject. Each lesson was composed of either two or three sections. Each section included a section of the text to read, a questionnaire about the text, and a set of problem solving exercises with the tutor.
In Prolog, the following topics were covered: facts, queries, constants and variables, conjunctions of goals, matching and binding process and their efficiency, the use of the inequality operator, rules, general arithmetic concepts (arithmetic expressions, notations, precedence of operators), temporary variables and assignment, arithmetic expressions, lists, and list processing. In Lisp, the following topics were covered: an introduction to Lisp, constants and variables, lists, list processing, evaluation including functional evaluation, defining new functions, general arithmetic concepts, arithmetic functions, and temporary variables and assignment.
Procedure. For each Lisp lesson presented, the subjects first read the instructional text about the programming languages. At the end of each section of the text, the subjects were prompted about whether they were ready or not to take a questionnaire about what they had just read. If they felt they were not, they could go back to the text and study it again. On the other hand, if they felt they were, a set of four-alternative multiple-choice questions were presented to them, one question at a time. Time to answer was registered by the computer. Presentation of the Prolog lessons was identical except that no questions were presented. After having answered the questions, the subjects then moved to the tutor to solve a set of problem-solving exercises. The tutor registered all user's actions along with their stamped time and their corresponding production rule using the tutor's model tracing facility.
The Instructional Texts. The Lisp and Prolog texts used are those used in introductory programming course given to undergraduate students at CMU with one difference. The number of common pages shared by the Lisp and Prolog texts has been increased by the inclusion of identical pages about general topics such as lists, list processing, and arithmetic in both texts (e.g., Figure 1). The other, non identical pages, are categorized as either related or specific to Lisp or Prolog. A page is classified as related when it concerns a topic which could be true of both Lisp and Prolog but which contains few elements specific to a given language. On the other hand, a page is considered as specific when it concerns topics which are simply not true in the other language. For instance, facts, queries, and rules are specific to Prolog.
The texts were entered into the computer in an Hypercard stack and presented to the subjects in a separate window. Subjects read the text one page at a time, at their own pace, and could move freely from one page to another. The time they spent reading each page was monitored by the computer.
The Declarative Questionnaire. The declarative questionnaire contained four-alternative forced-choice questions and was administered after the declarative texts but before the practice with the tutor. The questions were selected to be as close to the text as possible. They were about the meaning of some words, expressions, or definitions. Some were simple evaluation questions where the result of a Lisp expression must be given as answer. In all, 51 questions are presented.
The Lisp and Prolog tutors. The tutors used in this study attempt to focus on problem solving while simplifying the interaction with the computer. Both the LISP and Prolog tutors used similarly windowed environment, a set of carefully designed exercises corresponding to the basic programming skills presented in the declarative instructions, declarative help, tracing of the student model, and a remediation plan that attempted to bring the students to mastery of the skill.
The tutors work as follow: The set of production rules that subjects must master by the end of the lessons is referred to as the student model. The tutors use this student model to trace the skills which have been mastered and present problems to practice unmastered skills. So, for each section of the text, subjects must solve a set of required problems which develop some of the skills. If after this set of problems, some of the productions still fail to be fully mastered, the subjects are presented with some remedial problems selected so to practice those rules. Once mastery of all the productions is achieved, the subjects are ready to enter the next section in the curriculum.
Computers. Presentation of the instructional texts, the declarative questionnaire, and the tutor are done on MAC II computers equipped with a double page monitors.
However, the transfer effect is not generalized to all pages. Indeed, the reading behavior of the subjects in the control and transfer groups appears to depend on the pages read. First, there is a significant interaction between condition and page [F(66, 1804) = 1.36, p < 0.03]. Moreover, differences for identical (q = 5.23, p < 0.01), related (q = 5.76, p < 0.01), and specific pages (q = 5.46, p < 0.01) have been found using a posteriori tests. The largest effect is on identical and the smallest one on the specific pages, as hypothesized. This can be seen in Figure 3 which compares the performance of the subjects on the three types of pages.
An information processing model of reading. An information processing model of the reading has been developed. Its aim is to formally locate the information processes that have been affected by transfer. The modeling approach is based on Thibadeau et al.'s reading model. As said earlier, they have precisely modeled the gaze duration of each word of a scientific text using a regression model with a number of regressors associated with the words. In the present model,
the time spent at reading a page is modeled using a regression model with a number of regressors associated with the page. Our model is built around two sets of processes. For each set of processes, a number of variables have been included in the model to test its effect. So, the first set of processes relates to some comprehension mechanisms that search for the meaning of new words and try to understand examples. The number of new words and the number of examples found in a page have been included in the model. A novel word has been defined as a word which is likely to be unknown by the subjects or a word which has a new definition in the context of programming. For instance, the word "list" is not per se a new word, but it is considered here as a new word since it has a specific meaning in programming. The set of words considered as new in the three lessons are atoms, list, embedded lists, car, cdr, cdrs, reverse, quoted, quote, unquote, unquoted, combiners, cons, consed, append, defun, let, and Lisp. The operational definition of an example has been based on the idea that many of the examples found in the Lisp text have the following form:
(cdr '(a b c)) returns (b c).
Consequently, one way to objectively count the number of examples present in a given page is to count the number of "returns" in that page. For instance, assume that a given page contains the following text about the Lisp extractors car and cdr:
(cdr '(a b c)) returns (b c)
(car '(a b c)) returns a
(car (cdr '(a b c))) returns b
Such a page contains three different examples.
The second set of processes will be an index of the presence of some change in the strategic reading processes. It is composed of variables that mirror the ways the text is scanned by the readers. We assume that the reader first scans all the characters in a given page and integrates them into meaningful units (characters, words, lines, sentence, and paragraph). Consequently, the number of characters in a page, the number of words, lines, phrases, and paragraphs have been included into the model. Also, subjects may tend to increase their reading speed within a lesson or within a given section. So, the serial position of a page in a section and its relative position in a given lesson have been included into the model.
Having these two sets of variables, one must be able to determine whether the coefficient of a particular variable was the same for both groups, or whether it is different for each group. To test this, each variable has been included into the model either alone or as a member of a group-by-variable interaction term. If a given coefficient was found to be significant it was kept in the model, otherwise it was removed. A significant group-by-variable interaction coefficient suggests that this coefficient is different for each group and that this variable contributes to explaining the observed transfer effect. On the other hand, if it is the coefficient of a single variable that was found to be significant, it suggests that this coefficient is the same for both groups and thus, contributes to explaining the reading behavior of the subjects in general but, is not useful to explaining the transfer effect.
The regression model have been fitted to the 134 mean cells of the experiment (2 groups x 67 pages/group). Results are shown in Figure 4. The variables which have entered the model are the number of characters (N of characters) and the number of lines (N of lines) in a page, the number of novel words (New words) and the number of examples (N of examples), the serial position of a page in a section (Position in section), and the serial position of a page in a lesson (Position in lesson). All these variables have been found to have a significant contribution to the model.
Four variables (N of characters, N of lines, Position in section, and Position in lesson) have entered the model as a single variable while two variables (New words, N of examples) have entered it as an interaction term. Consequently, this suggests that the time spent understanding the novel words (New words) and the examples (N of examples) found in the text seem to be the factors responsible for the transfer effects found in this experiment. Indeed, interpretation of the regression weights suggests that the subjects in the control group spend a supplementary 1.15 s for each novel word encountered in the text as compared to the transfer group. This result is slightly higher than the 686 ms per novel word reported by Thibadeau et al. (1982). Similarly, subjects in the control group spend a supplementary 4.02 s for each example encountered in a page as compared to the transfer group.
The other reading processes do not seem to have been affected by the transfer of knowledge as all of the remaining variables which have entered the model did it as a single variable and not as an interaction term. For instance, the 0.052 weight of the Nchar variable suggests that the subjects of both groups spend about 52 ms per character (N of characters ), which is, once again, relatively close to the 32 ms reported by Thibadeau et al. (1982). Similarly, subjects of both groups spend about 2.21 s at the end of a given line. Subjects also tend to decrease their reading speed as they progress through the lesson, but they increase it as they progress within a section of the text. Since the coefficients observed are the same for both groups, it must be concluded that subjects of both groups have adopted the same reading strategy.
The accuracy of this simple model which does not require a complex semantic analysis of the content of the texts is somewhat surprising. It explains about 70% of the variance of the means and reproduces the main irregularities found in the reading times.
Performance on Questionnaire. The previous analyses suggested that the subjects in the transfer group have developed a representation of the examples and of the new words in the text that enabled them to read the text faster. This suggests that this representation improves the efficiency of processes responsible for the acquisition of new knowledge. But is this faster reading of the text as efficient as the slower reading of the control group? The accuracy data obtained from the declarative questionnaire can be used to answer those questions. An analysis of the accuracy data suggests that there was no difference in the accuracy of answering the declarative questionnaire. On average, subjects in the transfer group have successfully answered 56.6% of the questions as compared to 55.5% for the control group [F(1, 28) = 0.02, MSe = 0.045, p < .88].
This suggests that subjects in the transfer group preferred to read the instructional text more rapidly instead of increasing their understanding of the content. This preference for speed over accuracy can also be induced from the time they took to read and answer the questions of the questionnaire. Indeed, even though subjects in the transfer group are not more accurate, an analysis of variance done on the response time to the correct answers shows that the transfer group are faster to correctly answer those questions [F(1, 28) = 4.91, MSe = 715.03, p < 0.03]. On average, transfer and control groups respectively took 15.32 s and 19.47 s to correctly answer a question.
Problem-Solving Performance. Since the tutor gives students remediation and that many problems are necessary to reach mastery, one way transfer could be observed is from an analysis of the number of problems solved by each group. However, there does not seem to be a difference between the two groups. On average, subjects in the control and transfer groups have solved about the same number of problems [71.2 versus 69.9 problems--F(1, 28)=0.33, MSe = 17.70, p< 0.57].
On the other hand, a large and significant effect on the mean time per problem was observed. So, on average, subjects in the control and transfer groups spend about 250.22 s and 185.37 s per problem respectively [F(1, 28) = 5.76, MSe = 136779, p < .023]. This represents an advantage of 64.85 s per problem which is a saving of about 25.9% of the time spend by the control group. Cumulated over the 25 required problems, this 64.5 s advantage means a saving of almost half an hour (27 minutes) in favor of the transfer group.
Two hypotheses can be stated to explain those faster problem solving times of the transfer group. The first relates to the transfer of identical rules and the other to the building of new rules. As said earlier, transfer of identical rules is assumed to be automatic and errorless while building a new production rule in ACT involves analogical problem solving which may be unsuccessful. Consequently, it is important to know whether the differences between the groups come from successful or unsuccessful interactions. Moreover, we distinguished among three different types of unsuccessful interactions. First, there are help-and-error interactions. They are characterized by the fact that subjects have both made some errors and requested help. Second, there are error-only interactions. They suggest that the subjects have entered a problem-solving episode, have been able to find a solution to the problem by themselves, and have built some new production rule without requesting any help from the tutor. Third, there are help-only interactions. Since subjects request some help, it is assumed that the subjects failed to built a new production rule by themselves and, that further declarative information had be acquired from the tutor. So, in accordance with the general theory, direct procedural-procedural transfer would be observed by a greater number or a shorter duration in successful interactions. Some effects in error-only interactions would be indicative of a more efficient problem solving capabilities and can be related to be declarative-procedural transfer. On the other hand, differences in the duration of help-only interactions would be an indication that both procedural-procedural and declarative- procedural transfer have failed and should be interpreted as an instance of declarative-declarative transfer.
This interpretation of successful and unsuccessful interactions did not envisage that procedural transfer could have resulted in the applications of wrong rules. As said earlier, a side effect of procedural transfer is that a previously learned rule may have been instantiated in a context where a better rule should have been developed. Even if Prolog and Lisp do not share common rules, it may happen that, for some problems, the condition of some rules may be instantiated and the (wrong) transferred rule may be applied. In such a case, procedural transfer may lead to greater errors for the transfer group. As a result, negative transfer effects where the control group outperformed the transfer group would have been observed. However, since the effects observed in the present study are largely positive, this interpretation of unsuccessful interactions has been discarded.
On the other hand, 5.64 s are taken by the transfer group for processing each help message while 12.14 s are required by the control group. Moreover, processing each error takes 6.24 and 4.68 s for the control and transfer groups respectively. These differences are significant since 2 alternative models which include help or error as a single predictor instead of an interaction term have been respectively found to be less accurate than the proposed model [F(2, 20892)=20.82, MSe = 530.44, p<0.01; F(2, 20892) = 121.52, MSe = 530.44, p < 0.01]. This suggests that the transfer advantage are due to faster processing of the help messages and errors by the transfer group. Figure 6 shows the amount of time subjects spend in the four types of interactions.
On the other hand, these faster processing times where not accompanied by a greater accuracy in the answers. Figure 7 shows the distribution of successful interactions, help only, error only, and error and help interactions for the control and transfer group. There is no significant differences between the groups. Indeed, using the frequency of occurrence of each type of interactions cumulated over each subject as the dependent variable, an anova reveals that both the group [F(1, 28) = 0.001, MSe = 17988.215, p < 0.96] and group*type of interaction [F(3, 84) = 1.02, MSe = 3290.07, p < 0.39] effects are not significant.
This experiment tested a number of assertions about the role of declarative knowledge in the context of the transfer of computer programming languages which are conceptually close but which share apparently no common procedural elements. The experiment supports the previously stated idea that when learning a domain of expertise, a declarative representation of this domain develops. At transfer time, the effect of this representation has been shown to be specific, limited, but predictable. Its role has been shown to be twofold. It helps integrate new declarative information and, it helps in building new procedural knowledge. Two kinds of evidence support this conclusion.
First, this has been inferred from an increase in the reading speed of the subjects. The increase has been found to be systematic. So, the development of an information-processing model describing some of the reading processes done by the subjects suggests that the effect is not generalized to all reading processes but appear to be specific to the reading processes which are responsible for the analysis of new programming words and the examples present in the text. This is consistent with the idea that the new words and examples are already represented in the memory of the subjects who have learned Prolog. When they encountered similar words and examples in the Lisp context, those subjects did not spend an extra reading time in order to comprehend them.
Second, the tracing of the activities done by the subjects while solving various programming problems also suggests that the subjects in the transfer group spend less time in solving problems. The effect was due to the fact that they were able to both process the help messages and to recover from errors more rapidly.
Experiment 2 will attempt to replicate the results of Experiment 1 and will test some supplementary hypotheses concerning the bases for the results in that experiment. For instance, one hypothesis is that subject's weak level of knowledge in programming has prevented the occurrence of more extensive declarative-procedural transfer or the procedural transfer of some abstract rules. Experiment 2 will overcome this limitation by considering more experienced programmers and by doubling the relative amount of practice of the base and transfer domain. It is generally accepted that experts in a domain have greater problem solving or abstracting abilities. Consequently, use of more experienced programmers may increase the amount of observed declarative-procedural or procedural transfer.
It will also be possible to test some other hypotheses regarding the transfer of declarative knowledge. For instance, it will be possible to test whether or not the effects observed on the first day of transfer in Experiment 1 are also observed on a second day of practice with the same material. According to the theory, once a procedure has been proceduralized and fully mastered, there is little need to refer back to the instructions. Consequently, if practice on the first day is sufficient to produce complete mastery of the skills, the declarative representation should be of little help on a second day of practice with the same material.
Subjects. Twenty-eight subjects with prior experience in programming participated in the experiment and were randomly assigned to one of the two groups. All subjects had taken, at least, a one semester course in a conventional computer language such as Pascal, Fortran, or Basic.
Design. The design of the experiment is similar to the one of Experiment 1, except that the amount of practice in both Lisp and Prolog has been doubled. The subjects in both groups had to study a given lesson twice before being allowed to pass to the next lesson. Each study of a given lesson was done on a different day. Subjects were required to read the instructional texts and redo the exercises with the tutor up to mastery on both study days. Consequently, a subject of the transfer group cumulates a total of 12 study days of programming as compared for 6 study days for the control group. Figure 8 presents the design of the experiment.
The subjects received a pretest for each section of a lesson and received a posttest at the end of a given lesson. In each section, the pretest questionnaire was administered after reading the text but before solving the exercises with the tutor. The same questionnaire was used both as pretest and posttest. Those tests were administered on all study days.
Procedure. The procedure used in Experiment 2 was also very similar to the one used in Experiment 1. For each Lisp lesson, the subjects first read the instructional text about the programming language. At the end of each section of the text, the subjects were prompted to determine whether they were ready or not to take a questionnaire about what they had just read. If they felt they were not, they could go back to the text and study it again. Otherwise, a set of four-alternative forced-choice questions were presented to them, one question at a time. The time taken to read and answer the question was registered by the computer. Presentation of the Prolog lessons were identical. After having answered the questions, the subjects then moved to the tutor to solve a set of problem solving exercises. At the end of each lesson, a posttest was administered to the subjects.
By day 2, the transfer effect seems to have completely disappeared. For instance, the overall difference between the groups is 1.1 s per page but the advantage is for the control group, an unexpected direction. Furthermore, the condition by page interaction fails to be significant [F(66, 1695) = 0.91 , MSe = 133.08, p < 0.68]. Figure 9 presents an overview of day 1 and day 2 data.
To further investigate the reading behavior of the subjects, the same information processing model as used in experiment 1 has been fitted to these data on a day by day basis. Figure 10a presents the results for the first day. The model explains 64% of the variance among the means. Regression coefficients suggest that the subjects in the transfer group spend respectively about 1.5 s and 2.57 s less time on each novel word and each example than the control group. While the coefficient associated with the number of examples fails to reach significance, it replicates the direction of the significant effect in the previous experiment.
The subjects also spend the same pattern of times on what has been called the basic text scanning processes (N of characters and N of lines), even if the coefficient associated with the number of characters fails to be significant in day 1. They also show the same strategic reading behavior as the subjects in experiment 1. They tend to decrease their reading speed as they progress through the text, but this trend is somewhat compensated by the fact that they increase their reading speed within a given section.
Figure 10b presents the fit of the reading model to the data on Day 2. The reading model remains an appropriate description of the reading behavior, explaining 70% of the variance of the means. The subjects, who have mastered the procedures corresponding to the text express a somewhat different reading behavior. Instead of intensively searching for important parts of the text, they seem to simply rapidly scan each page spending an equal amount of time per page. This is supported by a number of facts. First, the effect of novel words on reading times becomes small and actually negative. A negative coefficient supports the idea that the transfer effect observed in the first day has disappeared. Second, the coefficients of every variables, except the number of characters, have decreased. This suggests a more perfunctory scanning of the text which is less affected by the structure of that text.
Declarative questionnaires. Administration of the Lisp declarative questionnaire before and after each section of a lesson on each day provides supplementary data to investigate the learning behavior of the subjects. Results (Figure 11) show that subjects in both groups can learn from the pretest to the posttest [F(1, 26) = 11.32, MSe = 0.009, p < 0.002] and from day to day [F(1, 26) = 15.72, MSe = 0.002, p < 0.001]. However, even though the practice has been doubled in the present experiment, no transfer effect on the ability of the subjects to correctly answer the questions has been found. For instance, no overall differences between the groups [F( 1, 26) = 0.001, MSe = 0.0185, p < .967] have been found in the total scores. Moreover, the differences between the groups at pretests or posttests, or at day 1 and day 2, also failed to be significant as indicated by non-significant group by testing condition [F(1,26) = 0.08, MSe = 0.0009, p < 0.78], group by day [F(1,26) = 2.95, MSe = 0.002, p < 0.097] and group by testing condition by day interactions [F(1,26) = 0.46, MSe = 0.0008, p < 0.50].
The failure to find significant differences between the groups cannot be attributed to the idea that the transfer group did not fully mastered the Prolog material at transfer times. Results from the declarative questionnaire on Prolog administered to the transfer group suggest that this group mastered about 94% of the Prolog material.
So, the same speed-accuracy trade-off as observed in Experiment 1, between the time spent studying the instruction and understanding of the content of the text, still seems to be present in this experiment. Subjects in the transfer group prefer to read the instructional texts more rapidly instead of increasing their understanding of the text.
Indeed, the transfer group is faster than the control group to answer the questions [F(1,26) = 4.59, MSe = 24.31, p < 0.04]. On average, the transfer group takes 9.50 s to answer a question as compared to 11.50 s for the control group. Even though the answering speed increases from the pretests to the posttests [F(1,26) = 273.93, MSe = 4.38, p < .001] and from day to day [F(1,26) = 171.17, MSe = 5.64, p < 0.001], the advantage of the transfer group remains the same as no group by day [F1,26= 1.95, MSe = 5.64, p < 0.17], group by testing conditions [F(1,26) = 2.28, MSe = 4.38, p < 0.14], and group by day by testing condition interactions [F(1,26)= 3.03, MSe = 2.94, p < 0.09] have been found significant.
Problem solving. The hypothesis raised by experiment 1 that subject's weak level of knowledge in programming has prevented the occurrence of more extensive declarative-procedural or procedural transfer has been tested but not supported. Indeed, analysis of problem solving suggests that increasing practice and subject's level of knowledge tend to decrease the amount of observed declarative-procedural transfer. First, both groups solved about the same number of problems in order to reach the learning criteria as the group [F(1,26) = 0.50, MSe = 39.15, p < 0.48] and group*day variables [F(1, 26) = 0.84, MSe = 11.06, p < .37] failed to be significant. This suggests that transfer has no impact on the amount of remediation needed. Control and transfer groups have respectively solved 37.0 and 35.0 problems per subject in day 1 and 31.1 and 30.7 problems in day 2. On the other hand, the improvement from day to day has been found significant [F(1,26) = 32.30, MSe = 11.06, p < 0.001] indicating that the amount of remediation needed to master the skills has significantly decreased as a result of practice.
Also, transfer had no significant impact on problem solving times. To insure that the same problems are compared for both groups, only the required problems have been included in this analysis. In day 1, the mean times to solve a problem were 102.7 s and 120.5 s for the transfer and control groups respectively. This advantage represents a saving of about 17.8 s which is about 14.8% less than the time taken by the control group but this advantage failed to be significant as a main effect [F(1,24) = 1.72, MSe = 32428.2, p < 0.2013]. This suggests that a greater amount of initial knowledge does not increase the amount of declarative-procedural transfer. Analysis of day 2 suggests that the transfer effect is even weaker once the knowledge has been proceduralized. Indeed, on day 2, the mean times per problem are 65.4 s and 69.5 s for the transfer and control groups respectively and also fails to be significant [F(1,24) = 0.41, MSe = 7421.2, p < 0.53].
We have also provide supplementary evidence about the idea that examples are a key factor in developing a skill. This is that processing learning a prior programming language facilitates processing of examples. It has already been shown that learners better perform in a procedural task when the explanatory text is elaborated with examples instead of having no elaboration (Reder, Charney, & Morgan, 1986) , that building of procedural knowledge is supported by analogical reasoning based on examples extracted from the instructions (Anderson, 1987), and that good and poor students use examples to either produce efficient or less efficient self-explanations that are assumed to subsequently lead to the building of inference rules used in problem solving (Chi, Bassok, Lewis, Reimann and Glaser, 1989). The present paper adds to this work about the role of examples in skill development by suggesting that learners develop a representation of the examples found in the text and that this representation, not only supports problem solving and inference mechanisms, but also supports the acquisition of new knowledge by being activated when reading the instructions in a new domain.
However, the effect of the declarative representation is not as general as might be expected. We have shown that the declarative representation had an effect only on the reading times of specific words, while general and strategic reading processes remain unaffected. The effect on problem solving is also limited. It seems mainly localized in reading the help messages and error messages provided by the tutor. Moreover, procedural transfer appears to be negligible from Prolog to Lisp. Such a conclusion was expected since the procedural representations of these two languages as found in the tutoring environments do not share a set of common rules.
This research has a number of implications for design of computer systems although some of these implications are already part of the common sense of the field. For instance, this research suggests that interfaces for different applications should foster the development of declarative representations that are as similar as possible. This can be seen as an extension of the idea that consistency of use of interfaces is an important factor for ease of learning and ease of use. This is because consistent interfaces are assumed to share common procedural representations (Polson, 1988). Since it has been shown that having common declarative representations has an impact on reading, having a consistent organization of all reading material including vocabulary and a consistent use of it in instruction, examples, error messages and help messages is another conclusion of this research. Also, it is reasonable to think that when mastering a complex domain, the lack of appropriate documentation may prevent any declarative transfer and consequently increase the time needed to master the domain
An important conclusion of this research is that once knowledge is proceduralized, there is no further need for declarative instruction. In a transfer situation this conclusion suggests that there is no need to consider the role of declarative instruction in situations where a fair amount of procedural transfer is expected. So, we may believe that the importance of declarative representations decreases as the overlap of procedural representations increases. An instance of this situation would be to transfer from an English version of an iconic interface to a similar iconic interface operating in a foreign language. Since the iconic interface is language free, the procedural representations will be activated regardless of the language of the instructions and of the other declarative messages.
A more complex situation is the one that has been faced in the present paper. It is indeed somewhat challenging to predict transfer in domains where there is apparently no or little overlap between the procedural representations but where some relationship can be found in the declarative representations of these domains The present paper has demonstrated that there is some transfer of the basic vocabulary and examples but it is conceivable that learner also develops relatively accurate representations of other aspects of the situation that might be useful in some contexts. For instance, he may also develop knowledge of the way information is organized in a particular domain. This declarative knowledge might be useful in a context where the task requires an active search of information. This occurs in a domain like accounting where one has to verify the financial statements of very different institutions and industries. Here, the procedures to review the statements are very abstract and their applications involve specific declarative knowledge about the given industry. Moreover, evaluation of the statements may involve a comparison with similar institutions or industries and this information is constantly changing. Acquiring this knowledge may require a constant search for information that might be complex to perform. Knowledge of the structure of information sources then becomes very important in order to reduce the complexity of the search.
Anderson, J. R. (1983). The architecture of cognition. Cambridge, MA: Harvard University Press.
Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method problem solutions. Psychological Review, 94, 192-210.
Anderson, J. R. (1993). Rules of the Mind. Hillsdale, NJ: Erlbaum.
Anderson, J. R., Conrad, F., Corbett, A., T., Fincham, J., Hoffman, D., Wu, Q.(1993). Computer programming and transfer. In J. R. Anderson (Ed.), Rules of the Mind. Hillsdale, NJ: Erlbaum.
Ausubel, D. P. (1963), The psychology of meaningful verbal learning. New York: Grune & Stratton.
Ausubel, D. P. (1978). In defense of advance organizers: A reply to critics. Review of educational research, 48, 251-257.
Bovair, S., & Kieras, D. E. (1991). Toward a model of acquiring procedures from text. In Rebecca Barr, Michael L. Kamil, Peter B. Mosenthal, P. David Pearson (Eds.), Handbook of reading research, Volume II (pp. 206-229). New York: Longman.
Bovair, S., Kieras, D. E., & Polson, P. G. (1990). The acquisition and performance of text-editing skill: A cognitive complexity analysis. Human Computer Interaction, 5, 1-48.
Brooks, L. W., & Dansereau, D. F. (1987). Transfer of information: An instructional perspective. In Stephen M. Cormier and Joseph D. Hagman (Eds.), Transfer of learning: Contemporary research and applications (pp. 121-150). New York: Academic Press.
Catrambone, R., & Holyoak, K. J. (1989). Overcoming contextual limitations on problem solving transfer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 1147-1156.
Chi, M. T. H., Bassok, M., Lewis, M. W., Reimann, P., & Glaser, R. (1989). Self-explanations: How students study and use examples in learning to solve problems. Cognitive Sciences, 13, 145-182.
Cormier, S. M., & Hagman, J. D. (Eds.). (1987). Transfer of learning: Contemporary Research and applications.. New York: Academic Press.
Dixon, P., & Gabrys, G. (1991). Learning to operate complex devices: Effects of conceptual and operational similarity. Human Factors, 33, 103-120.
Fong, G. T., & Nisbett, R. E. (1991). Immediate and delayed transfer of training effects in statistical reasoning. Journal of experimental psychology: General, 120, 34-45.
Gick, M. L., & Holyoak, K. J. (1983). Schema induction and analogical transfer. Cognitive psychology, 15, 1-38.
Gick, M. L., & Holyoak, K. J. (1987). The cognitive basis of knowledge transfer. In Stephen M. Cormier and Joseph D. Hagman (Eds.), Transfer of learning: Contemporary research and applications (pp. 9-46). New York: Academic Press.
Gray, W. D., & Orasanu, J. M. (1987). Transfer of cognitive skills. Stephen M. Cormier and Joseph D. Hagman (Eds.), Transfer of learning: Contemporary research and applications (pp. 183-215). New York: Academic Press.
Harvey, L. (1990). Systèmes de productions et interaction humain-ordinateur. Thèse de doctorat, Université Laval, Québec.
Harvey, L., & Rousseau, R. (in press). Text-editing skills: Predicting the effects of semantic and syntactic mappings. Human-Computer Interaction, in press.
Hayes, J. R., & Simon, H. A. (1977). Psychological differences among problem isomorphs. In J. Castellan, D. B. Pisoni, and G. Potts (Eds.), Cognitive theory, (vol. 2 pp. 21-44). Hillsdale, NJ: Erlbaum Associates.
Johnson, W., & Kieras, D. E. (1983). Representation-saving effects of prior knowledge in memory for simple technical prose. Memory and Cognition, 11, 456-466.
Kieras, D. E., & Bovair, S. (1984). The role of a mental model in learning to operate a device. Cognitive Science, 8, 255-273.
Kieras, D. E., & Bovair, S. (1986). The acquisition of procedures from text: A production system analysis of transfer of training. Journal of Memory and language, 25, 507-524.
Laird, J. E., Newell, A., & Rosenbloom, P. S. (1987). SOAR: An architecture for general intelligence, Artificial Intelligence, 33, 1-64.
Polson, P. G. (1987). A quantitative theory of human-computer interaction. In J. M. Carroll (Ed.), Interfacing thought: Cognitive aspects of human-computer interaction (pp. 184-235). Cambridge, MA: Bradford Books/MIT Press.
Polson, P. (1988). The consequences of consistent and inconsistent user interfaces. In Raymonde Guidon (Ed.), Cognitive science and its applications for human-computer interaction (pp. 59-108). Hillsdale, NJ: Erlbaum Associates.
Reder, L. M., Charney, D. H., & Morgan, K. I. (1986). The role of elaborations in learning a skill from an instructional text. Memory & Cognition, 14, 64-78.
Royer, J. M. (1979). Theories of the transfer of learning. Education Psychologist, 14, 53-69.
Royer, J. M. (1986). Designing instruction to produce understanding: An approach based on cognitive theory. In G. D. Phye & T. Andre (Eds.), Cognitive classroom learning: Understanding thinking, and problem solving (pp. 83-113), Orlando, FL: Academic Press.
Salomon, G., & Perkins, D. N. (1987). Transfer of cognitive skills from programming: When and how? Journal of educational computing research, 3, 149-169.
Scholtz, J., & Wiedenbeck, S. (1989). Learning second and subsequent programming languages: a problem of transfer. Technical report #80 of Department of Computer Science and Engineering. University of Nebraska.
Singley, M. K., & Anderson, J. R. (1985). The transfer of text-editing skill. International Journal of Man-Machine Studies, 22, 403-423.
Singley, M. K., & Anderson, J. R. (1988). A keystroke analysis of learning and transfer in text-editing. Human-Computer Interaction, 3, 223-274.
Singley, M. K., & Anderson, J. R. (1989). The transfer of cognitive skill. Cambridge, MA: Harvard University Press.
Thibadeau, R. H., Just, M. A., & Carpenter, P. A. (1982). A model of the time course and content of reading. Cognitive science, 6, 157-203.
Wu, Q., & Anderson, J. R. (in preparation). Problem solving transfer among programming languages.
A typical page that can be part of both Lisp and Prolog instructional manuals
In this chapter, we will introduce what is one of the most useful data structures in prolog, the list. The list can be defined as an ordered sequence of elements separated by commas, and enclosed by a pair of square brackets([ and ]). The following are examples of lists:
The second basic entity, the list, is an important structure that gives Lisp much of its power because it allows the grouping of symbols. A list is a sequence of expressions enclosed in a pair of parentheses. The following are examples of lists:
Group Phase 1 Phase 2 Prolog) (Lisp) _____________________________________________________ Control ------ Lessons 1,2,3 Transfer Lessons 1,2,3 Lessons 1,2,3 _____________________________________________________
Results of fitting an information processing model on mean
reading times.
Overall fit of the model, (R2=0.70, Number of data
points=134).
Variable Coefficients t Statistics Probabilities Constant -3.89 -0.91 0.36 Group x New Words 1.149 4.18 0.001 Group x Examples 4.022 2.95 0.004 N of Characters 0.052 4.25 0.000 N of Lines 2.205 3.43 0.001 Position in section -1.31 -3.52 0.001 Position in lesson 0.89 5.91 0.000
Estimation of help and errors processing times
(R2=0.265, Number of data points = 20,896)
Standard Variable Coefficients t Statistics Probabilities error of estimates Constant 10.76 44.07 0.001 0.244 Group (correct code) Control -0.17 -0.50 0.620 0.352 Transfer 0.00 Help*group Control 12.14 38.73 0.001 0.314 Transfer 5.64 19.70 0.001 0.286 Errors*group Control 6.24 28.40 0.001 0.187 Transfer 4.68 34.33 0.001 0.165
Mean number of successful, help only, error only, and error and help interactions for the control and transfer groups.
____________________________________________________________________________
Groups Successful Help-Only Error-Only Error-and-Help Control 456.4 39.5 90.8 107.0 Transfer 483.5 25.7 86.9 92.3____________________________________________________________________________
Experimental design of Experiment 2
Each lesson is presented on a different day.
______________________________________________________ Group Phase 1 Phase 2 (Prolog) (Lisp) Control ------ Lessons 1, 1, 2, 2, 3, 3 Transfer Lessons Lessons 1, 1, 2, 2, 3, 3 1, 1, 2, 2, 3, 3 ______________________________________________________ <pre>
<h4>Figure 9</h4>
<h4>Figure 10(a)</h4>
Results of fitting an information processing model on first day mean reading times. (R2=0.64, number of data points = 134).
<pre> Variable Coefficients t Statistics Probabilities Constant -6.39 -1.55 0.123 Group x New Words 1.495 5.56 0.000 Group x Examples 2.565 1.29 0.197 N of Characters 0.014 1.23 0.219 N of Lines 2.543 4.13 0.001 Position in section -0.876 -2.44 0.016 Position in lesson 0.804 5.41 0.000
Figure 10(b)
Results of fitting an information processing model on second day mean reading times. (R2=0.70, number of data points = 134).
Variable Coefficients t Statistics Probabilities Constant -3.409 -2.16 0.032 Group x New Words -0.223 -2.17 0.032 Group x Examples 1.222 1.62 0.109 N of Characters 0.028 6.30 0.000 N of Lines 0.713 3.02 0.003 Position in section -0.288 -2.10 0.038 Position in lesson 0.394 6.92 0.000
Group Day Conditions Pre Post Total _________________________________________ Control 1 .81 .84 .82 2 .86 .87 .865 _________________________________________ Transfer1 .81 .85 .83 2 .85 .85 .85 _________________________________________
Figure 12(a)
Estimation of help and errors processing times on first day (R2=0.51, number of data points = 11,102)
Standard Variable Coefficient t Statistics Probabilities error of estimates Constant 11.54 22.27 0.001 0.387 Group (correct code) Control -0.76 -1.06 0.291 0.721 Transfer 0.000 Help*group Control 18.93 24.81 0.001 1.442 Transfer 18.47 15.16 0.001 1.110 Errors*group Control 23.98 39.20 0.001 0.842 Transfer 15.42 29.44 0.001 0.632Figure 12(b)
Estimation of help and errors processing times on second day (R2=0.42, number of data points = 8,220)
Standard Variable Coefficient t Statistics Probabilities error of estimates Constant 8.992 23.21 0.001 0.518 Group (correct code) Control -0.60 -1.10 0.620 0.721 Transfer 0.00 Help*group Control 10.27 7.13 0.001 0.763 Transfer 25.89 23.38 0.001 1.218 Errors*group Control 33.18 39.32 0.001 0.612 Transfer 18.26 28.89 0.001 0.524