Mastering 'The Tolstoy Effect': a research exercise in linguistic philosophy

The research design and methodology are outlined in order to contextualise the key aspects of the process for developing the Temple Index of Functional Fluency (TIFF), namely the harnessing of the creativity of many people and the collaborative style of working with ongoing piloting and evaluation. The social nature of this process is shown to demand linguistic exploration of shared meanings at every stage. The results of the data analyses are presented to demonstrate how the quantitative data provided the essential framework for the qualitative search for the evidence of meaningfulness that supports and illustrates the validity and reliability of TIFF. This search for evidence can be seen to illuminate, and be illuminated by, transactional analysis ego state theory and is characterised by 'The Tolstoy Effect'.


Introduction
Functional Fluency is the art and skill of interpersonal effectiveness. Interpersonal effectiveness is fundamentally about communication and therefore the finding of ways to share meanings through language; hence the title of this paper.
The matters summarised above in the Abstract have been found to have great importance in terms of the positive parallel process between the nature of the research journey and the process of using TIFF for personal development work with clients. Genuine collaboration and sharing of ideas were key characteristics of the former and are the chief factors that fuel the dynamics of the latter. They are key to the empowerment that TIFF offers clients, and evidence of the veracity of McLuan's (1964) famous phrase, "The medium is the message". This paper takes the form of a structured reflection on the research process to highlight the connections between these matters and their significance for the claim that TIFF is a research-based personal development tool.

Stages of the research
In order to achieve the research aims, i.e. to find out a) if the Functional Fluency model could be validated as the basis for the development of TIFF and b) how well TIFF would work as a personal development tool, it was vital that the model construction was as thorough as possible to provide a sound basis. Any measure of human attributes, including the development of TIFF, requires a particular set of stages (Lanyon & Goodstein 1997): • Definition of constructs. What psychological constructs need to be measured and why must they be unidimensional?
Task -conceptualise the constructs.
• Manifestation of the constructs. What behaviours would evidence them? Task -generate behavioural indicators.
• Construction of the measure. What sort of questions would elicit useful answers? What sort of scaling would provide useful information? Task -decide how the test will work.
• Scoring method. How will the results be relevant and useful? Task -pilot & test the workability.
• Statistical analysis. What do the results show? Task -Analyse the results, testing for validity and reliability.
• Presentation of Results. How can the results be used? Task -Evaluate the usefulness of the measure. Table 1 shows how the Functional Fluency research study followed this pattern. Stage 1 was particularly important for laying a reliable foundation of unidimensional constructs that were defined both precisely and richly. Choosing the words for these definitions demanded accurate differentiation between subtle meanings and deep understanding of the nature of the constructs of the Functional Fluency model. The collating of many people's ideas and designing and using a logical framework for making the choices took many International Journal of Transactional Analysis Research Vol 4 No 1, January 2013 www.ijtar.org page 86 The 302 participants were all people whose work was with people.
The 20 sub-groups came from a variety of contexts within overall population.
Aims were: to make a data collection in order to do relevant statistical analyses to demonstrate reliability and validity, investigate the effect of personal variables and refine all materials. Findings were cross-checked with triangulation studies wherever possible. months, but was well worth it for the eventual gathering of the Pilot data.
The set of stages for the Functional Fluency doctoral study included a seventh that indicated the follow-up studies undertaken after the main study.

The Methodological Process
The methodological process was designed to proceed in a series of separate stages, each one unfolding from the one before. Each stage evolved according to the findings from each data collection. McGilchrist, Myers and Reed (1997), Kemnis and McTaggert (1982) and Elliott (1981) have proposed similarly reflexive processes. What these processes have in common is a spiralling flow of action that includes having a general plan to embark on, a study of some sort, then monitoring and analytical procedures from which emerge revised ongoing plans for the next stage. Such a pattern is repeated in order to maintain relevant and well-considered decision-making in the cause of the creation of a high quality project. In this way the study is informed by both theory and empirical evidence, rather than only being driven by the theory (Denscombe 1998). The Functional Fluency Index (FFI) study was also enriched by the involvement of many people who contributed their knowledge, experience and creativity and thus widened the perspective of the developing instrument. This contributed to its eventual efficacy and its acceptability to a very wide range of users.
The FFI Project Outline above gives an overview of the Project agenda. At every stage of the project process careful attention was given to issues of reliability and validity throughout, following principles cited by Neuman (1994 p 129): 1.
Clear conceptualisation of constructs.
Construct conceptualisation for this study, a vital foundation, in fact took place over some years prior to the start of the project. This entailed the development and trialling of the Functional Fluency model used as the basis of the study (Temple 1990). Similarly the pool of descriptors used in stage one of the study was generated and developed over a period of years using a wide variety of sources.

2.
Use of multiple indicators.
Using information from the initial Descriptor Sort Exercise, it was decided to use twelve indicators to measure each of the nine constructs in the theoretical model."Reliability tends to increase as the number of items in a combination increases" (Nunnally 1978 p 67).

Precise levels of measurement.
Indicators were measured at ordinal level, using six categories of refinement.

4.
Use of pilot tests.
Major pilot tests were a key aspect of the project design. As well as this, the principle of piloting was put into practice whenever feasible to improve effectiveness of instructions, layout, design of exercises. Nunnally (1978) claims that "To the extent to which measurement error is slight, a measure is said to be reliable" (p191). The efforts cited above were intended to reduce systematic error or bias that would produce skewed results. Attention was therefore paid both to the actual structure and matter of the measure -the content -and to the process of creating the measure stage by stage. Random errors can never be completely eliminated, but the attention paid to consistency of organisation and appropriateness of development processes was intended to increase the stability of the final instrument.
Attention was paid throughout the study to the measurement validity, to make as good a fit as possible between the constructs and the operationalised indicators thereof. As Neuman (1994) writes, "Measurement validity refers to how well the conceptual and operational definitions mesh with each other" (p130) in the cause of the creation of what he calls a "true measure". Different aspects of measurement validity were given particular attention during different stages of the study: • Face validity. The design of the Descriptor Sort Exercise set out to show that the constructs of the model had high face validity, by seeking the rate of consensus amongst a high number and wide range of expert judges as to their description.
• Content validity. As above, the range of consensus of the judges gave a wide and comprehensive description of each construct.
• Construct validity. The factor analysis in part two of the study showed how consistent the multiple indicators were for each construct. Convergent and divergent validities were also demonstrated at this stage.
High internal validity was aimed for through thorough validation exercises and careful item development in order to reduce systematic error.
The FFI is a measurement based on a rationaltheoretical strategy of construction (Lanyon & Goodstein 1997). It is, to use their terminology (p 58), "congruent with a particular theoretical view" of human psychological functioning, that of Transactional Analysis, and is designed to assess concepts within that theory. The creation and selection of test stimuli adhered consistently to the demands of the theoretical model, while using a combination of rational and intuitive strategies in their development. This was consistent with what Lanyon and Goodstein (1997) call "state of the art method" (p 119), in which a measurement instrument employs a rational and/or theoretical basis for initial item development, and follows up with the use of empirical and factor-analytic methods for the process of item refinement.
The behaviour pattern profiling of the FFI was achieved by means of a self-report questionnaire. It measured the usage of the various modes of behaviour featured in Temple's (1999) expanded model of human functioning. This Functional Fluency model acted, according to Nunnally (1978), as "an internally consistent plan for seeking a good scaling of an attribute". Nunnally continues, "Having a plan increases the probability of finding an acceptable measure" (p 31).

The 'Tolstoy Effect'
"All happy families resemble one another, but each unhappy family is unhappy in its own way". The opening sentence of 'Anna Karenina' by Leo Tolstoy 1875 Throughout each of the project stages there were linguistic issues to tussle with. The Descriptor Sort exercise in stage one demanded of participants that they choose which Modes the descriptors best described. All had preparation for doing the task, but of course the choice depended on the meanings attributed by the participants to the words, and their understandings of the nature/meaning of each of the Modes. The sorting of the huge matrix created by the exercise was another demand for linguistic discernment that enabled deeper understanding of the Modes. Interestingly, in every case it seemed harder to choose descriptors for the positive Modes than for the negative Modes.
At this stage and again in stages two and three, generating behavioural indicators and creation of test items, there was a clear difference in understanding and expression of the characteristics of the positive Modes compared with those of the negative Modes. The phenomenon was highlighted again at the data analysis stage, dealing with the hundreds of TIFF pilot profiles. This was the point at which the phenomenon was named the "Tolstoy Effect".
In particular the patterns of human behaviour shown in the pilot profiles provided evidence of the Tolstoy Effect from several perspectives. In each case the positive Modes of behaviour exhibited greater integrated wholeness, while the negative Modes manifested more variance and therefore greater fragmentation. The coherent blending of the positive Modes made it hard to differentiate between their various separate aspects, whereas with the negative Modes the various separate aspects seemed to stand out more clearly. There was therefore a clear pattern of differentiation between the positive and the negative Modes in terms of how easy or difficult it was to identify, understand and express their respective characteristics.
Tolstoy's creative assertion conveys a depth of meaning that speaks poignantly to the human condition. It is a fact that when things are going well there is little motivation to examine the way that this is happening -"If it ain't broke, don't mend it" as we say. It is only when something goes wrong that people want to find out how and why. The urge to diagnose the negative is strong; there may be evolutionary imperatives at work. Noticing and interpreting something untoward in the natural or social environment may always have been an important survival reflex. When this is a very strong cultural habit, however, negative reinforcement is common, but positive reinforcement often gets left out. This was a crucially important factor to understand for the development of TIFF, in which the intention was to reinforce and make familiar the positive options for behaviour by describing and explaining them in as much detail as the negative options. Appreciation for doing well, and understanding of how to do it, is a serious motivator for behavioural change which was planned as a priority for the use of TIFF.

Data Analysis -Aims and Objectives
Firstly, the overall aim of piloting the FFI Questionnaire with a large enough appropriate sample to provide suitable data for construction of a norm was fully achieved.
There were 302 respondents in the Pilot sample, all human service practitioners of one sort or another. This meant that all could be expected to manifest above average emotional literacy and thus score higher on the positive Modes and lower on the negative Modes. This would be one way to test the validity of TIFF. There were 20 groups within the sample with a wide variety of professional focus. This in turn meant that TIFF could be tested by seeing how the Average Group Profiles varied and whether the variations made sense and could be explained by using TA theory and/or social reality norms.
The main objectives of the range of analyses undertaken were to investigate the results in order to: • Illuminate how the instrument operationalises the theory behind the model.
• Present evidence of how the instrument portrays respondents' characteristics.
• Examine the effectiveness of the instrument in order to identify ways to improve it. These objectives, though closely linked, were oriented in different directions. The data provided a central source of information illuminating both the world of theoretical ideas and the world of concrete reality.
The data analysis was a "systematic, in-depth inquiry" of the sort claimed by Gregory (2000 p 156) to deliver scientific answers through the diligent efforts of researchers to "distinguish the pattern into which facts (phenomena) fall and their succession, and as every science does, look for hypotheses that give coherence to the pattern (de Chardin 1970)".

Construction of the Profiles and the Pilot Norm
In order to facilitate these considerations and promote coherence of the analytical inquiry, a decision was made to collate and present the test results using an adjusted version of the profile format used in the instrument feedback materials for respondents. This helped to make the patterns contained in the results immediately visible and comparable, and meant that a comprehensive range of statistics could be displayed simultaneously. The same format was used for both individuals' results and for group average results. To these ends the following Group Profiles were created: • Average Total Pilot Profile (N=302) • Average Form A Profile (N=177) and Average Form B Profile (N=125) • Average Pilot Group Profiles, 3 top-scoring, 3 middling-scoring and 3 bottom-scoring.
• Average Profiles for: 2 Gender Groups, 6 Age Groups, 3 Levels of Prof. Responsibility Groups.
Individual respondents' profiles were also examined and compared when relevant to explore scoring significances, for instance when there were exceptional patterns or anomalies.
The outcome of the systematic qualitative exploration of the quantitative data was that there was considerable evidence to demonstrate the aptness and coherence of ego state theory and its consistency with the Functional Fluency model theory. Group by group and person by person, there was also substantial evidence for claiming the accuracy of the data with regard to how they portrayed the characteristics of the respondents. As intended, the bonus of such a detailed exercise was the information needed for instrument refinement.
The summary descriptive statistics of central tendency and dispersion were used to create the Averages of the Group Profiles in order to express the sum of the scores obtained from the 302 respondents in the Pilot Study. After careful consideration, the mean was judged to be the most suitable choice for expression of central tendency, or average. Key to having confidence in the data so produced, as above, was the creation of the Average Total Pilot Profile and the results of its rigorous testing.

Comparison of the Pilot Data with That of a Theoretical Population Answering at Random
A Monte Carlo method was used to create a theoretical ('phantom') population of 10,000 cases. Computergenerated random scoring on all the 108 variables produced a theoretically random Profile of the nine Modes, the Average Phantom Profile. Using identical scoring mechanisms as in the FFI Pilot, results were produced to show the distribution of the 10,000 phantom FFIs, for comparison with the Pilot results. The figure below shows the difference in the respective means and the amount of scoring overlap.
This exercise provided firm evidence that the questionnaires were producing a genuine result rather than a random one. It can be seen from the above figure that the overlap of Pilot results with the Phantom Population is very small with the FFI mean falling outside the range of the Phantom Population scores, thus indicating that the actual Pilot population was not answering randomly. A t-test for the equality of means showed that the means of the Pilot FFI and the Phantom population FFI were different at p<.001. The fact that the Pilot population's scores clustered round the mean of 2.42 rather than the 1.54 of the randomly generated Phantom Population demonstrated that the phenomenon concerned the Pilot population's characteristics and was not simply a regression effect.

Exploring the Pilot Data
Two examples that demonstrate vividly how the Pilot data was explored follow next. Both reveal the theory behind the model and indicate the psychometric potential of TIFF.
First are the mode frequencies. The charts in Figure 3 show the distributions around the mean. They indicate reasonably normal distributions, thus further endorsing the choice of the means to express results. The mode frequencies of the set of nine Functional Fluency modes of behaviour are laid out in the standard pattern of the model. They depict both quantitative and qualitative aspects of the Modes and demonstrate various aspects of theoretical validity of the model, for instance: The five positive Modes are clearly differentiated from the four negative Modes by their relative positions on the x axis with positives to the right and negatives to the left.
The positive Modes show less variability than the negative Modes, with the exception of SPONTANEOUS Mode which demonstrates its idiosyncratic nature as the manifestation of people's uniqueness. This characteristic was demonstrated elsewhere (another example is below in the cross correlations). An explanation is that the positive modes of behaviour blend together in use. They are learned, with the exception of SPONTANEOUS Mode, which is, however, integrated with the other four in order to 'respond' realistically to situations. Negative mode use demonstrates 'reactions' that lack integration with positive modes, especially ACCOUNTING.
• The squatter shape of the SPONTANEOUS and IMMATURE charts shows their wider variability, as might be expected of the natural element of the Self Actualisation category of social behaviour.
• The simple pairing of the Social Responsibility Modes as roles can be seen in contrast to the complex group of Self Actualisation Modes that relate to the spiral of human development.
• The greater variability of the DOMINATING (Criticising) and MARSHMALLOWING Modes could be explained by saying that this is due to their ego state source being Parent contaminations of Adult.
The second example is the results of the cross correlations of all the Modes with each other using Pearson's R. The most significant aspects of the correlational pattern are illustrated in Table 4. Points of theory illustrated in the patterns are noted in the commentary.
It was possible to take each Mode in turn and track the pattern of correlations with the other Modes, in order to illuminate theoretical implications of the model and demonstrate coherence and consistency. For example, MARSHMALLOWING had a small correlation with other negative Modes and almost no correlation with any positive Modes. On the other hand, IMMATURE had a small correlation with MARSHMALLOWING, a slightly larger one with COMPLIANT/RESISTANT, an even larger one with DOMINATING but negative correlations with the three positive Modes STRUCTURING, NURTURING and ACCOUNTING.
This statistical analysis was important in demonstrating further aspects of the relative independence of the nine Modes, which was in addition to the conceptual independence of the Modes demonstrated by the results of the initial Descriptor Sort exercise at the start of the project. N.B. the statistical significances showed up as high because of the large sample. What was of theoretical/practical significance were the actual values (relatively low though they were) of the Pearson's R coefficients, and the patterns they revealed, which also gave some further evidence of the theoretical validity of the model. For instance: • There were no high correlations between Modes. In terms of practical significance, this gave some evidence of the relative independence of the constructs, high dependence being an extreme form of relatedness.
• The highest correlations (between 0.4 and 0.6) were found between the cluster of the five positive Modes, giving some evidence of their mutual integration.
• The highest correlation of all was between STRUCTURING and NURTURING (0.59). They are the twin aspects of 'positive parenting' (Illsley Clarke 1979, Baumrind 1991.
• The only other correlation over 0.4 was between DOMINATING and IMMATURE, suggesting that immaturity is connected with negative manifestation of authority.
• The negative correlations in particular indicated key theoretical points, namely that DOMINATING contrasts with NURTURING, and IMMATURE contrasts with the positive Modes of STRUCTURING, NURTURING and ACCOUNTING.     Under the influence of Parent or Child contaminations of Adult, Accounting may be inhibited, limited, distorted or irrelevant -not usefully usable in that moment -and the ensuing 'reaction' will use one of the negative modes of behaviour. If the person has actually slipped right out of Adult into a Parent or Child ego state then the Accounting will belong to the ego state slipped into, and the resulting automatic transferential 're-enaction' will give behavioural evidence of the ego state slippage.

Linguistic philosophy in action
Using the terms 'response', 'reaction' and 're-enaction' consistently in this way as 'technical terms', has proved useful for aiding understanding of the way the Functional Fluency model maps social behaviour. Models convey messages simply through their design and terminology (Allen 2002). The development of TIFF, using the validated model (Temple 2004) supported the use of a variety of language registers to name, describe and explain how the model works and how to make use of the TIFF results (Appendices 1 & 2). The registers are flexible and range from the formal, to the colloquial (even slang). What matters is that they are all accurate and consistent with the meanings within the Functional Fluency cognitive map. The application of linguistic philosophy throughout the research project supported the way the theory has translated into practice and is accessible to a wide variety of people. As with other TA conceptual maps, people latch on to the Functional Fluency model with enthusiasm and, as Alison Gopnik explains (2009), they have a natural human urge to use it to make sense of their experience.

Data analysis as enrichment of the research process
In retrospect, it seems that the enrichment and value from the data analysis grew from the combination of quantitative and qualitative analyses undertaken in order to meet the need to combine attention to theoretical matters with attention to pragmatic issues. Charles Desforges (Desforges 2000) stated in a lecture, "There is enormous synergy in working on the dimensions of both practical use and fundamental understanding" (p 13). He was referring to the phenomenon named 'Pasteur's Quadrant' (Stokes 1997) which points up the differences between research focussed solely on either fundamental understanding or the practicalities of use, and research such as Louis Pasteur's which had a dual focus, encompassing both.
Comparative and inferential statistics on the descriptive data enabled the qualitative explorations that revealed information about how personal details and professional contexts affected the scores. Triangulation exercises then helped to deepen understanding of how TIFF worked as a tool for personal development, in preparation for learning how to use it. The further range of analyses included Coefficient of Variation Analysis, Reliability Analysis using Cronbach's Alpha, and Factor Analysis, all of which, as well as giving more evidence of validity and reliability, provided subtle indications of refinement needs with respect to the test items. These indications were added to those already gathered. Often the same need was simply confirmed. 'Rogue items' were identified clearly by this sort of detective work.
Another type of analysis -that of the detailed evaluations collected from the Pilot respondents -was undertaken in order to help improve all the other aspects of the TIFF self-report questionnaire.