Client Assessment in Transactional Analysis – A Study of the Reliability and Validity of the Ohlsson , Björk and Johnsson Script Questionnaire © 2011

A script questionnaire and associated checklist developed by Ohlsson, Johnsson & Björk (1992) was used by the author and two professional colleagues to independently assess ten clients of a year-long transactional analysis therapy group conducted by the author. Ratings based on written responses at start of therapy were compared to ratings based on videotape interviews conducted by the author six years after termination of therapy. Moderately high inter-assessor reliability was found but intra-assessor reliability was low for the independent assessors; agreement increased for script components ‘primary injunction from father,’ ‘racket feeling’, ‘escape hatch’, ‘driver from father’ and ‘driver from mother’.


Literature Review
The TA Concept of Script "The ultimate goal of transactional analysis is the analysis of scripts, since the script determines the destiny and identity of the Individual" (Berne, 1958, p. 737). Berne (1961) emphasised how scenes and experiences from early family drama are played out in everyday life in a specific and concrete way, similar to theatre dramaturgy, and argued that the task of therapy is to liberate the individual from the compulsion to repeat reliving the early script-bound scenes and thus start a new independent route in life. Although he defined script essentially as an "unconscious life plan for the individual based on decisions made in early childhood" (Berne, 1966, p. 300), he was not interested in therapies with long processes of transference and countertransference to raise an awareness of unconscious material. His method was allied with the client's functioning in the present, where the focus was mostly on processing the early message that the client could explicitly remember. He continued to develop the concept, culminating in a definition published posthumously (Berne, 1972) of script as "an on-going programme, developed in early childhood under parental influence, which directs the individual's behaviour in the most important aspects of his life" (p. 418).
Berne's approach was further developed by his colleagues and successors (English, 1972;Goulding & Goulding, 1976, 1979Steiner, 1967;Woollams, 1973). Steiner (1967) added the script matrix as a diagram showing how the ego states of the Child are impacted upon by injunctions, counterinjunctions, drivers and programme from the ego states of the parents. Readers unfamiliar with TA concepts are referred to Tilney (1998) for a glossary.
Steiner's matrix emphasised the functional clinical usefulness as it can be used to fill in the client's messages directly into the matrix. Other diagrams by Berne (1966), Goulding & Goulding (1979) and Woollams & Brown (1978) were more detailed and aimed at clarifying the theoretical developmental psychological aspect. Following an idea by Karpman (1966), Steiner (1967) complemented his visual matrix with a checklist where other script components were listed. Berne (1972) provided a script questionnaire comprising 220 questions; this was followed by questionnaires with fewer questions from authors such as James (1977), McCormick (1971) and Holloway (1973a).
Based on different versions of script questionnaires/ checklists, Ohlsson, Björk & Johnsson (1992) designed, from their clinical experience, a script questionnaire with 43 questions (Appendix A) and a script checklist (Appendix B) including a script matrix with a checklist. These have been the work material for this study. Because of the various meanings given to the word 'script', it is suggested that the term as used in this paper refers to all of the items in this checklist and that, ideally, when talking about a person's script, the observer is referring to the whole checklist rather than to one or a few of the items. Tomkins (1995a), originator of affect theory, posits nine early innate biological affects that are the foundation of our motivation to survive. When the little child communicates affects, the parents modulate these to an 'acceptable' level (Nathanson, 1992). Tomkins (1995b) makes clear that affects differ from emotions and feelings; the former are biology whereas the latter are linked to historical development and are interconnected with the individual's unique thoughts and memories, for which Tomkins (1978) also uses the term script.

Comparable theories
Like Berne, Tomkins uses concepts and metaphors from the theatre, suggesting that feelings are organised on two levels as scenes and scripts. The scene is the basic unit, where the feeling is attached to an object (person), or a theme and an event with a beginning and an end. Tomkins' script refers to guiding principles for how the scenes are organised, and thus how specific or emotional experiences will be predicted, understood and controlled. As with TA theory, scripts can be adequate or destructive.
The cognitive theory concepts of schema (Perris, 1996), and RIGS (Representation of Interactions that have been Generalised) Stern (1991) have great similarities with Tomkins' (1978) script. They are all about individual-specific structures and patterns formed in childhood, which have subsequently guided the individual through life for good or bad. One difference is that Perris emphasises cognition while Berne, Stern and Tomkins underline the emotional interaction in early relationships and the ability to create and develop an internal object world.
TA script theory can also be linked to the psychoanalytical view on neurosis as an intra-psychic conflict (Fenichel, 1945, Haak, 1982. Small children come into conflict with the environment when they are frustrated in getting their operational needs satisfied. The conflict is pushed away, becomes unconscious and then fixated as a need at the time of the conflict. When, at times of crisis later in life, the individuals want to regain their inner balance, they regress to the point of fixation. The ego resolves the conflict by creating a symbolically designed compromise formation, the neurotic symptom. This is the solution Berne called the early decision, which is the basis for script formation. In a number of studies, TA has been compared with other treatment methods (Goodstein, 1971;Ohlsson, 2010;Novey, 1999;Shaskan, Moran & Moran, 1981) where the script application of TA therapy resulted in positive outcomes.

Diagnosis
The problem with TA diagnoses is that there is no standardisation or precision in the concepts and therefore it is uncertain whether the diagnosis has relevance (validity) in relation to the treatment process. As with most therapies, TA diagnoses are not regularly tested to achieve consistency between TA and non-TA practitioners. However, the communicability to the client and the usefulness are considered satisfactory without confirmation by a research context. Widdowson (2010) has shown that many TA therapists use the DSM-IV or ID 10 diagnostic system in addition to their TA diagnosis. ID 10 is vaguely classified, while the DSM IV has clear behavioural criteria and can serve as a symptom classification instrument. Stewart (1996) found that DSM and ID classifications are not appropriate for practitioners because of contrasting opinions of how health problems should be described and of their narrow focus on the client's symptoms. Diagnoses do not usually follow a formally structured methodology and therapists also draw their conclusions from the informal process-oriented dialogue with the client (Cornell, 2008), in which the therapist emphasises the observation of oneself, one's feelings, memories and thoughts, so-called counter-transference. (Novellino, 1984, Hargaden & Sills, 2002. The diagnosis is then used initially in a wider sense. The psychodynamically developed OPD-2, Operationalized Psychodynamic Diagnostics (2008), has been identified as an appropriate and well-developed diagnosis instrument, well tested in a series of reliability and validity studies. It would be important for TA practitioners to link to other systematic classifications and pragmatically create congruence between the systems. The knowledge that it is possible to describe poor health in more ways is basically fruitful and can compensate for the risk that the diagnosis has a negative effect of becoming a selffulfilling prophecy, especially for those who believe that a diagnosis always has an organic basis and a disease. An attempt to combine diagnostic descriptions based on TA and DSM has been made by Stewart & Joines (2002) including a classification of different personality adaptations. It has become widespread among TA practitioners but has not been researched in detail.

Aims of the study and questions posed
The aim of this study was to make client assessments, using interviews with a script questionnaire, by identifying central key conflicts in accordance with TA script theory and to examine the reliability of those analyses. The TA script theory can be viewed as a methodological theory and as an intervening variable.
The following research questions were posed:

1.
Is there agreement between script analyses made on two separate occasions, on the same client and made by the same assessor (intra-assessor reliability)?

2.
Is there agreement between script analyses made on two separate occasions, on the same client and by different assessors (inter-assessor reliability)?

Ethical permission
The research was conducted under the provisions of Protocol 104-2 (Forskningsetikkommittén (2002)

Methodology
The study subjects were 10 clients who had sought therapy voluntarily and attended a one-year, 24 sessions of two and a half hours TA therapy group with the author as psychotherapist. They responded to the 43 question script questionnaire and checklist (Appendices A & B) at T1 -start of therapy and T2six years later. At T1 they answered the written questionnaire themselves on the basis of instructions given by the author at the first session and submitted the completed questionnaires at the next therapy session. At T2, the author acted as interviewer, using the same questions and instructions as at T1. These interviews were videotaped.
The final material consisted of nine completed script questionnaires and ten videotaped script interviews. Analyses were made on both occasions by the author and by two independent assessors separately; all three were licensed psychotherapists and formally educated transactional analysts (TSTA-P Teaching and Supervising Transactional Analyst in the Psychotherapy field) with extensive experience as trainers and psychotherapists.
A total of 57 individual analyses were completed in which 26 different script components were assessed at each analysis. A series of tables are included. Assessors coded 1 st , 2 nd and 3 rd drivers from five, and made choices from 12 possible injunctions (Goulding & Goulding, 1976), three potential positions on the drama triangle (Karpman, 1968), four life positions (Berne 1972) and three variants of escape hatches (Holloway, 1973b). Other components were formulated freely. Each client was described with a document that assembled all the data from the assessments on the two occasions (see example  Table 1).
Based on each client's version of Table 1, versions of  Table 2 were created to show reliability of interassessor and intra-assessor agreement. The sum-mary of these results is shown in Table 3 and illustrated graphically in Figure 1. In order to calculate the percentage agreement, full agreement between the three assessors was scored 3, partial agreement 2, zero for no agreement, and a hyphen was used to indicate missing assessment items. The percentage agreement was calculated as a simple and direct measure of reliability with no adjustment for random agreement in the coding. This adjustment was made at a later stage (Tables 5-8) when the kappa coefficients according to Fleiss (1971) were calculated for a sample of primary script components. Tables 9-10 focus on intra-assessor reliability.

Reliability considerations
Sources of error with humans as measuring instruments are numerous and create well known reliability problems (Armelius & Armelius, 1985). In this study these problems were addressed by using comparisons of assessments from well-trained and experienced transactional analysts (inter-assessor reliability) and assessments on several occasions (test-retest reliability or intra-assessor reliability). The complexity of the rating procedure contributed to reducing the reliability, whereas providing direct observations of the script interviews on the second assessment gave assessors access to significant phenomenological data as if they had been there.
As the therapist conducted the video interviews himself, a clear, confident and trusting situation was created for the client. The six-year interval meant results would be influenced by the client's maturity, development and possibly by other treatments; however the long gap would decrease the client's memory of previous answers given.
Therapist adherence to methodology has been linked to important positive outcomes by Luborsky et al (1985) but the TA therapy provided in this study did not follow a specific manualised treatment procedure (adherence), and the theoretical and operational definitions of script and its different components are qualitative and multidimensional. Clinical practice in TA requires a constantly modified observational process, making it more difficult to be confident of assessor reliability in statistical terms. A logicaldeductive approach was used, whilst being aware of subjective and qualitative elements in the definitions and observations that were used.

Validity considerations
Cook & Campbell (1979) discuss problems that may occur with different types of validity. The operationalisation of the theoretical definitions of the concepts is rooted in clinical practice so construct validity is complex. Content validity has never been tested empirically, but has been assessed according to face validity by the different TA therapists. The interviews and assessments indicated that the so-called face validity was good, as the validity of the motivation, trust and knowledge of script questionnaires validity was high among interviewers and interviewees. The therapy room where the interviews took place and the direct contact between the therapist/interviewer and the client may in this context be regarded as an authentic environment with good ecological validity (Shadish, Cook & Campbell, 2002). In the video the assessors could see how the clients reacted and responded to the interview questions. This on-line validation was built into the interview dialogue and has been used in other studies such as family therapy (Gustl et a., 2007;Sundell, Hansen, Andrée-Löfholm et al, 2006).
In a mainly qualitative study, it becomes important to describe how data have been collected and processed in a systematic manner (internal validity). The script interview in the study was compiled by the assessors and used in a clinical context over a 25-year period, so may be regarded as relevant and reliable for its intended purpose.
In clinical research the 'truth' is highly linked to practical implications so we needed to take into account the therapeutic movement or process. Kvale (1987), Polkinghorne (1983) and Malterud, (1998), report communicative and pragmatic validity as two relevant criteria; these were reflected through a careful and detailed description of how the key elements of the research took place so the reader has the opportunity to consider the transferability of the approach to similar situations (external validity).

Results
Tables 1 and 2 are presented here as examples of how results were summarised and worked with.
Inter-assessor reliability The summary in Table 3 indicates that there are small variations between the two occasions. At T1 the average agreement is 59% and at T2 it is 53%.

The total script
The assessors' agreements for the analysis of each client's total script are shown in Figure 1. The difference in client assessments is at most 24% on both occasions. There is a variation in reliability of 49-73% at T1 and 41-60% at T2. The similar matching between the assessors on the two assessment occasions for each client is acceptable. The assessors do not show any significant difference in the agreement of client assessments over time.

Individual script components
An estimation of each script component separately (Table  3) shows that the coherence of assessments of the various components is mixed. For example, the correlation at T1 varies from 0% (the specifics of Games) to 85% (Life position) and at T2 from 0% (Counterinjunction 2 from mother) to 90% (Real feeling 1).
Script components with fixed defined categories like Driver, Injunction, Game/Drama-triangle, Life position and Escape hatch, have a higher percentage coherence compared to open categories. Especially low accordance is found in the coding of specified Games and different Counterinjunctions. The open categorisation of Racket feeling and Real feeling is an exception and has relatively high accordance.
The most significant primary components (Counterinjunction 1, Driver 1, Injunction, 1) have slightly higher coherence than the secondary and tertiary ones (e.g. Counterinjunction 2, Driver 3). This is apparent in the examination of the primary components in Tables 4-7.
The agreement between the two occasions is generally lower if one considers the individual components compared with assessments of the total script.

Primary script components
In a second examination of the material the focus was on the script components occurring in the clients that were most obvious and most evident and, thus, were first observed (Counterinjunctions, Driver, Injunction 1, etc.). These 11 primary components (Table 4) were a starting point for a new reliability calculation based on both percentage agreement and kappa ratio. Fleiss' kappa (1971) was used, which in contrast to Cohen's kappa is a statistical reliability measure to assess inter-assessor reliability between more than two assessors. The significance of the kappa value is determined both by the strength of the kappa quotient and by the number of categories. The kappa coefficient (κ) is adjusted for randomness, as opposed to the percentage agreement (%), which leads to a stronger consistency in the correlation.
The interpretation of the significance of the Fleiss kappa ratio has been made by Landis and Koch (1977). The distribution of the study's kappa quotas on the basis of their significance intervals is summarised in Table 5. A ranking of script components has been made for T1 (Table 6) and T2 (Table 7).
According to Wood (2007), in the research context there seems to be a general view that the kappa ratio should preferably be 0.60-0.70, but that in certain cases, such as psychiatric diagnoses, a value of 0.40 and above may be acceptable. Nine categories are above 0.40 at T1 and six at T2. At T1 'Injunction from father', 'Racket feeling', 'Escape hatch' and 'Drivers from father' lie between 0.62-0.72, while at T2 only 'Real feeling' and 'Game/Drama triangle' attain such values (0.66-0.69). 'Counterinjunction 1' from mother and father has a low value on both occasions (0.15 to 0.39). The largest difference in the ratio between the two sessions relates to 'Injunction from father' with a value of 0.72 or 0.29. The total average for all of the components has a kappa ratio of 0.48.

Intra-assessor reliability
The assessors made two analyses of each client at different times. Tables 8a, 8b and Figure 2 show that the ability to make a similar script analysis for the assessors in total is 67% for one of the assessors (C) and significantly lower, 33% and 39%, respectively, for the other two (A and B). Looking at the overall agreement based on each client, differences of 25-30% are found. Client 2 had the highest accordance (63%) between the two assessments, while Client 9 had the lowest (31%). Even an examination of the specific percentage numbers gives a picture of wide variation (20-70%) in the coherence of assessor analyses on the two occasions. Overall, it can be concluded that factors related to both the client and the assessor affect the result when assessments are made with a relatively long period in between (six years).
Ranking the results of Table 8a into Table 8b shows that the assessors have maximum coherence for clients 2 and 3, and lowest coherence for client 9.

Discussion
The aim of the study was to assess whether you can make a diagnostically reliable script analysis using a script questionnaire. This was done by examining, with the help of two interviews, the assessors' ability to agree on client assessments. The focus was partly tied to how well the assessments match for each assessor over time (intra-assessor reliability) and partly to agreement in their analyses of the clients' total scripts and the individual components of the scripts (inter-assessor reliability). With those two measures of reliability, an indication was given of how well the script analysis on the basis of script questionnaires serves as an assessment instrument.
Intra-assessor reliability. The results show that assessors A and B, without any detailed knowledge of the client, made different assessments on the two occasions. Assessor C, who is the therapist and author, had much higher agreement in his two perceptions of the clients' script, which indicates that a knowledge about the client may result in more consistency in analysis although it could also mean that the assessor failed to pick up on changes. In line with Orlinsky & Howard (1986) the large discrepancy between the reliability of different client assessments may indicate that personal variables of the client and/or assessor can play a major role in the assessment.
One explanation for the relatively low coherence is that the client has changed over time. The therapy goal and ambition is to help to change the client's script. Hence, in a successful therapy the script should not be coherent over time. Conversely, responses to the script questions could become similar even if you have changed. Most of the questions are in the nature of memories of historical events and can be expected to give similar responses, regardless of the time factor. Another possible factor is that client assessment is unreliable, because of validity problems.
Inter-assessor reliability When we combine all assessors' script analyses at both times and compare them with each other, the result is almost acceptable in relation to the literature. The overall correlation is 56% and relatively evenly distributed for each client. Given the difficulties with assessments over time as discussed, the overall reliability is surprisingly good. One influencing factor may be that the three assessors have worked together for a long time and have created a similar frame of reference in assessing clients. This convergence is likely to also affect the assessors' assessments over time, but becomes clearer from a general context.
When the reliability of the assessors' analysis of individual script components is examined, a considerable variation in the values is found, with the fixed categories giving better coherence than the open ones. Reliability increases significantly when examining only the 11 primary script components. More than half of those have moderate to substantial agreement and, overall, this more restrictive analysis obtains a higher reliability than the analysis of all 26 components. This is not surprising in any way but shows the difficulty of increasing the level of detail in the assessments whilst making an accurate analysis. It also shows that the gap decreases when going from the specific components to the total overall script.