Syed Ejaz Alam ( PMRC Research Centre, Jinnah Postgraduate Medical Centre, Karachi. )
Describing Quantitative relationship
Scientific studies often require a description of the relationship between two variables. Usually in such circumstances we think of one variable as being influenced by the other. It has become conventional to denote the dependent variable, i.e. the one being influenced, by "Y" and independent variable by "X". We are interested in describing the association between X & Y. To do this we have to measure jointly X and Y on a series of subjects.1. The simplest way of describing the relationship between X and Y is by a graph called a scatter diagram. To construct a scatter diagram, the level of Y is plotted against the level of X for each subject. The resulting scattering of points indicates how Y varies with differing levels of X. Although the scatter diagram is very useful for gaining a visual impression of the relationship, a more quantitative description ‘is often needed. Two kinds of statistical techniques are used to further specify the relationship between X and Y:
The Regression Equation:
The regression approach is appropriate when our main purpose is to develop a predictive model i.e. a device that will enable us to predict Y against a given specified level of X. (See example).
The regression equation has the form: Y a + b X.
Where a = the intercept, i.e. the value of Y when X is zero
b = the slope, i.e. the change in Y resulting from a change in X of one unit. The constant a and-b are found by the Least Square procedure2.
The Correlation Coefficient
The correlation coefficient, usually denoted by r, is an index of the extent to which two variables are associated. It can take on values between + 1.0 and -1.0, depending out the strength of the association. A correlation coefficient of zero indicates that the two variables are not related.
The following can serve as a general guide to interpreting the magnitude of the correlation coefficient:
Degree of association
0.8 to 1.0 Strong
0.5 to 0.8 Moderate
0.2 to 0.5 Weak
0 to 0.2 Negligible
Making the Scatter diagram.(Figure 1)
to show the heights and pulmonary anatomical dead spaces in the 15 children. Dr. Green set out the figures as in column 1,2,3. It is helpfull to arrange the observations3, as he has done in serial order on the independent variable when one of the two variables is clearly identified as independent. The corresponding figures for the dependent variable can be examined in relation to the increasing series for the independent variable. In this way we get the same picture, but in numerical form, as appears in the scatter diagram. The calculation of the correlation coefficient is as follows. With X representing the values of the independent variable (in this case height) and Y representing the value of the dependent variable (in this case anatomical dead space).
The Correlation coefficient of 0.846 indicates a strong positive correlation between size of pulmonary anatomical dead space and height of child. However, to test the deviation of r from 0, or nil correlation, it is better to use the t test in the following calculation:
The table is entered at n-2 degrees of freedom. For example, the correlation coefficient for Dr. Green’s figures was 0.846. The number of pairs of observations was 15. Applying the above formula,
we have Entering the t table3 at 15-2 = 13 degrees of freedom we find that, at t = 5.72, p <0.001. So the correlation coefficient maybe regarded as highly significant.
The Regression Equation
Y = a + bX
With this equation we can find a series of values of Y, the dependent variable, that corresponds to each of, a series of values of X, the independent variable. The letters a & b have to be calculated from the data. The letter ‘a’ signifies the distance above the base line at which the regression line cuts the vertical Y-axis the letter b (the regression coefficient) signifies the amount by which a change in X must be multiplied to give the corresponding average change in Y. In this way it represents the degree to which the line slopes upwards or downwards. Once the correlation coefficient has been computed regression coefficients are easy to work out.
The line representing the
equation is shown superimposed on the scatter diagram in Figure 2.
The way to draw the line is to take three values of X, one on the left side of the scatter diagram one in the middle, and one on the right, and substitute these in equation.
If X = 110 Y = -82.4 + (1.033 x 110) = 31.2
X = 140 Y = -82.4 + (1.033 x 140) = 62.2
X = 170 Y = -82.4 + (1.033 x 170) = 93.2
1. Morton, R. F. and Hebel, J. R. A Study Guide to Epidemiology and Biostatistics. University Park Press, Baltimore 1983, pp. 81.84.
2. Chaudhry, S. M.. Introduction to Statistical Theory Part.!, Markazi Kutub Khana Urdu Bazar, Lahore 1975, pp. 178-182.
3. Swinscow, T.D.V. Statistics at Square One. British Medical As. sociation 1978, pp 62-70 & 78.