INTRO TO RESEARCH METHOD

## DATA ANALYSIS

¨ *Statistic*

is a set of procedure for describing, synthesizing, analyzing, and interpreting

quantitative data. One thousand scores, for example, can be represented with a

single number.

¨ *Choice of appropriate statistical techniques* is determined to a great extent by the

design of the study and by the kind of data to be collected.

¨ The choice of statistical techniques is largely

determined by the research hypothesis to be tested.

¨ A simple statistic is often more appropriate

than a more complicated one.

### Types of Descriptive Statistics

*The first step in data analysis* is to describe, or summarize the data using descriptive

statistics. *Descriptive statistics* permit the researcher to meaningfully describe many, many scores with a small number of indices. If such indices are calculated for a sample drawn from a

population, the *resulting values* are referred to as statistics; if they are calculated for an entire population, they are referred to as parameters.

*Graphing Data*

The shape of the distribution may not be self-evident, especially if a large number of

scores are involved. *The most common method of graphing research data* is to construct a frequency polygon. *The first step in constructing a frequency polygon* is to list all the scores and tabulate how many subjects received each score. Once the scores are tallied, the steps are as follows: place all the scores on a horizontal axis, at equal intervals on the vertical axis,

starting with Zero for each score; find the point where the scores intersect with its frequency of occurrence and make a dot; connect all the dots with straight lines.

### Measures of Central Tendency

*Measured of central tendency* give the researcher a convenient way of describing a set of data with a single number. The number resulting from computation of a measure of central

tendency represents the average or typical score attained by a group of subjects. *Each index of central tendency is appropriate for a different scale* of measurement; the mode is

appropriate for nominal data, the median for ordinal data. And the mean for interval or ratio data.

Ø __The Mode__

*The mode is* the score that is attained by more subjects than any others core. *The mode is not established through calculation*; it is determined by looking at a set of scores or at a graph

of scores and seeing which score occures more frequently.** **There are *several problems associated with the mode*, and it is therefore of limited values and seldom used. *For one thing* a set of scores may have two (or more) modes, in which case it is referred to as bimodal. *Anotherproblem* is that it is an unstable measure of central tendency; equal-sized samples randomly selected from the same accessible population are likely to have different modes.

*When nominal data are involved*, however, the mode is the only appropriate measure of central tendency.

### Ø__The Median__

The median is that point in a distribution above and below which are 50% of the scores; in

other words, the median is the midpoint. The median doesn’t take into account

each and every score; it ignores, for example, extremely high scores and extremely low scores.

Ø __The Mean__

*The mean is* the arithmetic average of the scores, or the most frequently and measure of central tendency. By the very nature of the way in which it is computed, the mean takes into account or based on each and every score.** ***It is appropriate when* the data represents either an interval or ratio scale and is a more precise stable index than both the median and the mode.** ***In situation* in which there are one or more extreme scores, the median will be the *best index of typical performance.*

### Measures of Variability

### Two sets of data that are very different can have identical means or medians. Thus, there is a need for a measure that indicates how spread out the scores are, how much variability there is. While the standard deviation is by far the most often used, the range is the only appropriate measure of variability for the nominal data, and the quartile deviation is the appropriate index of variability for ordinal data. As with measures of central tendency, measures of variability appropriate for nominal and ordinal data may be used with interval or ratio data even though the standard deviation is generally the preferred index of variability.

Ø __The Range__

*The range is* simply the difference between the highest and the lowest score in a distribution and is determined by subtraction. Like the mode, the range is not a very stable

measure of variability, and its chief advantage is that it gives a quick, rough

estimate of variability.

Ø __The Quartile Deviation__

*The quartile deviation is* one-half of the difference between the upper quartile (the 75 percentile) and lower quartile (the 25 percentile) in a distribution. The quartile deviation is more stable measure of variability than the range and it is appropriate whenever the median is appropriate.

Ø __The Standard Deviation__

Like the mean, its counterpart measure of central tendency, the standard deviation is the most stable measure of variability and takes into account each and every score. If you know the mean and the standard deviation of a set of score, you have a pretty good picture of what the distribution looks like. If the distribution is relatively normal, then the mean plus three

standard deviation and the mean minus three standard deviations

encompasses just about all the scores, over 99% of them.

__The ____Normal____ Curve__

Many, many variables do yield a normal curve if a sufficient

number of subjects are measured.

Ø __Normal Distributions__

If a variable is normally distributed, that is, does form a normal curve, then several things are true. *First*, 50% of the scores are above the mean and 50% of the scores are below the mean. *Second*, the mean, the median, and the mode are the same. *Third*, most scores are near the mean and the farther from the mean a score is, the fewer the number of subjects who attained that score. *Fourth*, the same number, or percentage, of scores is between the

mean and plus one standard deviation (X + 1 SD) as is between the mean and minus

one standard deviation (X- 1 SD), and similarly for __+__ 2 SD and X __+__ 3 SD.

Many variables form a normal distribution, including

physical measures, such as height and weight, and psychological measures, such

as intelligence and aptitude. Since research studies deal with a finite number

of subjects, and often not a very large number, research data only more or less

approximate a normal curve.

Ø __Skewed Distributions__

When a distribution is not normal, it is said to be skewed. *A distribution which is skewed is*

not symmetrical, and the values of the mean, the median, and the mode are different. In a skewed distribution, there are more extreme scores at one end than the other. If the extreme scores are at the lower end of the distribution, the distribution is said to be *negativelyskewed*; if the extreme scores are at the upper, or higher, end of the distribution, the distribution is said to be

*positively*

*skewed*

*.*In both cases, the mean is “pulled” in

the direction of the extreme scores. For a negatively skewed distribution the

mean (X) is always lower, or smaller, than the median (md); for a positively skewed distribution the mean is always higher, or greater, than the median. Usually, in a negatively skewed distribution the mean and the median are lower, or smaller, than the mode, whereas in a positively skewed distribution the mean and the median are higher, or greater, than the mode.

__Measures of Relationship__

*Degree of **relationship is* expressed as a correlation coefficient which is computed

based on the two sets of scores. If two variables are highly related, a correlation coefficient near + 1.00 (or – 1.00) will be obtained; if two variables are not related, a coefficient near .00 will be obtained.

Ø __The Spearman ____Rho__

If the data for one of the variables are expressed as ranks instead of scores, *the Spearman rho* is the appropriate measure of correlation. It is thus appropriate when the data represent an ordinal scale (although it may be used with interval data) and is used when the median and quartile deviation are used. If only one of the variables to be correlated is in rank order, for

example, class standing at time of graduation, then the other variable to be correlated with it must also be expressed in terms of ranks. *The Spearman rho* is interpreted in the same way as the Pearson *r* and produces a coefficient somewhere between – 1.00 and + 1.00.

If more than one subject receives the same score, then their ranks are averaged.

Ø __The Pearson r__

The Pearson *r *is the most appropriate measure of correlation when the sets of data to be

correlated represent either interval or ratio scales. Like the mean and the

standard deviation, the Pearson *r *takes into account each and every score in both distributions; it is also the most stable measure of correlation. Since most educational measures represent

interval scales, the Pearson *r *is usually the appropriate coefficient for determining relationship. An assumption associated with the application of the Pearson *r *is that the relationship between the variables being correlated is a linear one.

__Measures of Relative Position__

*Measures of relative position* indicate where a score is in relation to all other scores in the

distribution. A major advantage of such measures is that they make it possible

to compare the performance of an individual on two or more different tests.

Ø __Percentile Ranks__

*A percentile ranks indicates* the percentage of scores that fall below a given score. Percentile

are appropriate for data representing an ordinal scale, although they are frequently computed for interval data.

Ø __Standard Scores __

*A standard score* is a measure of relative position which is appropriate when the data represent

an interval or ratio scale. A *z* score expresses how far a score is from the mean in terms of standard deviation units. If a set of scores is transformed into a set of *z* scores the new distribution has a mean of 0 and a standard deviation of 1. *The major disadvantage* of the z score is that it allows scores from different tests to be compared. *The only problem with z score* is that they involve negative numbers and decimals. *A simple solution* is to transform z scores into Z scores. To do this, you simply multiply the z score by 10 and add 50. *Stanines* are standard scores that divide a distribution into nine parts.

§ The formula for the mean is

§ The formula for the standard deviation is where

§ The formula for the Pearson *r *is

The formula for degrees of freedom for the Pearson *r *is N-2 standard scores.

§ The formula for a *z* score is

§ The formula for a Z score is Z= 10z + 50

§ Calculation for Interval Data

*Symbols commonly used in statistical*:

X = any scores; = the sum of; add them up of all the scores; = the sum of all the scores= the mean or arithmetic average; = the square of the sum; add up all the scores and square the sum, or total.

*Formulas are as follows*:

N = total number of subjects; n = number of subjects in a particular graph; = the sum of all the squares; square each score and add up all the squares.

*RESEARCH REPORT*

__WRITING THE REPORT__

Everything in the main body of the report up to the results section can actually be written before the experiment is conducted. The rest is merely a matter of writing down what happened, analyzing these happenings, and drawing conclusions.

__FORMAT OF THE RESEARCH REPORT__

The research report, whether it be a thesis, dissertation, or shorter term paper of report, usually follows a fairly standardized pattern. The usual sequence of topics is as follows:

A. *PreliminarySection or Front Matter,* consists of (1) Title Page, (2) Acknowledgement (if any), (3) Table of Contents, (4) List of tables (if any), (5) List of Figures (if any).

B. *Main Body of the Report*, consists of* *(1) Introduction: Statement of the problem-specific questions to be answered-hypotheses to be tested; Significance of the problem; Purposes of the study; Assumptions, limitations, and delimitations; Definition on important terms, (2) Review of Related Literature or Analysis of Previous Research, (3) Design of the Study: Procedures used; Sources of data; Methods of gathering data; Description of data-gathering instruments used, (4) Presentation and Analysis of Data: Text; Tables; Figures, (5)Summary and Conclusions: Restatement of the problem; Description of procedures used; Principal findings and conclusions; Recommendations for further research.

C. *Reference Section, *consists of (1) Bibliography, and (2) Appendix

__PRELIMINARY SECTION __

The first page of the report is *the title page*. *The forms* usually include: (1) the name of the topic, (2) the name of the author, (3) the relationship of the report to a course or degree requirement, (4) the name of the institution where the report is to be submitted, and (5) the date of presentation.

*The title* should communicate as briefly and directly as possible the precise nature of what the report is about. It should contain key words that will be recognized by others who might be interested in the research because abstracting services index reports by the key words in the titles. The title should be written after the report is written, if it occurs to the researcher before, he must not throw it away but he must be sure to examine it carefully after the report is written to make sure it conveys what is expected to convey.*An acknowledgement page* is included if the writer has received unusual assistance in the conduct of the study. If used, acknowledgements should be simple and restrained.

*Table of contents* serves an important purpose in providing an outline of the contents of the

report.

__THE ABSTRACT __

The abstract should include the brief summary of the key points of the report. It is usually limited to 100 or 150 words. It should contain as its essential ingredients the statement of the hypothesis, the statement of the research prediction, and a brief statement of the results. In addition, a very brief statement of why the research is worth doing may sometimes be important.

__THE MAIN BODY OF THE REPORT__

This section may be divided into five divisions, that is:

1. *An introduction to the area of consideration*; a clear __statement of the problem__ with

specific questions to be answered or hypothesis to be tested is presented first. The __hypothesis__ should be clearly stated immediately before the methods section of the report. This statement can be couched in purely conceptual terms at this point; or it seems natural to do so, it can employ operationally defined terms. A consideration of the __significance of the problem__ and its historical background is also appropriate. Specific purposes of the study are described, and all __assumptions, limitations, and delimitations__ are recognized. All important terms are carefully defined, so that the reader may understand the concepts underlying the development of the investigation.

2. *Review of the important literature*; previous research studies are abstracted, and significant writing of authorities in the area under study are reviewed. This part provides a background for the development of the present study and brings the reader up to date. It gives evidence of the investigator’s knowledge of the field. A brief summary, indicating areas of agreement or disagreement in findings, or gaps in existing knowledge, should be included.

3. *Design of the study*; all the important variables in the study should be operationally defined, including control and moderator variables as well as the dependent and the independent variables, the size of the samples and how they are selected is carefully described, as well as the sources and methods of gathering data, the reliability of instruments selected or constructed, and the statistical procedures used in the analysis.

4. *The presentation and analysis of data*; through textual discussion and tabular and graphic

de-vices, the data are critically analyzed and reported. __Tables and figures__ are used to clarify significant relationships. They are constructed and titled to be self-explanatory and are relatively simple. If complex tables are developed, they should be placed in the appendix.

5. *Summary;* after a brief statement of the problem and a description of the procedures used

in the investigation, the findings and conclusions are presented. __Findings__ are statements of factual information based upon the data analysis. __Conclusions__ are answers to the questions raised, or the statements of acceptance or rejection of the hypotheses proposed. This is often a very short section, but it can be lengthened considerably by being combined with the conclusions section of the report. It may be appropriate in concluding this part of the report to indicate promising side-problems that have been uncovered and to suggest areas or problems for further investigation. The summary section is the most used part of the research report. Readers who scan research literature to find significant studies examine this section before deciding whether or not further examination of the report is worthwhile.

__REFERENCE SECTION__

1. *Bibliography; *located at the end of the main body of the report,* *lists in alphabetical order the references used by the writer in preparing the report. In a short bibliography, books, pamphlets, monographs, and periodical references may be combined in the same list. If the number of references is large, the bibliography may be divided into sections, for books,

periodicals, and special documents. Ordinarily, a selected bibliography is preferable

to an exhaustive list.

2. *Appendix;* tables and data-important, but not essential to the understanding of the report-copies of cover letters used, and printed forms of questionnaires, tests, and other

data-gathering devices may be placed in the appendix.

§ __Footnotes__ serve a number of purposes:

enabling writers to substantiate their presentation by citations of other

authorities, giving credit to sources of material that they have quoted or

paraphrased, and providing the reader with specific purposes that or she may use

to verify the authenticity and accuracy of material used. Footnotes are found

at the bottom of the page.

§ __Tables and Figures__; *a table* is a systematic method of presenting statistical data in

vertical columns and horizontal rows, according to some classifications of

subject matter. Tables enable the reader to comprehend and interpret masses of

data rapidly, and to grasp significant details and relationships at a glance. Good

tables are relatively simple, concentrating on a limited number of ideas. Text references should identify tables by number, rather than by such expressions as, “the table above” or “the following table”.

*A figure* is a device that presents statistical data in graphic

form. The term __figure__ is applied to a wide variety of graphs, charts, maps,

sketches, diagrams, and drawings. When skillfully used, figures

present aspects of data in a

visualized form that may be clearly and easily understood. Figures should not

be intended as substitutes for textual description, bu

included to emphasize certain significant relationships.

*Tables and figures* should be used sparingly; too many will

overwhelm the reader.

REFERENCES

1. John W. Best, Research in Education (4^{th} Edition), 1981.

2. Edward L. Vockell, Educational Research, 1983.

