Text Analysis- Case Study

In 1949, Dr. Rudolf Flesch published The Art of Readable Writing, in which he proposed a measure of text readability known as the Flesch Index. This index is based on the average number of syllables per word and the average number of words per sentence in a piece of text. Index scores usually range from 0 to 100, and they indicate readable prose for the following grade levels:

Flesch Index             Grade  Level of Readability

0–30                            College

50–60                          High School

90–100                        Fourth Grade

In this case study, we develop a program that computes the Flesch Index for a text file.

Write a program that computes the Flesch Index and grade level for text stored in a text file

Analysis

The input to this program is the name of a text file. The outputs are the number of sentences, words, and syllables in the file, as well as the file’s Flesch Index and Grade Level Equivalent.

During analysis, we consult experts in the problem domain to learn any information that might be relevant in solving the problem. For our problem, this information includes the definitions of sentence, word, and syllable. For the purposes of this program, these terms are defined below

Word     Any sequence of non-whitespace characters.

Sentence Any sequence of words ending in a period, question mark, exclamation point, colon, or semicolon.

Syllable Any word of three characters or less; or any vowel (a, e, i, o, u) or pair of consecutive vowels, except for a final -es, -ed, or -e that is not -le.

Note that the definitions of word and sentence are approximations. Some words, such as doubles and kettles, end in -es but will be counted as having one syllable, and an ellipsis ( … ) will be counted as three syllables.

Flesch’s formula to calculate the index F is the following:

F =206.835 − 1.015 × (words / sentences) − 84.6 × (syllables / words)

The Flesch-Kincaid Grade Level Formula is used to compute the Equivalent Grade Level G:

G=0.39 × (words / sentences)+11.8 × (syllables / words) − 15.59

Design

This program will perform the following tasks:

1. Receive the filename from the user, open the file for input, and input the text.

2. Count the sentences in the text.

3. Count the words in the text.

4. Count the syllables in the text.

5. Compute the Flesch Index.

6. Compute the Grade Level Equivalent.

7. Print these two values with the appropriate labels, as well as the counts from tasks 2–4.

Implementation

# Take the inputs

fileName = input("Enter the file name: ")
inputFile = open(fileName, 'r')
text = inputFile.read()

# Count the sentences

sentences = text.count('.') + text.count('?') + text.count(':') + text.count(';') + text.count('!')

# Count the words

words = len(text.split())

# Count the syllables

syllables = 0
vowels = "aeiouAEIOU"

for word in text.split():
    for vowel in vowels:
        syllables += word.count(vowel)
    for ending in ['es', 'ed', 'e']:
            if word.endswith(ending):
                syllables -= 1
     if word.endswith('le'):
            syllables += 1

# Compute the Flesch Index and Grade Level

index=206.835 - 1.015 * (words / sentences)-84.6 * (syllables / words)
level=round(0.39 * (words / sentences) + 11.8 * (syllables / words) – 15.59)

# Output the result

print("The Flesch Index is",index)
print("The Grade Level Equivalent is", level)
print(sentences, "sentences")
print(words, "words")
print(syllables, "syllables")

output
Enter the file name: test.dat
The Flesch Index is 35.28733333333335
The Grade Level Equivalent is 10
5 sentences
27 words
53 syllables

Comments

Popular posts from this blog

Programming in Python CST 362 KTU CS Sixth Semester Elective Notes

Image Processing

Turtle Graphics