Text Analysis- Case Study
In 1949, Dr. Rudolf Flesch published The Art of Readable Writing, in which he proposed a measure of text readability known as the Flesch Index. This index is based on the average number of syllables per word and the average number of words per sentence in a piece of text. Index scores usually range from 0 to 100, and they indicate readable prose for the following grade levels:
Flesch Index Grade Level of Readability
0–30 College
50–60 High School
90–100 Fourth Grade
In this case study, we develop a program that computes the Flesch Index for a text file.
Write a program that computes the Flesch Index and grade level for text stored in a text file
Analysis
The input to this program is the name of a text file. The outputs are the number of sentences, words, and syllables in the file, as well as the file’s Flesch Index and Grade Level Equivalent.
During analysis, we consult experts in the problem domain to learn any information that might be relevant in solving the problem. For our problem, this information includes the definitions of sentence, word, and syllable. For the purposes of this program, these terms are defined below
Word Any sequence of non-whitespace characters.
Sentence Any sequence of words ending in a period, question mark, exclamation point, colon, or semicolon.
Syllable Any word of three characters or less; or any vowel (a, e, i, o, u) or pair of consecutive vowels, except for a final -es, -ed, or -e that is not -le.
Note that the definitions of word and sentence are approximations. Some words, such as doubles and kettles, end in -es but will be counted as having one syllable, and an ellipsis ( … ) will be counted as three syllables.
Flesch’s formula to calculate the index F is the following:
F =206.835 − 1.015 × (words / sentences) − 84.6 × (syllables / words)
The Flesch-Kincaid Grade Level Formula is used to compute the Equivalent Grade Level G:
G=0.39 × (words / sentences)+11.8 × (syllables / words) − 15.59
Design
This program will perform the following tasks:
1. Receive the filename from the user, open the file for input, and input the text.
2. Count the sentences in the text.
3. Count the words in the text.
4. Count the syllables in the text.
5. Compute the Flesch Index.
6. Compute the Grade Level Equivalent.
7. Print these two values with the appropriate labels, as well as the counts from tasks 2–4.
Implementation
# Take the inputs
fileName = input("Enter the file name: ")inputFile = open(fileName, 'r')
text = inputFile.read()
# Count the sentences
sentences = text.count('.') + text.count('?') + text.count(':') + text.count(';') + text.count('!')
# Count the words
words = len(text.split())
# Count the syllables
syllables = 0vowels = "aeiouAEIOU"
for vowel in vowels:
syllables += word.count(vowel)
for ending in ['es', 'ed', 'e']:
if word.endswith(ending):
syllables -= 1
if word.endswith('le'):
syllables += 1
# Compute the Flesch Index and Grade Level
index=206.835 - 1.015 * (words / sentences)-84.6 * (syllables / words)level=round(0.39 * (words / sentences) + 11.8 * (syllables / words) – 15.59)
# Output the result
print("The Flesch Index is",index)print("The Grade Level Equivalent is", level)
print(sentences, "sentences")
print(words, "words")
print(syllables, "syllables")
Comments
Post a Comment