dxalxmur.com

Python Script to Read and Display Word Document Content

Written on

Chapter 1: Introduction to Document Reading

In this section, we will explore a Python script designed to read and present the number of paragraphs and their respective content from a specified Word document. This functionality is particularly beneficial for tasks such as document statistics and text analysis.

Python code for reading Word documents

The following code snippet utilizes the docx module to access a specified Word document. It initializes the Document object with the desired Word file, allowing it to be loaded into memory. The script calculates the total number of paragraphs using len(doc.paragraphs) and displays it. Each paragraph is then iterated over with the enumerate function, which provides both the index and content of each paragraph. The results are printed using formatted strings.

import docx

doc = docx.Document("script.docx")

print(f"Number of paragraphs: {len(doc.paragraphs)}")

for count, para in enumerate(doc.paragraphs, start=1):

print(f"{count}: {para.text}")

Chapter 2: Retrieving and Modifying Paragraph Content

The subsequent code snippet opens a Word document (.docx) using the python-docx package. This code retrieves the content and number of runs in specific paragraphs and displays the text of each run. Finally, the modified document is saved to a specified file path.

The code starts by opening the document at the defined path using docx.Document(doc_path). It then counts the paragraphs with len(doc.paragraphs) and prints the result.

Next, it processes selected paragraph indices (in this case, indices 0 and 2) and retrieves the paragraph object for the specified index using doc.paragraphs[p_idx]. The script counts the runs in the paragraph with len(para.runs) and prints this information.

In the next loop, it iterates through each run within the paragraph, extracting and printing the text content of each run using run.text. Finally, the modified document is saved using doc.save("Modified_Document.docx").

import docx

doc_path = "script.docx"

doc = docx.Document(doc_path)

print("Number of paragraphs: ", len(doc.paragraphs))

for p_idx in [0, 2]:

para = doc.paragraphs[p_idx]

print("Number of runs: ", len(para.runs))

for run in para.runs:

print(run.text)

doc.save("Modified_Document.docx") # Specify the path for saving

Chapter 3: Extracting Heading Paragraphs

The following code focuses on reading and printing the heading paragraphs (those with style names starting with "Heading") from a specified Word document. Utilizing the python-docx module, this code effectively handles the Word document's contents.

  1. Importing the `docx` Module: This line allows access to the functionalities provided by the python-docx library.
  2. Opening the Word Document: A doc object is created by invoking the Document class to open the specified Word document ("script.docx"). This enables reading and processing the document's content.
  3. Counting Paragraphs: The total number of paragraphs in the document is calculated using the len function and printed.
  4. Iterating Through Paragraphs: A loop iterates through each paragraph in the document, checking if the style name begins with "Heading". If true, the text of that paragraph is printed.

In summary, this code reads the total number of paragraphs in a specified Word document and prints all heading paragraphs.

import docx

doc = docx.Document("script.docx")

print("Paragraphs: ", len(doc.paragraphs))

for para in doc.paragraphs:

if para.style.name.startswith('Heading'):

print(para.text)

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

# Are You Bound? Exploring John Wesley's Spiritual Questions for Lent

Reflect on spiritual accountability with John Wesley's questions, focusing on freedom from worldly attachments.

The Power of Nostalgia: A Profitable Business Strategy

Exploring how nostalgia can drive profitability in various sectors, particularly fashion and marketing.

Mercari's Global Journey: Expanding from Japan to Taiwan

Mercari's journey from Japan to Taiwan highlights its ambitions, challenges, and strategies in the second-hand goods market.

Navigating Startup Equity: Is It a Gamble or an Opportunity?

Understanding startup equity is crucial; it can be a gamble with unpredictable outcomes.

generate a compelling case for embracing a mundane life

Exploring the value of a simple life and the paradox of seeking excitement over stability.

# Exploring Faster Than Light Travel: Theoretical Implications

Delving into the theoretical concepts of FTL travel, exploring warp drives, wormholes, and the challenges posed by physics.

How Hot Can It Get Before Humans Are No Longer Able to Survive?

Exploring the limits of human survivability in extreme heat scenarios and what that means for our future.

# The Enduring Legacy of Our Birth Charts Beyond Death

Explore how birth charts continue to influence our legacy even after death, highlighting the example of Vincent Van Gogh.