Python provides robust tools for working with Microsoft Word documents. One of the most powerful libraries available for this purpose is python-docx. This library allows you to create, modify, and extract information from Word documents (.docx) using Python. In this article, we'll explore the capabilities of python-docx and demonstrate how to create a .docx document from scratch.
Getting Started
Before we dive into the functionality, you need to install the python-docx library. You can do this using pip:
pip install python-docx
With the library installed, you can now start creating and manipulating Word documents.
Creating a Word Document
The first step in working with python-docx is creating a new Word document. This is done by initializing a Document object.
from docx import Document # Create a new Document doc = Document()
Adding a Title
You can add a title to your document by using the add_heading method. This method allows you to specify the level of the heading (1 through 9).
# Add a title doc.add_heading('Document Title', level=1)
Adding Paragraphs
Adding paragraphs is straightforward with the add_paragraph method. This method returns a Paragraph object that you can further manipulate if needed.
# Add a paragraph doc.add_paragraph('This is the first paragraph of the document.')
Formatting Text
python-docx allows you to apply basic text formatting, such as bold, italic, and underline. This is done using Run objects, which represent contiguous runs of text with the same formatting.
# Add a paragraph with formatted text paragraph = doc.add_paragraph() run = paragraph.add_run('This text is bold and italic.') run.bold = True run.italic = True
Changing Font Size and Color
To change the font size and color of text, you use the Font object associated with a Run.
from docx.shared import Pt from docx.shared import RGBColor # Add a paragraph with custom font size and color paragraph = doc.add_paragraph() run = paragraph.add_run('This text is 24 pt and blue.') run.font.size = Pt(24) run.font.color.rgb = RGBColor(0, 0, 255)
Creating a Blockquote
To create a blockquote, you can adjust the indentation and apply a different font style to a paragraph.
# Add a blockquote blockquote = doc.add_paragraph('This is a blockquote paragraph. It is typically indented and styled differently.') blockquote.paragraph_format.left_indent = Inches(0.5) blockquote.paragraph_format.space_before = Pt(12) blockquote.paragraph_format.space_after = Pt(12) blockquote.runs[0].italic = True
Text Alignment
You can align text within a paragraph using the alignment attribute of the Paragraph object. Available alignment options include left, center, right, and justified.
from docx.enum.text import WD_ALIGN_PARAGRAPH # Add a paragraph with centered text paragraph = doc.add_paragraph('This paragraph is centered.') paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER # Add a paragraph with right-aligned text paragraph = doc.add_paragraph('This paragraph is right-aligned.') paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT # Add a paragraph with left-aligned text paragraph = doc.add_paragraph('This paragraph is left-aligned.') paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT # Add a paragraph with justified text paragraph = doc.add_paragraph('This paragraph is justified.') paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY
Adding Lists
You can add both ordered and unordered lists using the add_paragraph method with appropriate styles.
# Add an unordered list doc.add_paragraph('First item in unordered list', style='List Bullet') doc.add_paragraph('Second item in unordered list', style='List Bullet') # Add an ordered list doc.add_paragraph('First item in ordered list', style='List Number') doc.add_paragraph('Second item in ordered list', style='List Number')
Inserting Images
Inserting images into your document is simple with the add_picture method. You can also specify the width and height of the image.
from docx.shared import Inches # Add an image doc.add_picture('path/to/image.jpg', width=Inches(2))
Creating Tables
python-docx makes it easy to create tables and populate them with data.
# Create a table table = doc.add_table(rows=3, cols=3) # Populate the table for row in table.rows: for cell in row.cells: cell.text = 'Cell content'
Saving the Document
Once you have added all your content, you can save the document using the save method.
# Save the document doc.save('example.docx')
Full Example Script
Here's a complete script that demonstrates all the functionalities mentioned above:
#!/usr/bin/env python3 from docx import Document from docx.shared import Inches, Pt, RGBColor from docx.enum.text import WD_ALIGN_PARAGRAPH # Create a new Document doc = Document() # Add a title doc.add_heading('Document Title', level=1) # Add a paragraph doc.add_paragraph('This is the first paragraph of the document.') # Add a paragraph with formatted text paragraph = doc.add_paragraph() run = paragraph.add_run('This text is bold and italic.') run.bold = True run.italic = True # Add an unordered list doc.add_paragraph('First item in unordered list', style='List Bullet') doc.add_paragraph('Second item in unordered list', style='List Bullet') # Add an ordered list doc.add_paragraph('First item in ordered list', style='List Number') doc.add_paragraph('Second item in ordered list', style='List Number') # Add an image doc.add_picture('path/to/image.jpg', width=Inches(2)) # Create a table table = doc.add_table(rows=3, cols=3) # Populate the table for row in table.rows: for cell in row.cells: cell.text = 'Cell content' # Add a paragraph with centered text paragraph = doc.add_paragraph('This paragraph is centered.') paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER # Add a paragraph with right-aligned text paragraph = doc.add_paragraph('This paragraph is right-aligned.') paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT # Add a paragraph with left-aligned text paragraph = doc.add_paragraph('This paragraph is left-aligned.') paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT # Add a paragraph with justified text paragraph = doc.add_paragraph('This paragraph is justified.') paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY # Add a paragraph with custom font size and color paragraph = doc.add_paragraph() run = paragraph.add_run('This text is 24 pt and blue.') run.font.size = Pt(24) run.font.color.rgb = RGBColor(0, 0, 255) # Add a blockquote blockquote = doc.add_paragraph('This is a blockquote paragraph. It is typically indented and styled differently.') blockquote.paragraph_format.left_indent = Inches(0.5) blockquote.paragraph_format.space_before = Pt(12) blockquote.paragraph_format.space_after = Pt(12) blockquote.runs[0].italic = True # Save the document doc.save('example.docx')
Advanced Features
Adding Sections
Word documents can have multiple sections, each with its own page layout settings. You can add sections using the add_section method.
from docx.enum.section import WD_SECTION # Add a new section section = doc.add_section(WD_SECTION.NEW_PAGE)
Page Breaks
To insert a page break, you can use the add_page_break method.
# Add a page break doc.add_page_break()
Setting Margins
You can set the margins of a section using the Section object.
from docx.shared import Inches # Set margins for the section section.top_margin = Inches(1) section.bottom_margin = Inches(1) section.left_margin = Inches(1) section.right_margin = Inches(1)
Adding Headers and Footers
Headers and footers can be added to sections. You can insert text, images, and even tables in headers and footers.
# Add a header header = section.header header_paragraph = header.paragraphs[0] header_paragraph.text = "Header text" # Add a footer footer = section.footer footer_paragraph = footer.paragraphs[0] footer_paragraph.text = "Footer text"
Adding Page Numbers to the Footer
To add page numbers to the footer, you need to insert a Field object into the footer.
from docx.oxml import OxmlElement from docx.oxml.ns import qn # Add page number to the footer footer_paragraph = footer.paragraphs[0] run = footer_paragraph.add_run() fldChar1 = OxmlElement('w:fldChar') fldChar1.set(qn('w:fldCharType'), 'begin') instrText = OxmlElement('w:instrText') instrText.set(qn('xml:space'), 'preserve') instrText.text = 'PAGE' fldChar2 = OxmlElement('w:fldChar') fldChar2.set(qn('w:fldCharType'), 'end') run._r.append(fldChar1) run._r.append(instrText) run._r.append(fldChar2)
Complete Advanced Example
Here's a complete script that includes advanced features like adding sections, setting margins, and adding headers, footers, and page numbers.
#!/usr/bin/env python3 from docx import Document from docx.shared import Inches, Pt, RGBColor from docx.enum.section import WD_SECTION from docx.oxml import OxmlElement from docx.oxml.ns import qn from docx.enum.text import WD_ALIGN_PARAGRAPH # Create a new Document doc = Document() # Add a title doc.add_heading('Document Title', level=1) # Add a paragraph doc.add_paragraph('This is the first paragraph of the document.') # Add a paragraph with formatted text paragraph = doc.add_paragraph() run = paragraph.add_run('This text is bold and italic.') run.bold = True run.italic = True # Add an unordered list doc.add_paragraph('First item in unordered list', style='List Bullet') doc.add_paragraph('Second item in unordered list', style='List Bullet') # Add an ordered list doc.add_paragraph('First item in ordered list', style='List Number') doc.add_paragraph('Second item in ordered list', style='List Number') # Add an image doc.add_picture('Mastering python-docx A Guide to Creating Word Documents with Python.jpg', width=Inches(2)) # Create a table table = doc.add_table(rows=3, cols=3) # Populate the table for row in table.rows: for cell in row.cells: cell.text = 'Cell content' # Add a paragraph with centered text paragraph = doc.add_paragraph('This paragraph is centered.') paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER # Add a paragraph with right-aligned text paragraph = doc.add_paragraph('This paragraph is right-aligned.') paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT # Add a paragraph with left-aligned text paragraph = doc.add_paragraph('This paragraph is left-aligned.') paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT # Add a paragraph with justified text paragraph = doc.add_paragraph('This paragraph is justified.') paragraph.alignment = WD_ALIGN_PARAGRAPH.JUSTIFY # Add a paragraph with custom font size and color paragraph = doc.add_paragraph() run = paragraph.add_run('This text is 24 pt and blue.') run.font.size = Pt(24) run.font.color.rgb = RGBColor(0, 0, 255) # Add a blockquote blockquote = doc.add_paragraph('This is a blockquote paragraph. It is typically indented and styled differently.') blockquote.paragraph_format.left_indent = Inches(0.5) blockquote.paragraph_format.space_before = Pt(12) blockquote.paragraph_format.space_after = Pt(12) blockquote.runs[0].italic = True # Add a new section section = doc.add_section(WD_SECTION.NEW_PAGE) paragraph = doc.add_paragraph('This is a new page.') # Set margins for the section section.top_margin = Inches(1) section.bottom_margin = Inches(1) section.left_margin = Inches(1) section.right_margin = Inches(1) # Add a header header = section.header header_paragraph = header.paragraphs[0] header_paragraph.text = "Header text" # Add a footer footer = section.footer footer_paragraph = footer.paragraphs[0] footer_paragraph.text = "Footer text\n" footer_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER # Add page number to the footer run = footer_paragraph.add_run() fldChar1 = OxmlElement('w:fldChar') fldChar1.set(qn('w:fldCharType'), 'begin') instrText = OxmlElement('w:instrText') instrText.set(qn('xml:space'), 'preserve') instrText.text = 'PAGE' fldChar2 = OxmlElement('w:fldChar') fldChar2.set(qn('w:fldCharType'), 'end') run._r.append(fldChar1) run._r.append(instrText) run._r.append(fldChar2) # Save the document doc.save('advanced_example.docx')
Conclusion
python-docx is a powerful library for creating and manipulating Word documents programmatically. This guide covered the basics of document creation, including adding text, formatting, lists, images, and tables. We also touched on advanced features like sections, page breaks, margins, headers, footers, and adding page numbers. With these tools, you can automate document generation, create templates, and enhance productivity in your projects. Feel free to explore the official documentation for python-docx to discover more features and customize your document processing tasks even further.