Word Extraction Features
Use FileVerbs to extract structured data and insights from Word documents with precision. These actions are ideal for processing content, automating workflows, or transforming unstructured files into usable formats such as JSON, plain text, or ZIP archives.
WordExtractMetadata
Extract metadata such as title, author, creation date, and last modified timestamp from a Word document. The output is returned as a structured JSON.
{
"action": "wordextractmetadata",
"parameters": {
"fileIds": ["your_file_id_here"]
}
}
WordExtractComments
Extract all comment threads including content, author, and timestamps. Useful for audit trails or collaboration insights. Output is a structured JSON list.
{
"action": "wordextractcomments",
"parameters": {
"fileIds": ["your_file_id_here"]
}
}
WordExtractImages
Extract all embedded images from the Word document. Output is a ZIP file containing individual image files in original formats (PNG, JPEG, etc.).
{
"action": "wordextractimages",
"parameters": {
"fileIds": ["your_file_id_here"]
}
}
WordToTextWithOcr
Extract plain text from Word documents, including text detected from images using OCR. Set wordHasImage
to true
if the document contains scanned or embedded image content. Output is a plain .txt
file.
{
"action": "wordtotextwithocr",
"parameters": {
"fileIds": ["your_file_id_here"],
"options": {
"wordHasImage": true
}
}
}
Common Use Cases
- 📄 Indexing and archiving documents with metadata extraction
- 💬 Analyzing feedback or reviews via extracted comments
- 🖼️ Collecting all images from product specs or brochures
- 🔎 Extracting scanned content from forms and reports using OCR