FileVerbs API Documentation - Word Extraction

Word Extraction

Word Extraction Features

Use FileVerbs to extract structured data and insights from Word documents with precision. These actions are ideal for processing content, automating workflows, or transforming unstructured files into usable formats such as JSON, plain text, or ZIP archives.

WordExtractMetadata

Extract metadata such as title, author, creation date, and last modified timestamp from a Word document. The output is returned as a structured JSON.

{
  "action": "wordextractmetadata",
  "parameters": {
    "fileIds": ["your_file_id_here"]
  }
}

WordExtractComments

Extract all comment threads including content, author, and timestamps. Useful for audit trails or collaboration insights. Output is a structured JSON list.

{
  "action": "wordextractcomments",
  "parameters": {
    "fileIds": ["your_file_id_here"]
  }
}

WordExtractImages

Extract all embedded images from the Word document. Output is a ZIP file containing individual image files in original formats (PNG, JPEG, etc.).

{
  "action": "wordextractimages",
  "parameters": {
    "fileIds": ["your_file_id_here"]
  }
}

WordToTextWithOcr

Extract plain text from Word documents, including text detected from images using OCR. Set wordHasImage to true if the document contains scanned or embedded image content. Output is a plain .txt file.

{
  "action": "wordtotextwithocr",
  "parameters": {
    "fileIds": ["your_file_id_here"],
    "options": {
      "wordHasImage": true
    }
  }
}

Common Use Cases

  • 📄 Indexing and archiving documents with metadata extraction
  • 💬 Analyzing feedback or reviews via extracted comments
  • 🖼️ Collecting all images from product specs or brochures
  • 🔎 Extracting scanned content from forms and reports using OCR