Data Authoring Guidelines

Date Updated: June 10, 2025 Best Practices for Creating Documents for Our Internal Knowledge Base

<aside>

About:

Data Authoring is the human-driven process of creating new, original content that is optimized for machine readability. This is the crucial first step where subject matter experts, technical writers, and content creators produce high-quality source documents.

The goal of data authoring is not just to create content, but to create it in a structured way that prevents data loss and interpretation errors during automated processing.

Core Activities: Writing, designing, structuring, and formatting new documents.
Primary Actors: Subject matter experts, content creators, and anyone contributing knowledge.
Key Characteristics:
- Generation: It involves making something new from scratch.
- Guideline-Driven: Follows the official Data Authoring Guidelines to ensure consistency and quality.
- Proactive: Aims to prevent data quality issues at the source, rather than fixing them later. </aside>

1. Introduction

To ensure that ADAM provides the most accurate and helpful responses, the quality of the source documents it learns from is critical.

By following these guidelines when creating documents (e.g., in Word, Google Docs), you help our automated data pipeline process the information cleanly and efficiently. This leads to fewer errors, faster knowledge updates, and a smarter AI for everyone.

2. General Principles

Simplicity is Key: Use simple, single-column layouts. Avoid complex text wrapping, multiple columns, and floating text boxes.
Consistency: Use the same formatting for headers, footers, and body text throughout the document.
Clarity: Write in clear, direct language. Use short sentences and paragraphs where possible.

3. Text and Structure

Properly structuring your document is the most effective way to ensure it is understood correctly by the AI.

Use Style Headings: Use built-in heading styles (Heading 1, Heading 2, etc.) to define the document's hierarchy. Do not just make text bold or larger to signify a heading.
- Example: Use "Heading 1" for the main title, "Heading 2" for major sections, and so on.
Use Lists: For itemized information, use the built-in bulleted or numbered list functions.
Headers and Footers: Keep headers and footers minimal. Avoid placing critical information or large images in these areas, as they are often ignored or misinterpreted during processing.

4. Tables

Tables are a common source of data extraction errors. Following these rules is essential.

Use Real Tables: Crucially, tables must be created using the native table function of your editor (e.g., Insert > Table). Do NOT use screenshots or images of tables. The pipeline can read data from a real table but cannot reliably read it from an image.
Provide Captions: Always add a descriptive caption to your tables (e.g., "Table 1: Quarterly Sales Figures"). Refer to the table by its caption in your text (e.g., "...as shown in Table 1.").