<aside>
Objective: To provide a standardized procedure for converting batches of external or legacy PDFs (e.g., from partners, scanned archives) into text-searchable files that are ready for ingestion into the ADAM knowledge base pipeline.
Applies To:
Any PDF document that was not created following SOP-101.
Responsible Role:
Data Steward / Content Manager
</aside>
This SOP covers the end-to-end process of taking a collection of raw PDF files, processing them using the company's OCR tool, performing a quality check, and handing them off to the next stage of the data pipeline.
Objective: To organize source files into a dedicated workspace before processing.
Step 1. Collect Source PDFs
Gather all the PDF files for a single ingestion batch (e.g., "Q2 Partner Research Papers," "Archived Grow Guides").
Step 2. Create a Working Folder
On your local machine or a designated network drive, create a new folder. Name this folder using the convention: YYYY-MM-DD_Batch-Description.
2025-06-23_Partner-Grow-Guides
Step 3. Stage the Files
Move all the collected source PDFs from Step 1 into this new working folder. This folder is now your input directory.
Objective: To run the OCR tool on the staged files using the standard configuration.