This page explains the end-to-end process at a high level.
This workflow ensures that all data, whether authored from scratch or processed from existing documents, is clean, structured, and validated before being integrated into the knowledge base.
This workflow shows the current process for handling the shown data types
graph TD;
A[Start] --> B{Data Type?};
B -- New Content --> C[📄 Author New Doc in GDocs];
B -- Legacy/Existing PDF --> D[📄 Process with OCR Package];
C --> E[🧪 Test & Validate Content];
D --> E;
E --> F[✅ Ready for Import];
F --> G[🎉 Imported to Knowledge Base];
This diagram illustrates an improved workflow after implementing the proposed Unified Ingestion Engine . The key change is that all data sources are funneled into a single engine that produces a standardized JSON.
flowchart TD;
subgraph 1 ["Source Documents"]
direction TB
B[External PDF];
C["Internal Documents
(Exported from GDrive)"];
end
subgraph 2["Proposed: Unified Ingestion Engine"]
direction TB
D[Collector] --> E[Extractor]
E --> F[Cleaner]
F --> F1[Normalizer]
F1 --> F2[Validator / Scorer]
end
F2 --> G(("Standardized JSON
(for review)"))
G --> 3
subgraph 3[Downstream Process]
direction TB
H[🧪Test and
Validate Content] --> I[✅Ready for Import]
I --> J[🎉Imported to Knowledge Base]
end
B --> D
C --> D
Click here for further details on the Unified Ingestion Engine.