The Situation

A mid-size management consulting firm came to us with a familiar problem: 8 years of SharePoint accumulation. 47 team sites. 3 different folder naming conventions. An unknown number of duplicate documents. And 200 consultants spending an average of 45 minutes per day searching for internal documents.

Phase 1: Content Audit

We ran an automated content audit across all 47 SharePoint sites, classifying documents by type, project, and freshness. The results were eye-opening: 142,000 total documents, 38% duplicates, 22% orphaned with no owner.

Phase 2: Deduplication & Normalization

We built a deduplication pipeline using file hashing (exact duplicates) and semantic similarity (near-duplicates). We kept the most recent version and archived the rest. Then we normalized formats, standardized metadata, and tagged by project, client, and knowledge domain.

Phase 3: AI-Powered Search

With clean data, we deployed an AI search interface. Consultants could ask natural language questions and get cited answers with source links. Domain-specific chunking strategies ensured proposals, deliverables, and memos were each handled optimally.

Results

  • 82% reduction in average search time (45 min → 8 min/day)
  • 3x faster proposal creation for new consultants
  • ROI payback in under 3 months