Skip to main content

MedLearn SharePoint Q&A Agent

AI Agent SharePoint MedLearn SharePoint Q&A Agent Restricted-folder document querying — POC AI Integration POC

A proof-of-concept to let authorised MedLearn staff query specific SharePoint folders — SOPs, how-tos, and technical notes — using natural language, and receive answers with source citations. Requested by Alex Furr (DEO Lead), initially scoped for five named users.

Status POC — in development
Requester Alex Furr, DEO Lead — Digital Education Office
Team Adrian Cowell
Tech stack Python, Microsoft Graph API, Microsoft Entra ID (OAuth2), RAG / vector embeddings, WordPress plugin
Users ~5 named Imperial staff (DEO team)
GitHub github.com/adrianImperial/medlearn-Sharepoint

The Challenge

The MedLearn and DEO teams maintain a growing body of operational knowledge — SSH commands, deployment procedures, platform SOPs, infrastructure how-tos — spread across SharePoint folders. Finding a specific piece of information requires knowing which folder to look in, navigating SharePoint’s interface, and reading through documents manually.

The request is simple in concept: point an agent at those folders and ask it questions. The implementation involves careful decisions around permissions, data governance, and where processing can take place — particularly given Imperial’s M365 environment and information security requirements.

What It Will Do

When complete, the agent will:

  • Index documents from specified SharePoint folders only — no access beyond the defined scope
  • Extract and chunk text from docx, pdf, and pptx files
  • Answer natural-language queries (e.g. “What is the SSH command to connect to MedLearn prod?”) with verbatim supporting snippets and links back to the source file
  • Respect each user’s existing SharePoint permissions via delegated Microsoft Entra ID authentication — no user can retrieve documents they couldn’t already access
  • Log queries and cited sources for audit purposes

A second surface — a Microsoft Teams bot or Copilot Studio agent — is in scope as a follow-on once the core service is validated.

Architecture

Auth

Delegated Microsoft Entra ID (OAuth2 authorisation code flow). Each user signs in — the agent uses their Graph token and inherits their SharePoint permissions.

Ingestion

Microsoft Graph API enumerates target folders, downloads file content, extracts text, chunks it, generates embeddings, and stores with metadata (filename, SharePoint URL, last modified).

Retrieval & Answer

Query is embedded and matched against the vector store. Top chunks are passed to an LLM with a strict citation requirement. If confidence is low, the agent says so and returns the most relevant documents instead.

MedLearn UI

A protected WordPress page with a chat interface. Calls the retrieval service and renders answers with cited filenames and SharePoint links. SSO-gated to authorised users only.

POC Scope & Constraints

To keep the first build fast and governable, the POC is intentionally constrained:

  • Folders: 2–3 specific SharePoint folder URLs (to be confirmed by requester)
  • Users: exactly 5 named Imperial staff — access is not open
  • Auth model: delegated (per-user sign-in, not a service account)
  • File types: docx, pdf, pptx, txt/md — no OCR on scanned PDFs in v1
  • Answer format: verbatim snippets + SharePoint file link + last-modified date
  • Audit log: user ID, query, file URLs cited — stored server-side

Pending from requester

Exact SharePoint folder URLs, confirmation of 5 authorised user accounts, and confirmation that external LLM use (outside M365) is approved by ICT/InfoSec. These are required before the indexing service can be built.

Next Steps

  1. Confirm SharePoint folder URLs and authorised user list with Alex Furr
  2. Confirm ICT/InfoSec position on external LLM use vs Azure OpenAI
  3. Register app in Microsoft Entra ID (Imperial tenant) — delegated permissions
  4. Build ingestion service (Graph → text extraction → embeddings → vector store)
  5. Build retrieval + answer layer with citation enforcement
  6. Build WordPress chat UI page (SSO-gated)
  7. Pilot with 5 users, gather feedback, assess Teams surface viability