Home Workshop Azure Infrastructure
Azure Infrastructure

Complete Azure Infrastructure
Data Mining Workshop

Every service, every connection, every cost — mapped for FAA’s Azure-native environment.

Architecture

3-Layer Architecture on Azure

Extract → Structure → Analyse. All three layers run within FAA’s Azure environment.

Layer 1: Extract

MS Graph API Mail.Read / OAuth 2.0
Azure Blob Storage Excel / CSV staging
Microsoft Forms Export Webform responses
Manual CSV Digitised records

Layer 2: Structure

Azure VM D4s v3 4 vCPU / 16 GB RAM
Python 3.11 Pipeline scripts
Qdrant Vector DB Semantic search
SQLite Unified prospect schema
Node.js 22 Dashboard API

Layer 3: Analyse

Workshop Dashboard Pipeline analytics
HubSpot Sandbox 6-year CRM import
Claude / OpenAI / Azure OpenAI AI classification API
Azure Services

Azure Services Required

All services run within FAA’s existing Azure subscription. No new tenant required.

Azure Virtual Machine

Critical

Standard D4s v3 — 4 vCPU / 16 GB RAM. Ubuntu 22.04 LTS. Region: Southeast Asia (Singapore). ~USD 180/mo.

Required from Day −14. SSH access needed by x.1 Foundation team for remote software installation.

Azure Managed Disk

Important

P10 Premium SSD — 128 GB. ~USD 20/mo. Stores workshop data, Qdrant vectors, and SQLite database.

Upgrade to P15 (256 GB, ~USD 30/mo) if processing more than 15,000 emails.

Azure Virtual Network + NSG

Important

Default VNet. NSG rules: Inbound SSH port 22 (restrict to FAA IP), Outbound HTTPS port 443 only. No additional cost.

IT confirmed Strict IT Policy. x.1 will provide exact whitelist before Day 1.

Azure AD / Entra ID App Registration

Critical

App name: FAA-DataMining-Workshop. Permissions: Mail.Read + Files.Read.All (Application). Global Admin consent required. ~USD 0 (included with M365).

IT co-lead Sir Roel confirmed MS Graph API experience. App registration takes approximately 30 minutes.

Microsoft 365 Tenant

Existing subscription. MS Graph API data source. Approximately 4 individual mailboxes + 1 shared inbox confirmed. No additional cost.

Email history goes back to FAA founding per IT prep call.

Azure Blob Storage

Important

LRS, Hot tier. ~USD 2/mo for 100 GB. Optional but recommended for FAA Excel file staging and workshop data backups.

Provides an audit-friendly staging area before data enters the VM pipeline.

Azure OpenAI Service — optional (Scenario C)

Deploy gpt-4o-mini in Southeast Asia region. Data stays within FAA’s Azure tenant. Requires 1–3 day approval from Microsoft after applying at aka.ms/oai/access.

Recommended for compliance — AI processing stays within the Microsoft ecosystem FAA already trusts.
Software Stack

Software Stack — Installed by x.1 Foundation

FAA IT only needs the VM provisioned. x.1 handles everything below via SSH.

Python 3.11
Node.js 22 LTS
Qdrant Vector DB (Docker)
SQLite 3
Docker + Compose
Caddy Reverse Proxy
MS Graph Python SDK
Claude SDK
OpenAI SDK
Ollama (optional, Scenario D)
x.1 Foundation handles all software installation via SSH. FAA IT only needs the VM provisioned and SSH port 22 open to the x.1 team IP.
Data Flow

Data Flow — What Moves Where

Five steps. Personal data stays on the Azure VM. External APIs receive anonymised text only.

  1. FAA M365 → MS Graph API (OAuth 2.0) → Azure VM Email threads, calendar data read into the pipeline
  2. FAA OneDrive / SharePoint → MS Graph Files.Read → Azure VM Excel and CSV prospect records transferred to the VM
  3. Azure VM → AI API → Azure VM Anonymised text → structured prospect records. Names and email addresses replaced with [PERSON_1], [EMAIL_1] etc. before any external API call.
  4. Azure VM (SQLite + Qdrant) → Workshop Dashboard (browser) Pipeline analytics and conversion charts served locally to workshop participants
  5. Azure VM → HubSpot Sandbox API → HubSpot Free Account Structured historical 6-year prospect data imported to FAA’s own HubSpot account
Privacy guarantee: No raw personal data leaves the Azure VM except for AI classification calls. All external API calls receive anonymised text. Scenarios C (Azure OpenAI) and D (Ollama) have zero external data transfer.
Setup Sequence

Setup Sequence — Before Day 1

Six milestones across three weeks. IT’s workload is front-loaded in weeks −3 and −2.

1. Provision Azure VM

FAA IT — Week −3

Provision Azure VM Standard D4s v3 in Southeast Asia, attach P10 Premium SSD disk, configure NSG with SSH inbound + HTTPS outbound rules.

2. Azure AD App Registration

FAA IT — Week −3

Register FAA-DataMining-Workshop app in Azure AD / Entra ID. Add Mail.Read and Files.Read.All Application permissions. Grant Global Admin consent.

3. Share Credentials Securely

FAA IT — Week −2

Securely share with x.1 Foundation: VM public IP, x.1 SSH public key added to the VM, and App Registration credentials (Tenant ID, Client ID, Client Secret).

4. x.1 Installs Software Stack

x.1 Foundation — Week −2

Install full software stack via SSH: Python 3.11, Node.js 22, Qdrant Docker containers, SQLite, Caddy. Configure and run MS Graph API end-to-end test.

5. Joint Validation Call

FAA IT + x.1 — Week −1

Live email read test, data anonymisation review, mailbox scope confirmation. Confirm which mailboxes are in scope and sign off on the data flow.

6. Day 0 Final Checks

All — Day 0

Workshop room internet speed test, machine count verification (8 confirmed), display setup. Final dashboard smoke test before participants arrive.

Network

Firewall Whitelist

All outbound. Port 443 HTTPS only — except the inbound SSH rule for x.1 access.

Endpoint Port Purpose Required For
graph.microsoft.com 443 MS Graph API All scenarios
api.anthropic.com 443 Claude API Scenario A
api.openai.com 443 OpenAI API + embeddings Scenario B
*.cognitiveservices.azure.com 443 Azure OpenAI Service Scenario C
none — local Ollama self-hosted Scenario D
api.hubspot.com 443 HubSpot import All scenarios
FAA IT confirmed Strict IT Policy firewall. x.1 Foundation will provide exact IP ranges and FQDNs 7 days before the workshop for your IT team to whitelist.

Explore the related resources for this workshop: