Home Workshop IT Team
IT Team Track

IT Team Preparation
Data Mining Workshop

Three setup tasks. Most of the workshop is automated after Day 2.
This page gives you everything you need before the x.1 team arrives.

Technical Overview

3-Layer Architecture

Extract → Structure → Analyse. Each layer builds on the previous. IT’s role is primarily in Layer 1 setup.

Layer 1: Extract

MS Graph API OAuth 2.0 / Mail.Read
File Parser Excel / CSV / Word
Webform Export Microsoft Forms
Manual Input Digitised records

Layer 2: Structure

SQLite / PostgreSQL Unified schema
Qdrant / ChromaDB Vector embeddings (RAG)
Claude API AI classification
Python 3.11 Pipeline scripts

Layer 3: Analyse & Present

Pipeline Dashboard HTML / JS reports
HubSpot Free CRM import via API
Claude API Report generation

Full Technology Stack

MS Graph API (Mail.Read)
Claude API (Anthropic)
or Azure OpenAI (tenant)
Azure VM D4s v3
Python 3.11
SQLite / PostgreSQL
Qdrant / ChromaDB
HubSpot Free
Ubuntu 22.04 LTS
IT Tasks

Three Things IT Needs to Set Up

These are the only tasks that require FAA IT involvement. Everything else is handled by the x.1 engineering team.

1. Azure AD App Registration

Required

Register an Azure Active Directory application with Mail.Read permission to allow the pipeline to read the relevant mailboxes via MS Graph API.

Time: ~45 minutes. Full steps in the section below.

2. Azure VM Provisioning

Required

Provision a Standard D4s v3 virtual machine running Ubuntu 22.04 LTS in the FAA Azure tenant. The x.1 team will handle all software installation remotely via SSH.

Spec: 4 vCPU, 16 GB RAM, 128 GB SSD. Open port 22 for SSH access. ~USD 180/month.

3. Claude API Key

Required

Create an Anthropic account on console.anthropic.com under an FAA-owned email. Add USD 50–150 in credits. Generate an API key and share it with the x.1 team securely.

Model usage: Claude Haiku for bulk classification (cheap), Claude Sonnet for report generation.

4. Data Outsourcing Agreement

Required — Before Day 1

Arrange for an authorised FAA signatory (CEO, DPO, or Legal) to review and sign the Data Outsourcing Agreement (DOA) prepared by x.1 Foundation. This is a legal requirement under the Philippines Data Privacy Act (RA 10173) before any personal data can be processed by the x.1 team.

Also: When creating the Anthropic account, sign Anthropic’s Data Processing Addendum to cover the Claude API sub-processing chain. → Full Data Privacy overview
Azure AD Setup

Azure AD App Registration — Step by Step

This gives the pipeline OAuth 2.0 access to read the relevant mailboxes. Admin consent is required.

  • Navigate to portal.azure.com and sign in with your FAA Azure admin account
  • In the search bar, type App registrations and select it from the results
  • Click New registration. Name it: FAA-DataMining-Workshop. Account types: Accounts in this organizational directory only. No redirect URI needed. Click Register.
  • From the app overview, copy the Application (client) ID and Directory (tenant) ID — you will need these later
  • Go to API permissions in the left sidebar. Click Add a permission → Microsoft Graph → Application permissions
  • Search for Mail.Read and select it. Also add User.Read.All if you want to include user profile data. Click Add permissions.
  • Click Grant admin consent for [FAA tenant] and confirm. The status column should show a green tick for both permissions.
  • Go to Certificates & secrets → Client secrets → New client secret. Set expiry to 6 months. Copy the Value immediately — it is only shown once.
  • Share the following with the x.1 team via a secure channel (Signal, encrypted email, or shared vault): Tenant ID, Client ID, Client Secret
Scope of access: The Mail.Read application permission grants read-only access to all mailboxes in your tenant. You can restrict this to specific mailboxes by adding a Mail.Read application access policy in Exchange Online PowerShell. The x.1 team will advise if you want to scope it to only sales@, info@, admissions@.

Option C: Azure OpenAI Service — Data Stays in FAA Tenant

Recommended for compliance
  • In Azure Portal → Azure OpenAICreate (requires Cognitive Services contributor role)
  • Select region: Southeast Asia for lowest latency
  • After approval (24–72 hours from Microsoft), go to Azure AI Studio → Model deployments
  • Deploy: gpt-4o-mini (sufficient for extraction) and text-embedding-3-small
  • Generate API key in Azure Portal → note the endpoint URL and key
  • Configure on workshop VM: same workflow as OpenAI SDK but pointing to Azure endpoint

Same cost as OpenAI API (~$5/mo) — but all AI processing stays within FAA’s Azure subscription. Recommended for compliance-conscious organisations. Note: requires advance planning due to the 24–72 hour Azure OpenAI deployment approval process.

Data Security

Where Your Data Goes — and Where It Does Not

All data stays in FAA’s Azure tenant

The pipeline runs on an Azure VM within FAA’s own Azure subscription. Extracted emails and structured data are stored on that VM only — they do not leave the FAA Azure environment. At the end of the workshop, the VM can be decommissioned and all data deleted.

Claude API receives text only — no PII in prompts

When email content is sent to the Claude API for classification, it is anonymised first: names and email addresses are replaced with placeholders before the prompt is submitted. Anthropic receives text fragments only. No personally identifiable information is included in API calls after the anonymisation step.

HubSpot receives only what FAA approves

The HubSpot import is controlled and selective. Only the structured prospect records that FAA approves for the demo are pushed to HubSpot via its API. This uses FAA’s own HubSpot account (free tier) — not an x.1 account.

x.1 team has read access only

The x.1 engineering team is granted SSH access to the Azure VM and read-only OAuth access to the mailboxes. They cannot write to mailboxes, cannot access FAA financial systems, and cannot access anything outside the scope defined in the Azure AD app registration.

Data Outsourcing Agreement (DOA) — RA 10173 Compliance

The Philippines Data Privacy Act (RA 10173) requires a formal Data Outsourcing Agreement between FAA (as Personal Information Controller) and x.1 Foundation (as Personal Information Processor) before any personal data is processed. x.1 Foundation will prepare the DOA draft. An authorised FAA signatory must sign it before Day 1. Digital signature is accepted under Philippine e-commerce law.

View full Data Privacy & Compliance overview

Network Requirements

Outbound HTTPS Only

The pipeline requires outbound HTTPS access from the Azure VM to three external endpoints. No inbound rules beyond SSH are required.

Endpoint Purpose Protocol / Port Direction
graph.microsoft.com MS Graph API — read mailboxes HTTPS / 443 Outbound
api.anthropic.com Claude API — AI classification & reports HTTPS / 443 Outbound
api.hubapi.com HubSpot API — CRM import (Day 9 only) HTTPS / 443 Outbound
*.cognitiveservices.azure.com Azure OpenAI API — AI processing (Scenario C only) HTTPS / 443 Outbound
x.1 team IP (TBD) SSH access for remote setup and monitoring SSH / 22 Inbound
Azure NSG: If the Azure VM has a Network Security Group attached, ensure that outbound rules allow HTTPS to 0.0.0.0/0 (default) or to the specific IPs/domains above. The x.1 team can provide their static IP for the SSH inbound rule before Day 1.

Scenario D (self-hosted Ollama): If FAA selects the self-hosted LLM option, only the MS Graph API outbound endpoint is required — no external LLM API endpoints needed. All AI inference runs locally on the VM.
IT Timeline

When IT Is Needed

After Day 2, the pipeline runs automatically. IT’s active involvement is front-loaded.

Before Day 1 — Pre-Workshop Setup

~2 hours total
  • Azure AD app registration (steps above)
  • Azure VM provisioned (Ubuntu 22.04, D4s v3)
  • Claude API key created and shared with x.1 team
  • SSH access granted to x.1 team IP
IT Liaison

Day 1–2 — Discovery & Setup

~2 hours/day
  • Joint session with x.1 team: verify Azure app permissions
  • Confirm mailbox list (sales@, info@, admissions@, etc.)
  • Transfer Excel/CSV files to VM shared folder
  • Verify MS Graph API token generation works end-to-end
IT + x.1 Team

Day 3–8 — Automated Pipeline

On-call only
  • Pipeline runs automatically on the Azure VM
  • IT monitors VM health (CPU / memory) — 5 min/day check
  • Available for ad-hoc questions if x.1 team hits an access issue
Mostly automated

Day 9–10 — Finalisation & Presentation

Optional attendance
  • HubSpot import (x.1 team handles, IT observes optionally)
  • Day 10 findings presentation — IT attendance recommended
  • Post-workshop: decision on whether to keep or decommission VM
Everyone — optional for IT
Planning Resources  —  Full Azure Infrastructure Blueprint  ·  Interactive Cost Calculator (compare all 4 AI scenarios with live Azure pricing)

Questions about the setup? The x.1 technical team is available before Day 1 to walk through anything.

Nik Metaxa-Schwarten
x.1 Foundation — Technical Lead
nik@x1-foundation.org
Back to main workshop proposal