x.1 Foundation × First Aviation Academy

Data Mining Workshop

Enter the access code provided by x.1 Foundation.

← Back to portal

First Aviation Academy · Subic Bay, Philippines

Data Mining Workshop

A structured approach to six years of prospect data
x.1 Foundation · Pro Bono

Executive Summary

Turning 6 Years of Data Into Actionable Intelligence

The Goal

Demonstrate that AI-powered data mining can transform 6 years of scattered, unstructured data into a complete, actionable sales intelligence system — with immediate, tangible value for FAA.

The Deliverable

A fully populated historical dashboard with pipeline analytics, prospect insights, communication patterns, and a live HubSpot demo — built from FAA’s own real data.

The Model

A pro bono workshop under the x.1 Foundation mandate to empower Philippine organisations. FAA provides the data and infrastructure; x.1 provides the expertise and labour.

For Leadership: This workshop is a first step to introduce the supporting power of AI — not to replace people, but to empower them with intelligence they never had access to. The focus is to demonstrate real-life value using your own data.
The Challenge

6 Years of Valuable Data — Trapped in Silos

Since operations began, FAA has accumulated a rich history of prospect interactions. The problem isn’t a lack of data — it’s that the data is scattered across incompatible formats and systems.

Email Communications

Thousands of prospect emails across multiple mailboxes. Inquiries, follow-ups, negotiations, rejections — all untagged, unsorted, impossible to search by stage or outcome.

Outlook / Exchange

Excel Spreadsheets

Multiple spreadsheets maintained by different staff members with varying column names, formats, and levels of completeness. Some have 200 rows, some 2,000.

Multiple Files / Authors

Webform Submissions

Microsoft Forms / website contact forms with structured data (name, email, program interest, country) — the most consistent source, but disconnected from follow-up records.

Microsoft Forms / Website

Hand-Written Lists

Notes from school visits, career fairs, and walk-in inquiries. Valuable leads captured on paper but never digitized or connected to digital records.

Paper / Notebooks

Social Media Messages

Facebook Messenger is a primary inquiry channel in the Philippines. Conversations happen in DMs but are never captured in any database.

Messenger / Instagram

Phone & Walk-In Logs

Phone inquiries and walk-in visits captured informally — sometimes in a logbook, sometimes in a colleague’s memory. Critical context lost over time.

Informal Records
The Hidden Cost: Without connecting these data sources, FAA cannot answer fundamental questions: How many leads did we receive last year? What is our conversion rate? Why do prospects drop off? Which channels produce the best students?
The Opportunity

Seven Analysis Layers the Workshop Could Produce

Each of these becomes possible once the scattered data is unified, structured, and analysed with AI.

A

Communication Habits

Discover response time patterns, follow-up cadence, peak inquiry seasons, and communication gaps. Learn how FAA talks to prospects and where the process breaks down.

Scope note: Email and webform data are accessible via API. Facebook Messenger and WhatsApp communications require separate access arrangements and are subject to further discussion — they may not be part of the initial workshop scope.

MS Graph API Claude AI
B

Prospect Personas & Archetypes

Identify recurring prospect profiles: the serious career changer, the price-sensitive parent, the foreign agent pipeline, the walk-in dreamer. Understand who enquires, through which channel, and why.

Note: PTC’s team has already developed prospect personas in a previous session. This layer will analyse the actual historical data and compare those findings against the existing personas — validating, refining, or replacing them based on evidence.

AI Clustering RAG
C

Cross-Source Matching

Connect email addresses and names across webforms, emails, spreadsheets, and enrollment records. Map a prospect’s complete journey from first inquiry to certification (or drop-off).

Entity Resolution SQL
D

Pipeline Dashboards

Build historical pipeline views with conversion rates, stage durations, bottleneck analysis, and trend lines. For the first time, a consolidated view of the complete historical funnel.

Analytics Pipeline Engine
E

Prospect Drop-off Analysis

Understand why and when prospects disappear. Identify the stages with highest drop-off, the common last-touch patterns, and whether leads are truly lost or just forgotten.

Attrition Analysis AI Analysis
F

Live HubSpot Demo

Import the cleaned, structured data into a dedicated parallel HubSpot instance. A hands-on environment experiencing exactly what a production CRM looks like when populated with your own real historical data.

HubSpot Sync API
G

Communication Learnings

Extract best practices from successful enrollments. Which email templates worked? Which follow-up timing converted? What language resonated? Build a playbook from your own history.

AI Extraction Playbook
Technical Architecture

How the Data Mining System Works

A three-layer approach: Extract → Structure → Analyse

Layer 1: Extract

MS Graph API Read mailboxes with OAuth tokens. AI categorises, tags, and organises emails into folders.
File Parser Parse Excel, Word, CSV, and PDF files. Normalise column names, merge duplicates.
Webform Export Structured form data from Microsoft Forms and website submissions.
Manual Input Digitise handwritten notes and logbooks via AI OCR or manual entry.

Layer 2: Structure

SQL Database Unified relational schema: prospects, communications, stages, outcomes. Entity resolution by email + name.
RAG Knowledge Base Email bodies and documents embedded as vectors. Enables semantic search: “Show me all emails about medical clearance delays.”
AI Classification Claude API reads each email/document, assigns: stage, intent, sentiment, program interest, action required.

Layer 3: Analyse & Present

Pipeline Dashboard Historical funnel view with conversion rates, stage durations, bottlenecks, and trend analysis.
HubSpot Demo Cleaned data imported into a dedicated HubSpot instance for hands-on CRM experience.
AI Reports Communication protocols, persona analysis, lead death reports, best-practice playbooks — all generated from your own data.

Technology Stack

MS Graph API
Claude API (Anthropic)
VM Infrastructure TBD
SQLite / PostgreSQL
Qdrant (RAG)
Node.js
Python
HubSpot Free

Infrastructure decision pending: The workshop system can run on either an Azure VM (if FAA already has an Azure environment) or a Linux VPS (lower cost, no existing cloud account required). This will be confirmed during the IT preparation call based on FAA’s current setup and preferences.

Workshop Deliverables

What the Workshop Produces

Every deliverable below is built during the 5-day workshop and handed over to FAA in full — including all code, data, and documentation. No ongoing dependency on x.1 Foundation.

Complete Historical Database

All 6 years of prospect data — from every source — unified in a single, queryable SQL database. Every prospect matched across email, webform, and spreadsheet by email address and name.

Pipeline Analytics Dashboard

Full historical pipeline with conversion rates per stage, time-in-stage distributions, seasonal patterns, lead source effectiveness, and year-over-year comparisons.

Prospect Persona Analysis

AI-generated persona profiles based on actual communication patterns: who enquires, through which channel, what they ask, and how long they take to decide. Compared against PTC’s existing persona model to validate, adapt, or refine it.

Prospect Attrition Report

A detailed analysis of where and why prospects drop off — stage by stage. Last-touch patterns, common objections, and a clear picture of the 92 who didn’t make it for every 100 who enquired.

Live HubSpot Parallel Demo

FAA’s real historical data loaded into a dedicated parallel HubSpot instance — showing exactly what the prospect pipeline, deal timeline, and contact records look like in a professional CRM. A working reference environment alongside the live HubSpot account.

Full ownership. All code, data, API keys, and documentation stay with FAA. No vendor lock-in, no subscription to x.1 Foundation, no ongoing dependency.
Future Options

What Becomes Possible Next

The workshop builds the foundation. The following capabilities are not part of this workshop scope — but become natural next steps once the data infrastructure is in place.

Communication Best-Practices Playbook

Translating the data insights into a structured playbook: response templates, follow-up cadences, objection-handling scripts — derived from what FAA’s own history shows works. The workshop generates the raw findings; the playbook is the consulting step that turns them into operational procedures.

Full HubSpot Production Migration

Merging the validated historical data from the workshop demo instance into FAA’s live HubSpot account — giving the active CRM a complete prospect history from day one. Requires careful data mapping, field alignment, and change management to avoid overwriting live records.

RAG-Powered Communication Archive

All email communications embedded in a searchable AI knowledge base. Ask questions like: “What did we discuss with Maria Santos about medical clearance?” and get instant, cited answers. Requires significant compute infrastructure and ongoing API costs — valuable, but not part of the initial workshop scope.

Live AI Prospect Scoring

Applying the workshop models to new incoming inquiries in real time — scoring each prospect’s likelihood of enrolment based on profile, channel, and communication patterns identified in the historical data.

Additional Channel Integrations

Extending the data scope to include Facebook Messenger, WhatsApp Business, and other inquiry channels currently not covered in the workshop. Each requires separate access arrangements, API setup, and data compliance review.

Workshop Schedule

A Structured Journey from Raw Data to Actionable Intelligence

The workshop runs across approximately two to three weeks — mostly via online calls and remote collaboration, with concentrated hands-on sessions for the technical phases. Three milestone calls anchor the process: kickoff, midterm review, and final presentation. The pace adapts to team availability throughout.

Pre-Workshop: Discovery & Alignment Call

Before Day 1 — Online • ~60 min

An informal online session to present the proposal, walk through the IT preparation checklist together, and align on scope before any commitment is made.

  • Walk through the proposal and answer open questions
  • Review the IT prep checklist together (MS Graph access, data sources, hardware)
  • Decide on infrastructure approach: Azure VM (recommended), Azure AI Services add-on, or local server
  • Agree on workshop scope, timeline, and participant availability
  • Outcome: green light + IT begins preparation
Leadership IT x.1 Foundation

Kickoff Call — All Parties

Day 1 Morning — Online • ~45 min

Official start of the workshop. Everyone meets, roles are confirmed, and the timeline is locked.

  • Introduce the full team: IT, Leadership, Sales, x.1 Foundation
  • Confirm data sources and access credentials are in place
  • Agree on communication cadence — flexible and adapted to the IT team’s readiness and overall availability throughout the workshop
  • Walk through the five deliverables and agree on success criteria
  • Email access approval: If individual salespeople maintain their own mailboxes with prospect communications, each person must provide documented written approval before any mailbox is accessed — this is a compliance requirement and will be prepared as a formal sign-off document
Leadership IT Sales / Admissions x.1 Foundation

Phase 1: Infrastructure & Access Setup

Day 1 Afternoon – Day 2 — Remote Screen-Share

x.1 Foundation and IT work together via screen-share to stand up the workshop environment and verify all data access is functional. All infrastructure decisions and access arrangements are documented and signed off.

  • Infrastructure (FAA confirmed: Azure):
    • Azure VM (Recommended) — Standard D4s v3 in FAA’s own Azure tenant. Keeps all infrastructure within the Microsoft ecosystem. ~USD 180/month.
    • Azure AI Services (add-on) — Azure OpenAI Service within FAA’s own tenant. AI processing stays inside FAA’s Azure subscription. Recommended for compliance.
    • Local machine — fallback if neither cloud option is available; requires a capable workstation staying online during extraction.
  • Install Python environment, database engine, and extraction tools
  • IT registers MS Graph App in Azure AD — credentials remain within FAA infrastructure at all times
  • Verify mailbox access: test read on each approved mailbox in scope
  • Inventory all data sources: shared drives, Excel files, webform exports
  • HubSpot dual-access setup:
    • Existing FAA HubSpot — read-only access configured to understand current pipeline stages, contact fields, and deal structure. No data will be written or modified.
    • New free HubSpot sandbox — a separate, dedicated HubSpot account created for the workshop. All mined and structured data will be imported here for demonstration. This is the environment the team will explore and evaluate.
  • Initial review of existing pipeline stages — open discussion on whether the current stage structure is accurate or may benefit from refinement
IT — lead x.1 Foundation — technical guide

Phase 2: Data Extraction

Day 3–5 — Automated Pipeline + Async Check-ins

Largely automated. x.1 runs the extraction pipeline; IT monitors and resolves any access issues. Brief daily async check-in (15 min call or message) to confirm progress.

  • Pull all emails via MS Graph API — all approved mailboxes in scope
  • AI categorisation: tag each email by stage, intent, and prospect identity
  • Parse all Excel/Word prospect files into normalised tables
  • Export and clean webform submission data
  • Daily async status update to IT and Leadership
  • Note: Digitisation of scanned documents or handwritten records is out of scope for this workshop. As a demonstration, one representative example may be processed to show the approach — a full digitisation effort would be scoped as a separate future engagement.
x.1 Foundation — automated pipeline IT — on-call for access issues

Phase 3: Structure, Unify & Classify

Day 6–8 — Automated + Validation Sessions

AI resolves identities across sources and builds the unified database. IT, Sales, and Leadership all participate in reviewing outputs to confirm the data reflects operational reality.

  • Entity resolution: match the same prospect across email, webforms, and spreadsheets
  • Build unified SQL database with full prospect timeline per person
  • AI classification: assign each prospect to a pipeline stage based on evidence
  • IT validation session: spot-check 20–30 known prospects to confirm technical accuracy
  • Sales review: confirm that stage assignments reflect what the team remembers about real cases
  • Leadership review: validate that the emerging picture of the historical pipeline matches strategic understanding — flag any discrepancies
  • Generate first-pass pipeline metrics for the midterm review call
x.1 Foundation — AI pipeline IT — technical validation Sales — operational validation Leadership — strategic review

Midterm Review Call

End of Day 8 — Online • ~60 min

A structured review of first findings before the final analysis phase begins. Leadership sees early results; the team aligns on any scope adjustments.

  • Present first pipeline metrics: prospect counts, stage distribution, drop-off rates
  • Discuss data quality findings and any gaps in the historical record
  • Confirm or adjust the five deliverables based on what the data actually contains
  • Agree on priority focus areas for Day 9–10
  • Preview the HubSpot sandbox environment with early data
  • Collect interim feedback from all participants on process and emerging findings
Leadership IT Sales / Admissions x.1 Foundation

Phase 4: Analysis & Deliverable Production

Day 9–10 — x.1 Working Sessions

x.1 Foundation generates all final outputs. Sales is on-call for context questions. Deliverables are assembled and reviewed before the final presentation.

  • Generate Pipeline Analytics Dashboard with all historical data
  • Run AI persona clustering: identify recurring prospect archetypes from the data
  • Produce Attrition Report: stage-by-stage drop-off with root cause patterns
  • Finalise HubSpot sandbox with full historical dataset imported
  • Assemble all five deliverables into a structured handover package
  • Prepare final presentation and success measurement summary
x.1 Foundation — lead Sales — on-call for context

Final Presentation & Sign-Off

Day 10 — Online • ~90 min

All five deliverables presented to the full team. Since IT has been a co-owner throughout, this session focuses on reviewing, validating, and agreeing on the outputs — not on a one-way handover.

  • Walk through all five deliverables with the full team
  • Live demo of the HubSpot sandbox on FAA’s own historical data
  • Review persona analysis findings against PTC’s existing personas — validate, refine, or challenge
  • Present attrition findings and key drop-off patterns
  • Confirm that all code, data, and documentation are complete and accessible within FAA’s own infrastructure
  • Structured feedback round from all participating members: Leadership, IT, and Sales
  • Measure workshop success against agreed criteria: pipeline data completeness, stage accuracy, persona validity, and team readiness to use the outputs
  • Discuss optional next steps: HubSpot production migration, playbook development, additional channel integrations
Leadership IT Sales / Admissions x.1 Foundation
Participant Key
Leadership — strategic decisions, final approval IT — infrastructure, access, validation (intensive throughout) Sales / Admissions — business context, operational validation x.1 Foundation — technical execution (remote throughout)
Fully remote-friendly & Agile by design. All milestone calls are online. Technical phases run via screen-share and async communication — no travel required. The workshop follows an Agile approach: iterative, feedback-driven, and adaptive to team availability. The total calendar span is typically 2–3 weeks. Every access arrangement, approval, and scope decision is formally documented and signed off at each phase.
Requirements

What FAA Needs to Provide

x.1 Foundation provides the expertise and labour. FAA provides the data access and infrastructure. Every item below is formally documented and approved before work begins.

CEO Approval & Support

Critical

Executive sponsorship for the workshop. The CEO’s endorsement signals to both IT and Sales that this initiative has organisational backing and sets the tone for compliance with access requirements.

Why: Access to email data and company files requires executive authorisation. Individual salespeople must also provide written approval for access to their own mailboxes — a formal document will be prepared for this.

Workshop Infrastructure

Critical

A server environment to run the data mining pipeline. All data stays within infrastructure controlled by FAA — nothing is processed on x.1 Foundation servers. Since FAA operates within the Microsoft Azure ecosystem, Azure VM is the recommended option:

Option A — Azure VM (recommended for FAA): Standard D4s v3 (4 vCPU, 16 GB RAM, 128 GB SSD) in FAA’s own Azure tenant. Keeps all infrastructure within the familiar Microsoft ecosystem. Estimated cost: ~USD 180/month.
Option C — Azure AI Services (add-on to Option A): Deploy Azure OpenAI Service within FAA’s own Azure tenant. AI processing stays inside FAA’s subscription — no data sent to Anthropic or OpenAI. Recommended for compliance-conscious organisations. Requires 24–72 hour Azure OpenAI approval from Microsoft.
Linux VPS alternative (not recommended for FAA): Hetzner CX32 or similar (~USD 20/month) offers equivalent performance at lower cost and is x.1 Foundation’s preferred environment. Not recommended here since FAA IT is Azure-native — keeping infrastructure in Azure avoids introducing an unfamiliar platform.

Claude API Key (Anthropic)

Critical

An Anthropic API key registered under FAA’s own account, with sufficient budget to process thousands of emails and documents through AI classification and analysis.

Estimated budget: USD 50–150 for the workshop period (depends on email volume). Claude Haiku for high-volume classification; Claude Sonnet for deeper analysis and persona generation.

MS Graph API Access & Approvals

Critical

Read-only OAuth access to the relevant mailboxes via Microsoft Graph API. For shared mailboxes (info@, sales@, admissions@), IT configures access. For individual salesperson mailboxes, written approval from each person is required before access is granted.

Setup by IT: Register an Azure AD app with Mail.Read scope. x.1 team guides the full configuration. A formal access approval document will be prepared for each salesperson to sign.

HubSpot — Dual Account Setup

Important

Two HubSpot environments are needed for the workshop:

1. Existing FAA HubSpot — read-only: API access configured to read current pipeline stages and contact structure. No data will be modified or written.
2. New free HubSpot sandbox: A separate, dedicated free HubSpot account created on Day 1. All mined data is written here for demonstration. This environment is used to show what a fully populated historical CRM looks like — a decision on migrating this data to the live HubSpot instance is a separate future step.

Data File Collection

Important

All available Excel files, Word documents, exported webform data, and any other records related to prospect tracking accumulated over the past 6 years.

Format: As-is. We will clean and normalise everything. Files are uploaded to the workshop server, not shared externally.

IT Team Participation

Critical

IT has an intensive, active role throughout the entire workshop — not just in setup. The IT team handles infrastructure provisioning, API configuration, data access, validation sessions, and is a co-owner of the final outputs.

Liaison required: A named IT liaison coordinates between the IT team and x.1 Foundation for the duration of the workshop. Full IT team involvement is expected during Phases 1, 3, and the Final Presentation. Availability during Phases 2 and 4 is on-call.

Estimated time commitment: IT Liaison — ~3–4 hours/day during active phases. Full IT team — milestone calls plus validation sessions.

Total Investment Summary

Scenario VM/mo AI API/mo Total/mo Data Safety
A — Claude API $180 ~$10 ~$200 External (Anthropic)
B — OpenAI API $180 ~$5 ~$205 External (OpenAI)
C — Azure AI (Recommended) $180 ~$5 ~$205 Microsoft tenant
D — Own Model (Max Safety) $720 $0 ~$740 Zero external transfer
x.1 Foundation facilitation & engineering: Pro Bono (~120 hours) — FAA covers infrastructure only.

All scenarios include HubSpot Sandbox (free tier) and MS Graph API (included with M365). One-time cost: OpenAI Embeddings $0.50 (Scenarios A–C) or $0 (Scenario D). Pricing: Azure Southeast Asia, March 2026.

Interactive Cost Calculator Full Azure Infrastructure Blueprint
x.1 Foundation mandate: All consulting, engineering, and facilitation work is provided pro bono. The foundation’s purpose is to empower Philippine organisations through technology — this workshop is a direct expression of that mandate. FAA only covers infrastructure costs, which remain in FAA’s own accounts.
About This Proposal

Who We Are

About the organisations delivering this workshop.

x.1 Foundation

Non-Profit

Philippine non-stock, non-profit corporation (SEC CN202003705) with a mandate to empower Filipino organisations and individuals through technology.

  • Pro bono workshop facilitation
  • AI engineering & data architecture
  • Knowledge transfer & capacity building
  • No sales incentive — advocacy only

Role in this workshop: Strategy, AI engineering, data mining, analysis, and all deliverables — provided entirely pro bono.

xex.one Limited

For-Profit Consulting

A Hong Kong–registered executive communications consulting firm serving C-suite executives, keynote speakers, and their teams for over 30 years.

  • Executive consulting services
  • ISO 26000 & social impact consulting
  • Technology-enabled process implementation
Important: This data mining workshop is entirely pro bono. x.1 Foundation provides all expertise and labour at no charge. The only costs to FAA are infrastructure (server + Claude API) which remain in FAA’s own accounts. Any future CRM implementation, AI support, or consulting engagement would be a separate, optional arrangement.
Next Steps

Getting Started

Four clear steps to move from proposal to workshop. Each step produces a signed document or confirmed decision before the next begins.

1

CEO & IT Briefing

Share this proposal with the CEO and key decision makers — including the IT lead. We are happy to present a 15-minute executive briefing via Zoom or on-site at Subic Bay. IT must be present: they are a critical partner from day one, not a support function.

2

Approve Infrastructure & APIs

Formally approve the infrastructure choice (Linux VPS or Azure VM) and the following API accounts — each registered under FAA’s own accounts and documented:

  • Claude API (Anthropic) — AI processing budget
  • Microsoft 365 / MS Graph API — mailbox read access
  • HubSpot — read access to existing instance + new free sandbox
3

Sign Approval Documents

x.1 Foundation prepares a documentation package for sign-off before work begins:

  • Workshop scope & objectives agreement
  • Data access authorisation (CEO sign-off)
  • Individual mailbox access consent (per salesperson)
  • Infrastructure & API setup checklist (IT sign-off)

The workshop follows an Agile methodology: iterative, feedback-driven, and adaptive. All decisions and approvals are documented at each phase to ensure a fully auditable process.

4

Schedule IT Prep Call & Kickoff

Once approvals are in place, we schedule the Pre-Workshop Discovery Call with IT and Leadership, followed by the full Kickoff Call. IT begins infrastructure setup; the workshop begins. Results in approximately 10 working days.

This is the first step to introduce the supporting power of AI — how to empower people, not replace them.

Nik Metaxa-Schwarten
Executive Director, x.1 Foundation
nik@x1-foundation.org