Alex Peczon

Hey, I'm Alex

Data software engineer. I like turning messy APIs, logs, reviews, surveys, and spreadsheets into pipelines people can actually use.

Favorite language: Python

Where I moved data around for real

Work that sits somewhere between data engineering, applied ML, and product automation.

Future Tilt

Software Engineer
Jul 2025 to Present
San Francisco
  • AI template builder: developed a React and FastAPI platform that converted campaign plans into editable Klaviyo-compatible marketing emails, using AI-generated copy and reusable email components to produce branded first drafts from designer-maintained assets in Google Drive, reducing email preparation time by 40%.
  • Built an AWS Lambda and BigQuery alerting system that compared month-to-date campaign and flow revenue against the same date range from the prior year, posting daily Slack reports to surface seasonal underperformance across ecommerce clients.
React FastAPI BigQuery AWS Lambda Klaviyo API Slack API Google OAuth Docker
Future Tilt
Future Tilt

Superlinked (Series A)

SIE Demo Software Developer (Contract)
Mar 2026 to May 2026
San Francisco Bay Area · Hybrid
  • Developed a paid launch demo for Superlinked's SIE engine during early access, working with Valentin Marek and Eric Taylor to showcase explainable wine recommendations.
  • Built a RAG wine recommender that used Vivino data, small inference models, OCR, and text embeddings to surface similar wines with clearer reasoning.
  • Shipped a React UI and containerized Python monorepo in Docker, keeping OCR and embedding modules cleanly separated for documentation users.
Superlinked SIE React Python Docker RAG Small Models

MAGICS Lab

NLP Research Assistant
Mar 2025 to May 2026
San Francisco
  • Built parallelized ETL pipelines in Go across 5 virtual machines using DuckDB analytical querying to generate entity-aware NLP datasets from 20,000+ news articles.
  • Implemented a modular explainable ABSA framework that outperformed VADER on SemEval 2014 Restaurant by 5 percentage points (78.58% vs. 73.57% accuracy), with transparent reasoning traces.
Go Python NLP DuckDB SemEval VADER ABSA

Alaris Security (Pre seed)

Junior Fullstack Engineer
Aug 2025 to Nov 2025
San Francisco
  • Resolved frontend and data consistency issues across an internal React platform, eliminating duplicate data displays, fixing UI rendering bugs, and establishing reusable patterns for frontend-to-backend communication.
  • Consolidated 50+ frontend database queries into typed tRPC procedures backed by Drizzle ORM, improving maintainability and reducing redundant data access across a Next.js + React monorepo.
Next.js React tRPC Drizzle ORM TypeScript PostgreSQL
WeWork build room
WeWork build room
Office view
Office view
Top view
Top view

Future Tilt

Software Engineering Intern
Jun 2025 to Aug 2025
San Francisco
  • Engineered a campaign synchronization service using AWS Lambda, BigQuery, and webhook-based APIs to coordinate campaign data across Trello, Klaviyo, and internal systems, reducing campaign maintenance overhead by 50%.
AWS Lambda BigQuery Klaviyo API Trello API Webhooks Automation

Candle Stories

Production Assistant
Apr 2025 to Aug 2025
San Francisco
  • Supported documentary shoots, equipment handling, and on set logistics. Less data pipeline, more real world pipeline.
Production Logistics

USF Strategic Enrollment Management

Data Analyst / Web Intern
Jul 2024 to Jul 2025
San Francisco
  • Conducted year-over-year exploratory analysis on 500,000+ prospect student records from SLATE, identifying macro enrollment trends and analyzing acceptance and declination rates by geography, finding that recruitment events where students from the same region met had significantly higher conversion rates.
  • Automated recurring website content updates for prospective students by building Python and Jinja2 templating workflows over a legacy SLATE-hosted system running raw HTML and React, eliminating manual copy-paste updates across enrollment marketing pages.
Go SQL Pandas SLATE Jinja2 React Admissions Analytics

iD Tech Camps (Stanford)

Machine Learning Instructor
Jun 2024 to Aug 2024
Stanford, California
  • Taught project based Python and machine learning lessons to high school students at Stanford, covering neural networks, NumPy, Pandas, and Keras.
Python PyTorch Keras NumPy Pandas

UC Merced to SATAL

Data Analyst Intern
Aug 2023 to May 2024
Merced
  • Started my first day as a Data Analyst Intern at SATAL UC Merced, using survey data and student feedback to improve enrollment support, course experiences, and faculty decision making.
  • Built a survey response normalization pipeline using Pandas and OpenAI assisted categorization to bucket thousands of open ended Qualtrics responses into structured themes, transforming unstructured student feedback into analyzable datasets for faculty action.
  • Analyzed survey responses tied to student grade outcomes across lab sections and lectures, splitting fieldwork across a team of 6 to conduct weekly focus groups and large scale surveys, then compiling findings into weekly reports delivered to 5+ faculty members, contributing to measurable improvements in student outcomes and faculty relationships.
  • Presented research on methodology at the Fresno State Exemplary Practices in Higher Education Conference.
Pandas Qualtrics OpenAI Survey Analysis Research Methods

Acme Builders Incorporated

Construction Worker → Accounting Assistant
May 2021 to Dec 2023
Oakland · On site · Part time
  • Built internal data systems in Python with NumPy and Pandas to clean, organize, and standardize records across departments.
  • Updated, organized, and archived company documents to support payroll cycles, budgeting, and reliable business data management.
  • Used OCR workflows to reduce manual document sorting and make scanned account records easier to organize.
Python Pandas NumPy OCR Business Data Accounting Construction

Projects

These are mostly passion projects that I made with friends.

show me

showing everything

All projects, no bucket applied.

Live
Stars

nextsteamgame.com

A semantic Steam recommender for 80,000+ games built around the idea that games should match by what they are, not only by player overlap. The pipeline filters up to 2,000 reviews per game, classifies useful review pools with ModernBERT, extracts identity tags and focus vectors, canonicalizes noisy generated tags, then precomputes candidate relationships so users can rerank recommendations by soundtrack, setting, mechanics, narrative, pacing, and vibe.

Long term PostgreSQL ChromaDB Qdrant ModernBERT FastAPI Docker
Superlinked Wine Recommender
Superlinked
Series A

Superlinked Wine Recommender

A wine recommender developed with the Superlinked team during early access to their SIE engine. It uses document processing, vector embeddings, and small model inference to explain why a result appears, whether the match came from fizz, cherry notes, body, acidity, or other wine attributes.

Long term Superlinked SIE Vector Search OCR Small Models Chroma PostgreSQL
2nd Place

Maldemic Simulator

We built Maldemic to help close the gap between researchers and the public. Disease models can feel locked behind papers and equations, so we turned SIR dynamics and Markov chain mobility into a 3D globe people can watch, question, and reason about. Python computes the stochastic population transitions, then Godot makes the spread visible for public education.

Long term Python NumPy SciPy Godot Markov Chains SIR
Next Chapter
Hackathon

Next Chapter

A hackathon project built to make retirement questions feel less foggy. Users can ask things like "Can I retire in the Philippines?" or "How much should I start saving?" and the system answers with retrieved context and visible data instead of pretending a prompt is a financial plan.

RAG LLMs FinTech Personal Finance AI for Good
Antidote Intelligence
Open Source

Antidote Intelligence

An open source ML security project that treats training data as the place where model risk often starts. The system uses a multi agent analysis pipeline to inspect dataset content, generate hypotheses, and surface examples worth investigating before bad data becomes expensive behavior.

Long term Python OpenAI ML Security Data Quality Agent Pipeline
Dreamville
In Progress

Dreamville

A gamified Canvas LMS tracker that pulls assignments into a game loop, then scores urgency from completion patterns and difficulty signals. The useful part is turning school workflow data into a next action system students can act on without another dashboard yelling at them.

Long term Godot Go Canvas API Regression Workflow Data
Hyper Rosen
Hackathon

Hyper Rosen

A hackathon built Godot experiment in systems that can keep expanding. Swirled Perlin noise places planets, wave function collapse handles city placement, and procedural rules create enemies and asteroids, making the project feel like a small galaxy generated from reusable data rules.

Godot Hackathon Procedural Generation Perlin Noise Wave Function Collapse
Cake Walk
GDC
GDC Jam

Cake Walk

A fast game jam pitch: make a tiny character readable, charming, and playable in a single day. We built and demoed Cake Walk at GDC Festival of Gaming with Keriya Son on 3D, Angie Peczon on art, Eric Taylor on shaders, and Ilce Perez on music.

Godot Game Jam 3D Shaders Team Project
Old Man Climbs
First Project, 2022

Old Man Climbs

A small vertical climber built over a weekend for a UC Merced game jam in 2022. It is here less as a technical flex and more as the first shipped artifact: a reminder that finishing a small loop teaches more than endlessly planning a bigger one.

Godot Game Jam 2022
Quick Autocorrect
Obsidian

Quick Autocorrect

A small community plugin for reducing friction while writing in Obsidian. It catches repeated misspellings, applies quick corrections, and keeps a personal dictionary for words Obsidian should stop fighting you on: a tiny version of the same pattern I like, cleaning a messy text stream into something easier to use.

Long term TypeScript Obsidian Plugin Text UX
NutriFinder

NutriFinder

A small dietary search project with a practical pitch: pull in messy menu and nutrition information, normalize it enough to filter, and give people a cleaner way to decide what they can eat.

React Flask Python Search Filters
Spiral Visualizer

Spiral Visualizer

A compact teaching visualization for spiral growth using queued directions. The pitch is simple: when a system changes step by step, showing the state often teaches faster than another paragraph of explanation.

Python Matplotlib Queues

Blog

Hey hey! Didn't think many people would see this haha. These are basically leftover thoughts from projects I made: recommendation systems, explainable AI, data poisoning, and the parts that did not fit cleanly on a resume.

Comic strip reference for editorial layout
Since I'm feeling newspaper-y here I put a Garfield strip (it's in the public domain).
Case Study May 2026

How I Built a Semantic Recommendation Engine for 80,000 Steam Games

NextSteamGame is a Steam recommendation project built around a simple complaint: most game recommenders know that two games are related, but they rarely explain why. Player-overlap signals are useful, but they flatten intent. Someone may like Persona 5 for the jazz fusion soundtrack and modern Tokyo setting; another person may like it for social simulation and dungeon crawling. Those are different reasons, and a good recommendation system should let users separate them.

Games as vectors

I think games can be represented as weighted profiles: not just genre, but the parts that actually make the game feel like itself.

MusicSettingSystemsNarrativeVibe

Persona 5 Royal: jazz fusion, modern urban fantasy, social simulation, dungeon crawling, stylish UI.

Micro-tags normal genres miss
confidant systempersona fusion systemday-night cyclestylish art directionmodern Tokyo settingoppression and rebellionsocial linkdungeon explorationjazz fusion soundtrackcharacter driven narrative

The problem

Most recommendation systems lean on the pattern "players who liked X also liked Y." That works well for popular games, but it struggles with niche tastes and gives weak explanations. I wanted a system that could represent a game as a shape: soundtrack, setting, systems, narrative, vibe, and the small micro-tags that genre labels leave behind.

The pipeline

  • Collect Steam metadata, appids, genres, tags, descriptions, release data, and storefront artwork.
  • Pull up to 2,000 reviews per game, then remove spam and low-signal reviews with regex filters, word diversity scoring, quality heuristics, and descriptive phrase detection.
  • Classify useful reviews with ModernBERT into pools for gameplay, art, soundtrack, systems depth, narrative, and general description.
  • Generate semantic identity data: focus vectors, mechanics, narrative, vibe, structure loop, signature tags, niche anchors, music tags, and micro-tags.
  • Canonicalize noisy generated tags with heuristics, fuzzy matching, embedding similarity, and vector search so tags like fast action, quick action, and high-speed combat can be grouped without losing useful distinctions.
  • Precompute candidate relationships offline, then let the live FastAPI and React app apply user-controlled reranking at runtime.

Why the architecture is cool

The key design choice is splitting expensive semantic work from cheap interactive reranking. Computing every similarity at runtime would be wasteful, so candidate relationships are built offline. When a user searches, the app retrieves candidates, applies the user's weights, and reranks recommendations based on the profile dimensions they care about.

From review to recommendation

A raw review like "the combat is fast, the soundtrack goes hard, and the boss fights feel like rhythm puzzles" becomes structured signals: fast combat, high-energy soundtrack, boss-focused structure, rhythm-like timing, and mechanical precision. Those signals can then be weighted independently by the user.

What it demonstrates

  • 80,000+ Steam games indexed
  • Up to 2,000 reviews analyzed per game
  • Semantic vectors, identity tags, and canonicalized genre/tag relationships
  • 30,000+ users and discovery across 8,000+ unique games
  • A retrieval design cheap enough to run on constrained cloud infrastructure

What I learned

Review text is noisy, so filtering before embeddings matters. LLM-generated tags are useful, but raw generated tags need canonicalization. Most importantly, recommendations feel better when users can inspect and control the reason behind a match instead of accepting a mystery list.

Recommendation Systems Vector Search ModernBERT FastAPI Semantic Retrieval Steam
Friends + Games GDC Game Jam

Cake Walk: A One-Day Game Jam at GDC

Cake Walk at GDC
Cake Walk started as a tiny joke and became a playable floor demo by the end of the day.

Cake Walk was a one-day game jam project I made at GDC with friends. The whole thing was intentionally small: make a little cake cross the street, make it readable, make it charming, and ship something people could actually try.

The shape of the day

Everyone had a lane. We split up character work, art, shaders, music, and gameplay, then kept cutting scope until the core loop was visible. That is the best part of game jams: you cannot hide behind architecture for too long. Either the thing plays or it does not.

Cake Walk group photo
The real artifact was less the game and more the tiny production pipeline we built under pressure.

Why it matters

I keep these projects on the site because they show a different kind of engineering. Hackathons are messy, but they force prioritization, communication, and taste. You learn how much polish can come from a few good decisions when the team is moving fast.

GDC Game Jam Godot Team Project Friends
Friends + Games Hackathon Notes

Hyper Rosen: A Tiny Galaxy From a Hackathon Weekend

Hyper Rosen hackathon photo
Hyper Rosen was one of those weekend builds where the idea was bigger than the time limit, which is kind of the whole point.
The same silent gameplay capture from the project card, dropped into the newsletter so the build feels alive instead of only described.

Hyper Rosen was a hackathon game I made with friends. The pitch was simple and probably too ambitious: build a procedural space game where planets, cities, enemies, and asteroid fields come from generation rules instead of hand placement.

What we tried

The fun part was treating the game like a small systems experiment. Swirled noise placed planets, procedural rules filled out the galaxy, and wave-function-collapse-style logic helped with city layout. It was not polished in the normal product sense, but it had that good hackathon feeling where every hour made the world a little more alive.

Why I still like it

I like projects like this because they make constraints obvious. You learn what actually matters when the deadline is close: readable movement, a loop people can understand, and enough visual feedback that the system feels real even if half of it is held together by deadline energy.

One day

Long term, I still want to make a full Mario Galaxy-style procedural game from this idea: tiny planets, playful gravity, generated worlds, and a sense that the level is wrapping around you. That is probably a post-college version of the project, though. The kind you build when breakfast is no longer mostly oats and coffee.

Hackathon Godot Procedural Generation Friends Game Dev
Update October 2025

Turns Out We Weren't Crazy About Data Poisoning

In December 2024, I built Antidote Intelligence around a simple belief: training data is infrastructure, and poisoned examples can become model behavior if nobody inspects the dataset early enough.

Anthropic, the UK AI Security Institute, and The Alan Turing Institute later published a large-scale poisoning study that makes that concern feel a lot less speculative. Their result: in their experimental setup, as few as 250 malicious documents were enough to introduce a backdoor across models from 600M to 13B parameters.

250 docsenough to backdoor tested models in Anthropic's denial-of-service setup

Why this reinforced the project

A lot of people think poisoning only matters if an attacker controls a meaningful percentage of the training set. The Anthropic result challenges that. Their finding suggests the absolute number of poisoned documents can matter more than the percentage of the corpus, at least for the narrow backdoor they tested.

What Antidote was trying to do

Antidote was not trying to solve all model security. It was a dataset inspection tool: look at examples before they become model behavior, generate hypotheses about suspicious content, and make data quality visible enough for a human to investigate.

The larger lesson

This is the same theme as my recommender and ABSA work: AI systems can be powerful without becoming completely opaque. If model behavior depends on messy upstream data, then inspection, provenance, and explainable intermediate artifacts are not extras. They are part of the system.

Read Anthropic's research post.

Data Poisoning ML Security Dataset Inspection AI Safety Antidote Intelligence
Research Note October 2025

Building Explainable ABSA Without Hiding the Reasoning

AeVAA is a research project about a question I keep coming back to: machine learning and AI are powerful, but can we build systems where the important reasoning stays inspectable?

Aspect-based sentiment analysis usually tries to predict whether a sentence is positive, negative, or neutral toward a target. That is useful, but it often hides the path from text to judgment. AeVAA takes a different route: split the problem into modules, keep intermediate artifacts, and use survey-derived formulas to explain how sentiment moves between entities.

Σ(x)k = σ(sk, ik, rk)Sentiment as a function of local score, interaction, and relation context.

The core idea

Instead of asking one model for one answer, AeVAA builds a trace. It extracts clauses, resolves entities, identifies relationships, constructs a graph, and then calculates valence-aware sentiment over that graph. The model can still use black-box components, but the system around them exposes what each component contributed.

Why this matters

Document-level sentiment can miss the point. In a sentence like "the person was bad, but the child was good," the total sentiment is not enough. The meaningful question is who the sentiment is aimed at and why it changed. That becomes even more important for media bias, long-form narrative, and texts where framing matters.

What we built

  • A modular pipeline for constituency clause extraction, entity/coreference resolution, relation and modifier extraction, graph construction, and sentiment aggregation.
  • A human annotation study with 36 participants and 3,900+ sentiment judgments across action, association, ownership, and temporal aggregation cases.
  • Survey-fitted formulas for action, target, association, ownership, and aggregate sentiment dynamics.
  • Explanatory traces that show where errors came from instead of only reporting a final label.

Results

The fitted formulas explained roughly half of the variance in pilot sentiment judgments. On SemEval 2014, AeVAA reached 78.58% restaurant accuracy and 68.52% laptop accuracy. It did not beat state-of-the-art DeBERTa systems, but that was not the point of the prototype. The point was to show that a modular, inspectable ABSA system can produce plausible results and make debugging easier.

The bigger theme

I like projects that score well without becoming total black boxes. The goal is not to reject ML; it is to use ML where it helps, then design the surrounding system so people can inspect the evidence, the intermediate state, and the reason a result appeared.

Explainable AI ABSA NLP Human Annotation SemEval Research
Earlier Post Feb 2025

Maldemic: A Pandemic Model You Can Watch

Maldemic is a stochastic disease spread simulator that turns SIR equations and city mobility into a live 3D globe. I like projects where the math becomes something you can inspect with your eyes.

Data flow

Python computes population movement with a Markov matrix, updates local susceptible/infected/recovered states, then passes the evolving state into a Godot visualization.

Technical shape

  • Markov chain mobility between cities
  • SIR disease dynamics for local spread
  • Population cleanup to keep totals consistent
  • 3D globe rendering for real time visual feedback

The project won 2nd place at BLOOM Hackathon and received grant support for neural network forecasting work.

NumPy SciPy Godot 3D Markov Chains SIR Model Simulation
Latest Post December 2024

Data Poisoning Detection at Continue DX

Continue DX presentation header
Continue DX demo: inspecting training data before it becomes model behavior.

I demoed Antidote Intelligence at Continue DX, showing a content aware data poisoning detection system for ML training datasets. The basic pitch: before we argue about the model, let's look harder at the data we fed it.

Why this matters

Training data quality is one of those problems that hides until it becomes expensive. Bad examples, poisoned content, or subtle distribution weirdness can leak into model behavior long before anyone notices.

What I built

  • Multi agent review pipeline for suspicious dataset content
  • Hypothesis generation and validation around poisoned examples
  • Content aware checks instead of only metadata based filtering
  • Reports aimed at making data issues inspectable, not magical

I am interested in this space because it treats data quality as infrastructure. The model gets the attention, but the dataset is where a lot of the story starts.

ML Security Data Quality Training Data Agent Pipeline
Journal Entry August 2024

USF, Sentiment, and Moving Into the City

View from my USF dorm
The view from my dorm at USF. This was the point where school started feeling connected to the city instead of separate from it.

This one is more of a journal entry than a project breakdown.

I transferred to USF in 2024 after UC Merced because I wanted to be in San Francisco. Merced felt too far away from the people and companies I wanted to learn from. I wanted to make a name for myself, be around real builders, and learn from people actually working in tech.

Before I even got to USF, I applied to more than 100 jobs. That search eventually turned into a Data Analyst / Web Intern role, where I worked on enrollment analytics, SLATE data, admissions event cleanup, and prospective-student web updates. It was not glamorous, but it taught me something important: useful software usually starts as messy data, weird processes, and people who need better tools.

Where sentiment came in

Later, at the MAGIC Lab, I worked on AeVAA, an explainable sentiment project. The technical version is about aspect-based sentiment analysis, graphs, coreference, relation extraction, and survey-derived formulas. The personal version is simpler: I was trying to understand how a system could make a judgment without hiding the reasons.

That thread shows up in a lot of my projects. Recommenders should explain why a game matches. Sentiment systems should explain who a sentence is about and why the score moved. Data poisoning tools should show what suspicious examples look like before they become model behavior.

USF campus
USF became the place where those ideas started turning into real projects instead of just things I was reading about.

The pattern

I like AI systems, but I do not like when the answer is the only artifact. The projects I keep coming back to are the ones where the intermediate state matters: vectors, tags, traces, formulas, records, examples, and the evidence behind a prediction.

USF was where that started to become a theme instead of a coincidence.

USF Journal Sentiment Analysis Research San Francisco
Journal Entry August 2023

UC Merced Game Dev Club and My First Data Internship

UC Merced Game Dev Club event
UC Merced Game Dev Club, back when I was trying to get more students to actually start making games.

In 2023, I was the secretary of the UC Merced Game Dev Club. A lot of the work was not glamorous: planning, messaging people, getting rooms, keeping events moving, and making sure students felt like they could show up even if they had never shipped anything before.

We hosted a successful showcase where people brought in games they had been building, talked through what worked, and got to see other students care about the same weird problems: controls, art, music, level design, scope, and how to make a tiny idea feel playable.

Game jams and mixers

We also hosted a game jam that produced some genuinely cool student projects. The best part was watching people form teams quickly and make something real under a deadline. I also helped host a mixer for students who wanted to get started with game development but did not know who to work with yet.

That year also overlapped with my first day as a Data Analyst Intern at SATAL UC Merced. I was using survey data and student feedback to improve enrollment support, course experiences, and faculty decision making. Looking back, both roles were about the same thing: turning scattered student energy into something organized enough that people could act on it.

SATAL Fresno State presentation poster
We ended up presenting this SATAL work at Fresno State, showing how student feedback could become faculty-facing evidence instead of disappearing into end-of-term forms.

That presentation mattered to me because it made the internship feel real. We were not just cleaning survey data for a class assignment; we were turning student perspectives into something instructors could discuss, revise around, and bring back into their courses.

UC Merced Game Dev Club Game Jam SATAL Journal
Journal Entry Summer 2022

ACME Builders: Construction, Payroll Scans, and Starting College

ACME Builders building
ACME Builders, around the time I was just starting college and trying to find any useful way to spend the summer.

In 2022, I had just started college and I could not get an internship yet. I was still early, still figuring out what counted as experience, and honestly just trying to keep moving instead of waiting around for someone to hand me a clean first opportunity.

So I worked with ACME Builders. Some of it was construction work. Some of it was office work: scanning accounting documents, payroll records, and old archives so they were easier to store and reference. It was not software engineering, but it gave me a closer look at the kind of messy operational work that every business quietly depends on.

Why I still count it

Looking back, this was one of the first places I saw how much value exists in boring process cleanup. Paper records, payroll files, old folders, and construction logistics are not glamorous, but someone still has to make them usable. That lesson shows up later in my data work: useful systems often start by organizing the unorganized.

At the time, it was mostly a way to pass the summer and stay busy. But it also taught me that not every important experience looks like a polished internship. Sometimes the early work is just learning how real businesses keep track of things.

ACME Builders Construction Accounting Payroll Archives Journal