## T.J. Gaffney

Here is my extended resume and some projects that I've worked on. Email me any questions, at gaffney.tj@gmail.com.

## Work Experience

##### Sr Machine Learning Engineer - Reddit (June 2021 - October 2022)

I worked on Reddit's User Understanding team, whose main task was to features for use in models, primarily recommendations. I created specific features and established patterns for aggregating content features to users and creating user embeddings. This work focused on both batch and streaming pipelines.

**User embeddings**Built user embeddings using Collaborative Filtering and user history. Designed pipeline to resolve cold start problem. Proved predictive value in recommendation models.EmbeddingsStreamingWorkflow ManagerCollaborative FilteringSVD

**User interests**Aggregates content labels to user level. Project included filtering NSFW, grouping labels, and decaying. I designed and implemented. Batch feature was built with Airflow scheduler calling BigQuery scripts. Streaming feature was built with Flink. Further built User-to-Subreddit mapping using Annoy approximate nearest neighbors.StreamingNearest NeighborSQL / DatabaseClusteringWorkflow managerDesign / Arch

**Subreddit depth**Designed and implemented bespoke Markov chain approach to compute average time-to-discover for subreddits. Optimized and parallelized expensive matrix computation for >99% speed up.Multitasking (CS)Markov ChainMatricesPageRank

**Brand safety analysis**Built analytics dashboard. Helped change serving pattern to increase ad slots ~8%.ProgrammingTableau / Mode

**User covariates**Covariates are user variables that we control for when analyzing impact of A/B tests. I identified impactful covariates and wrote script to compute these.A/B TestingSQL / Database

Some substantial projects may be excluded due to proprietary information.

##### Software Engineer - Google (May 2018 - June 2021)

I work for a team called FameBit on YouTube, which facilitates organic ads in YouTube videos. More specifically I'm in a group that deals with matching creators to brands.

**Audience sentiment aggregation**For various back-end projects, I've worked with audience sentiment data to aggregate to a channel-level.Multitasking (CS)Big data

**YouTube channel recommendation**Designed and implemented model to find best YouTube channels for a client, given URL and keywords for brand/campaign.Transfer LearningNearest NeighborModelingDesign / ArchNetworking (CS)ProgrammingEmbeddings

**Video view predictor**Utilized monte carlo simulations in a patent pending application to predict views for a pool of channels.RegressionMonte CarloModelingNetworking (CS)StatsProgramming

**Video review pipeline**Developed a UI pipeline to facilitate and automate video reviews.Design / ArchNetworking (CS)Programming

**Payments database**Implemented server for CRUD operations (and export/reverse) on a payment table.SQL / DatabaseMultitasking (CS)Networking (CS)Programming

**Contract processing**I did some side work on a project which attempted to automatically process contracts. For my part, I scraped the FCC EDGAR database to find contracts to be used by human and machine labelers.Data scrapingBig data

Some substantial projects may be excluded due to proprietary information.

##### Gaming Consultant - The Innovation Group (Apr 2018 - May 2018)

For a brief period, I consulted with The Innovation Group. During that time I worked on:

**Oceans marketing**Leading up to Oceans Casino's relaunch, I prepared some research on the market segments, and designed their loyalty program.Marketing

**Sports betting market sizing**I conducted surveys and matched to Census data to model demand. I used this in a gravity model to estimate market size of sports betting in states that were considering legalizing.ModelingModelingData scrapingRegression

Some substantial projects may be excluded due to proprietary information. Link

##### Manager, Marketing Analytics - Pinnacle Entertainment (Apr 2016 - Apr 2018)

I was a manager in a group that analyzed about $500M of marketing budget; as manager, I drove the direction/workflow. We touched many branches of marketing, including direct mail, host program, events/promotions, loyalty program, and advertising. I worked on a wide range of projects, including: Ad hocs; building reports; A/B split testing of DM campaigns; goal-setting for casino hosts; market segmentation; advertising impacts; and test-analyzing survey results.

Here are some project highlights with limited details:

**Ad hocs analyses and reporting tools**Analyses on effectiveness of digital direct mail, Asian play trends, cross-property marketing, and others.SQL / DatabaseTableau / ModeMarketing

**Data work**Aggregated player data and joined with the marketing data from many sources and systems.SQL / DatabaseProgramming

**A/B testing of DM campaigns**Advised on statistical methodology and developed software to quickly run A/B testing and reporting, resulting in an 80% reduction in process time. Reporting included visualizations for drill down decision-making. Created results repository for long-term trend analysis.A/B TestingMarketingStatsProgramming

**Casino host target model**Modeled expected hosted players' play. Significant improvement over existing methodology.ModelingTime SeriesRegression

**Marketing segmentation**Designed and aligned market segments from 15 different casinos.SQL / DatabaseMarketing

**Event post forma dashboard**Built a Tableau dashboard to show event KPIs versus benchmarks. Spearheaded initive and achieved wide roll-out to about 60 users at 15 casinos with 1000s of uses per month, becoming most-used workbook in the company.SQL / DatabaseTableau / ModeNetworking (CS)

**Survey analysis**Performed sentiment analysis on year-end survey results, and communicated results to the company.NLPLDATableau / ModeMarketing

**License bids**Conducted a live game theory experiment to help decide how to bid for gaming licenses.Game Theory

Some substantial projects may be excluded due to proprietary information. Link

##### Actuary, Commercial Lines Analytics - Auto-Owners Insurance (Sep 2014 - Apr 2016)

My team built the models for Auto Owners' commercial line products, including TTP, commercial auto, workers comp, and others. My work was divided about equally into three tasks: Data work, modeling, and research. Data work was SQL work to pull data for our models, and the models were large general linear models. Some specific projects I worked on include:

**Fraud model**Used an SVM on text to predict fraud from claim notes. This allowed us to automate the work of 15 FTEs.ModelingNLPSVM

**Model packet automations**Reversed-engineered pre-packaged GLM software, allowing us to automatically produce modeling packets. This reduced a day-long project to minutes.ModelingGame TheoryProgramming

**Dimensionality reduction**Researched and advised analysts in the company on dimension reduction in auto and credit datasets. We looked into PCA, partial least squares, and lasso regressions.PCAStats

**Lifetime value model**Built a customer lifetime value model of our commerical policy data.ModelingSQL / DatabaseRegression

**Commercial auto model**Revamped our decade-old commercial auto model, combining two previous models.ModelingSQL / DatabaseRegression

**Presentations**I presented research on Shapley values and family errorwise rates that influenced our modeling techniques broadly.Game Theory

Some substantial projects may be excluded due to proprietary information. Link

##### Stats Lecturer - Davenport University (Jun 2015 - Aug 2015)

I thought Intro to Stats at nights one summer while I was an actuary. That semester I redesigned the term project.

Some substantial projects may be excluded due to proprietary information. Link

##### Underwriting Actuary - Qualchoice Insurance (Feb 2014 - Sep 2014)

I was the company's only underwriting actuary. My main job was to create and maintain software to renew group insurance policies. Additionally I worked on a number of small and ad hoc projects, including:

**Obamacare updates**Researched to understand how new legislation impacted the way that we priced policies.

**ICD-9 to ICD-10 migration**Wrote a web scraper to get a ICD-9 to ICD-10 crosswalk.SQL / DatabaseData scraping

Some substantial projects may be excluded due to proprietary information. Link

##### Teaching Assistant - Michigan State University (Aug 2011 - Aug 2013)

While in grad school, I thought a dozen classes over six semesters. These classes included algebra, math for education majors, and calculus 2 and 3. As teacher, I taught classes; met with students; wrote and graded tests; and reported grades. In my first year, I won a reward from the department for teaching.

Some substantial projects may be excluded due to proprietary information.

## Education

##### Masters, Mathematics - Michigan State University (Fall 2011 - Summer 2013)

**GPA: 3.83**

Passed qualifying exams on geometry/topology, algebra, and analysis.

##### Bachelors, Mathematics - University of Nevada (Fall 2007 - Spring 2011)

**GPA: 3.83**

Minored in computer science. Graduated magna cum laude.

## Open Source Contributions

##### Axelrod Python Library (2017-2020)

The Axelrod library in Python is a research tool for the Iterated Prisoner's Dilemma. I've contributed in a few ways including:

**Strategy refactoring**Refactored old Fortran strategies to build up the repository of strategies.Game TheoryProgramming

**Hidden markov model**Implemented a hidden Markov model framework for strategies, including an optimizer using a evolutionary algorithms and a partical swarm optimizer.Game TheoryGenetic AlgorithmProgramming

**Memory-length algorithm**Created an algorithm to efficiently calculate memory-length of strategies represented as finite state machines.Game TheoryAlgorithmProgramming

##### Few Shot Text Classification (2020)

I provided some minor clean-ups/tests for a library that demonstrates zero-shot and few-shot classification of documents.

ProgrammingNLP## Research Papers

##### Memory of FSM strategies for iterated prisoner's dilemma (2019)

##### Reviving, reproducing and revisiting Axelrod's second tournament

##### Khovanov Homology of Symmetric Unions of 2-Bridge Knots (2011)

## Personal Projects

##### Project Titan Model Orchestrator (2022-)

Project Titan is a set of software I wrote to make predictions for sports. This is a large project with web scraping, modeling, and architecture components. See linked doc for way more details.

ProgrammingMarkov ChainWorkflow ManagerRegressionDesign / ArchPageRankStatsData scrapingAgile developmentModelingSQL / Database##### Go Space (2021-)

Go Space is a program / model that attempts to embed similar Tseumego close to each other. The ultimate goal is to use this as a study tool: As the user solves problems, we can explore / exploit to build a map for that user to identify which parts of the go space the user has difficulty. We can then serve problems from difficult regions until the user gets better.

Today, the UI has not yet been built, but a V1 model has been built, and there is some evidence that it works well.

CNNProgrammingNearest NeighborEmbeddings##### StacksByStacks (2020)

This is a now-defunct website I’ve made to track predictions made for NHL games, by experts on the internet.

The front-end is built with Angular, which uses a PHP handler to access the MySQL database, which gets populated with a library of Python scrapers.

Note: The linked codebase is not very well documented. This was initially intended to be private.

Data scrapingAgile developmentPHPSQL / DatabaseDesign / ArchNetworking (CS)BayesFront-end##### Cell-Link Table (2019)

For a project, I wanted to arrange data in a table-like, object, but with the requirement: Cells could depend on other cells, with the dependency being any function (to be provided). When I update the cells, I want the children to update as well. As well the data is dynamically saved and loaded, so that the entire table isn't held in memory at once. The link contains a description of the project and design decisions I made; it contains a link to the code on GitHub.

AlgorithmDesign / ArchProgramming##### r/Borrow (2016)

There's a subreddit where people ask for high-risk, high-yield, short-term loans, and various lenders fulfill these loans. My friend and I tried to make money doing this, by underwriting users based on their borrowing and user history. We scraped a ton of data from the site, modeled on this, and set up a service to notify us of good risk to lend to.

ModelingData scrapingRandom ForestProgramming##### Santa Kaggle Project (2015)

A friend and I signed up to do a Kaggle project, about Santa's sleigh. The problem was to find Santa's shortest path, while considering a weight restriction; essentially a travelling salesperson problem on top of a knapsack problem. We approached the problem with a modified k-means and a hybrid of standard TSP algorithms.

ModelingClusteringTSPGenetic AlgorithmProgramming##### March Madness (2013-2019)

Over the years, I've used March Madness as a playground for some of my ideas on how to model two teams who have never played each other but have shared opponents or opponents of opponents.

ModelingData scrapingMonte CarloGenetic AlgorithmQuadratic ProgrammingProgrammingMarkov Chain##### Modern Portfolio Theory with Costs (2013)

This was a small project that I worked on following a financial stats class I took in grad school. I attempted to account for transaction costs in a modern portfolio theory implementation.

ModelingQuadratic ProgrammingTime SeriesStats##### Amazon scraper (2012)

I had a friend in grad school who used to look for collectibles on eBay to buy and resell on Amazon at a higher price. Since fewer people shopped on eBay, sometimes there we price differences. Inspired by this, I got a list of movie UPCs and wrote a script that searched these on both sites, and notified me of descrepancies. For a while I made some money, but this was lost to a couple of purchases that weren't as advertised. I abandonned the pursuit.

Data scraping##### Snakes on a Projective Plane (2010)

As an undergrad, I made a snake game for Windows and Linux where the snake is on a real projective plane. This was meant to be a little educational, but mostly for fun.

ProgrammingMath## Presentations, Blogs, and Writing

##### Linear Algebra for Those Who Know Linear Algebra (2021-)

A linear algebra book I’m working on, and will post online.

The book employs a novel approach to teaching linear algebra by centering on singular value decomposition (SVD). This vantage point provides a deeper understanding of algorithms, like PCA, quadratic programming, and regressions, and allows the reader to develop their intuition for how to modify and apply these algorithms in real world applications.

The book interweaves theory and application, beginning with SVD, from which PCA and collaborative filtering immediately follow. Regressions can be understood via projection matrices (matrices with singular values all equal to 1) or via pseudo-inverses (inverses on the non-zero singular values). Quadratic programming can be understood via positive definite matrices (matrices with an SVD having inverse isometries and all positive singular values). Recognizing that most applied linear algebra follows from a solid understanding of SVD makes it easier to understand these hard topics.

After establishing a solid theoretical basis, the book dives into application. It discusses which model to use when, how to interpret model outputs, and what modeling choices are available for each algorithm. It grounds this discussion in concrete applications, like the Netflix Prize for collaborative filtering. Further, the book provides real-world exercises, such as, “You and a coworker are building an insurance model on credit data. The model will be a logistic regression to predict the probability of an insurance claim. You have historical claim data. You think you should use only PLS, and your coworker thinks you should only use PCA. How can you decide with data which model is better?”

MathPCASVDMatricesCollaborative FilteringQuadratic Programming##### Probability is in the Eye of the Beholder… Probably (2022)

We can talk about probability with a shared understanding. But what does it mean to say the odds of rolling a 4 is 1/6? The exact meaning is a little difficult to say. In this blog, I talk about the formal interpretation of probability, why it matters, and why interpreting is difficult.

MathStats##### Half Derivatives: An Operator Theory Prospective (2020)

##### Fun With Optimization (2018)

Presented at CMC3's 22nd Annual Recreational Mathematics Conference.

Link##### Abelian Categories (2012)

Presented as part of MSU's Graduate Student Colloqium Series.

Link##### Higher-Dimensional Pascal Simplices

Several times, I’ve given a general interest talk on higher-dimensional extensions to the Pascal triangle. There are no slides, but I can give this talk at any time if you’re interested.

Math##### Khovanov Homology of Selected Families of Symmetric Unions (2011)

Presentation of undergrad thesis.

Link## Miscellaneous

##### Actuarial Exams

I quickly passed six exams in about two years. I was just an exam short of an ASA when I changed careers.

##### College Awards and Activities

As an undergraduate, I won first place at UNR in each of: The Putnam exam, the Intermountain Mathematics competition, and the university’s Association for Computing Machinery programming competition. My team of three won the designation of meritorious winner in the international COMAP Mathematics Competition in Modeling. In grad school I won an award for teaching.

I participated in a number of clubs, as well as founding UNR's math club and go club.