Sr Machine Learning Engineer - Reddit (June 2021 - October 2022)
I worked on Reddit's User Understanding team, whose main task was to features for use in models, primarily recommendations. I created specific features and established patterns for aggregating content features to users and creating user embeddings. This work focused on both batch and streaming pipelines.
User embeddings Built user embeddings using Collaborative Filtering and user history. Designed pipeline to resolve cold start problem. Proved predictive value in recommendation models.EmbeddingsStreamingWorkflow ManagerCollaborative FilteringSVD
User interests Aggregates content labels to user level. Project included filtering NSFW, grouping labels, and decaying. I designed and implemented. Batch feature was built with Airflow scheduler calling BigQuery scripts. Streaming feature was built with Flink. Further built User-to-Subreddit mapping using Annoy approximate nearest neighbors.StreamingNearest NeighborSQL / DatabaseClusteringWorkflow managerDesign / Arch
Some substantial projects may be excluded due to proprietary information.