Thomas Pequegnot
Data Science & Computer Science Student @ NYU
📍 New York, NY | 📞 914-548-3920 | ✉️ tp2191@nyu.edu
đź”— LinkedIn
🎓 Education
New York University
Bachelor of Arts in Data Science and Computer Science
Minor: Mathematics
Expected Graduation: May 2026 Relevant Coursework: Machine Learning, Data Structures and Algorithms, Natural Language Processing, Causal Inference, Parallel Computing, Computer Systems, Deep Learning, Mathematics of Finance
đź’Ľ Experience
Nomura Securities — Software Developer Intern
Jun 2024 – Aug 2025
- Designed a clustering-based reflection strategy within data lakehouse architecture, improving query execution times by 200%.
- Implemented unsupervised learning algorithms such as DBSCAN, KMeans, and Agglomerative clustering to group SQL query metadata and generate 7 optimized Apache Iceberg reflections, improving query performance and resource efficiency.
- Built an automated pipeline with a dedicated stage for difference-in-differences (DID) testing using Dremio’s workload tool.
- Deployed containerized machine learning pipelines using Kubernetes and Helm charts, enabling scalable orchestration.
Biokind Analytics — Director of Data Science
Feb 2024 – Present
- Led a team of 5 data scientists in a high-impact project for Alliance NYC, developing scalable ETL workflows to support multi-level automation, streamlining data ingestion and transformation across nested program layers for consistent downstream analytics.
- Built a data-driven report for Damon Runyon Cancer Research Foundation to assess the effectiveness of Damon Runyon’s cancer research funding, using SCOPUS API and Pandas.
- Developed over 10 data visualizations using Seaborn to illustrate the demographic breakdown of Damon Runyon’s applicants.
NYU Motorsports — Lead Operations Product Manager
Sep 2023 – May 2025
- Executed A/B testing to evaluate the effectiveness of various marketing strategies, achieving a historic $50,000+ in fundraising.
- Personally secured over 30 partnerships with firms including Perplexity, Monster Energy, and Sigmatex totaling over $30,000.
- Designed over 10 dashboards with Power BI, providing valuable insights into team demographics and budget allocation.
GBCS Group — Research Intern
May 2023 – Aug 2023
- Evaluated industry trends, best practices, and cutting-edge technologies to inform data-driven fleet management strategies.
- Researched sources of GHG emissions in aviation and land sectors across entire life cycles, covering over 200 diverse assets.
- Conducted thorough market research and analysis on over 50 prospective GSE assets and wrote detailed reports for each asset.
đź§Ş Projects
Real-Time Market Data Forecasting
Dec 2024 – Jan 2025
- Designed an object-oriented Market Predictor class, encapsulating the feature engineering , model training, and prediction to facilitate efficient and reusable machine learning pipelines analyzing over 4 million data entries.
- Employed a LightGBM regressor, incorporating feature scaling and feature importance analysis, with optimized hyperparameters using RandomizedSearchCV, resulting in a 70% improvement in model accuracy over a baseline model.
- Implemented techniques such as moving averages, lag feature creation, and handling missing values to enhance performance.
- Designed and implemented a visual analysis framework, including plots for correlation matrices, feature group sizes, feature correlations with the target, and feature importance, enabling intuitive exploration and understanding of feature relationships.
Professor Ratings Analysis
Nov 2024 – Dec 2024
- Performed various statistical tests including t-tests, Mann-Whitney U tests, and Levene’s tests to analyze gender bias, teaching experience, and the impact of online classes on professor ratings.
- Built a logistic regression model to predict the likelihood of a professor receiving a ”pepper,” while addressing class imbalances through techniques such as under sampling and resampling. Evaluated model performance using metrics like AUC and ROC.
- Developed a Ridge regression model to predict average professor rating, achieving R2 of 0.8 and a RMSE of 0.372.
FIFA Player Performance Predictions
Apr 2023
- Strategically engineered relevant features to optimize model input, significantly boosting predictive accuracy for player ranking.
- Applied KMeans clustering to categorize players into 5 distinct subgroups based on attacking and defending performance metrics.
- Developed a K-Nearest Neighbors model to predict players’ preferred foot based on their playstyle, achieving 72% accuracy.
- Visualized clustering and model outcomes to identify key patterns in player performance, highlighting critical distinctions.
🛠️ Skills
Languages: Python, Java, SQL, C, C++
Tools: Git, AWS, PyTorch, Linux, CUDA, MPI, OpenMP, Jira, Confluence, Excel
Frameworks: Apache Spark, Iceberg, Arrow
Last updated: August 2025