K. Merrick Olivier
As a passionate data scientist, I focus on building and optimizing machine learning models to gain valuable insights from raw data. Recent projects have involved building neural networks for image classification and natural language processing, as well as constructing various stacked and blended models.
Created by engineers from tech company Yandex, Practicum is a 30-46 week, online coding bootcamp offering tracks in Data Science, Data Analysis, and Web Development and aimed at sparking career change.
Prudential Associates is a cybersecurity and digital forensics company, offering a wide range of services within the top-level categories of cybersecurity, digital forensics, investigations, and network security.
The University of Aberdeen is one of Scotland’s ancient universities (founded in 1495), and is ranked 158th in the World University Rankings 2022.
The AU Philosphy Society aims to facilitate discussion and thought in an informal setting. They host weekly discussion nights on a variety of philosophical topics.
Nexus Student Association is a study association founded in 2009, dedicated to students studying the International and European Law LLB programme at the University of Groningen.
2021-2021 Machine Learning / Data Scientist Professional Certification | ||
2021-2021 Tutor Academy Master Course | ||
2013-2014 Masters in Philosophy (Epistemology)Extracurricular Activities
| ||
2008-2012 Bachelors in Philosophy |
The aim of this project was to develop a program that can provide meter readings from photos of analog meters. I was provided with 1244 meter images with readings and accomanying masks of the reading portion of each meter. In order to extract readings, I first built and trained a UNet model to supply masks outlining the area of each meter with the meter reading. I then cropped the images using the mask and manually segmented 200+ meter readings; in doing so, I created boundinb boxes around each digit and appropriately labeled them. Next, I built and trained a Faster RCNN model to detect digits in images. While the predictions from this model did not provide the detected digits in order, I was able to organize them based on the x-coordinates of each digit’s bounding box. The Faster RCNN model had a mAP score of around 90, which was better than expected for training on around 200 images. Finally, I created a meter-reading app and dockerized it along with its dependencies.
Designed and implemented a custom stacking model, using gradient boosting base models, to predict customer churn with a goal of an AUC-ROC score of at least .88. Final score of stacked model was .963.
Created a notebook to demonstrate to colleagues how model stacking and hyperparameter tuning using Bayesian optimization is performed.
Created a notebook to demonstrate to colleagues how model blending is performed. Notebook includes exploratory data analysis, missing value imputation, and custom functions to create and test blended model.
Created a custom ResNet50 deep learning model using Keras to preduct the age of customers from images. Employed an image data generator to periodically load batches of images.
Trained and tested various models using NLTK, spaCy, TF-IDF, and BERT to determine whether IMDB movie reviews were positive or negative.
Conducted in-depth time-series analysis witht he goal of developing a regression model to predict hourly taxi orders. This involved stationarization using differencing, feature engineering, and developing/testing several regression models.
Carried out extensive data preprocessing and exploratory data analysis to uncover trends in worldwide game sales that may be used to create lucrative targeted advertisements.
Trained and evaluated several regression models to predict the amount of gold recovered via a complex extraction process from raw gold ore. This required extensive exploratory data analysis and data preprocessing, which included imputing nulls using the KNN method.
Published (as single author in a top-rated philosophy journal) an in-depth research paper on acquiring inferential knowledge from non-knowledge. This possibility directly challenges the orthodox view that inferential knowledge can only be acquired from known premises.
Developed and tested various machine learning classification models to predict which plan a callular service user should be placed on.
Project involved grouping similar customers using unsupervised learning, predicting whether a new customer is likely to receive an insurance benefit, predicting the number of insurance benefits a new customer is likely to receive, and masking clients' personal data using matrix transformation.
The goal of this project was to build a model for predicting the volume of oil well reserves in various regions and, on the basis of the results, determine which region offers the highest potential for profit.
Trained and tested various gradient-boosting algorithms in developing a model that predicts, with a low RMSE score, vehicles' values. Dataset required substantial preprocessing.
This certification demonstrates proficiency in workign with Power Bi, which includes designing and building scalable data models, cleaning and transforming data, and enabling advanced analytic capabilities that provide meaningful business value.
This certification demonstrates proficiency in developing and maintaining Power Apps applications and related components.
This certification demonstrates foundational knowledge of core data concepts and how they’re implemented using Azure data services.
This certification demonstrates knowledge of Microsoft Security, compliance, and identity (SCI) solutions.
This certification demonstrates foundational knowledge of machine learning and AI concepts, along with related Azure services.
This certification demonstrates proficiency in the business value and product capabilities of the Microsoft Power Platform.
Recommendation letter from the Practicum by Yandex administrators for the diligence with which I completed by degree and for my role as a Senior Student.
This is the first course of the Deep Learning Specialization. The course involves building deep neural networks by hand, through the creation of functions for forward- and back-propagation.
Explain what GPU is, how it can speed up the computation, and its advantages in comparison with CPUs. Implement deep learning networks on GPUs. Train and deploy deep learning networks for image and video classification as well as for object recognition.
Learn how to handle missing values, non-numeric values, data leakage, and other preprocessing necessities. Also learn how to buld pipelines, perform cross-validation, and use gradient-boosting algorithms.
Learn the core ideas in machine learning. This course is designed to train individuals with limited background in machine learning to develop their first models and perform model validation.
Learn how to use Python for data science. Topics covered include building functions, data structures, loops and list comprehension, working with string and dictionaries, and working with external libraries.
This master course was designed specifically for tutors and senior students at Yandex. The course covers all topics pertinent to providing student with the best learning environment and experience possible, with an emphasis on interacting with students in a supportive and productive manner.
Understand why version control is a fundamental tool for coding and collaboration. Use and interact with GitHub. Install and run Git on your local machine. Collaborate with others through remote repositories.
This course introduces the basics of Python 3, including conditional execution and iteration as control structures, and strings and lists as data structures.
This course introduces classes, instances, and inheritance. It teaches students how to use classes to represent data in concise and natural ways, as well as how to override built-in methods and how to create “inherited” classes that reuse functionality.
This capstone of the Python programming specialization requires students to build a series of applications to retrieve, process, and visualize data using python.
Install Python and write your first program. Use variables to store, retrieve and calculate information. Describe the basics of the Python programming language. Utilize core programming tools such as functions and loops.
Explain the principles of data structures & how they are used. Store data as key/value pairs using Python dictionaries. Create programs that are able to read and write data from files. Accomplish multi-step tasks like sorting or looping using tuples.
Learn to Program and Analyze Data with Python. Develop programs to gather, clean, analyze, and visualize data. Use variables to store, retrieve and calculate information while utilizing core programmign tools such as functions and loops.
This course was designed to give students a primer in the fundamentals of SQL and working with data so that they can begin analyzing it for data science purposes. Course includes topics on filtering, sorting, and calculating data with SQL; subqueries and joins in SQL; and, modifying and analyzing data with SQL.
This course introduces students to the basics of the SQL, as well as basic database design for storing data as part of a multi-step data gathering, analysis, and processing effort. The course uses SQLite3 as its database. It also trains students to build web crawlers and multi-step data gathering and visualization processes.
This course demonstrates how one can treat the Internet as a source of data. Students are trained to scrape, parse, and read web data, as well as access data using web APIs. Students work with HTML, XML, and JSON data formats in Python.
The Professional Scrum Master I (PSM I) certification demonstrates a fundamental level of Scrum mastery. PSM I certificate holders prove that they understand Scrum as described in the Scrum Guide and how to apply Scrum in Scrum Teams.