0
Skip to Content
Insights With Patalee
About
Home
Beyond work
View My GitHub Repo
Insights With Patalee
About
Home
Beyond work
View My GitHub Repo
About
Home
Beyond work
View My GitHub Repo

Read more about my research

Unpacking Consumer Choices: Socio-Economic and Demographic Drivers of Hemp Cereal Purchasers:

View Presentation Here

Predicting International Trade Participation: A Multifaceted Analysis Using Classification Models:

Abstract:

This study aims to predict whether a country engages in international trade using various classification models. The analysis utilizes predictor variables such as common language, legal system similarity, shared religion, border proximity, trading distance, free trade agreements, colonial ties, currency union, World Trade Organization membership, island status, and landlocked status. Random forest (RF) and Boosting are employed as primary models, supplemented by LDA, Naïve Bayes, Logistic Regression, and Generalized Additive Model (GAM). Results indicate that RF outperforms other models with the lowest testing error, followed by Boosting. Variable importance analysis identifies shared religion, WTO membership, and distance as influential predictors in RF, while Boosting emphasizes distance, shared religion, and WTO membership. Cross-validation further confirms RF's superior performance, with lower testing errors compared to other models. Boosting, despite slightly higher testing error rates, demonstrates promising results. In conclusion, RF proves to be the most effective model for predicting international trade, offering high accuracy and generalization. However, Boosting also exhibits strong performance, highlighting its potential for further improvement through fine-tuning. Overall, the study provides valuable insights into the application of classification models in predicting international trade patterns.

Disclaimer: The analysis presented in this document is based on publicly available datasets, which have been previously published and cited accordingly. The research aims to apply different analytical techniques to address distinct research questions. This work is intended for demonstration and discussion and is not part of a formal publication

Please Click Here for the Full Paper
GitHub repository URL

Estimating Mean and Variance Functions of a Random Variable Dependent on Two Independent Variables

Abstract:

 The mean and variance of a random variable play pivotal roles in statistical analysis and predictive modeling, offering insights into central tendency and data dispersion. In this study, we aim to develop predictive models for estimating the mean and variance of a random variable Y, dependent on two independent variables X1 and X2. Leveraging 200 realizations of Y across various (X1, X2) pairs, we employ data mining and machine learning techniques to predict mean and variance as functions of X1 and X2. Methodology involves employing a range of non-linear predictive models, including Generalized Additive Models, Support Vector Machine (SVM), Local Smoothing (LOESS), K-Nearest Neighbors (KNN), Neural Networks, Random Forest (RF), and boosting. Parameter tuning via cross-validation and grid search optimizes model performance, with KNN emerging as the top-performing model. Validation results demonstrate KNN's superior accuracy in predicting both mean and variance, as evidenced by lower RMSE, MAE, and RSS values compared to other models. In conclusion, the KNN model offers the most accurate estimations for mean and variance, providing valuable insights for decision-making across various domains. 

Please Click Here for the Full Paper
GitHub Repository URL

Estimating Mean and Variance Functions of a Random Variable Dependent on Two Independent Variables

Abstract:

 The mean and variance of a random variable play pivotal roles in statistical analysis and predictive modeling, offering insights into central tendency and data dispersion. In this study, we aim to develop predictive models for estimating the mean and variance of a random variable Y, dependent on two independent variables X1 and X2. Leveraging 200 realizations of Y across various (X1, X2) pairs, we employ data mining and machine learning techniques to predict mean and variance as functions of X1 and X2. Methodology involves employing a range of non-linear predictive models, including Generalized Additive Models, Support Vector Machine (SVM), Local Smoothing (LOESS), K-Nearest Neighbors (KNN), Neural Networks, Random Forest (RF), and boosting. Parameter tuning via cross-validation and grid search optimizes model performance, with KNN emerging as the top-performing model. Validation results demonstrate KNN's superior accuracy in predicting both mean and variance, as evidenced by lower RMSE, MAE, and RSS values compared to other models. In conclusion, the KNN model offers the most accurate estimations for mean and variance, providing valuable insights for decision-making across various domains. 

Please Click Here for the Full Paper
GitHub Repository URL

Chasing Votes: A Markov Chain Monte Carlo Forecast for 2024 US Presidential Election

Abstract:

Markov chain Monte Carlo methods (MCMC) are essential tools for solving many modern-day statistical and computational problems (Calderhead, 2014). The name “Monte Carlo” started as the casino at Monte Carlo. But it soon became a technical term for simulation of random processes (Geyer, 2011). A Markov chain is a type of stochastic process that consists of random variables transitioning between states according to specific probabilistic rules. The process depends on the Markov Property, which states that the future state of the system is determined solely by its current state, without any influence from past states. This makes Markov chains a powerful tool for modeling sequences where each step is independent of all but the most recent one (Lateef, 2019). This paper will extensively explain MCMC and provide an exploratory analysis using it to simulate the current 2024 US Presidential Election.

Please Click Here for the Full Paper
GitHub Repository URL