1. 1. Dataset
  2. 2. Features and Preprocessing
  3. 3. Logistic Regression Model
  4. 4. Reference
Modeling Default Probability in SMEs

STATS 243 Project

The small and medium sized enterprises (SMEs) in the economy of many coun- tries and the considerable attention placed on SMEs in the new Basel Capital Accord. The financial data variables of SMEs datasets can be coverage to financial ratios, which have been used to analyze and predict firm bankruptcies. Altman (2001) developed a Z-score that is useful in predicting firm bankruptcies (a low score indicates high probability of failure). The predictive model was based on a firm’s working capital to assets, retained earnings to assets, EBIT to assets, market to book value of a share of stock, and revenues to assets.
In this project, I choice alternative approach. The goal is to find related variables and predict the failure risk of each individual in the testing set based on the various models, such as logistic regression and dynamic EB via GLMM discussed in Lai, Su and Sun (2014). To develops the specific model to estimate one-year SME probability of default, and illustrate the steps of my analysis and compare the results obtained using di↵erent statistical instruments.


The SME datasets for n = 46595 record data (omitted the Na observations) included 1226 companies from 1994/01/31 to 2014/11/30 quarterly data, which is 20 years, 84-period observations per company if the company are still active.
In this case, set the training set t = 80, the last 4 periods (1 year) data set to be the testing set. Because, the interest will be adjusted per year generally. In addition, for forecasting more accurately in the k-step ahead predictions, k should be limited.

Features and Preprocessing

The variables in SME dataset are in the Table 1, included 19 financial data. Summary the SME data and check the quantile, for getting the first intuitive un- derstanding. For further process the SME datasets to be longitudinal and cross- sectional, converge the date to period t = 1, 2, …, 84 for specific company.
The Table 2. quantile of SME variables shows that the financial status of 1226 companies in the 20 years are complex.

Logistic Regression Model

= \frac{\pi \exp{-d_1(\mathbf{x})/2}}
{\pi \exp{-d_1(\mathbf{x})/2} + (1-\pi)\exp{-d_0(\mathbf{x})/2} }
= \frac{1}{1+e^{-s(\mathbf{x})}} \notag
\end{equation} \]


  1. Tze Leung Lai, Haipeng Xing (2008) Statistical Models and Methods for Financial Markets

  2. Tze Leung Lai, Haipeng Xing Active Risk Management: Financial Models and Statistical Methods

  3. Tze Leung Lai, Yong Su and Kevin Haoyu Sun (2014) Dynamic Empirical Bayes Models and Their Applications to Longitudinal Data Analysis and Prediction

  4. Tze Leung Lai, Yong Su and Zhiyu Wang Evaluation of Econometric Forecasts with Applications to Default Prediction in Small and Medium Sized Enterprises

  5. Edward I. Altman and Gabriele Sabato. Modeling Credit Risk for SMEs: Evidence from the US Market

  6. Jurg Schelldorfer, Lukas Meier and Peter Buhlmann GLMMLasso: An Algorithm for High-Dimensional Generalized Linear Mixed Models Using $l_1$-Penalization

Author: Eva W.
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless stating additionally.