Using alternative data to build credit models
Posted: Sun Dec 22, 2024 5:42 am
Today we are presenting our first “use case”. In this series of posts, which we will expand over time, we intend to show the market how to make the best possible use of the data and products we provide to make better decisions, optimize processes and reduce costs.
In this first post, we will explore the challenge of credit modeling, in particular, modeling using so-called “alternative data” (we have talked about them before ). Throughout this series, we will follow a similar structure: first, we will talk about the objectives, that is, what is this use case, the problem being solved and why it is relevant to the market; then, we will go into the application details, how you use BigDataCorp services to solve the problem; finally, we will explore some results that have been obtained by our customers in practice, as well as challenges that these customers faced, which can help in future implementations.
So let's get down to business!
Objective
The big challenge that companies that are building credit models face – besides, of course, building the model itself – is how to create something that is truly unique to their business, something that can significantly surpass the off-the-shelf models offered by traditional credit bureaus. This is not a simple problem to solve. The bureaus have extensive experience in what they do and a significant history of information, so surpassing the models they offer is indeed difficult.
To address this problem, we offer our clients oman whatsapp number dozens of attributes that correlate with risk behavior, payment capacity, and access to credit for individuals and companies. However, all of these attributes are constructed from alternative data, such as the presence and participation in the digital economy of individuals and companies, the relationship networks of different entities, or even the inference of behavior from advertisements and comments posted on the Internet by a person.
The data we deliver can be used independently, to build a model based solely on it, or integrated into existing models, with attributes coming from other sources or even with the client's own internal data.
Application
So how does building a credit model with this alternative data work? The first step, as in any modeling process, is to find your “response variable,” the result that your model wants to predict in some way. In the case of credit models, this response variable is usually default after 3, 6, or 9 months, that is, whether the customer continues to pay the loan granted after this period. If you do not have this response variable in-house – because your company is still starting out, or because you do not have enough volume of operations to have a representative sample – the recommendation is that you do not try to build your own model, but use an off-the-shelf model. We offer a few different models , built by partners with our alternative data and with an excellent cost-benefit ratio.
Once you have found the response variable, you need to set aside a sample to train your model and collect the data that you will use as input attributes for your modeling process. If you have never worked with sampling, a general rule of thumb is to set aside about 15% of the records for the training sample and the rest for validation. As far as the data is concerned, you don't need to worry about knowing in advance which data will be relevant. All modern modeling tools offer mechanisms for feature selection , so focus on getting as much input data as possible and let the tool filter out the ones that are really relevant.
This is when BigDataCorp comes in, when collecting data to build models. After receiving a sample, we enrich the data and return the attributes to feed your modeling tool. For this sample, it is important that you separate, in addition to the person or company identifier, the reference date of the operation, that is, when you analyzed that proposal. With this, we can return the data as it was on that date. This is essential, because today's information may be very different from what it was in the past, producing completely different results.
Once we receive the sample, we enrich it and return the data. From there, you can operate on the data in your own model building environment, with the tool(s) of your choice, and evaluate which of the data we provide adds the most value to your decision.
Results and challenges
The results are obviously diverse and vary according to the nature of each client, but in general we have seen consistent gains in our clients' models when they combine our alternative data with the traditional information they are already using. This gain can happen in KS, where we have seen gains of 2 to 20 points in different clients, or in cost, maintaining a similar result, but with attributes that cost much less than traditional data.
In the case of models built solely with our data, we know that it is possible to achieve results that are as good as, or even better than, the off-the-shelf models offered by credit bureaus. The great advantage we bring in this case is a huge cost reduction for customers.
Regardless of the results achieved, the challenges faced by clients in the process of modeling with alternative data are always the same. First, there is a difficulty in processing the information itself. The great advantage of the information we present is its complexity, the number of attributes related to each record and, in some cases, the multiple records associated with each individual entity. All this volume of data usually needs to be pre-processed to generate more structured attributes that can be placed directly into the models, which can be a challenge for those who are not experienced with this type of work.
The second major challenge is related to coverage. Alternative data, by definition, speaks of characteristics that traditional data does not cover. Some of this data can have a huge impact on models, but because it has low coverage, it ends up being discarded. Imagine, for example, knowing which college a person attended . Obviously, this attribute has great predictive power when we talk about financial capacity, but it only exists for a small percentage of the population. If your modeling process is not prepared to work with low-coverage attributes, it ends up being discarded.
In this first post, we will explore the challenge of credit modeling, in particular, modeling using so-called “alternative data” (we have talked about them before ). Throughout this series, we will follow a similar structure: first, we will talk about the objectives, that is, what is this use case, the problem being solved and why it is relevant to the market; then, we will go into the application details, how you use BigDataCorp services to solve the problem; finally, we will explore some results that have been obtained by our customers in practice, as well as challenges that these customers faced, which can help in future implementations.
So let's get down to business!
Objective
The big challenge that companies that are building credit models face – besides, of course, building the model itself – is how to create something that is truly unique to their business, something that can significantly surpass the off-the-shelf models offered by traditional credit bureaus. This is not a simple problem to solve. The bureaus have extensive experience in what they do and a significant history of information, so surpassing the models they offer is indeed difficult.
To address this problem, we offer our clients oman whatsapp number dozens of attributes that correlate with risk behavior, payment capacity, and access to credit for individuals and companies. However, all of these attributes are constructed from alternative data, such as the presence and participation in the digital economy of individuals and companies, the relationship networks of different entities, or even the inference of behavior from advertisements and comments posted on the Internet by a person.
The data we deliver can be used independently, to build a model based solely on it, or integrated into existing models, with attributes coming from other sources or even with the client's own internal data.
Application
So how does building a credit model with this alternative data work? The first step, as in any modeling process, is to find your “response variable,” the result that your model wants to predict in some way. In the case of credit models, this response variable is usually default after 3, 6, or 9 months, that is, whether the customer continues to pay the loan granted after this period. If you do not have this response variable in-house – because your company is still starting out, or because you do not have enough volume of operations to have a representative sample – the recommendation is that you do not try to build your own model, but use an off-the-shelf model. We offer a few different models , built by partners with our alternative data and with an excellent cost-benefit ratio.
Once you have found the response variable, you need to set aside a sample to train your model and collect the data that you will use as input attributes for your modeling process. If you have never worked with sampling, a general rule of thumb is to set aside about 15% of the records for the training sample and the rest for validation. As far as the data is concerned, you don't need to worry about knowing in advance which data will be relevant. All modern modeling tools offer mechanisms for feature selection , so focus on getting as much input data as possible and let the tool filter out the ones that are really relevant.
This is when BigDataCorp comes in, when collecting data to build models. After receiving a sample, we enrich the data and return the attributes to feed your modeling tool. For this sample, it is important that you separate, in addition to the person or company identifier, the reference date of the operation, that is, when you analyzed that proposal. With this, we can return the data as it was on that date. This is essential, because today's information may be very different from what it was in the past, producing completely different results.
Once we receive the sample, we enrich it and return the data. From there, you can operate on the data in your own model building environment, with the tool(s) of your choice, and evaluate which of the data we provide adds the most value to your decision.
Results and challenges
The results are obviously diverse and vary according to the nature of each client, but in general we have seen consistent gains in our clients' models when they combine our alternative data with the traditional information they are already using. This gain can happen in KS, where we have seen gains of 2 to 20 points in different clients, or in cost, maintaining a similar result, but with attributes that cost much less than traditional data.
In the case of models built solely with our data, we know that it is possible to achieve results that are as good as, or even better than, the off-the-shelf models offered by credit bureaus. The great advantage we bring in this case is a huge cost reduction for customers.
Regardless of the results achieved, the challenges faced by clients in the process of modeling with alternative data are always the same. First, there is a difficulty in processing the information itself. The great advantage of the information we present is its complexity, the number of attributes related to each record and, in some cases, the multiple records associated with each individual entity. All this volume of data usually needs to be pre-processed to generate more structured attributes that can be placed directly into the models, which can be a challenge for those who are not experienced with this type of work.
The second major challenge is related to coverage. Alternative data, by definition, speaks of characteristics that traditional data does not cover. Some of this data can have a huge impact on models, but because it has low coverage, it ends up being discarded. Imagine, for example, knowing which college a person attended . Obviously, this attribute has great predictive power when we talk about financial capacity, but it only exists for a small percentage of the population. If your modeling process is not prepared to work with low-coverage attributes, it ends up being discarded.