Data Mining | Scores and Models

What is scoring?

Scoring is a well known concept under data mining. For a newbie, Data Mining is the usage of data from a data warehouse for marketing. Data Mining involves complicate logics applied to get the maximum out of the data available with you on sales, guests, inventory etc.

Coming back to our topic, scoring is one of the methodologies under data-mining. Scoring can be applied to several dimensions such as guests, items etc.

Example :

Consider that you are planning to send out an offer on discount of Bakery items. Consider you being a retail giant, will have around 100 million customers. Sending offers to all of them would prove an expensive investment. Hence, you would rather prefer to send it to customers who have a history of purchasing bakery items. That is where scoring comes in. The probability that a customer will buy a bakery item again will be granted as a score to that guest. Say a score of 0 will be the least probability that he will purchase a bakery item and 1 the maximum. This directly depends on the number of times he/she has already purchased bakery items.

What is the Scoring Process?

scoring

The above picture summarizes the scoring process. I shall explain in detail each of the sections.

Segment

Segment is the category of data to which the scoring process is to be applied. An example would explain it well.

Example:

Say I decide to run the offer of discount in Bakery items to Women of the age category 30-40. This becomes the segment that forms as a candidate for the scoring process.

Model

Model is the logic or rule being applied to the data to decide what score it gets. Models are usually written in an XML type of language called PMML(Predictive Model Markup Language). Designing a model itself is a critical and complex process. Usually a dedicated team is available for this.

Example:

The model decides how to interpret the data. An example of a model would be

(Total amount spent on bakery Items by the guest)/(Total amount spent by the guest on purchase)

Basically it is just the calculation that needs to be applied on the data to generate the score.

Scoring Engine

The scoring engine is responsible for applying the model on the input segment of data. It could be a Procedure, Java based application etc.

Score

The output of the whole scoring process is the score. The score need not be 0-1 always. It could be any kind of desirable range value.

Campaign

Just having the scores does not complete the work. Now comes the work of campaigning. Once the scores have been assigned, a selection is made on the scored data to identify which need to be chosen or eliminated.

Example:

Once scoring is done for all women in the segment 30-40, we have a range of scores(0-1). The guests can be ordered by scores and the first 30% can be chosen for sending out the offers.

Summarizing everything

A segment of data is taken from a large Data Warehouse to perform the scoring. This selection is decided usually by the Business teams of the retail firm. Usually a view or table is created for each segment as there would be several segments and several models. A model is to be applied to these segments. The model is again done usually by a dedicated modeling team. Model is the statistical analysis that needs to be done for the segment to assign the score. Once the score is generated by the scoring engine the scores are saved in a ordered format. Again as there are multiple models and segments, usually dedicated tables are created. The campaigning team runs a selection criteria on the data based on the scores and decides the audience for the discounted bakery offer.

1 thought on “Data Mining | Scores and Models

Leave a comment