MANG 6054 Credit Scoring and Data Mining SAS Enterprise Miner OR SAS OR SPSS

MANG 6054 Credit Scoring and Data Mining

SAS Enterprise Miner OR SAS OR SPSS

Coursework 2015/16

Subject or discipline: Statistics
Title: SAS Enterprise Miner OR SAS OR SPSS
Number of sources: 2
Provide digital sources used: No
Paper format: MLA
# of pages: 2
Spacing: Double spaced
# of words: 550
# of slides: ppt icon 0
# of charts: 0
Paper details:

Estimate a logistic regression classifier for the churn data set which you find in the attachment. You can use the software you like (e.g. SAS Enterprise Miner, SAS, SPSS). But I will highly suggest you to use SAS Enterprise Miner, maybe in combination with Microsoft Excel to do some preprocessing. The churn indicator is the target variable. Carefully describe all steps you undertake and indicate which software you used. Make sure you don’t forget to:
• Split the data into a training set (2/3 of the observations) and a test set (1/3 of the observations). Each student should do this individually in a random way (using e.g. SAS Enterprise Miner/ Weka/Microsoft Excel). Hence, it is very implausible that students come up with the same parameter estimates! Special consideration will be given to students that come up with the same parameter estimates.
• Code the nominal variables using dummies or Weights of Evidence (note that some additional coarse classification might be needed).
• Do outlier detection and treatment (only univariate) as discussed in the lectures.
• Consider doing stepwise regression.

You should report the following:
• A short discussion of your data preprocessing steps
• Values of the estimated parameters
• A discussion of the most predictive inputs
• Classification accuracy, sensitivity and specificity on the training and test sets assuming a cut-off of 0.5
• The ROC curve and the Area Under the ROC Curve on the test set
• Accuracy Ratio on the test set

Please do not plagiarize, no plagiarism is tolerated. The paper has to go through the TURNITIN check.

Guidelines 

  • All questions need to be answered!
  • The coursework should be handed in as 1 report by 4pm on Friday 6th May 2016.
  • You should submit your coursework to ALL of the above by 4pm on the above-mentioned date.
  • The first page of your coursework should be the front cover as made available on Blackboard.
  • The file you send me by email should contain your student ID, e.g. pdf!
  • The report should contain page numbers and your student ID as a header on each page of the report.
  • Only answer what is asked for, do not include any irrelevant material and/or appendices (marks will be deducted if you do)!
  • Make sure not to exceed 2000 words!
  • Respect the environment and print out your report on both sides of a sheet of paper.

Question 0 (5 marks)

These are marks assigned to the:

  • structure and lay-out of your report (including page number, header, see above)
  • language use
  • sending of email (only 1 mail with 1 pdf attachment + correct naming)
  • writing my name correctly

 Question 1 (25 marks)

 Estimate a logistic regression classifier for the churn data set which you find on the Blackboard.  You can use the software you like (e.g. SAS Enterprise Miner, Weka, SAS, SPSS, Matlab, …).  Nevertheless, I would advise to use SAS Enterprise Miner, maybe in combination with Microsoft Excel to do some preprocessing.  The churn indicator is the target variable.  Carefully describe all steps you undertake and indicate which software you used.  Make sure you don’t forget to:

  • Split the data into a training set (2/3 of the observations) and a test set (1/3 of the observations). Each student should do this individually in a random way (using e.g. SAS Enterprise Miner/ Weka/Microsoft Excel).  Hence, it is very implausible that students come up with the same parameter estimates!  Special consideration will be given to students that come up with the same parameter estimates.
  • Code the nominal variables using dummies or Weights of Evidence (note that some additional coarse classification might be needed).
  • Do outlier detection and treatment (only univariate) as discussed in the lectures.
  • Consider doing stepwise regression.

You should report the following:

  • A short discussion of your data preprocessing steps
  • Values of the estimated parameters
  • A discussion of the most predictive inputs
  • Classification accuracy, sensitivity and specificity on the training and test sets assuming a cut-off of 0.5
  • The ROC curve and the Area Under the ROC Curve on the test set
  • Accuracy Ratio on the test set

Question 2 (25 marks)

Find an academic or business paper published in 2015 or later discussing a real-life application of data mining or credit scoring.  It is important that the case considered is a real-life case and not an artificial one.  You can consult the following websites and journals to find an appropriate paper:

  • Informs (http://www.informs.org/), e.g.
    • Informs Journal on Computing
    • Informs Management Science
    • Informs Operations Research
  • Elsevier (elsevier.com), e.g.
    • European Journal of Operational Research
    • Journal of the Operational Research Society
    • Omega
    • Computers and Operations Research
    • Machine Learning
    • Expert Systems with Applications
  • Oxford University Press (http://www.oxfordjournals.org/), e.g.
    • IMA Journal of Management Mathematics
  • Springer
    • Data Mining and Knowledge Discovery

However, feel free to use other literature sources as well, as long as they are scientific, academic papers.  Once you have found an appropriate paper, report the following in separate sections:

  • Title, authors and complete citation (journal name, book title, issue, year, …)
  • The data mining problem considered
  • The data mining techniques used
  • The results reported
  • A critical discussion of the model and results (assumptions made, shortcomings, limitations, …)

Make sure you demonstrate that you understand what the article is all about!

Do not copy and paste from the article.  Using Turnitin, this will be easily detected!

 Question 3 (25 marks)

The Internet of Things (IoT) refers to the network of interconnected things such as electronics devices, sensors, software, IT infrastructure which create and add value by exchanging data with various stakeholders such as manufacturers, service providers, customers, other devices, etc., hereby using the World Wide Web technology stack (e.g. Wifi, IPv6, …).  In terms of devices, you can think about heartbeat monitors; motion, noise or temperature sensors; smart meters measuring utility (e.g. electricity, water) consumption; etc.  Some examples of applications are:

  • Smart parking: automatically monitoring free parking spaces in a city;
  • Smart lighting: automatically adjusting street lights to weather conditions;
  • Smart traffic: optimizing driving and walking routes based upon traffic and congestion;
  • Smart grid: automatically monitoring energy consumption;
  • Smart supply chains: automatically monitoring goods as they move through the supply chain;
  • Telematics: automatically monitoring driving behavior and linking it to insurance risk and premiums;

It speaks for itself that the amount of data generated is enormous and offers an unseen potential for analytical applications.

Pick one particular type of application of IoT and discuss the following:

  • how to use both predictive and descriptive analytics;
  • how to evaluate the performance of the analytical models;
  • key issues in post-processing and implementing the analytical models;
  • important challenges and opportunities.

Question 4 (20 marks)

Explain the following concepts (don’t copy and paste from the Internet or Wikipedia):

  • Information Value of a variable
  • Validation data set in a decision tree context
  • Outlier truncation
  • LGD in the Basel context
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: