逻辑回归实例

 简介

  Logistic回归是一种机器学习分类算法,用于预测分类因变量的概率。 在逻辑回归中,因变量是一个二进制变量,包含编码为1(是,成功等)或0(不,失败等)的数据。 换句话说,逻辑回归模型预测P(Y = 1)是X的函数。

  数据

  该数据集来自

import pandas as pd import numpy as np from sklearn import preprocessing import matplotlib.pyplot as plt  plt.rc("font", size=14) from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import train_test_split import seaborn as sns sns.set(style="white") sns.set(style="whitegrid", color_codes=True)
复制代码
复制代码
data=pd.read_csv('F:/wd.jupyter/datasets/log_data/bank.csv',delimiter=';') data=data.dropna() print(data.shape) print(list(data.columns))  data.head()
(41188, 21)['age', 'job', 'marital', 'education', 'default', 'housing', 'loan', 'contact', 'month', 'day_of_week', 'duration', 'campaign', 'pdays', 'previous', 'poutcome', 'emp.var.rate', 'cons.price.idx', 'cons.conf.idx', 'euribor3m', 'nr.employed', 'y']
复制代码

  数据集提供银行客户的信息。 它包括41,188条记录和21个字段。

  变量

  • age (numeric)
  • job : type of job (categorical: “admin”, “blue-collar”, “entrepreneur”, “housemaid”, “management”, “retired”, “self-employed”, “services”, “student”, “technician”, “unemployed”, “unknown”)
  • marital : marital status (categorical: “divorced”, “married”, “single”, “unknown”)
  • education (categorical: “basic.4y”, “basic.6y”, “basic.9y”, “high.school”, “illiterate”, “professional.course”, “university.degree”, “unknown”)
  • default: has credit in default? (categorical: “no”, “yes”, “unknown”)
  • housing: has housing loan? (categorical: “no”, “yes”, “unknown”)
  • loan: has personal loan? (categorical: “no”, “yes”, “unknown”)
  • contact: contact communication type (categorical: “cellular”, “telephone”)
  • month: last contact month of year (categorical: “jan”, “feb”, “mar”, …, “nov”, “dec”)
  • day_of_week: last contact day of the week (categorical: “mon”, “tue”, “wed”, “thu”, “fri”)
  • duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y=’no’). The duration is not known before a call is performed, also, after the end of the call, y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model
  • campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
  • pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
  • previous: number of contacts performed before this campaign and for this client (numeric)
  • poutcome: outcome of the previous marketing campaign (categorical: “failure”, “nonexistent”, “success”)
  • emp.var.rate: employment variation rate — (numeric)
  • cons.price.idx: consumer price index — (numeric)
  • cons.conf.idx: consumer confidence index — (numeric)
  • euribor3m: euribor 3 month rate — (numeric)
  • nr.employed: number of employees — (numeric)

  预测变量

  y - 客户是否订购了定期存款? (二进制:“1”表示“是”,“0”表示“否”)

关键字:
50000+
5万行代码练就真实本领
17年
创办于2008年老牌培训机构
1000+
合作企业
98%
就业率

联系我们

电话咨询

0532-85025005

扫码添加微信