项目介绍
在文章CNN大战验证码中,我们利用TensorFlow搭建了简单的CNN模型来破解某个网站的验证码。验证码如下:

在本文中,我们将会用Keras来搭建一个稍微复杂的CNN模型来破解以上的验证码。
数据集
  对于验证码图片的处理过程在本文中将不再具体叙述,有兴趣的读者可以参考文章CNN大战验证码。
  在这个项目中,我们现在的样本一共是1668个样本,每个样本都是一个字符图片,字符图片的大小为16*20。样本的特征为字符图片的像素,0代表白色,1代表黑色,每个样本为320个特征,取值为0或1,特征变量名称为v1到v320,样本的类别标签即为该字符。整个数据集的部分如下:

CNN模型
利用Keras可以快速方便地搭建CNN模型,本文搭建的CNN模型如下:

将数据集分为训练集和测试集,占比为8:2,该模型训练的代码如下:
# -*- coding: utf-8 -*- import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from matplotlib import pyplot as plt  from keras.utils import np_utils, plot_model from keras.models import Sequential from keras.layers.core import Dense, Dropout, Activation, Flatten from keras.callbacks import EarlyStopping from keras.layers import Conv2D, MaxPooling2D  # 读取数据 df = pd.read_csv('F://verifycode_data/data.csv')  # 标签值 vals = range(31) keys = ['1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','J','K','L','N','P','Q','R','S','T','U','V','X','Y','Z'] label_dict = dict(zip(keys, vals))  x_data = df[['v'+str(i+1) for i in range(320)]] y_data = pd.DataFrame({'label':df['label']}) y_data['class'] = y_data['label'].apply(lambda x: label_dict[x])  # 将数据分为训练集和测试集 X_train, X_test, Y_train, Y_test = train_test_split(x_data, y_data['class'], test_size=0.3, random_state=42) x_train = np.array(X_train).reshape((1167, 20, 16, 1)) x_test = np.array(X_test).reshape((501, 20, 16, 1))  # 对标签值进行one-hot encoding n_classes = 31 y_train = np_utils.to_categorical(Y_train, n_classes) y_val = np_utils.to_categorical(Y_test, n_classes)  input_shape = x_train[0].shape  # CNN模型 model = Sequential()  # 卷积层和池化层 model.add(Conv2D(32, kernel_size=(3, 3), input_shape=input_shape, padding='same')) model.add(Activation('relu')) model.add(Conv2D(32, kernel_size=(3, 3), padding='same')) model.add(Activation('relu')) model.add(MaxPooling2D(pool_size=(2, 2), padding='same'))  # Dropout层 model.add(Dropout(0.25))  model.add(Conv2D(64, kernel_size=(3, 3), padding='same')) model.add(Activation('relu')) model.add(Conv2D(64, kernel_size=(
                    
                