Pandas | 17 缺失数据处理- 学习笔记- 青岛软件培训-选择一家好的青岛软件培训学校，就要看教学质量和口碑

数据丢失(缺失)在现实生活中总是一个问题。机器学习和数据挖掘等领域由于数据缺失导致的数据质量差，在模型预测的准确性上面临着严重的问题。在这些领域，缺失值处理是使模型更加准确和有效的重点。

使用重构索引(reindexing)，创建了一个缺少值的DataFrame。在输出中，NaN表示不是数字的值。

一、检查缺失值

为了更容易地检测缺失值(以及跨越不同的数组dtype)，Pandas提供了isnull()和notnull()函数，它们也是Series和DataFrame对象的方法

示例1

import pandas as pd import numpy as np  df = pd.DataFrame(np.random.randn(5, 3),                   index=['a', 'c', 'e', 'f','h'],                   columns=['one', 'two', 'three'])  df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])  print(df) print('\n')  print (df['one'].isnull())

输出结果：

        one       two     three
a  0.036297 -0.615260 -1.341327
b       NaN       NaN       NaN
c -1.908168 -0.779304  0.212467
d       NaN       NaN       NaN
e  0.527409 -2.432343  0.190436
f  1.428975 -0.364970  1.084148
g       NaN       NaN       NaN
h  0.763328 -0.818729  0.240498


a    False
b     True
c    False
d     True
e    False
f    False
g     True
h    False
Name: one, dtype: bool

示例2

import pandas as pd import numpy as np  df = pd.DataFrame(np.random.randn(5, 3), index=['a