title: 学习 pandas [02]: 二进制操作 slug: er-jin-zhi-cao-zuo date: 2021-1-23 tags:

Pandas
Jupyter
python category: 数据分析 link: description: type: text

二进制操作

Pandas数据结构之间执行二进制操作，要注意下列两个关键点：

DataFrame与Series之间的广播机制；
计算中的缺失值处理。

加、减、乘、除、取模

DataFrame支持以下运算操作中对Series进行广播：

add()：加
sub()：减
mul()：乘
div()：除
radd()
rsub()

在以上方法使用中，通过axis参数，指定沿index或columns广播。

以sub()举例。

df.sub(other, axis='columns', level=None, fill_value=None)

other：被减去的常数、序列、DataFrame或Series。如果是Series，则会按axis参数进行广播，也就是df的每行（或列）都执行减去Series。
axis：可选值为columns或index，指定沿行或列广播。
level：可选值为整数或标签，对于有多层及索引的df，指定Series对齐多层索引df的某一层级。
fill_value：指定一个值代替df和other中的NaN。只对other为DataFrame时起作用。如果在df和other的相同位置都是NaN，则运算结果也是NaN。

python
import numpy as np
import pandas as pd

df = pd.DataFrame({
    'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
    'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
    'three': pd.Series(np.random.randn(3), index=['b', 'c', 'd'])
})

df

	one	two	three
a	-0.062689	1.465328	NaN
b	-2.187806	-0.299947	0.670050
c	-0.795105	0.803948	-0.016265
d	NaN	-2.975800	0.280831

python
row = df.iloc[1]
df1 = df.sub(row, axis='columns')
df1

	one	two	three
a	2.125118	1.765275	NaN
b	0.000000	0.000000	0.000000
c	1.392701	1.103896	-0.686316
d	NaN	-2.675852	-0.389220

python
df2 = pd.DataFrame({
    'one': pd.Series(np.random.randn(3), index=['a', 'b', 'c']),
    'two': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd']),
    'three': pd.Series(np.random.randn(4), index=['a', 'b', 'c', 'd'])
})
df2

	one	two	three
a	-0.930814	0.543193	-3.300306
b	-0.756836	1.236183	-0.116324
c	1.270964	0.958356	0.802346
d	NaN	0.818930	-0.072648

python
df3 = df.sub(df2, fill_value=10)
df3

	one	two	three
a	0.868126	0.922134	13.300306
b	-1.430970	-1.536130	0.786375
c	-2.066069	-0.154408	-0.818611
d	NaN	-3.794730	0.353479

比较操作

Series与DataFrame支持以下比较操作：

eq()：等于
ne()：不等于
lt()：小于
gt()：大于
le()：小于或等于
ge()：大于或等于
equals()：直接比较两个Series或两个DataFrame是否相等。此方法将NaN与NaN比较结果视为相等。用法与加减乘除类似。以lt()举例。

df.lt(other, axis='columns', level=None)

other：被比较的常数、序列、DataFrame或Series。如果是Series，则会按axis参数进行广播，也就是df的每行（或列）都执行减去Series。
axis：可选值为columns或index，指定沿行或列广播。
level：可选值为整数或标签，对于有多层及索引的df，指定Series对齐多层索引df的某一层级。

equals()返回值为布尔值，以上其他方法返回值为与df构造相同的元素为布尔值的DataFrame或Series。

python
df.lt(df2)

	one	two	three
a	False	False	False
b	True	True	False
c	True	True	True
d	False	True	False

python
df.lt(row, axis='columns')

	one	two	three
a	False	False	False
b	False	False	False
c	False	False	True
d	False	True	True

python
df.lt(0)

	one	two	three
a	True	False	False
b	True	True	False
c	True	False	True
d	False	True	False

直接用df == df比较，返回False，因为df中含有NaN，两个NaN值的比较结果为不等。因此需使用equals()方法。

python
(df == df).all().all()


False

python
df.equals(df)


True

布尔简化

可以把数据汇总简化成单个布尔值。

empty：DataFrame或Series的属性，非方法，验证是否为空。
any()：任意一个元素为真则为真。每使用一次则返回值减少一个维度。
all()：全部元素为真则为真。每使用一次则返回值减少一个维度。
bool()：验证单元素pandas对象的布尔值。

python
df.empty


False

python
pd.Series([True]).bool()


True

any()和all()用法类似。以any()举例。

df.any(axis=0, bool_only=None, skipna=True, level=None)

axis：取值为0/'index'、1/'columns'、None，默认0，指定哪个维度被减少，如果取值None，则减去所有维度返回常数。
bool_only：取值为布尔值，默认None。是否只利用序列中的布尔值进行判断，取None后会使用任何值。对Series无效。
skipna：取值为布尔值，是否跳过NA/null值。默认值为True，则将整行/列都是NA的当作空行/列，返回False。反之不当作空行/列，返回True。
level：可选值为整数或标签，对于有多层及索引的df，指定某一层级。

python
df.lt(0).any()


one      True
two      True
three    True
dtype: bool

python
df.lt(0).all()


one      False
two      False
three    False
dtype: bool

python
df.lt(0).any().all()


True

合并重叠数据

df5.combine_first(df6)

将df5中的NaN值以及空行/列，填充df6中对应位置的值。

python
df5 = pd.DataFrame({'A': [1., np.nan, 3., 5., np.nan],
                    'B': [np.nan, 2., 3., np.nan, 6.]})

df6 = pd.DataFrame({'A': [5., 2., 4., np.nan, 3., 7.],
                    'B': [np.nan, np.nan, 3., 4., 6., 8.],
                    'C': [2., 3., 9., 10., 11., 12.]})

df5

	A	B
0	1.0	NaN
1	NaN	2.0
2	3.0	3.0
3	5.0	NaN
4	NaN	6.0

python
df6

	A	B	C
0	5.0	NaN	2.0
1	2.0	NaN	3.0
2	4.0	3.0	9.0
3	NaN	4.0	10.0
4	3.0	6.0	11.0
5	7.0	8.0	12.0

python
df5.combine_first(df6)

	A	B	C
0	1.0	NaN	2.0
1	2.0	2.0	3.0
2	3.0	3.0	9.0
3	5.0	4.0	10.0
4	3.0	6.0	11.0
5	7.0	8.0	12.0

目录

二进制操作

加、减、乘、除、取模

比较操作

布尔简化

合并重叠数据