Python3标准库built-in、itertools、functools中的生成器

shuibo 发布于2019-07-30 18:06 / 2660人阅读

摘要：介绍中实现了很多生成器函数，本篇主要介绍模块中的生成器。会过滤掉中所有非元音字母，返回符合元素组成的生成器对象。函数等同于下面的生成器表达式用法。相关链接中的迭代器和生成器

介绍

Python3中实现了很多生成器函数，本篇主要介绍built-in、itertools、functools模块中的生成器。

过滤器生成器

本类生成器函数将iterable对象作为参数，在不改变该iterable对象的条件下，返回iterable子集的生成器对象。

filter(predicate, iterable)

iterable的每一个元素会传入predicate函数中判断是否为True，该生成器会返回所有返回为True的元素组成的生成器对象。

</>复制代码 
def is_vowel(c):
    return c.lower() in "aeiou"
    
word = "abcdefghijk"
print(list(filter(is_vowel, word)))
## output: ["a", "e", "i"]

filter会过滤掉word中所有非元音字母，返回符合元素组成的生成器对象。
注意：通过list(generator)可以将生成器对象转换为列表，但如果是无限生成器list将会产生大量元素导致出错。
filter函数等同于下面的生成器表达式用法。

</>复制代码 
(item for item in iterable if function(item))

如果filter的第一个参数为None，则不过滤返回全部，等同于下面的生成器表达式用法。

</>复制代码 
(item for item in iterable if item)

itertools.filterfalse(predicate, iterable)

该函数和filter类似，区别是过滤掉predicate返回True的元素。

</>复制代码 
print(list(itertools.filterfalse(is_vowel, word)))
## output: ["b", "c", "d", "f", "g", "h", "j", "k"]

itertools.takewhile(predicate, iterable)

该函数连续迭代iterable对象中的元素，并用predicate函数判断，若predicate返回为True，将不断产出该元素，直到predicate返回False，过滤了iterable后面不符合的元素。

</>复制代码 
print(list(itertools.takewhile(is_vowel, word)))
## output: ["a"]

itertools.dropwhile(predicate, iterable)

该函数与itertools.takewhile相反，过滤了iterable对象前面符合predicate返回True的元素，保留后面的子集。

</>复制代码 
print(list(itertools.dropwhile(is_vowel, word)))
## output: ["b", "c", "d", "e", "f", "g", "h", "i", "j", "k"]

itertools.compress(iterable, selectors)

该函数中的selectors也是一个迭代对象，compress根绝selectors中的值(0/1或是True/False)判断是否过滤iterable中的元素。

</>复制代码 
print(list(itertools.compress(word, [1, 0, 1, 0])))
## output: ["a", "c"]

如果selectors长度不够，则iterable后面的对象全部被过滤掉。

itertools.islice(iterable, stop)

根据传入参数的个数不同，该函数另一种写法是itertools.islice(iterable, start, stop[, step])，islice函数类似python中的分片操作：list[start:stop:step]。

</>复制代码 
print(list(itertools.islice(word, 4)))
## output: ["a", "b", "c", "d"]
print(list(itertools.islice(word, 4, 8)))
## output: ["e", "f", "g", "h"]
print(list(itertools.islice(word, 4, 8, 2)))
## output: ["e", "g"]

映射生成器

该类生成器主要对于传入的一个或多个迭代对象中的每一个元素进行操作，返回映射后的生成器对象。

map(func, *iterables, timeout=None, chunksize=1)

map是Python中常用的原生生成器，将迭代对象中的每一个元素传入func进行映射返回新的迭代对象。如果有n个iterable对象，则func的参数则为n个，后面的timeout和chunksize参数涉及到异步，本篇将不阐述。

</>复制代码 
print(list(map(lambda x: x.upper(), word)))
print([x.upper() for x in word])
## output: ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K"]

上面第一行中的map将word中的每个元素转换为大写，和第二行中的列表生成式用法相似。

</>复制代码 
print(list(map(lambda x, y: (x, y), word, word)))
print(list(zip(word, word)))
## output: [("a", "a"), ("b", "b"), ("c", "c") ... ("k", "k")]

当有两个iterable传入时，func将需要处理传入的两个参数，第一行的用法和zip函数的作用相似。

itertools.starmap(function, iterable)

当iterable中的元素也是个迭代对象时，如果使用map函数，需要在函数内部实现解压操作获取到单个元素，而startmap将iterable中的元素按function(*item)方式传入，我们可以在定义function的参数时完成解压操作。举例，如果想输入序列[(2,5), (3,2), (10,3)]来得到一个每个元组元素的和的序列[7, 5, 13], 若使用map方法，fun函数将会复杂，而使用startmap则只需要传递一个add函数作为startmap参数，元组解压后的两个值将传入add函数作为参数。

</>复制代码 
from operator import add
print(list(map(lambda x: add(x[0], x[1]), [(2, 5), (3, 2), (10, 3)])))
print(list(itertools.starmap(add, [(2, 5), (3, 2), (10, 3)])))
## output: [7, 5, 13]

enumerate(iterable, start=0)

enumerate函数也是常见的生成器函数，它的主要用法是提供for-in循环中的索引。若设置start参数，索引将从start值开始逐1增加。

</>复制代码 
for i, c in enumerate(word, 2):
    print(i, c)

itertools.accumulate(iterable[, func])

accumulate函数将通过func函数完成逐步累加操作，默认func为operator.add。下面用例子进行说明。

</>复制代码 
sample = [1, 2, 3, 4, 5]
print(list(itertools.accumulate(sample)))
## output: [1, 3, 6, 10, 15]
print(list(itertools.accumulate(sample, mul)))
## output: [1, 2, 6, 24, 120]
print(list(itertools.accumulate(sample, mul)))
## output: [1, 2, 6, 24, 120]
print(list(itertools.accumulate(sample, min)))
## output: [1, 1, 1, 1, 1]
print(list(itertools.accumulate(sample, max)))
## output: [1, 2, 3, 4, 5]
print(list(itertools.starmap(lambda x, y: y/x, 
                             enumerate(itertools.accumulate(sample), 1))))
## output: [1.0, 1.5, 2.0, 2.5, 3.0]

合并生成器

合并生成器接收多个可迭代对象参数，将他们组合后返回新的生成器对象。

itertools.chain(*iterables)

chain生成器函数接收多个可迭代对象参数，将他们按顺序组合成新的生成器对象返回。

</>复制代码 
print(list(itertools.chain(range(3), range(3, 7))))
## output: [0, 1, 2, 3, 4, 5, 6]

itertools.chain.from_iterable(iterable)

chain.from_iterable函数接收一个元素为可迭对象的可迭代对象，将该所有可迭代的元素拆开，重新按顺序组合成一个新的生成器，新的生成器产出的元素为iterable参数某个元素的解压，chain.from_iterable功能更像是逐层解压迭代对象。

</>复制代码 
a, b = [1,2], [3,4]
iterable= [[a,b],[a,b]]
print(iterable)
new_iterable = list(itertools.chain.from_iterable(iterable))
print(new_iterable)
print(list(itertools.chain.from_iterable(new_iterable)))
## output:
## [[[1, 2], [3, 4]], [[1, 2], [3, 4]]]
## [[1, 2], [3, 4], [1, 2], [3, 4]]
## [1, 2, 3, 4, 1, 2, 3, 4]

zip(*iterables)

zip函数接收多个iterable参数，并提取每个iterable元素组成元组，返回这些元组组成的生成器对象。

</>复制代码 
iterable1 = "abcd"
iterable2 = [1, 2, 3]
iterable3 = [10, 20, 30, 40]
print(list(zip(iterable1, iterable2, iterable3)))
## output:
## [("a", 1, 10), ("b", 2, 20), ("c", 3, 30)]

如果多个iterable元素个数不一致，zip会在最短的iterable耗尽后停止。
我们可以通过zip函数生成一个字典对象

</>复制代码 
keys = "abc"
values = [1, 2, 3]
print(dict(zip(keys, values)))
## output: {"a": 1, "b": 2, "c": 3}

itertools.zip_longest(*iterables, fillvalue=None)

zip_longes函数作用和zip类似，在zip中如果某个iterable对象耗尽，生成器将就此停止，而zip_longest函数将为耗尽的iterable补充fillvalue值。

</>复制代码 
iterable1 = "abcd"
iterable2 = [1, 2, 3]
iterable3 = [10, 20, 30, 40]
print(list(itertools.zip_longest(iterable1, iterable2, iterable3, fillvalue=0)))
## output: [("a", 1, 10), ("b", 2, 20), ("c", 3, 30), ("d", 0, 40)]

itertools.product(*iterables, repeat=1)

product函数计算所有iterable的笛卡尔积，它像是生成器表达式中处理嵌套循环的步骤，product(a, b)可以等同于((x, y) for x in a for y in b)。
repeat相当于扩展了iterables, product(a, b, repeat=2)相当于product(a, b, a, b)

</>复制代码 
a = (0, 1)
b = (2, 3)
print(list(itertools.product(a, b)))
print(list(itertools.product(a, repeat=2)))
## output:
## [(0, 2), (0, 3), (1, 2), (1, 3)]
## [(0, 0), (0, 1), (1, 0), (1, 1)]

扩展生成器

扩展生成器将传进的单一对象进行扩展，生成更多元素组成的生成器对象。

itertools.repeat(object[, times])

repeat函数可以接收一个对象(可以不是可迭代对象), 根据非必选参数times，生成元素个数为times的生成器，如果不提供times参数，将生成无限生成器。

</>复制代码 
print(list(itertools.repeat(1, 3)))
print(list(itertools.repeat((1, 2), 3)))
print(list(zip(range(1, 4), itertools.repeat("a"))))
print([1, 2] * 3)
"""output:
[1, 1, 1]
[(1, 2), (1, 2), (1, 2)]
[(1, "a"), (2, "a"), (3, "a")]
[1, 2, 1, 2, 1, 2]
"""

注意repeat()和列表乘法的区别，通过上文提到的itertools.chain.from_iterable函数结合repeat函数可以实现列表乘法。

</>复制代码 
lst = [1, 2, 3]
g = itertools.repeat(lst, 3)
print(list(itertools.chain.from_iterable(g)))
print(lst * 3)
"""output
[1, 2, 3, 1, 2, 3, 1, 2, 3]
[1, 2, 3, 1, 2, 3, 1, 2, 3]
"""

itertools.cycle(iterable)

cycle函数将传进的iterable可迭代对象首尾相连形成循环，生成无限生成器。

</>复制代码 
# cycle("ABCD") --> A B C D A B C D A B C D ...

itertools.count(start=0, step=1)

计数器函数，start和step参数可以为小数，直接看例子。

</>复制代码 
g = itertools.count(1.2, 2.5)
print(next(g))
print(next(g))
print(next(g))
"""output:
1.2
3.7
6.2
"""

上文提到的enumerate生成器函数可以通过map和count来实现。

</>复制代码 
for i, v in map(lambda x, y: (x, y), itertools.count(), range(3, 10)):
    print(i, v)

我们可以通过调整count函数让索引i的值更加灵活。
Python中的range(start, stop[, step])函数可以生成一个序列，但是要求输入参数必须为整数，可以通过count函数实现一个可以接收小数的新range。

</>复制代码 
def range_new(start, stop, step):
    for i in itertools.count(start, step):
        if i >= stop:
            break
        yield i
print(list(range_new(1, 5.5, 1.5)))
## output: [1, 2.5, 4.0]

排列组合生成器

以下三个函数可以实现迭代对象的排列组合
itertools.combinations(iterable, r)
非重复组合

</>复制代码 
print(list(itertools.combinations("ABC", 1)))
print(list(itertools.combinations("ABC", 2)))
print(list(itertools.combinations("ABC", 3)))
"""output:
[("A",), ("B",), ("C",)]
[("A", "B"), ("A", "C"), ("B", "C")]
[("A", "B", "C")]
"""

itertools.combinations_with_replacement(iterable, r)
重复组合

</>复制代码 
print(list(itertools.combinations_with_replacement("ABC", 1)))
print(list(itertools.combinations_with_replacement("ABC", 2)))
print(list(itertools.combinations_with_replacement("ABC", 3)))
"""output:
[("A",), ("B",), ("C",)]
[("A", "A"), ("A", "B"), ("A", "C"), ("B", "B"), ("B", "C"), ("C", "C")]
[("A", "A", "A"), ("A", "A", "B"), ("A", "A", "C"), ("A", "B", "B"), ("A", "B", "C"), ("A", "C", "C"), ("B", "B", "B"), ("B", "B", "C"), ("B", "C", "C"), ("C", "C", "C")]
"""

itertools.permutations(iterable, r=None)
全排列

</>复制代码 
print(list(itertools.permutations("ABC", 1)))
print(list(itertools.permutations("ABC", 2)))
print(list(itertools.permutations("ABC", 3)))
"""output:
[("A",), ("B",), ("C",)]
[("A", "B"), ("A", "C"), ("B", "A"), ("B", "C"), ("C", "A"), ("C", "B")]
[("A", "B", "C"), ("A", "C", "B"), ("B", "A", "C"), ("B", "C", "A"), ("C", "A", "B"), ("C", "B", "A")]
"""

对比itertools.product(*iterables, repeat=1)函数

</>复制代码 
print(list(itertools.product("ABC", repeat=1)))
print(list(itertools.product("ABC", repeat=2)))
"""output:
[("A",), ("B",), ("C",)]
[("A", "A"), ("A", "B"), ("A", "C"), ("B", "A"), ("B", "B"), ("B", "C"), ("C", "A"), ("C", "B"), ("C", "C")]
"""

整理生成器

此类生成器将传入的可迭代对象经过整理后，以生成器的形式全部返回。

itertools.groupby(iterable, key=None)

groupby生成器可以根据key，将iterable分组，返回的生成器的元素为(key, iterable)的元组形式。扫描整个序列并且查找连续相同值（或者根据指定 key 函数返回值相同）的元素序列。在每次迭代的时候，它会返回一个值和一个迭代器对象，这个迭代器对象可以生成元素值全部等于上面那个值的组中所有对象。

</>复制代码 
g = itertools.groupby("LLLLAAGGG")
for char, group in g:
    print(char, "->", list(group))
"""output:
L -> ["L", "L", "L", "L"]
A -> ["A", "A"]
G -> ["G", "G", "G"]
"""
rows = [
    {"address": "5412 N CLARK", "date": "07/01/2012"},
    {"address": "5148 N CLARK", "date": "07/04/2012"},
    {"address": "5800 E 58TH", "date": "07/02/2012"},
    {"address": "2122 N CLARK", "date": "07/03/2012"},
    {"address": "5645 N RAVENSWOOD", "date": "07/02/2012"},
    {"address": "1060 W ADDISON", "date": "07/02/2012"},
    {"address": "4801 N BROADWAY", "date": "07/01/2012"},
    {"address": "1039 W GRANVILLE", "date": "07/04/2012"},
]
rows.sort(key=itemgetter("date"))
g = itertools.groupby(rows, itemgetter("date"))
for char, group in g:
    print(char, "->", list(group))
"""output:
07/01/2012 -> [{"address": "5412 N CLARK", "date": "07/01/2012"}, {"address": "4801 N BROADWAY", "date": "07/01/2012"}]
07/02/2012 -> [{"address": "5800 E 58TH", "date": "07/02/2012"}, {"address": "5645 N RAVENSWOOD", "date": "07/02/2012"}, {"address": "1060 W ADDISON", "date": "07/02/2012"}]
07/03/2012 -> [{"address": "2122 N CLARK", "date": "07/03/2012"}]
07/04/2012 -> [{"address": "5148 N CLARK", "date": "07/04/2012"}, {"address": "1039 W GRANVILLE", "date": "07/04/2012"}]
"""

groupby() 仅仅检查连续的元素，因此在调用之前需要根据指定的字段将数据排序。

reversed(seq)

reversed函数接收一个序列（实现sequence相关协议，已知长度）

</>复制代码 
print(list(reversed(range(5))))
## output: [4, 3, 2, 1, 0]

itertools.tee(iterable, n=2)

tee函数返回单个iterable对象的n个独立迭代器

</>复制代码 
g1, g2 = itertools.tee("ABC")
print(next(g1), next(g2))
print(next(g1), next(g2))
print(list(zip(*itertools.tee("ABC"))))
"""output
A A
B B
[("A", "A"), ("B", "B"), ("C", "C")]
"""

缩减生成器

接收一个迭代对象，处理只返回一个单一值。

functools.reduce(function, iterable，initializer=None)

function参数是一个接收两个参数的函数function(x, y)，reduce函数将上一次function得到的返回值作为参数x，将iterable的下一次迭代值作为参数y传进function计算，初始时x的值为initializer值(若initializer为None，初始值则为iterable的第一个元素值)。循环直到iterable耗尽返回最终值。
reduce的基本实现大概为一下代码:

</>复制代码 
def reduce(function, iterable, initializer=None):
    it = iter(iterable)
    if initializer is None:
        value = next(it)
    else:
        value = initializer
    for element in it:
        value = function(value, element)
    return value

</>复制代码 
print(functools.reduce(add, [1, 2, 3, 4, 5]))
## output: 15

常用的min和max函数都可以用reduce实现

</>复制代码 
def min_reduce(iterable):
    return functools.reduce(lambda x, y: x if x < y else y, iterable)
def max_reduce(iterable):
    return functools.reduce(lambda x, y: x if x > y else y, iterable)
print(min_reduce([4, 6, 6, 78]))
print(max_reduce([4, 6, 6, 78]))
"""output
4
78
"""

除此之外any和all函数原理也是类似，不再阐述。

总结

本篇按照分类介绍了python库中的一些常用的生成器，可以通过不同场景选择不同的生成器工具，将它们组合灵活运用。

相关链接

Python3中的迭代器和生成器

GPU云服务器云服务器 python3 库 python3.6库 python3安装库 python3函数库

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/42621.html

python高级特性

摘要：常规的使用来统计一段代码运行时间的例子输出结果总结其实是一门特别人性化的语言，但凡在工程中经常遇到的问题，处理起来比较棘手的模式基本都有对应的比较优雅的解决方案。 python的高级特性名词与翻译对照表 generator 生成器 iterator 迭代器 collection 集合 pack/unpack 打包/解包 decorator 装饰器 context manager ...

yexiaobai 2019-07-30 14:19 评论0 收藏0
低调奢华有内涵 - 收藏集 - 掘金

摘要：比较的是两个对象的内容是并发编程之协程异步后端掘金引言随着的盛行，相信大家今年多多少少都听到了异步编程这个概念。使用进行并发编程篇二掘金我们今天继续深入学习。 python 之机器学习库 scikit-learn - 后端 - 掘金一、加载sklearn中的数据集datasets from sklearn import datasets iris = datasets.load_i...

walterrwu 2019-07-25 12:00 评论0 收藏0
Python 进阶之路 (十一) 再立Flag, 社区最全的itertools深度解析（下）

摘要：将每一行作为返回，其中是每行中的列名。对于每一行，都会生成一个对象，其中包含和列中的值。它返回一个迭代器，是迭代结果都为的情况。深度解析至此全剧终。简单实战大家好，我又来了，在经过之前两篇文章的介绍后相信大家对itertools的一些常见的好用的方法有了一个大致的了解，我自己在学完之后仿照别人的例子进行了真实场景下的模拟练习，今天和大家一起分享，有很多部分还可以优化，希望有更好主意...

tomorrowwu 2019-07-31 10:06 评论0 收藏0

发表评论

登陆后可评论

0条评论

shuibo

男|高级讲师

我要关注我要私信

TA的文章

欢迎关注“UCloud用户社区”微信公众号，移动端使用社区更方便！

阅读 3669·2021-08-02 13:41
css制作从下往上逐渐显示的div

阅读 2639·2019-08-30 15:56
深入理解JavaScript的类型转换

阅读 1578·2019-08-30 11:17
浏览器渲染简述

阅读 1259·2019-08-29 15:18
浏览器带你学前端

阅读 668·2019-08-29 11:10
【重磅】Chameleon 开放跨端扩展标准协议

阅读 2747·2019-08-26 13:52
[源码阅读]高性能和可扩展的React-Redux

阅读 598·2019-08-26 13:22
记 vue 移动端开发中的经验

阅读 3044·2019-08-23 15:41

资讯专栏INFORMATION COLUMN

上云采购季！| 2核2G4M爆款云服务器低至59元/年，更有多台、长期优惠，快来选购！

Python3标准库built-in、itertools、functools中的生成器

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

</>复制代码

相关文章

发表评论

0条评论

男|高级讲师

TA的文章

最新活动