UU Blog

Python词频统计

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#!/usr/bin/env python
#coding:utf-8

from random import randint

"""
词的频次统计
统计重复的词的次数,用字典的形式展现
"""

data = [randint(0,10) for _ in xrange(0,20)]

print data

## 常规处理
# 根据键生成一个字典,赋值0
ddata = dict.fromkeys(data,0)

print ddata
# 迭代统计
for x in data:
ddata[x]+=1

print ddata

## 用collections 的counter

from collections import Counter

ddata2 = Counter(data)

# 统计最高的4位
print ddata2.most_common(4)

输出:

1
2
3
4
[1, 0, 5, 6, 6, 4, 8, 7, 3, 1, 2, 10, 9, 5, 9, 3, 10, 9, 5, 4]
{0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 0, 8: 0, 9: 0, 10: 0}
{0: 1, 1: 2, 2: 1, 3: 2, 4: 2, 5: 3, 6: 2, 7: 1, 8: 1, 9: 3, 10: 2}
[(5, 3), (9, 3), (1, 2), (3, 2)]
给作者打一针鸡血