计算文章中的高频词出现的次数

通过python，计算英文文章中的每个单词出现的次数，从而得到出现频繁的词或者出现非常少的词。为了更精确地观察文章的中心思想，可剔除无意义的定冠词，副词等

tokens="\n-*,.!"   ##去除标点符号
exclude={"by","is","to","if"}  ##该处剔除无需计算的单词
fr=open("python.txt","r")  ##打开该篇文章，该文章需要与本py文件处于同一位置，否则需要添加路径
txt=fr.read()
fr.close()
txt=txt.lower()
for i in tokens:
    txt=txt.replace(i," ")  ##用替换的方法将tokens里的元素去掉
wordlist=' '.join(txt.split())   ##去掉多个空格
wordlist=wordlist.split(' ')   ##变成字符串列表
d={}
for word in wordlist:
    if word not in exclude:  ##排除exclude里的单词
        d[word]=d.get(word,0)+1  ##计算除exclude外每个单词出现的次数
toplist=list(d.items())  ##生成一个包含单词键值对的列表
toplist.sort(key=lambda x:x[1],reverse=True)  ##reverse=True 降序,reverse=False 升序(默认)
                                              ##lambda取键值对中的值出来进行排序

count=0
flag=0
for i in toplist:   ##求出出现次数最多的前x个单词
    if count<5:      ##将"5"更改为其他数字求出相应的top x
        count+=1
        print(i)
    if count==5:     ##此处的"5"也需改
        flag=i[1]
        count+=1
    if count>5:     ##此处的"5"也需改
        if flag==i[1]:
            print(i)

##以下代码为出现次数最少的前x个单词，改法同上，不需要可注释掉
toplist.sort(key=lambda x:x[1],reverse=False)  
count=0
flag=0
for i in toplist:
    if count<2:
        count+=1
        print(i)
    if count==2:
        flag=i[1]
        count+=1
    if count>2:
        if flag==i[1]:
            print(i)