资讯专栏INFORMATION COLUMN

数据集:大学毕业生收入

Aklman / 1862人阅读

摘要:数据集大学毕业生收入下载地址,本文以绘制直方图为主。整型全年全职在岗人数。浮点型收入的百分位数。各大类专业就业率图示结论相对来说,由于计算机的发展前景,计算机与数学类的就业率较高。

数据集:大学毕业生收入

下载地址,本文以绘制直方图为主。

1. 字段描述

字段名称字段类型字段说明
Major_code整型专业代码。
Major字符型专业名称。
Major_category字符型专业所属目录。
Total整型总人数。
Employed整型就业人数。
Employed_full_time_year_round整型全年全职在岗人数。
Unemployed整型失业人数。
Unemployment_rate浮点型失业率。
Median整型收入的中位数。
P25th整型收入的25百分位数。
P75th浮点型收入的75百分位数。

2. 数据预处理

2.1 导包

import numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport osimport warningswarnings.filterwarnings("ignore")

2.2 读取数据

df = pd.read_csv("大学毕业生收入数据集.csv")

3. 数据预览

3.1 预览数据

print(df.head())

结果

Major_code                                  Major  ...  P25th    P75th0        1100                    GENERAL AGRICULTURE  ...  34000  80000.01        1101  AGRICULTURE PRODUCTION AND MANAGEMENT  ...  36000  80000.02        1102                 AGRICULTURAL ECONOMICS  ...  40000  98000.03        1103                        ANIMAL SCIENCES  ...  30000  72000.04        1104                           FOOD SCIENCE  ...  38500  90000.0

3.2 查看基本信息

df.info()

结果

RangeIndex: 173 entries, 0 to 172Data columns (total 11 columns): #   Column                         Non-Null Count  Dtype  ---  ------                         --------------  -----   0   Major_code                     173 non-null    int64   1   Major                          173 non-null    object  2   Major_category                 173 non-null    object  3   Total                          173 non-null    int64   4   Employed                       173 non-null    int64   5   Employed_full_time_year_round  173 non-null    int64   6   Unemployed                     173 non-null    int64   7   Unemployment_rate              173 non-null    float64 8   Median                         173 non-null    int64   9   P25th                          173 non-null    int64   10  P75th                          173 non-null    float64dtypes: float64(2), int64(7), object(2)

3.3 查看重复值

print(df.duplicated().sum())

结果

0

3.4 查看缺失值

print(df.isnull().sum())

结果

Major_code                       0Major                            0Major_category                   0Total                            0Employed                         0Employed_full_time_year_round    0Unemployed                       0Unemployment_rate                0Median                           0P25th                            0P75th                            0dtype: int64

4. 数据集描述性信息

describe = df.describe()print(describe)

结果

Major_code         Total  ...         P25th          P75thcount   173.000000  1.730000e+02  ...    173.000000     173.000000mean   3879.815029  2.302566e+05  ...  38697.109827   82506.358382std    1687.753140  4.220685e+05  ...   9414.524761   20805.330126min    1100.000000  2.396000e+03  ...  24900.000000   45800.00000025%    2403.000000  2.428000e+04  ...  32000.000000   70000.00000050%    3608.000000  7.579100e+04  ...  36000.000000   80000.00000075%    5503.000000  2.057630e+05  ...  42000.000000   95000.000000max    6403.000000  3.123510e+06  ...  78000.000000  210000.000000[8 rows x 9 columns]

可在变量视图中查看describe

5. 数据分析

5.1 各专业种类(Major_category)的专业分支个数

Major_category_counts=df["Major_category"].value_counts()print(Major_category_counts)rects = plt.bar(range(1,17),Major_category_counts);for rect in rects:  #rects 是三根柱子的集合    height = rect.get_height()    plt.text(rect.get_x() + rect.get_width() / 2, height, str(height), size=12, ha="center", va="bottom")interval = ["Engineering","Education","Humanities & Liberal Arts","Biology & Life Science","Business","Health","Computers & Mathematics","Agriculture & Natural Resources","Physical Sciences","Social Science","Psychology & Social Work","Arts","Industrial Arts & Consumer Services","Law & Public Policy","Communications & Journalism","Interdisciplinary"]plt.xticks(range(1,17),interval,rotation=90);plt.title("Number of Branches by Major Category")plt.ylabel("Counts")plt.show()

结果

Engineering                            29Education                              16Humanities & Liberal Arts              15Biology & Life Science                 14Business                               13Health                                 12Computers & Mathematics                11Agriculture & Natural Resources        10Physical Sciences                      10Social Science                          9Psychology & Social Work                9Arts                                    8Industrial Arts & Consumer Services     7Law & Public Policy                     5Communications & Journalism             4Interdisciplinary                       1Name: Major_category, dtype: int64

图示

结论
由于机械类专业发展历史悠久,故相对来说机械类专业分支数相较其他大类专业要多

5.2 各大类专业收入

averageMoney = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Median"][j]    averageMoney.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageMoney);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Annual salary by Major Category")plt.ylabel("Moneys")plt.show()

图示

结论
由于机械类专业与人工智能、自动化等领域相关,故平均工资比较高;计算机与数学类专业发展前景很好,但是小公司工资普遍不高,大公司工资相对来说较高。

5.3 各大类专业失业率

averageUnemployRate = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Unemployment_rate"][j]    averageUnemployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageUnemployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Unemployment Rate by Major Category")plt.ylabel("Rate")plt.show()

图示

结论
艺术类专业由于可变动性特别大,加上对人才的要求相对来说较为苛刻,故失业率较高。

5.4 各大类专业就业率

averageEmployRate = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Employed"][j] / df["Total"][j]    averageEmployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageEmployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Employment Rate by Major Category")plt.ylabel("Rate")plt.show()

图示

结论
相对来说,由于计算机的发展前景,计算机与数学类的就业率较高。

5.5 各大类专业全年全职在岗率

averageFullTimeRate = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Employed_full_time_year_round"][j] / df["Employed"][j]    averageFullTimeRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageFullTimeRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Full-Time Rate by Major Category")plt.ylabel("Rate")plt.show()

图示

5.6 各大类专业总人数

averageNum = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Total"][j]    averageNum.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageNum);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Total Numbers by Major Category")plt.ylabel("Counts")plt.show()

图示

5.7 就业失业比

EUratio = []for i in range(len(interval)):    EUratio.append(averageEmployRate[i]/averageUnemployRate[i])plt.bar(range(1,17),EUratio);plt.xticks(range(1,17),interval,rotation=90);plt.title("Employment-Unemployment Ratio by Major Category")plt.ylabel("Ratio")plt.show()

图示

结论
相对来说,农业就业的门槛低,就业率高的同时失业率低。

6. 完整代码

# 导包import numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport osimport warningswarnings.filterwarnings("ignore")# 读取数据df = pd.read_csv("大学毕业生收入数据集.csv")# 预览数据print(df.head())# 规范字段名称(本数据集已经较为规范)# 查看基本信息df.info()# 查看重复值print(df.duplicated().sum())# 查看缺失值print(df.isnull().sum())# 查看数据集描述性信息describe = df.describe()print(describe)# 统计表中每个专业种类(Major_category)的个数Major_category_counts=df["Major_category"].value_counts()print(Major_category_counts)rects = plt.bar(range(1,17),Major_category_counts);for rect in rects:  #rects 是三根柱子的集合    height = rect.get_height()    plt.text(rect.get_x() + rect.get_width() / 2, height, str(height), size=12, ha="center", va="bottom")interval = ["Engineering","Education","Humanities & Liberal Arts","Biology & Life Science","Business","Health","Computers & Mathematics","Agriculture & Natural Resources","Physical Sciences","Social Science","Psychology & Social Work","Arts","Industrial Arts & Consumer Services","Law & Public Policy","Communications & Journalism","Interdisciplinary"]plt.xticks(range(1,17),interval,rotation=90);plt.title("Number of Branches by Major Category")plt.ylabel("Counts")plt.show()# 对各大类专业收入作统计并作图averageMoney = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Median"][j]    averageMoney.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageMoney);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Annual salary by Major Category")plt.ylabel("Moneys")plt.show()# 对各大类专业失业率作统计并作图averageUnemployRate = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Unemployment_rate"][j]    averageUnemployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageUnemployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Unemployment Rate by Major Category")plt.ylabel("Rate")plt.show()# 对各大类专业就业率作统计并作图averageEmployRate = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Employed"][j] / df["Total"][j]    averageEmployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageEmployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Employment Rate by Major Category")plt.ylabel("Rate")plt.show()# 对各大类专业全年全职在岗率作统计并作图(没有早退的)averageFullTimeRate = []for i in range(len(interval)):    sum = 0    for j in range(173):        if df["Major_category"][j] == interval[i]:            sum = sum + df["Employed_full_time_year_round"][j] / df["Employed"][j]    averageFullTimeRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageFullTimeRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Full-Time Rate            
               
                                           
                       
                 

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/121287.html

相关文章

  • 毕业工作几年,月入还不到2万的建议速看

    摘要:中国的行业的蓬勃发展,蛋糕之大,让所有行业从业者的收入总体处于行业前列,可比拟的只有金融行业一个不创造财富,只分配财富的行业。每天收到十几份简历,却招聘不到合适的人。很多小伙伴冷门专业,普通学校,毕业了工作几年了月薪还是几千块,这就是现状。             中国的IT行业因为有人口福...

    wmui 评论0 收藏0
  • 毕业后,他百万年薪,我十万年薪,有时候选择比努力更重要

    摘要:我想说的是,有时候选择比努力更重要。未来职业的选择是我们在毕业后面对的人生中第一次重大选择,它与我们未来几十年的人生走向有着莫大关系。就这样,几年过去了,几十年又过去了,同龄人之间的差距便会凸显出来越来越大。 大家都知道程序员这个行业,目前是站在风口上的,薪资待遇可以说是高于其他多数行业,但...

    马永翠 评论0 收藏0
  • 小马哥Java项目实战训练营 极客大学

    摘要:根据公司的调查,计算机科学专业在所有专业的前五年职业生涯的基础薪资中位数中占据第一位,约为万美元。市场现状,产品背景十三五规划对应年,大方向是加快壮大战略性新兴产业,打造经济社会发展新引擎。 极客时间是极客邦科技出品的IT类知识服务产品,内容包含专栏订阅、极客新闻、热点专题、直播、视频和音频等多种形式的知识服务。极客时间服...

    codeGoogle 评论0 收藏0
  • 小马哥Java项目实战训练营 极客大学

    摘要:根据公司的调查,计算机科学专业在所有专业的前五年职业生涯的基础薪资中位数中占据第一位,约为万美元。市场现状,产品背景十三五规划对应年,大方向是加快壮大战略性新兴产业,打造经济社会发展新引擎。 ​​​​​​百​​​​度网盘​​提取码:u6C4 极客时间是极客邦科技出品的IT类知识服务产品,内容包含专栏订阅、极客新闻、热点专题...

    jcc 评论0 收藏0
  • 开学了,计算机的大学生们,送你们一篇经书,希望你们的四年不负年华!

    摘要:作为十几年的老开发者,今天我来分享一下,我个人认为的大学计算机相关专业该怎么学,希望你们的四年能够不负年华。粉丝专属福利九关于考研有能力去考研的,我建议去尝试一下考研,理由有以下几点第一,毕业就工作的人,前三年还处于摸索和定性的阶段。 ...

    duan199226 评论0 收藏0

发表评论

0条评论

最新活动
阅读需要支付1元查看
<