A Brief Introduce of Database Index（索引简介）

luqiuwen 发布于2019-08-06 14:01 / 2267人阅读

摘要：所以这篇文章希望帮助大家理解一下。我没有在算法上展开太多，因为很多算法相似，如果展开容易喧宾夺主。等过段时间我会加入一些实验数据和代码进这篇文章，最近比较懒不想安装数据库安装实在太烦了。

这是我写的一篇研究生申请的writing sample，关于数据库索引的，对关系型数据库的索引从数据结构到用途再到作用对象简单分析了一下，因为我发现在实际工作中，index是个好东西，但是很多DBA并不能找到合适的index使用，这样会使查询效率提高得不大，甚至影响查询效率。所以这篇文章希望帮助大家理解一下index。我没有在算法上展开太多，因为很多index算法相似，如果展开容易喧宾夺主。等过段时间我会加入一些实验数据和代码进这篇文章，最近比较懒不想安装数据库.DB2安装实在太烦了。

ABSTRACT

Index is an extremely important item for a database system. The purpose of this article is to explore the concept and function of the index, how the index is able to improve the speed of retrieve data significantly, and help database administrator distinct and choose the correct index in real life.

1. 1. How DBMS stores data on disk?

The relational database store data in the disk by pages, which is the minimum unit of storage. The page only stores three things: the data in the tables, indexes, and execution plans. No matter when the database receiving a query request, RDBMS must load the page to memory first. Once the page complete working, it won’t discard immediately. DBMS will move the page to buffer pool, which is cache table and index data from disk for next time.
As we know, data is stored on disk-based storage devices, and it is stored as blocks of data. The structure of disk block as much as linked list, both of them contain a data section and a pointer (or link) for the next block. The fact is multiple records can only be stored one field, if we use the linear search to retrieve a record will require N/2 block accesses on average. If the field doesn’t contain unique entries, then the entire time cost is Nm which means you have to search the entire disk [1].

2. What is index?

As we mentioned early, DBMS stores index in the page. Index is a data structure which could significantly improve the speed of data retrieval operations. Why the index improve the retrieve speed effectively? We have to talk about the index algorithm and data structure first [2].
Different DBMS provide different types of index, but majority database index designs exhibit logarithmic (O (log (N))) retrieval performance. Generally speaking, there are some different indexes based on the different data structures. Indexes could be implemented by different kinds of data structure such as balance trees, B+ trees hash table, R+ trees.

3. Index architecture

We can distinguish the index architecture by the rows’ order
1) Non-clustered index
In a non-clustered index, the physical order of the rows is not the same as the index order, which means the data located in the disk has the arbitrary order, but logical ordering is specified by the index order.
2) Clustered index
Clustered index will change the data block into a specific index order to match the index. As the result, the row data will be stored in the index order. Therefore, we can create only one clustered index in a specific database table.

4. Index Implementations

1) Bitmap Index

Bitmap index is designed by a B-tree data structure, a self-balancing binary search tree, to retrieve data from database. So many facts indicate bitmap index working perfect for low-cardinality columns, which include a meaningful quantity of distinct values, no matter absolutely or relative to the number of records which contain the data. [3] Bitmap index uses bit array, which segment only has two value, 0 and 1 (False or True). Due to the values of column are distinct, each value could be presented by 1 or 0, standing for the value included or not. Therefore, bit map index has a significant space and performance over other index structures for retrieving such data [4].

2) Dense index and sparse index

Dense index and sparse index are very similar. Dense index contains the search key value and pointer for each record in the data file, thus dense index record could be very large. While sparse index contains the search key value and pointer only for each data block in the data file, due to the data is sorted or ordered, so sparse index only need to point to the head of each data block. Relatively speaking, the sparse index record is smaller than dense index record [5].

3) Reverse index

Reverse index uses B-tree structure, which reverses the key value before inserting in the index. Reverse index is very effective for indexing sequence numbers data, because each key value is highly greater than the prior value. For instance, 29, 30, 31, all these three keys will be in the same block possible. If we use reverse index to query them as 92, 03, 13, they may be distributed at the different blocks. Due to the B+ tree structure, so we can query them faster than the sequence keys [6].
In addition, there are some other index methods such as R+ tree, which is used by Google map, to retrieve the new type data as times go by. Finally, it is good to research deeply about index to improve database performance.

Works Cited：

[1] Garcia-Molina, Hector, Jeffrey D. Ullman, and Jennifer Widom. Database Systems: The Complete Book. Upper Saddle River, NJ: Prentice Hall, 2002. Print.

[2] Fritchey, Grant. "Statistics, Data Distribution, and Cardinality." SQL Server Query Performance Tuning (2014): 193-235. Web.

[3] "Bitmap Index vs. B-tree Index: Which and When?" Bitmap Index vs. B-tree Index: Which and When? N.p., n.d. Web. 26 Nov. 2016.

[4] Fujioka, Kengo, Yukio Uematsu, and Makoto Onizuka. "Application of Bitmap Index to Information Retrieval." Proceeding of the 17th International Conference on World Wide Web - WWW "08 (2008): n. pag. Web.

[5] "Dense and Sparse Indices." Dense and Sparse Indices. http://www.cs.sfu.ca/CourseCe... Web. 30 Nov. 2016.

[6] "Introduction To Reverse Key Indexes: Part I." Richard Foote"s Oracle Blog. N.p., 2014. Web. 26 Nov. 2016.

超融合服务器服务器托管 Brief Introduce automation of com apache index

文章版权归作者所有，未经允许请勿转载,若此文章存在违规行为，您可以联系管理员删除。

转载请注明本文地址：https://www.ucloud.cn/yun/62293.html

A Brief Introduce of Database Index（索引简介）

摘要：所以这篇文章希望帮助大家理解一下。我没有在算法上展开太多，因为很多算法相似，如果展开容易喧宾夺主。等过段时间我会加入一些实验数据和代码进这篇文章，最近比较懒不想安装数据库安装实在太烦了。这是我写的一篇研究生申请的writing sample，关于数据库索引的，对关系型数据库的索引从数据结构到用途再到作用对象简单分析了一下，因为我发现在实际工作中，index是个好东西，但是很多DBA并...

marek 2019-06-25 18:23 评论0 收藏0
Neo4j索引笔记之SchemaIndex和LegacyIndex

摘要：从提供的接口来看，被称作的变量通常是指能够提供全文本检索的能力。这个功能并没有在中被提供，这也是版本保留的原因之一。 neo4j包含schema indexes 和 legacy indexes两种类型，两者理念不同且不可互换或兼容，实际应用中应明确检索需求后采用合适的索引。 schema index vs legacy index 参考neo4j index-confusion s...

_ipo 2019-07-25 12:11 评论0 收藏0
入门MySQL——基础语句篇

摘要：本文将从最基础的语句出发，为你展示出创建及修改不同对象应该使用的标准语句。自己也很少写这种系列的文章，打算多写几篇入门的基础文章，可能这几篇先后顺序也不太严谨，其他同学有什么好的想法，欢迎与我沟通哦前言：前面几篇文章，我们介绍了MySQL的基础概念及逻辑架构。相信你现在应该有了自己的一套MySQL环境，接下来我们就可以开始练习MySQL了。本文将从MySQL最基础的语句出发，为你...

lncwwn 2019-08-27 14:26 评论0 收藏0
保存数据到MySql数据库——我用scrapy写爬虫（二）

摘要：坦克大战上简介上的坦克大战相信大家都玩过有逃学玩坦克的可以自己默默的扣一个了我们现在长大了，学习游戏开发了。写在前面上一篇（https://www.tech1024.cn/origi... ）说了如何创建项目，并爬去网站内容，下面我们说一下如何保存爬去到的数据开始爬取创建Spider，上一篇我们已经创建了ImoocSpider，我们做一下修改，可以连续下一页爬取。scrapyD...

Kross 2019-07-30 14:53 评论0 收藏0
MongoDB学习笔记（1）- MongoDB简介、数据类型及帮助命令

摘要：数据模型取决于数据库类型。仅支持位浮点数，所以位整数会被自动转换为位浮点数。位浮点数中的数字都是这种类型。数字只能表示为双精度数位浮点数的另外一个问题是，有些位的整数并不能精确地表示为位浮点数。 MongoDB学习笔记（1）- MongoDB简介及数据类型本文所使用的MongoDB版本为 4.0.10 > db.version(); 4.0.10 一、MongoDB 介绍 1. Mo...

nihao 2019-08-26 10:27 评论0 收藏0