资讯专栏INFORMATION COLUMN

【04】把 Elasticsearch 当数据库使:按字段聚合

xfee / 3171人阅读

摘要:使用可以用进行的查询。最简单的聚合是把整个当作一个巨大的桶,然后去数这个桶里的数字。复杂一些的聚合我们先要能够把文档分到不同的桶里,然后去分别数每个桶的数字。

使用 https://github.com/taowen/es-monitor 可以用 SQL 进行 elasticsearch 的查询。最简单的聚合是把整个index当作一个巨大的bucket(桶),然后去数这个桶里的数字。复杂一些的聚合我们先要能够把文档分到不同的桶里,然后去分别数每个桶的数字。统计聚合需要 分桶 (GROUP BY XXX INTO BUCKET),也需要 指标(METRIC)

GROUP BY exchange => COUNT(*)

SQL

$ cat << EOF | ./es_query.py http://127.0.0.1:9200
select exchange, count(*) from symbol group by exchange                                                        
EOF
{"count(*)": 3240, "exchange": "nyse"}
{"count(*)": 3089, "exchange": "nasdaq"}
{"count(*)": 385, "exchange": "nyse mkt"}

Elasticsearch

{
  "aggs": {
    "exchange": {
      "terms": {
        "field": "exchange", 
        "size": 0
      }, 
      "aggs": {}
    }
  }, 
  "size": 0
}
{
  "hits": {
    "hits": [], 
    "total": 6714, 
    "max_score": 0.0
  }, 
  "_shards": {
    "successful": 1, 
    "failed": 0, 
    "total": 1
  }, 
  "took": 23, 
  "aggregations": {
    "exchange": {
      "buckets": [
        {
          "key": "nyse", 
          "doc_count": 3240
        }, 
        {
          "key": "nasdaq", 
          "doc_count": 3089
        }, 
        {
          "key": "nyse mkt", 
          "doc_count": 385
        }
      ], 
      "sum_other_doc_count": 0, 
      "doc_count_error_upper_bound": 0
    }
  }, 
  "timed_out": false
}

Profile

[
  {
    "query": [
      {
        "query_type": "MatchAllDocsQuery",
        "lucene": "*:*",
        "time": "0.2003150000ms",
        "breakdown": {
          "score": 0,
          "create_weight": 8029,
          "next_doc": 164079,
          "match": 0,
          "build_scorer": 28207,
          "advance": 0
        }
      }
    ],
    "rewrite_time": 2657,
    "collector": [
      {
        "name": "MultiCollector",
        "reason": "search_multi",
        "time": "1.245165000ms",
        "children": [
          {
            "name": "TotalHitCountCollector",
            "reason": "search_count",
            "time": "0.2540820000ms"
          },
          {
            "name": "GlobalOrdinalsStringTermsAggregator: [exchange]",
            "reason": "aggregation",
            "time": "0.3493420000ms"
          }
        ]
      }
    ]
  }
]

GlobalOrdinalsStringTermsAggregator 这个 GROUP BY 的方式非常快

GROUP BY ipo_year => MAX(market_cap)

MAX(market_cap)COUNT(*)的区别在于,COUNT(*)不需要添加额外的metric聚合,每个GROUP BY自带了doc_count的计算结果。
SQL

$ cat << EOF | ./es_query.py http://127.0.0.1:9200
select exchange, max(market_cap) from symbol group by exchange
EOF
{"max(market_cap)": 87022480321.0, "exchange": "nyse"}
{"max(market_cap)": 522690000000.0, "exchange": "nasdaq"}
{"max(market_cap)": 9586866593.0, "exchange": "nyse mkt"}

Elasticsearch

{
  "aggs": {
    "exchange": {
      "terms": {
        "field": "exchange", 
        "size": 0
      }, 
      "aggs": {
        "max(market_cap)": {
          "max": {
            "field": "market_cap"
          }
        }
      }
    }
  }, 
  "size": 0
}
{
  "hits": {
    "hits": [], 
    "total": 6714, 
    "max_score": 0.0
  }, 
  "_shards": {
    "successful": 1, 
    "failed": 0, 
    "total": 1
  }, 
  "took": 6, 
  "aggregations": {
    "exchange": {
      "buckets": [
        {
          "max(market_cap)": {
            "value": 87022480321.0
          }, 
          "key": "nyse", 
          "doc_count": 3240
        }, 
        {
          "max(market_cap)": {
            "value": 522690000000.0
          }, 
          "key": "nasdaq", 
          "doc_count": 3089
        }, 
        {
          "max(market_cap)": {
            "value": 9586866593.0
          }, 
          "key": "nyse mkt", 
          "doc_count": 385
        }
      ], 
      "sum_other_doc_count": 0, 
      "doc_count_error_upper_bound": 0
    }
  }, 
  "timed_out": false
}

Profile

[
  {
    "query": [
      {
        "query_type": "MatchAllDocsQuery",
        "lucene": "*:*",
        "time": "0.2007250000ms",
        "breakdown": {
          "score": 0,
          "create_weight": 8264,
          "next_doc": 164363,
          "match": 0,
          "build_scorer": 28098,
          "advance": 0
        }
      }
    ],
    "rewrite_time": 2678,
    "collector": [
      {
        "name": "MultiCollector",
        "reason": "search_multi",
        "time": "1.741733000ms",
        "children": [
          {
            "name": "TotalHitCountCollector",
            "reason": "search_count",
            "time": "0.2256530000ms"
          },
          {
            "name": "GlobalOrdinalsStringTermsAggregator: [exchange]",
            "reason": "aggregation",
            "time": "0.8099970000ms"
          }
        ]
      }
    ]
  }
]
下钻 GROUP BY exchange, sector

SQL里分桶的方式可以是多级的,先按exchange分,接着再按sector进一步细分

$ cat << EOF | ./es_query.py http://127.0.0.1:9200
select exchange, sector, max(market_cap) from symbol group by exchange, sector
EOF
{"sector": "n/a", "max(market_cap)": 1409695805.0, "exchange": "nyse"}
{"sector": "Consumer Services", "max(market_cap)": 46884855259.0, "exchange": "nyse"}
{"sector": "Finance", "max(market_cap)": 35827719192.0, "exchange": "nyse"}
{"sector": "Energy", "max(market_cap)": 86091463700.0, "exchange": "nyse"}
{"sector": "Public Utilities", "max(market_cap)": 24117360000.0, "exchange": "nyse"}
{"sector": "Basic Industries", "max(market_cap)": 27347618968.0, "exchange": "nyse"}
{"sector": "Capital Goods", "max(market_cap)": 82469796110.0, "exchange": "nyse"}
{"sector": "Technology", "max(market_cap)": 47882944000.0, "exchange": "nyse"}
{"sector": "Health Care", "max(market_cap)": 82041827564.0, "exchange": "nyse"}
{"sector": "Consumer Non-Durables", "max(market_cap)": 87022480321.0, "exchange": "nyse"}
{"sector": "Consumer Durables", "max(market_cap)": 8233713549.0, "exchange": "nyse"}
{"sector": "Transportation", "max(market_cap)": 9094527055.0, "exchange": "nyse"}
{"sector": "Miscellaneous", "max(market_cap)": 54171930444.0, "exchange": "nyse"}
{"sector": "Finance", "max(market_cap)": 30620000000.0, "exchange": "nasdaq"}
{"sector": "Health Care", "max(market_cap)": 126540000000.0, "exchange": "nasdaq"}
{"sector": "Technology", "max(market_cap)": 522690000000.0, "exchange": "nasdaq"}
{"sector": "Consumer Services", "max(market_cap)": 230940000000.0, "exchange": "nasdaq"}
{"sector": "n/a", "max(market_cap)": 34620000000.0, "exchange": "nasdaq"}
{"sector": "Capital Goods", "max(market_cap)": 20310000000.0, "exchange": "nasdaq"}
{"sector": "Consumer Non-Durables", "max(market_cap)": 87500000000.0, "exchange": "nasdaq"}
{"sector": "Miscellaneous", "max(market_cap)": 51420000000.0, "exchange": "nasdaq"}
{"sector": "Consumer Durables", "max(market_cap)": 7690000000.0, "exchange": "nasdaq"}
{"sector": "Basic Industries", "max(market_cap)": 9170000000.0, "exchange": "nasdaq"}
{"sector": "Energy", "max(market_cap)": 5760000000.0, "exchange": "nasdaq"}
{"sector": "Public Utilities", "max(market_cap)": 77810000000.0, "exchange": "nasdaq"}
{"sector": "Transportation", "max(market_cap)": 23530000000.0, "exchange": "nasdaq"}
{"sector": "n/a", "max(market_cap)": 971774087.0, "exchange": "nyse mkt"}
{"sector": "Basic Industries", "max(market_cap)": 424184478.0, "exchange": "nyse mkt"}
{"sector": "Health Care", "max(market_cap)": 93765452.0, "exchange": "nyse mkt"}
{"sector": "Energy", "max(market_cap)": 5199118597.0, "exchange": "nyse mkt"}
{"sector": "Consumer Services", "max(market_cap)": 99940496.0, "exchange": "nyse mkt"}
{"sector": "Capital Goods", "max(market_cap)": 49720054.0, "exchange": "nyse mkt"}
{"sector": "Technology", "max(market_cap)": 44979980.0, "exchange": "nyse mkt"}
{"sector": "Consumer Non-Durables", "max(market_cap)": 9586866593.0, "exchange": "nyse mkt"}
{"sector": "Finance", "max(market_cap)": 259074010.0, "exchange": "nyse mkt"}
{"sector": "Public Utilities", "max(market_cap)": 968077000.0, "exchange": "nyse mkt"}
{"sector": "Consumer Durables", "max(market_cap)": 50452938.0, "exchange": "nyse mkt"}
{"sector": "Miscellaneous", "max(market_cap)": 66549988.0, "exchange": "nyse mkt"}
{"sector": "Transportation", "max(market_cap)": 47179899.0, "exchange": "nyse mkt"}

Elasticsearch

{
  "aggs": {
    "exchange": {
      "terms": {
        "field": "exchange", 
        "size": 0
      }, 
      "aggs": {
        "sector": {
          "terms": {
            "field": "sector", 
            "size": 0
          }, 
          "aggs": {
            "max(market_cap)": {
              "max": {
                "field": "market_cap"
              }
            }
          }
        }
      }
    }
  }, 
  "size": 0
}
{
  "hits": {
    "hits": [], 
    "total": 6714, 
    "max_score": 0.0
  }, 
  "_shards": {
    "successful": 1, 
    "failed": 0, 
    "total": 1
  }, 
  "took": 11, 
  "aggregations": {
    "exchange": {
      "buckets": [
        {
          "sector": {
            "buckets": [
              {
                "max(market_cap)": {
                  "value": 1409695805.0
                }, 
                "key": "n/a", 
                "doc_count": 963
              }, 
              {
                "max(market_cap)": {
                  "value": 46884855259.0
                }, 
                "key": "Consumer Services", 
                "doc_count": 468
              }, 
              {
                "max(market_cap)": {
                  "value": 35827719192.0
                }, 
                "key": "Finance", 
                "doc_count": 384
              }, 
              {
                "max(market_cap)": {
                  "value": 86091463700.0
                }, 
                "key": "Energy", 
                "doc_count": 230
              }, 
              {
                "max(market_cap)": {
                  "value": 24117360000.0
                }, 
                "key": "Public Utilities", 
                "doc_count": 221
              }, 
              {
                "max(market_cap)": {
                  "value": 27347618968.0
                }, 
                "key": "Basic Industries", 
                "doc_count": 200
              }, 
              {
                "max(market_cap)": {
                  "value": 82469796110.0
                }, 
                "key": "Capital Goods", 
                "doc_count": 188
              }, 
              {
                "max(market_cap)": {
                  "value": 47882944000.0
                }, 
                "key": "Technology", 
                "doc_count": 186
              }, 
              {
                "max(market_cap)": {
                  "value": 82041827564.0
                }, 
                "key": "Health Care", 
                "doc_count": 113
              }, 
              {
                "max(market_cap)": {
                  "value": 87022480321.0
                }, 
                "key": "Consumer Non-Durables", 
                "doc_count": 111
              }, 
              {
                "max(market_cap)": {
                  "value": 8233713549.0
                }, 
                "key": "Consumer Durables", 
                "doc_count": 66
              }, 
              {
                "max(market_cap)": {
                  "value": 9094527055.0
                }, 
                "key": "Transportation", 
                "doc_count": 60
              }, 
              {
                "max(market_cap)": {
                  "value": 54171930444.0
                }, 
                "key": "Miscellaneous", 
                "doc_count": 50
              }
            ], 
            "sum_other_doc_count": 0, 
            "doc_count_error_upper_bound": 0
          }, 
          "key": "nyse", 
          "doc_count": 3240
        }, 
        {
          "sector": {
            "buckets": [
              {
                "max(market_cap)": {
                  "value": 30620000000.0
                }, 
                "key": "Finance", 
                "doc_count": 637
              }, 
              {
                "max(market_cap)": {
                  "value": 126540000000.0
                }, 
                "key": "Health Care", 
                "doc_count": 621
              }, 
              {
                "max(market_cap)": {
                  "value": 522690000000.0
                }, 
                "key": "Technology", 
                "doc_count": 449
              }, 
              {
                "max(market_cap)": {
                  "value": 230940000000.0
                }, 
                "key": "Consumer Services", 
                "doc_count": 354
              }, 
              {
                "max(market_cap)": {
                  "value": 34620000000.0
                }, 
                "key": "n/a", 
                "doc_count": 287
              }, 
              {
                "max(market_cap)": {
                  "value": 20310000000.0
                }, 
                "key": "Capital Goods", 
                "doc_count": 179
              }, 
              {
                "max(market_cap)": {
                  "value": 87500000000.0
                }, 
                "key": "Consumer Non-Durables", 
                "doc_count": 111
              }, 
              {
                "max(market_cap)": {
                  "value": 51420000000.0
                }, 
                "key": "Miscellaneous", 
                "doc_count": 97
              }, 
              {
                "max(market_cap)": {
                  "value": 7690000000.0
                }, 
                "key": "Consumer Durables", 
                "doc_count": 82
              }, 
              {
                "max(market_cap)": {
                  "value": 9170000000.0
                }, 
                "key": "Basic Industries", 
                "doc_count": 78
              }, 
              {
                "max(market_cap)": {
                  "value": 5760000000.0
                }, 
                "key": "Energy", 
                "doc_count": 70
              }, 
              {
                "max(market_cap)": {
                  "value": 77810000000.0
                }, 
                "key": "Public Utilities", 
                "doc_count": 68
              }, 
              {
                "max(market_cap)": {
                  "value": 23530000000.0
                }, 
                "key": "Transportation", 
                "doc_count": 56
              }
            ], 
            "sum_other_doc_count": 0, 
            "doc_count_error_upper_bound": 0
          }, 
          "key": "nasdaq", 
          "doc_count": 3089
        }, 
        {
          "sector": {
            "buckets": [
              {
                "max(market_cap)": {
                  "value": 971774087.0
                }, 
                "key": "n/a", 
                "doc_count": 123
              }, 
              {
                "max(market_cap)": {
                  "value": 424184478.0
                }, 
                "key": "Basic Industries", 
                "doc_count": 52
              }, 
              {
                "max(market_cap)": {
                  "value": 93765452.0
                }, 
                "key": "Health Care", 
                "doc_count": 52
              }, 
              {
                "max(market_cap)": {
                  "value": 5199118597.0
                }, 
                "key": "Energy", 
                "doc_count": 32
              }, 
              {
                "max(market_cap)": {
                  "value": 99940496.0
                }, 
                "key": "Consumer Services", 
                "doc_count": 28
              }, 
              {
                "max(market_cap)": {
                  "value": 49720054.0
                }, 
                "key": "Capital Goods", 
                "doc_count": 25
              }, 
              {
                "max(market_cap)": {
                  "value": 44979980.0
                }, 
                "key": "Technology", 
                "doc_count": 20
              }, 
              {
                "max(market_cap)": {
                  "value": 9586866593.0
                }, 
                "key": "Consumer Non-Durables", 
                "doc_count": 15
              }, 
              {
                "max(market_cap)": {
                  "value": 259074010.0
                }, 
                "key": "Finance", 
                "doc_count": 13
              }, 
              {
                "max(market_cap)": {
                  "value": 968077000.0
                }, 
                "key": "Public Utilities", 
                "doc_count": 12
              }, 
              {
                "max(market_cap)": {
                  "value": 50452938.0
                }, 
                "key": "Consumer Durables", 
                "doc_count": 5
              }, 
              {
                "max(market_cap)": {
                  "value": 66549988.0
                }, 
                "key": "Miscellaneous", 
                "doc_count": 5
              }, 
              {
                "max(market_cap)": {
                  "value": 47179899.0
                }, 
                "key": "Transportation", 
                "doc_count": 3
              }
            ], 
            "sum_other_doc_count": 0, 
            "doc_count_error_upper_bound": 0
          }, 
          "key": "nyse mkt", 
          "doc_count": 385
        }
      ], 
      "sum_other_doc_count": 0, 
      "doc_count_error_upper_bound": 0
    }
  }, 
  "timed_out": false
}

Profile

[
  {
    "query": [
      {
        "query_type": "MatchAllDocsQuery",
        "lucene": "*:*",
        "time": "0.2216380000ms",
        "breakdown": {
          "score": 0,
          "create_weight": 11316,
          "next_doc": 180546,
          "match": 0,
          "build_scorer": 29776,
          "advance": 0
        }
      }
    ],
    "rewrite_time": 2456,
    "collector": [
      {
        "name": "MultiCollector",
        "reason": "search_multi",
        "time": "3.587216000ms",
        "children": [
          {
            "name": "TotalHitCountCollector",
            "reason": "search_count",
            "time": "0.2345190000ms"
          },
          {
            "name": "GlobalOrdinalsStringTermsAggregator: [exchange]",
            "reason": "aggregation",
            "time": "2.605895000ms"
          }
        ]
      }
    ]
  }
]

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/34060.html

相关文章

  • 04 Elasticsearch 据库使字段聚合

    摘要:使用可以用进行的查询。最简单的聚合是把整个当作一个巨大的桶,然后去数这个桶里的数字。复杂一些的聚合我们先要能够把文档分到不同的桶里,然后去分别数每个桶的数字。 使用 https://github.com/taowen/es-monitor 可以用 SQL 进行 elasticsearch 的查询。最简单的聚合是把整个index当作一个巨大的bucket(桶),然后去数这个桶里的数字。复...

    vspiders 评论0 收藏0
  • Elasticsearch as Database

    摘要:开发者们请加入我们,滴滴出行平台技术部推销时间序列数据库的秘密介绍时间序列数据库的秘密索引时间序列数据库的秘密加载和分布式计算用查询把当数据库使表结构定义把当数据库使过滤和排序把当数据库使简单指标把当数据库使按字段聚合把当数据库使聚合把当 Go开发者们请加入我们,滴滴出行平台技术部 taowen@didichuxing.com 推销Elasticsearch 时间序列数据库的秘密(1)...

    hosition 评论0 收藏0
  • Elasticsearch as Database

    摘要:开发者们请加入我们,滴滴出行平台技术部推销时间序列数据库的秘密介绍时间序列数据库的秘密索引时间序列数据库的秘密加载和分布式计算用查询把当数据库使表结构定义把当数据库使过滤和排序把当数据库使简单指标把当数据库使按字段聚合把当数据库使聚合把当 Go开发者们请加入我们,滴滴出行平台技术部 taowen@didichuxing.com 推销Elasticsearch 时间序列数据库的秘密(1)...

    lansheng228 评论0 收藏0
  • Elasticsearch as Database

    摘要:开发者们请加入我们,滴滴出行平台技术部推销时间序列数据库的秘密介绍时间序列数据库的秘密索引时间序列数据库的秘密加载和分布式计算用查询把当数据库使表结构定义把当数据库使过滤和排序把当数据库使简单指标把当数据库使按字段聚合把当数据库使聚合把当 Go开发者们请加入我们,滴滴出行平台技术部 taowen@didichuxing.com 推销Elasticsearch 时间序列数据库的秘密(1)...

    fanux 评论0 收藏0
  • 【05】 Elasticsearch 据库使:HISTOGRAM聚合

    摘要:使用可以用进行的查询。按已有字段来分桶是最简单的一种分桶的方式。很多时候我们希望用于分桶的是需要先经过计算而来的。其中最简单的一种计算方式是按区间段来算。用于计算的字段可以是时间戳,也可能是数值。 使用 https://github.com/taowen/es-monitor 可以用 SQL 进行 elasticsearch 的查询。按已有字段来分桶是最简单的一种分桶的方式。很多时候我...

    suosuopuo 评论0 收藏0

发表评论

0条评论

xfee

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<