资讯专栏INFORMATION COLUMN

TRINI: an adaptive load balancing strategy

wudengzan / 847人阅读

摘要:这将预测出当达到多少时,将发生下一次算法把这个阈值传给另一个线性回归模型,这个模型将推测内存的时间序列并预测新的一个发生的时间将四种基本算法改为对应的感知性算法如果预选取的节点在很短的时间内要发生,则跳过该节点,评估其他节点。

TRINI: an adaptive load balancing strategy based on garbage collection for clustered Java system 1. Introduction

GC comes with a cost : Whenever it is triggered, GC has an impact on the system performance by pausing the involved programs.

major GC : usually causes the longest type of GC pauses

research shows that it is not possible to have a single "best-fit-for-all" GC strategy because the GC behavior is dependent on the application inputs and system configuration

GC is particular sensitive to the heap size and even small changes

it is commonly agreed that the GC plays an important role in the performance of Java system

core line of thinking

question : what techniques can be deployed so that the occurruence of MaGC events in the application nodes does not affect the performance of the cluster ?

solution : enhance a load balancer so that it selects the nodes that are not expected to have a MaGC event in the immediate futures

the behavior of load balancing strategies heavily influenced by the accuracy of its balancing decisions and the amount of resources it uses.

a deep understanding of these factors is key to comprehend the practicability of any load balancing strategy.

2. Background

Generational heap

对象按时间不同被分配到不同的叫作 generation 的内存区中。新的对象创建在 youngest generation 中,因为 younger generation的存活率通常比 older generation 的低。也就是说,younger generation 更有可能包含垃圾,也更频繁地被回收

younger generation 中的 GC 叫作 minor GC (MiGC),通常是廉价的,也很少造成性能问题。 MiGC 也负责将足够老的活着的对象移动到 older generation。这意味着 MiGC 在 older generation 的内存分配方面起到重要作用

older generation 中的 GC 叫作 MaGC ,它通常被认为是最昂贵的 GC 类型,因为它对性能影响很大

Garbage Collection Strategies

3种GC策略

种类 serial GC parallel GC concurrent GC
线程 单线程 多线程
适用 client JVM server JVM server JVM
throughput response time

Load balancing

4种负载均衡策略

round robin

random

weighted round robin

weighted random

3. Related Work 3.1 Garbage collection optimisation

propose new concurrent and parallel algorithm that impact performance less

3.2 Memory forecasting

本文目标

forecast the MaGC events and make the information available outside the JVM

其他人提出的

look for way to invoke a GC

present an approach to estimate the number of dead object at any time, information that a JVM could to dicide when to trigger a MaGC.

3.3 Distributed system optimisation

our research work has enhanced a load balancer by considering the MaGC forecast in its decision layer. In such a case , the load balancer can obteain additional knowledge about the JVM in order to control the workload of the system.

4 Garbage colletion-aware load balancing strategy 4.1 Overview

objective —— define a GC-aware load balancing strategy ( TRINI ) which is able to dynamically adjust to the specific GC characteristics of the underlying application

这个策略能让负载均衡器足够准确地预测 MaGC 事件的发生

TRINI 周期性地从应用节点中检索信息

根据应用的 GC 特点找到最适合的 policy

使用被选出的 policy 进行预测 MaGC 事件和均衡即将到来的负载

为了实现自适应,使用了MAPE-K模型

Monitoring element

obtain information

Analysis element

evaluate if any adaptation is required

Plan element

Execute element

Knowledge element

support other elements

is fulfilled by the set of program family

program family

包含一系列类似的程序。这些程序有共同的GC特点

例如按照 MaGC 时间长短划分的 program family

每个 program family 有2个属性

an evaluation criteria

判断应用的 GC 行为是否有资格成为那个family

a policy

指定 GC 预测和负载均衡的规则

4.2 TRINI core process

core process that coordinate its MAPE-K elements

load balancer 一旦开始,便触发 core process

初始时,core process 使用一个默认 policy —— 全部的可用 MiGC 历史被用来预测 MaGC。初始 policy 考虑所有的在启动时的额外配置信息,例如负载均衡算法。初始 policy 被用于所有的 node

接下来,monitor 中指定的循环和分析在所有节点中并行地开始,直到完成负载均衡

根据程序的 GC 特点(这些特点被用来定义一系列可用的 program family),收集数据样本

收集完成后,分析进程检查当前的 program family 是否适合底层的 GC 特点。如果不适合,则其他 program family 的评价标准被评价去发现新的 program family

这些新的 program family 一直被使用,直到下一次评价阶段发生。这些过程从 program family 的数据库中检索他们的配置信息。

4.3 MaGA : a major garbage collection forecast algorithm

TRINI最重要的能力 —— 准确预测 MaGC 的发生

通过 MaGA 算法 —— 作者的另一篇论文

MaGA 内容:

周期性地从 JVM 中检索 GC 和内存样本,以记录发生在 Young 和 Old generation 中的内存分配活动

利用最近的历史数据(由可配置的 FWS(预测窗口大小)限制)来预测下一个 MaGC 事件

预测出在 Old Generation 用完之前,要在 Young Generation 中开辟出多少内存(当 Old Generation 用完便会触发 MaGC)

算法使用 FWS 中的 old generation 历史数据得到一个线性回归模型。这样做是为了预测 YoungGen 中的增长率,并由此推断 OldGen 将超过其最大阈值的点在哪里,并且触发 GC 。这将预测出当 YoungGen 达到多少时,将发生下一次 MaGC

算法把这个 YoungGen 阈值传给另一个线性回归模型,这个模型将推测 YoungGen 内存的时间序列并预测新的一个 MaGC 发生的时间

4.4 Garbage collection-aware load balancing algorithms

将四种基本算法改为对应的 GC 感知性算法

round robin

random

weighted round robin

weighted random

the main difference of new algorithms (compared against their original counterparts)

perform an additional check in the selection of the next node

如果预选取的节点在很短的时间内要发生 MaGC ,则跳过该节点,评估其他节点。

当所有节点都要在接下来的很短的时间内要发生 MaGC ,则算法会按照其原始版本的算法进行,即按照没有 GC 感知的版本进行

4.5 MiGC-CV program families

自动选择 FWS

作者之前的一篇论文显示, MaGA 算法的准确性对 FWS 极其敏感。

FWS 限制了用来预测 MaGC 的知识水平(即:内存分配的历史信息的大小)

实验发现, 没有一个适合所有情况的最优 FWS 值

作者的另一篇论文表明,可用的历史数据越多, MaGA 算法预测得越准。但这不具有单调性。相反,最优 FWS 也会经历低谷

这种行为可以被 MiGCCV捕获

MiGCCV 是用来衡量在 MaGC 之间发生的 MiGC 数量变化的系数

这种方法使得 MiGCCV 成为一种恰当的分类标准,这种标准可以把不同的 program 行为分到不同的 family 中

例如:当 MaGC 之间的 MiGC 的数量变化很大时(即 MiGCCV很大),使用历史数据就很吃力,因为历史数据无法捕获内存行为的巨大变化(几个数量级)。相反,如果只是用最近的历史数据(意味着使用一个更小的 FWS ),则预测的准确率会显著提高。

5. Experimental evaluation 5.1 Experiment #1 Generality assessment

TRINI was applied to four load balancing algorithms to assess its generality

load balancing algorithms
original developed
round robin GC - round robin
weighted round robin GC - weighted round robin
random GC - random
weighted random GC - weighted random
test environment

52 virtual machines

50 applicationi nodes

1 load balancer

1 load tester node (performance test -- Apache JMeter)

garbage collection strategies

GC 策略是影响 GC 行为的一个主要因素

serial GC

parallel GC

concurrent GC

evaluation criteria

performance

throughput

response time

overhead

CPU (%)

memory (MB)

FA (forecast accuracy) ----- 3 metrics were calculated

FE ( forecast error)

MiGCAVG (the average number of MiGCs that occured between two MaGC events)

capture the relationship between the heap size and the memory allocaion required by an application (major factors influencing the GC)

MiGCAVG 越小, MaGC 发生的次数越多。此时程序的 old generation 总是非常频繁的被耗尽

如果 MiGCAVG接近0,则会产生内粗不足异常

MiGCCV (the coefficient of variation)

是 MiGCAVG 的标准差

用来比较不同程序在内存使用方面的变化

performance improvement

TRINI worked well irrespective of GC stategies and load balancing algorithms

difference in memory behaviors across the tested application

analyse MiGCCV behaviors

MiGCCV越小,预测的准确率越高

overhead

overhead

in the application nodes

TRINI proved to be lightweight in terms of CPU and memory ———— 增幅很小

by the data gathering process

in the load balancer node

相对较高 (compared to application nodes)

overhead is independent of load balancing algorithms

5.2 Experiment #2 Scalability assessment test environment

the cluster size is varizble

covering the range of 5~50 application nodes in increments of 5

the number of concurrent users was increaseed proportionally to the cluster size

5-node --> 50 users

10-node --> 100 users

and so on

performance

hypothesis (comfirmed)

performance improvements should not degrade when the cluster size increases (not strict)

the difference in improvements among the tested programs were due to their diversities in memory/GC behavior

overhead

cost in the application nodes

was minimal and relatively constant and independent of the cluster size

cost in the load balancer node

was dependent of the cluster size

5.3 Experiment #3 Reliability assessment test environment

50 nodes

duration of the test runs was increased from 1 to 24 hours

performance improvements

carry out a breakdown of the behavior of each experimental configuration on an hourly basis

remains stable through time

overhead

application node

minimal overhead ( relatively constant)

load balancer node

higher (but quite steady)

main contribution to this increase is the number of forecast processes, which is not influenced by time by the size of the cluster

stability in the memory footprint

data older than required FWS, the data is automatically purged

5.4 Final discussion for practitioners

to estimate the FA, the MiGCCV has proven to be a useful metric

more GC intensive applications can benefit most from TRINI

in terms of the overhead introduced to the load node, results have shown that the overhead usually follows a relatively linear growth with respect to the cluster size

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/66458.html

相关文章

  • JavaScript代码整洁之道

    摘要:代码整洁之道整洁的代码不仅仅是让人看起来舒服,更重要的是遵循一些规范能够让你的代码更容易维护,同时降低几率。另外这不是强制的代码规范,就像原文中说的,。里式替换原则父类和子类应该可以被交换使用而不会出错。注释好的代码是自解释的。 JavaScript代码整洁之道 整洁的代码不仅仅是让人看起来舒服,更重要的是遵循一些规范能够让你的代码更容易维护,同时降低bug几率。 原文clean-c...

    liaorio 评论0 收藏0

发表评论

0条评论

wudengzan

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<