资讯专栏INFORMATION COLUMN

linux keepalive探测对应用层socket api的影响

layman / 2910人阅读

摘要:问题大部分人都知道的假设读者知道会如何触发这篇文章想讨论触发后对使用者的影响设置修改验证可以看到因为开启了所以主动往发起探测若开启则是往发送探测如果都开启则会双向探测对应用层的影响准备工作为了模拟生效的情况用模拟断网线的情况准备好安装

问题

大部分人都知道tcp的keepalive. 假设读者知道keepalive会如何触发. 这篇文章想讨论keepalive触发后, 对socket使用者的影响.

keepalive设置

修改/etc/sysctl.conf

ubuntu# vim /etc/sysctl.conf 
ubuntu# sysctl -p
fs.file-max = 131072
net.ipv4.tcp_keepalive_time = 10
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3

验证

ubuntu# sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 5
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_time = 10

tcp_server.py

import socket
import sys

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_address = ("localhost", 22345)
sock.bind(server_address)
sock.listen(1)
connection, client_address = sock.accept()
while True:
    data = connection.recv(1024)
    print("data", data)

tcp_client.py

import socket
import sys
import time

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
server_address = ("localhost", 22345)
sock.connect(server_address)

time.sleep(999999999)


可以看到, 因为tcp_client开启了SO_KEEPALIVE, 所以tcp_client主动往tcp_server发起KEEPALIVE探测.
若tcp_server开启SO_KEEPALIVE, 则是tcp_server往tcp_client发送KEEPALIVE探测.
如果tcp_server/tcp_client都开启KEEPALIVE, 则会双向探测.

对应用层socket api的影响 准备工作

为了模拟keepalive生效的情况, 用docker模拟断网线的情况.

准备好安装有docker, python, vim, tcpdump的ubuntu镜像, 创建好docker 网络.

跑起来, 修改heartbeat设置.

ubuntu# sudo docker run -it 
  --volume=//home/enjolras/code_repo/python/keepalive_test://home/enjolras/code_repo/python/keepalive_test 
  --detach=true 
  --name=tcp_server 
  --privileged=true   
  --network=multi-host-network 
  ubuntu_with_python
08f89dcff3547bb15c7aed975dfa5a0821e4d0246d6d812e02fd1470f3cef6c3
ubuntu# sudo docker run -it 
  --volume=//home/enjolras/code_repo/python/keepalive_test://home/enjolras/code_repo/python/keepalive_test 
  --detach=true 
  --name=tcp_client 
  --privileged=true   
  --network=multi-host-network 
  ubuntu_with_python
对阻塞式send/recv的影响 tcp_server
import socket
import sys

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_address = ("0.0.0.0", 22345)
sock.bind(server_address)
sock.listen(1)
connection, client_address = sock.accept()
connection.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
data = connection.recv(1024)
print("data", data)
tcp_client
import socket
import sys
import time

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
server_address = ("tcp_server", 22345)
sock.connect(server_address)

time.sleep(999999999)
send/recv会以异常/错误码方式得知 heartbeat 检测到的链接断开.

可以看到, tcp_server/tcp_client互发心跳.

root@0b3f1ee81446:/# tcpdump -i any port 22345    
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
12:29:34.491239 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [S], seq 2347845399, win 28200, options [mss 1410,sackOK,TS val 951128354 ecr 0,nop,wscale 7], length 0
12:29:34.491279 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [S.], seq 1169988006, ack 2347845400, win 27960, options [mss 1410,sackOK,TS val 2298965862 ecr 951128354,nop,wscale 7], length 0
12:29:34.491299 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951128354 ecr 2298965862], length 0
12:29:44.666952 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298976038 ecr 951128354], length 0
12:29:44.666969 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951138530 ecr 2298965862], length 0
12:29:44.666978 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298976038 ecr 951128354], length 0
12:29:44.666987 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951138530 ecr 2298976038], length 0
12:29:54.907019 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298986278 ecr 951138530], length 0
12:29:54.907054 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951148770 ecr 2298976038], length 0
12:29:54.907059 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951148770 ecr 2298976038], length 0
12:29:54.907062 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2298986278 ecr 951138530], length 0

将tcp_server/tcp_client断网.

ubuntu# docker network disconnect multi-host-network tcp_client

可以看到tcp_server在连续3个探测包没有回复后, 往tcp_client发了一个RST.

12:31:47.547010 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951261408 ecr 2299088676], length 0
12:31:47.547019 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299098916 ecr 951251168], length 0
12:31:47.547061 IP tcp_client.multi-host-network.57130 > 0b3f1ee81446.22345: Flags [.], ack 1, win 221, options [nop,nop,TS val 951261408 ecr 2299098916], length 0
12:31:57.787226 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299109156 ecr 951261408], length 0
12:32:02.906612 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299114276 ecr 951261408], length 0
12:32:08.026829 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [.], ack 1, win 219, options [nop,nop,TS val 2299119396 ecr 951261408], length 0
12:32:13.146776 IP 0b3f1ee81446.22345 > tcp_client.multi-host-network.57130: Flags [R.], seq 1, ack 1, win 219, options [nop,nop,TS val 2299124516 ecr 951261408], length 0

可以看到, 在心跳机制检测到socket状态异常后, 会通过异常/错误码等方式通知调用者.

3f1ee81446:/home/enjolras/code_repo/python/keepalive_test# python tcp_serv
Traceback (most recent call last):
  File "tcp_server.py", line 11, in 
    data = connection.recv(1024)
socket.error: [Errno 110] Connection timed out
对select的影响 tcp_server
import socket
import sys
import select

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_address = ("0.0.0.0", 22345)
sock.bind(server_address)
sock.listen(1)
connection, client_address = sock.accept()
connection.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
readable, writable, exeptional = select.select([connection], [], [])
print("readable", readable, writable, exeptional)
data = connection.recv(1024)
print("data", data)
对套接字select返回可读事件.
3f1ee81446:/home/enjolras/code_repo/python/keepalive_test# python tcp_serv
("readable", [], [], [])
Traceback (most recent call last):
  File "tcp_server.py", line 14, in 
    data = connection.recv(1024)
socket.error: [Errno 110] Connection timed out
对epoll的影响

不做实验, 应该和select一致.

结论

heartbeat检测到tcp链接断开后, 会以可读事件方式通知应用层. 若无tcp heartbeat, 也无应用层heartbeat, 应用层无法得知链接的真实状态.

文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。

转载请注明本文地址:https://www.ucloud.cn/yun/63655.html

相关文章

  • [tcp] WEB服务,Linux内核参数调优

    摘要:如果客户端发送了最后的报文不进入而是立即释放连接,那么就无法收到客户端重发的报文段。 前言:web类应用一般会部署像nginx、tomcat、php等应用程序,使用默认的内核参数设置满足大部分场景,如果优化内核参数,也可以释放不少服务器性能,尤其是在高并发下 一.SYN状态的内核参数调优 大量SYN_SENT这种是主动连接服务端,而未得到响应,也就是SYN超时,一般是服务端根本不存在或...

    waruqi 评论0 收藏0
  • 一次server服务大量积压异常TCP ESTABLISHED链接排查笔记

    摘要:初步排查结论应该是代理所在服务器的问题,和用户开多个客户端没有关系。表示如果检测失败,会一直探测次。承上,探测次的时间间隔为秒。这个值是探测时,检测到这是一个无效的链接的话已经进行了的探测次数。 背景 我们都知道,基于Kubernetes的微服务,大行其道,传统部署模式一直都在跟着变化,但其实,在原有业务向服务化方向过度过程中,有些场景可能会变得复杂。 比如说:将Kubernete...

    shaonbean 评论0 收藏0
  • 一次server服务大量积压异常TCP ESTABLISHED链接排查笔记

    摘要:初步排查结论应该是代理所在服务器的问题,和用户开多个客户端没有关系。表示如果检测失败,会一直探测次。承上,探测次的时间间隔为秒。这个值是探测时,检测到这是一个无效的链接的话已经进行了的探测次数。 背景 我们都知道,基于Kubernetes的微服务,大行其道,传统部署模式一直都在跟着变化,但其实,在原有业务向服务化方向过度过程中,有些场景可能会变得复杂。 比如说:将Kubernete...

    Warren 评论0 收藏0
  • docker swarm mode 下容器重启IP引发 CLOSE_WAIT 问题

    摘要:问题问题简述如下图后端写入的日志丢失并且无报错因为不支持时序图把时序图代码嵌入在代码里问题定位过程为什么卡在看状态转换图可以看到收到了一直没有一直卡在和实际的代码是吻合的那么为什么在引发后发消息仍然不报错呢因为协议允许在收到后继续 问题 问题简述 如下图. server docker restart后, client端写入的日志丢失, 并且无报错.因为不支持时序图, 把时序图代码嵌入在...

    BlackHole1 评论0 收藏0
  • 断开TCP连接

    摘要:我们知道通过三次握手建立可靠连接,通过四次挥手断开连接,连接是比较昂贵的资源。从上分析,安全可靠的断开连接至少需要四次,再多一次的意义不大。连接的异常断开以上都是在理想的情况下发生的,理想状态下,一个连接可以被长期保持。 我们知道TCP通过三次握手建立可靠连接,通过四次挥手断开连接,TCP连接是比较昂贵的资源。为什么TCP需要通过三次握手才能建立可靠的连接?两次不行么?断开连接为什么需...

    fyber 评论0 收藏0

发表评论

0条评论

layman

|高级讲师

TA的文章

阅读更多
最新活动
阅读需要支付1元查看
<