Featured image of post 安装哪吒监控所踩的坑

安装哪吒监控所踩的坑

# 前言

这是一篇便做边写的水文,有一定参考价值。这不是一篇教程,请不要与我文章中所作所为同步。建议读完全篇后作取舍。报错部分可略过,挑取有价值的部分。

# 安装监控端

一开始怎么都安装不上,显示无 TencentOS 分支

1
2
Status code: 404 for http://mirrors.tencentyun.com/tlinux/3/TencentOS/x86_64/repodata/repomd.xml (IP: \*.\*.\*.\*)
Error: Failed to download metadata for repo 'TencentOS': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

自己加docker的镜像后发现报错

1
2
3
Errors during downloading metadata for repository 'docker-ce-stable':
  - Status code: 404 for https://mirrors.cloud.tencent.com/docker-ce/linux/centos/3.1/x86_64/stable/repodata/repomd.xml (IP: \*.\*.\*.\*)
Error: Failed to download metadata for repo 'docker-ce-stable': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

然后发现了TencentOS的一个issue,救星! docker 安装问题

跟着做发现报错。这才发现yum源被我搞炸了。进入yum源配置文件文件夹

1
cd /etc/yum.repos.d

ls 之后发现果然有docker-ce-stable 相关的repo文件,rm删除

yum update 之后根据上面GitHub的issue走

1
2
yum -y install tencentos-release-docker-ce
yum -y install docker-ce

然后再执行

1
sudo ./nezha.sh

不再报无法连接到docker了

1637550967926.png
1637550967926.png

安装面板端,1

1637551035963.png
1637551035963.png

跟着走,到目前为止都没什么问题,很开心。

1637551077771.png
1637551077771.png
然后就炸了。

将当前用户加入docker组,切换用户试试

1
2
3
sudo gpasswd -a ${USER} docker
sudo su
su ${USER}

1637551226908.png
1637551226908.png
依旧报错

安装完docker-compose后启动,报错

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
> 启动面板
Traceback (most recent call last):
  File "urllib3/connectionpool.py", line 677, in urlopen
  File "urllib3/connectionpool.py", line 392, in _make_request
  File "http/client.py", line 1277, in request
  File "http/client.py", line 1323, in _send_request
  File "http/client.py", line 1272, in endheaders
  File "http/client.py", line 1032, in _send_output
  File "http/client.py", line 972, in send
  File "docker/transport/unixconn.py", line 43, in connect
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "requests/adapters.py", line 449, in send
  File "urllib3/connectionpool.py", line 727, in urlopen
  File "urllib3/util/retry.py", line 410, in increment
  File "urllib3/packages/six.py", line 734, in reraise
  File "urllib3/connectionpool.py", line 677, in urlopen
  File "urllib3/connectionpool.py", line 392, in _make_request
  File "http/client.py", line 1277, in request
  File "http/client.py", line 1323, in _send_request
  File "http/client.py", line 1272, in endheaders
  File "http/client.py", line 1032, in _send_output
  File "http/client.py", line 972, in send
  File "docker/transport/unixconn.py", line 43, in connect
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "docker/api/client.py", line 214, in _retrieve_server_version
  File "docker/api/daemon.py", line 181, in version
  File "docker/utils/decorators.py", line 46, in inner
  File "docker/api/client.py", line 237, in _get
  File "requests/sessions.py", line 543, in get
  File "requests/sessions.py", line 530, in request
  File "requests/sessions.py", line 643, in send
  File "requests/adapters.py", line 498, in send
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 81, in main
  File "compose/cli/main.py", line 200, in perform_command
  File "compose/cli/command.py", line 70, in project_from_options
  File "compose/cli/command.py", line 153, in get_project
  File "compose/cli/docker_client.py", line 43, in get_client
  File "compose/cli/docker_client.py", line 170, in docker_client
  File "docker/api/client.py", line 197, in __init__
  File "docker/api/client.py", line 222, in _retrieve_server_version
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
[2064673] Failed to execute script docker-compose
启动失败,请稍后查看日志信息

重启。 再次启动发现报错。

1
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

忘记将docker添加到自启了

1
2
sudo systemctl enable docker.service
sudo systemctl enable containerd.service

然后启动docker

1
service docker start

终于,启动成功

1637567938818.png
1637567938818.png

# 添加反代

需要反代websocket,否则无法实时监控。

在宝塔新建网站,反代配置如下。

自己摸索的反代配置,很可能有更好的配置,但我这块不熟。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
location /
{
    proxy_pass http://127.0.0.1:8008;
    proxy_set_header Host $host;
}
location /ws
{
    proxy_pass http://127.0.0.1:8008;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header Host $host;
}
location /terminal
{
    proxy_pass http://127.0.0.1:8008;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_set_header Host $host;
}

# 安装受控端

安装第一台受控端的时候很快,其实就是监控端受控端在同一服务器上跑。

安装第二台服务器的探针时发现怎么弄都没有上线。 打开 nmap扫描端口后发现该端口竟然是关闭的

1637575217400.png
1637575217400.png

查找宝塔发现端口未使用!说明没有程序在监听。

1637575372251.png
1637575372251.png

查看日志发现启动成功了。

遇事不决就重启。 未能解决问题

遂怀疑是SE Linux的问题,但是Ubuntu没有啊。 不管了,安装之后再禁用试试是否可行吧。 启动,不行

那就手动运行吧

1
/opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d

不行。但是有清晰的报错了。

1
2
3
4
5
6
7
➜  admin /opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d
NEZHA@2021-11-22 19:39:29>> 检查更新: 0.11.6
NEZHA@2021-11-22 19:39:29>> 上报系统信息失败: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp server2的IP:5555: connect: connection refused"
NEZHA@2021-11-22 19:39:29>> Error to close connection ...
NEZHA@2021-11-22 19:39:39>> Try to reconnect ...
NEZHA@2021-11-22 19:39:39>> 上报系统信息失败: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp server2的IP:5555: connect: connection refused"
NEZHA@2021-11-22 19:39:39>> Error to close connection ...

百度发现有类似的问题是防火墙导致的,关闭即可。于是去能正确运行的服务器运行了

1
systemctl status firewalld.service

发现防火墙是在正常工作的,不解。出错的服务器是Ubuntu,运行 sudo ufw status verbose 发现5555端口是开放的。

第二天突发奇想

1
2
➜  admin docker    
zsh: command not found: docker

docker没有安装。(其实受控端不需要安装Docker) 安装后执行,报错同样。

1
2
3
4
➜  ~ /opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d
NEZHA@2021-11-23 18:53:14>> 检查更新: 0.11.6
NEZHA@2021-11-23 18:53:14>> 上报系统信息失败: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp server2的IP:5555: connect: connection refused"
NEZHA@2021-11-23 18:53:14>> Error to close connection ...

最终想起来不对,域名应该是指向面板啊! 所以不应该是 /opt/nezha/agent/nezha-agent -s server2的IP:5555 -p agent密钥 -d 而是 /opt/nezha/agent/nezha-agent -s 监控端的IP:5555 -p agent密钥 -d。 果然正常运行了!我真是个憨憨,在不应该出错的地方浪费的大量的时间。

# 总结

# 监控端的安装

  • 提前安装好Docker(如果你的系统不是常规系统的话
  • 跟着提示走简单快捷

# 受控端的安装

  • 跟着提示走,很简单。
  • 域名/IP应为监控端所在服务器的未套CDN的域名/IP
使用 Hugo 构建
主题 StackJimmy 设计