Newer
Older
# CephFS performance issues - problem description
## Summary
University of Helsinki has a Ceph cluster that can be accessed three
ways: CephFS, CephFS+NFS-Ganesha, RBD and S3. The people using CephFS
have reported significant performance issues. The key thing here is
that the tool they use for accessing their data uses seek before
reading the data.
* Ceph version 18.2.1 (reef)
* Ceph is managed by cephadm
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
* 15 HPE ProLiant XL420 Gen10 Plus servers each of which contains
* 512 GiB memory
* 24 18 TiB 7200 RPM HDDs
* 2 1.5 TiB NVMe drives
* 1 3 TiB NVMe drive
* 2 25GE network interfaces
* The Ganesha containers are running on libvirt virtual machines
running on ProLiant DL380 Gen10 Plus servers
* All nodes run RHEL 9.4
## The problem
The clients are using a tool that wants to read a part of the file and
process it further. When the tool issues a seek system call the CephFS
performs very poorly when the size of the file opened is at least 100
MiB. The same behaviour can be observed with tools like tac or
ddrescue both of which can read the files in reverse. The performance
hit is huge. When a 100 MiB file is read from the beginning to the end
the process completes in less than 10 seconds including the time
required to start the actual procesess. Instead, when the file is read
from the end to the beginning the time required is 10 minutes.
Intially, we thought the culprit was Ganesha but we were wrong. When
we let the client hosts connect the cluster (almost) directly the
performance didn't improve. The almost here means that there is an
intermediate host that routes the traffic between the client and the
Ceph public network. If we run iperf between the client and a cluster
host we get nearly a line speed performance.
The confusing thing is that if we set up a libvirt virtual machine
that can connect to cluter's public network directly we don't see any
performance hit.
``` mermaid
subgraph "Laboratory hosts"
Lab(Analysis host)
Node1[Node 1] --> Node2[Node 2]
Node2 --> SubGraph1[Jump to SubGraph1]
SubGraph1 --> FinalThing[Final Thing]
end