Today, building a new grafana dashboard for ceph, I found that the OSDs on one host perform much better than on the others. After this observation and a round of debugging I found out that the controllers where configured differently.
The two hosts, with the much slower apply and commit latency used the JBOD mode of the controller, the faster one used single disk RAID 0 to provide the disks to ceph (with a write-back cache).
Now the headline for this topic in the docs is "avoid RAID" and, as usual, isn't a good choice for the recommendation they give in the paragraph.
"Red Hat recommends that each hard drive be exported separately from the RAID controller as a single volume with write-back caching enabled."
So they indeed recommend RAID but only in the RAID 0 for every single disk configuration with a battery-backed controller.
JBOD, the other option they recommend for a much more specific usecase: "Using Just a Bunch of Drives (JBOD) in independent drive mode with Ceph is supported when using all Solid State Drives (SSDs), or for configurations with high numbers of drives per controller, for example, 60 drives attached to one controller.
What a fun! :)
@leah RAID0 can simply distribute between more disks. Two disks with half the writes are faster as one with all of them. Or does every controller only have 1 disk?
@Toasterson nope, you are getting something wrong here I think. The controller has multiple disks, every single disk is in its own raid0 there are no two disks in a raid0. That would be a problematic configuration because it would exploit some of the functions of ceph. Oh and the question also not much in common with the problem described above.
@leah Ah an interesting combination then. Firmware difference maybe? But thanks for clarifying.
@Toasterson I don't get the point of your question. sorry
@leah no need. I now understood your Thread. Curious to see how this turns out.
@leah maybe the raid 0 setting enables some caching on the controller, so you measure your cache instead of the disk. Just a guess.
@wasserpanther wouldn't explain it because two of three hosts have no caching enabled.
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!