Today, building a new grafana dashboard for ceph, I found that the OSDs on one host perform much better than on the others. After this observation and a round of debugging I found out that the controllers where configured differently.

The two hosts, with the much slower apply and commit latency used the JBOD mode of the controller, the faster one used single disk RAID 0 to provide the disks to ceph (with a write-back cache).

After that I searched for the, apparently religious, question if it is recommended to use a single disk RAID 0 or the JBOD mode of the controller underlying the ceph clusters OSDs.

After no clear result I decided to go by the official red heat docs. As they seem to be the best source of truth for ceph and are a lot better than the ceph project official docs. Well... maybe a result of rad hat buying ceph a while ago.

Now the headline for this topic in the docs is "avoid RAID" and, as usual, isn't a good choice for the recommendation they give in the paragraph.

"Red Hat recommends that each hard drive be exported separately from the RAID controller as a single volume with write-back caching enabled."

So they indeed recommend RAID but only in the RAID 0 for every single disk configuration with a battery-backed controller.

JBOD, the other option they recommend for a much more specific usecase: "Using Just a Bunch of Drives (JBOD) in independent drive mode with Ceph is supported when using all Solid State Drives (SSDs), or for configurations with high numbers of drives per controller, for example, 60 drives attached to one controller.

What a fun! :)

The only open question is, why is the read/write operations latency so much better? Does they count another part of the operation than apply+commit or...?

@leah RAID0 can simply distribute between more disks. Two disks with half the writes are faster as one with all of them. Or does every controller only have 1 disk?

@Toasterson nope, you are getting something wrong here I think. The controller has multiple disks, every single disk is in its own raid0 there are no two disks in a raid0. That would be a problematic configuration because it would exploit some of the functions of ceph. Oh and the question also not much in common with the problem described above.

@leah Ah an interesting combination then. Firmware difference maybe? But thanks for clarifying.

@Toasterson I don't get the point of your question. sorry

@leah no need. I now understood your Thread. Curious to see how this turns out.

@leah maybe the raid 0 setting enables some caching on the controller, so you measure your cache instead of the disk. Just a guess.

@wasserpanther wouldn't explain it because two of three hosts have no caching enabled.

@dwardoric thanks for the push, was also posted a few days ago in our channel but I had not time to read it in the last days.

@leah De nada. The downside of reading papers is that after one you have a list of references to n others you have to read. ;-)

@leah the controllers in our servers at work (hpe smart array whatever) disable their battery backed cache in jbod mode but don't do that on single disk raid setups. maybe the controllers you use have a similar behaviour.

@bsod jep thats the case for our controllers too.

@leah @bsod but with Cache ceph don’t know if the Bits are sure on disk or not ?

@Wageck @bsod as the controller is battery buffered this is not a problem.

@leah @bsod
What controllers do you use? Do both systems have the same driver and driver version?
It is definitely worthwhile investigating any funny thing you notice, keeping an eye on performance and especially check after each update.

@leah @bsod
We have had fun with LSI based controllers, when a kernel update halved the speed of the controller. Eventually it was fixed by a LSI firmware update - that only became available ~10 months after the kernel change.

@benno @bsod we use some AVAGO MegaRAID controllers, and of course the controllers all run the same version.

Sign in to participate in the conversation

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!