- A VTOL Aircraft company concluded that running CFD jobs in the public Cloud would become foolishly expensive over time.
- Moving from PublicCloud to on-premise EPYCsystems from Supermicro made financial and performance sense, but an equally fast data storage solution remained a question mark.
- In the end, the low-latency impact of running the RozoFS File System on the same EPYC compute nodes as HCI was an all-around win.
VTOL Aircraft company
EPYC Launch with RozoFS
In late 2020, Daystrom was brought in to assist with a ‘Cloud Liberation’ project of Computation Fluid Dynamics jobs that had reached useful productivity using public cloud resources, but was quickly consuming budget faster than churning out results.
A better ROI was needed, and bringing the CFD jobs back in-house to run on power-efficient AMD EPYC cores housed in the excellent Supermicro 820c chassis was an appealing path. The loose end: where to place the source data and results?
Available budget was unsurprisingly focused on acquiring as many EPYC cores and TBs of RAM as possible to churn through the CFD jobs, leaving little budget available for storage technology that had to be reliable, fast, and cost-effective. Fortunately, the aircraft team was open to new ideas, and they were intrigued by an innovative file system we have worked with for a number of years: RozoFS.
- The key to the magic of Rozo is in the math: RozoFS is an integrated file system plus volume-manager that encodes and decodes using the Mojette Transform from the world of integral geometry.
- The result of the Rozo approach is to significantly simplify the computational resources needed to read or write data in a fully resilient manner, locally or across a distributed set of nodes in a storage cluster. When running in a distributed mode, which is about 97% of our deployments, each node adds not only storage but the computation power to read and write functionally‘network speeds.’
It became clear that RozoFS was a perfect fit for the CFD workloads: deploying RozoFS in a hyper-converged HCI model on the 820C required a very small time slice from the EPYC processors to service data I/O, pooling the NVMe-connected SSDs into equally important, and also equally un-important, members of a storage grid. The performance impact to the jobs was essentially ‘nil’, yet when called upon the file system could provide 100Gb network performance to any node or set of nodes.
- Data integrity was guaranteed due to the RozoFS transform-based encoding
- Storage management would be easier from the support for tiering of older data to less-performant SSD or disk within the file system namespace, thereby avoiding team hours to manage the data residing on the high-performance storage layers.
- Hardware: Dual Supermicro 820C chassis with 40 blades, 64Core CPU, NVMe SSD
- Network: Integrated 25G/100G Ethernet and 100G EDR Infiniband
- Operating System: Debian Linux
- Storage Solution: RozoFS Storage/Client on all 40 blades
- Workloads: CFD Simulations
- ROI: 4x over public cloud // 2x performance, 1/2 price