Sanger has become Cray – Seagate first Lustre customer ! The system has arrived and is undergoing acceptance testing.
We intend to complete test and introduce the system by the end of November 2017. This is a very aggressive timeline with a view that the older filesystem areas will be retired by March 2018.
Our secure lustre guide is now available in the presentations area.
Our 2017 financial year end has brought with it a number of procurements. Over the next few months we will be installing:
2PB of Cray/Seagate lustre, 2PB iRODS, 1PB Ceph, a new OpenStack zone, an updated SciaaS platform, iRODS performance server units, SQL servers and a full OpenStack test environment.
We are also preparing for our new DC quadrant, which is expected online 2018.
Should be fun, we will keep our internal customers updated through the IT project portal in the usual way.
We are happy to announce that the final installment of the requested additional 3.5PB usable Ceph / S3 storage is both fully installed and is now available for use by our Human genetics groups.
The turnaround time from delivery to hand over has been less than 1 month for all storage sets and despite some last mile hardware issues, the service is running well.
Following the success stories from our @scale customers, we have been asked to provide an additional 1PB of usable capacity for our internal flexible compute environment.
The order has been placed and we expect BIOS-IT to have the hardware on site and acceptance tested but the end of September 2017.
We continue to be impressed by the resilience of both the Ceph and S3 services the the current platform has provided since Jan 2017 and we look forward to seeing the performance of the infrastructure continue to scale as additional units are added.
After many years of robust service, our Lustre scratch112 and 113 systems are approaching end of life. We are delighted to say that following significant testing and a procurement bakeoff, we have selected Seagate-Cray as the vendor of choice for our replacement system.
The system will be based around Seagates Nitro SSD cache configuration to get the best fit for our mixed workloads. We look forward to receiving the new hardware by the end of September 2017.
2 new RedHat hosts have been installed in farm3 and are available for testing through the retest queue.
We are looking for feedback on this updated operating system as we are proposing to update our clusters later this financial year to RedHat throughout.
So please check now with your dev teams to ensure that your software is ready to go and the software stacks continue to run as intended.
2 new teramem systems are now available through the teramem queue on farm3. These new hosts are quad socket 20 core units (80 cores in all) and provide 3TB of memory on each host. This is a significant boost to our existing hugemem environment where we continue to provide 256GB, 512GB and 1.5TB systems.
In addition to the high core and memory that these new bits provide, they also have approximately ~2TB of NVMe mounted under /local/scratch01. This is a very fast local high IOP/s storage area that is idea for creating graph indexes, or general small file transactions.
To access these hosts, jobs will need to be submitted to the termed queue (-q teramem) and only jobs > 750GB will currently be accepted into this queue. Currently the maximum job length is set to 15 days and the new kernel required to support the systems does not support the blcr checkpointing at this time. So please be aware, the that this restriction exists.
As always, any questions or comments, please let us know in the usual fashion.
Intel have kindly donated a full reference evaluation system so we can see what, if any, improvements our bio-informatics pipelines may realise on this new hardware platform. The model is the :
Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz
Intel’s ARK details are available here:
and of course, wikipedia are here:
We are looking to gauge interest in their conjoined Skylake FPGA offering. If anyone is interested we can gain access to a test system.
AMD systems are also looking very interesting:
We are looking to start evaluating their hardware very soon…. Finally a decent takeoff opportunity !
After working with Mellanox we have managed to realize a 300% network performance improvement across our flexible compute environment ! This is a huge impact on the service and dramatically reduces VM to VM communication.
A kernel upgrade was required for most of the uplift. It appears that the drivers and firmware provided in the default redHat kernel are sub-optimal.