a novel task-driven approach to scalable and distributed i/o services.
dtio (datatask i/o) is a groundbreaking framework that introduces a new i/o paradigm to support a wide variety of conflicting i/o workloads under a single scalable platform ready for the convergence of high-performance computing (hpc), ai, and big data. as scientific applications become increasingly data-intensive with diverse, often conflicting i/o requirements, dtio provides a unified solution through an innovative task-based i/o approach.
what makes dtio special? đĄ
dtio acts as an intelligent mediator between different i/o workloads. the core innovation is the concept of datatasks (dts) - a tuple of an operation and a pointer to data that allows applications to treat data as active objects capable of performing operations on themselves or other dts. this enables seamless integration of compute-centric and data-centric environments while providing robust i/o optimization capabilities.
behind the innovation
dtio emerged from the recognition that modern scientific workflows require a diverse and often conflicting set of storage features and semantics. our research shows that a unified approach is essential to address the challenges of data-intensive computing. the project, funded by the department of energyâs advanced scientific computing research (doe ascr) program, aims to create a novel data movement infrastructure that can efficiently handle the convergence of hpc, big data, and ai workloads.
key innovations
datatask abstraction: novel concept enabling atomic, distributed and composable data transformations
qos-aware scheduling: intelligent scheduling with asynchronous i/o capabilities
unified i/o solution: seamless support for both legacy and modern i/o interfaces
high performance: achieves significant speedup on real scientific applications
resilience: built-in fault tolerance and data lineage tracking
active storage: programmable locality-aware storage capabilities
real-world impact đ
dtio is making significant contributions across various scientific domains:
climate modeling: supporting applications like cm1 with efficient data analysis integration
molecular dynamics: enabling seamless i/o for applications like lammps
astronomical data processing: accelerating workflows like montage
weather forecasting: enhancing i/o performance for wrf simulations
high-performance data analytics: bridging the gap between computing and data processing
technical architecture
dtio consists of several key components:
datatask manager: handles dt specification and composition
qos scheduler: manages dt scheduling and resource allocation
asynchronous i/o engine: enables efficient overlapping of i/o operations
resilience manager: provides fault tolerance and data lineage
active storage layer: supports programmable storage capabilities
looking forward
dtio continues to evolve with exciting developments in:
extended support for various i/o interfaces and patterns
enhanced qos-aware scheduling techniques
advanced resilience and fault tolerance capabilities
deeper integration with active storage technologies
join the dtio community đ€
dtio is an open-source project welcoming contributions from both academic and industrial researchers:
this material is based upon work supported by the u.s. department of energy, office of science, under award number de-sc0023386. this project is a collaborative effort between illinois institute of technology and argonne national laboratory. i am grateful to my partners at anl and the broader doe community whose expertise has been instrumental in advancing this project.
Interested in learning more about DTIO or discussing potential collaborations? Feel free to reach out!
Jie Ye, Jaime Cernuda, Neeraj Rajesh, Keith Bateman, Orcun Yildiz, Tom Peterka, Arnur Nigmetov, Dmitriy Morozov, Xian-He Sun, Anthony Kougkas, and Bogdan Nicolae
In Proceedings of the 53rd International Conference on Parallel Processing , Aug 2024
Scientific workflows increasingly need to train a DNN model in real-time during an experiment (e.g. using ground truth from a simulation), while using it at the same time for inferences. Instead of sharing the same model instance, the training (producer) and inference server (consumer) often use different model replicas that are kept synchronized. In addition to efficient I/O techniques to keep the model replica of the producer and consumer synchronized, there is another important trade-off: frequent model updates enhance inference quality but may slow down training; infrequent updates may lead to less precise inference results. To address these challenges, we introduce Viper: a new I/O framework designed to determine a near-optimal checkpoint schedule and accelerate the delivery of the latest model updates. Viper builds an inference performance predictor to identify the optimal checkpoint schedule to balance the trade-off between training slowdown and inference quality improvement. It also creates a memory-first model transfer engine to accelerate model delivery through direct memory-to-memory communication. Our experiments show that Viper can reduce the model update latency by â 9x using the GPU-to-GPU data transfer engine and â 3x using the DRAM-to-DRAM host data transfer. The checkpoint schedule obtained from Viperâs predictor also demonstrates improved cumulative inference accuracy compared to the baseline of epoch-based solutions.
Traditionally, I/O systems have been developed within the confines of a centralized OS kernel. This led to monolithic and rigid storage systems that are limited by low development speed, expressiveness, and performance. Various assumptions are imposed including reliance on the UNIX-file abstraction, the POSIX standard, and a narrow set of I/O policies. However, this monolithic design philosophy makes it difficult to develop and deploy new I/O approaches to satisfy the rapidly-evolving I/O requirements of modern scientific applications. To this end, we propose LabStor: a modular and extensible platform for developing high-performance, customized I/O stacks. Single-purpose I/O modules (e.g, I/O schedulers) can be developed in the comfort of userspace and released as plug-ins, while end-users can compose these modules to form workload- and hardware-specific I/O stacks. Evaluations show that by switching to a fully modular design, tailored I/O stacks can yield performance improvements of up to 60% in various applications.
As the diversity of big data applications increases, their requirements diverge and often conflict with one other. Managing this diversity in any supercomputer or data center is a major challenge for system designers. Data replication is a popular approach to meet several of these requirements, such as low latency, read availability, durability, etc. This approach can be enhanced using new modern heterogeneous hardware and software techniques such as data compression. However, both these enhancements work in isolation to the detriment of both. In this work, we present HReplica: a dynamic data replication engine which harmoniously leverages data compression and hierarchical storage to increase the effectiveness of data replication. We have developed a novel dynamic selection algorithm that facilitates the optimal matching of replication schemes, compression libraries, and tiered storage. Our evaluation shows that HReplica can improve scientific and cloud application performance by 5.2x when compared to other state-of-the-art replication schemes.
Most parallel programs use irregular control flow and data structures, which are perfect for one-sided communication paradigms such as MPI or PGAS programming languages. However, these environments lack the presence of efficient function-based application libraries that can utilize popular communication fabrics such as TCP, Infinity Band (IB), and RDMA over Converged Ethernet (RoCE). Additionally, there is a lack of high-performance data structure interfaces. We present Hermes Container Library (HCL), a high-performance distributed data structures library that offers high-level abstractions including hash-maps, sets, and queues. HCL uses a RPC over RDMA technology that implements a novel procedural programming paradigm. In this paper, we argue a RPC over RDMA technology can serve as a high-performance, flexible, and co-ordination free backend for implementing complex data structures. Evaluation results from testing real workloads shows that HCL programs are 2x to 12x faster compared to BCL, a state-of-the-art distributed data structure library.
In the age of data-driven computing, integrating High Performance Computing(HPC) and Big Data(BD) environments may be the key to increasing productivity and to driving scientific discovery forward. Scientific workflows consist of diverse applications (i.e., HPC simulations and BD analysis) each with distinct representations of data that introduce a semantic barrier between the two environments. To solve scientific problems at scale, accessing semantically different data from different storage resources is the biggest unsolved challenge. In this work, we aim to address a critical question: âHow can we exploit the existing resources and efficiently provide transparent access to data from/to both environmentsâ. We propose iNtelligent I/O Bridging Engine(NIOBE), a new data integration framework that enables integrated data access for scientific workflows with asynchronous I/O and data aggregation. NIOBE performs the data integration using available I/O resources, in contrast to existing optimizations that ignore the I/O nodes present on the data path. In NIOBE, data access is optimized to consider both the ongoing production and the consumption of the data in the future. Experimental results show that with NIOBE, an integrated scientific workflow can be accelerated by up to 10x when compared to a no-integration baseline and by up to 133% compared to other state-of-the-art integration solutions.
The data explosion phenomenon in modern applications causes tremendous stress on storage systems. Developers use data compression, a size-reduction technique, to address this issue. However, each compression library exhibits different strengths and weaknesses when considering the input data entry_type and format. We present Ares, an intelligent, adaptive, and flexible compression framework which can dynamically choose a compression library for a given input data based on the entry_type of the workload and provides an appropriate infrastructure to users to fine-tune the chosen library. Ares is a modular framework which unifies several compression libraries while allowing the addition of more compression libraries by the user. Ares is a unified compression engine that abstracts the complexity of using different compression libraries for each workload. Evaluation results show that under real-world applications, from both scientific and Cloud domains, Ares performed 2-6x faster than competitive solutions with a low cost of additional data analysis (i.e., overheads around 10%) and up to 10x faster against a baseline of no compression at all.
Understanding, characterizing and tuning scientific applicationsâ I/O behavior is an increasingly complicated process in HPC systems. Existing tools use either offline profiling or online analysis to get insights into the applicationsâ I/O patterns. However, there is lack of a clear formula to characterize applicationsâ I/O. Moreover, these tools are application specific and do not account for multi-tenant systems. This paper presents Vidya, an I/O profiling framework which can predict applicationâs I/O intensity using a new formula called Code-Block I/O Characterization (CIOC). Using CIOC, developers and system architects can tune an applicationâs I/O behavior and better match the underlying storage system to maximize performance. Evaluation results show that Vidya can predict an applicationâs I/O intensity with a variance of 0.05%. Vidya can profile applications with a high accuracy of 98% while reducing profiling time by 9x. We further show how Vidya can optimize an applicationâs I/O time by 3.7x.
There is an ocean of available storage solutions in modern high-performance and distributed systems. These solutions consist of Parallel File Systems (PFS) for the more traditional high-performance computing (HPC) systems and of Object Stores for emerging cloud environments. More of ten than not, these storage solutions are tied to specific APIs and data models and thus, bind developers, applications, and entire computing facilities to using certain interfaces. Each storage system is designed and optimized for certain applications but does not perform well for others. Furthermore, modern applications have become more and more complex consisting of a collection of phases with different computation and I/O requirements. In this paper, we propose a unified storage access system, called IRIS (i.e., I/O Redirection via Integrated Storage). IRIS enables unified data access and seamlessly bridges the semantic gap between file systems and object stores. With IRIS, emerging High-Performance Data Analytics software has capable and diverse I/O support. IRIS can bring us closer to the convergence of HPC and Cloud environments by combining the best storage subsystems from both worlds. Experimental results show that IRIS can grant more than 7x improvement in performance than existing solutions.