IRIS

A unified data access framework for HPC and big data storage

IRIS (I/O Redirection via Integrated Storage) is a groundbreaking framework that bridges the gap between high-performance computing (HPC) and big data storage systems. As scientific applications become increasingly data-intensive and high-performance data analytics (HPDA) requires more computing power, IRIS provides a unified solution to seamlessly integrate compute-centric and data-centric storage environments.

What makes IRIS special?

IRIS acts as an intelligent mediator between different storage worlds. Just as a translator helps people speaking different languages communicate, IRIS enables applications to access data across different storage systems seamlessly. It unifies parallel file systems (PFS) and object stores under one cohesive framework, eliminating the traditional barriers between HPC and big data environments.

Behind the innovation

IRIS emerged from the recognition that the tools and cultures of HPC and HPDA have diverged, to the detriment of both. Our research shows that unification is essential to address a spectrum of major research domains. The project, funded by the National Science Foundation, aims to create a unified storage interface that bridges two very different compute-centric and data-centric data storage camps.

Key innovations

Cross-system data access: enables MPI applications to directly access object stores and big data applications to access parallel file systems
Virtual files and objects: novel abstractions that overcome semantic gaps between different storage systems
Unified storage interface: seamless integration of compute-centric and data-centric storage systems
High performance: achieves up to 12x speedup on real scientific applications
Transparency: no modification needed to existing applications

Real-world impact

IRIS is making significant contributions across various scientific domains:

Climate modeling: supporting applications like CM1 with efficient data analysis integration
Scientific computing: enabling seamless data sharing between simulation and analysis phases
High-performance data analytics: bridging the gap between computing and data processing
Big data applications: providing efficient access to both file-based and object-based storage

Technical architecture

IRIS consists of several key components working together:

Mappers: bridge semantic gaps between different storage interfaces
Storage modules: handle interactions with underlying storage systems
Metadata manager: maintains consistency and metadata operations
Performance optimizer: includes prefetching, caching, and request aggregation
Unified storage server: provides deep integration at the disk level

Looking forward

IRIS continues to evolve with exciting developments in:

Extended support for various high-level I/O libraries
Enhanced performance optimization techniques
Deeper integration of storage systems at the disk level
Expanded application support across different domains

Join the IRIS community

IRIS is an open-source project welcoming contributions from both academic and industrial researchers:

Repository: GitHub - IRIS project
Documentation: comprehensive guides and technical details
Research papers: latest findings and technical innovations

Key publications

IRIS: I/O Redirection via Integrated Storage (ICS 2018)
- Foundational paper introducing the IRIS framework and its core concepts
- Demonstrates significant performance improvements in real-world applications
Syndesis: Mapping Objects to Files for a Unified Data Access System (MTAGS 2017)
- Explores novel mapping strategies between file and object-based storage
Enosis: Bridging the Semantic Gap between File-Based and Object-Based Data Models (DataCloud 2017)
- Addresses fundamental challenges in unifying different data models
Rethinking Key-Value Store for Parallel I/O Optimization (IJHPCA 2017)
- Investigates optimization strategies for key-value stores in parallel environments
Niobe: An Intelligent I/O Bridging Engine for Complex and Distributed Workflows (IEEE Big Data 2019)
- Extends IRIS concepts to support complex workflow scenarios

Acknowledgements

The development of IRIS has been made possible through the support of the National Science Foundation (NSF). We’re grateful to our collaborators at the Illinois Institute of Technology and various research institutions whose expertise has been instrumental in advancing this project.

Interested in learning more about IRIS or discussing potential collaborations? Feel free to reach out!

Related Publications

2019

bigdata19 NIOBE: An intelligent i/o bridging engine for complex and distributed workflows

Kun Feng, Hariharan Devarajan, Anthony Kougkas, and Xian-He Sun

In Proceedings of the International Conference on Big Data , Dec 2019

ABS BIB Cite

In the age of data-driven computing, integrating High Performance Computing(HPC) and Big Data(BD) environments may be the key to increasing productivity and to driving scientific discovery forward. Scientific workflows consist of diverse applications (i.e., HPC simulations and BD analysis) each with distinct representations of data that introduce a semantic barrier between the two environments. To solve scientific problems at scale, accessing semantically different data from different storage resources is the biggest unsolved challenge. In this work, we aim to address a critical question: ”How can we exploit the existing resources and efficiently provide transparent access to data from/to both environments”. We propose iNtelligent I/O Bridging Engine(NIOBE), a new data integration framework that enables integrated data access for scientific workflows with asynchronous I/O and data aggregation. NIOBE performs the data integration using available I/O resources, in contrast to existing optimizations that ignore the I/O nodes present on the data path. In NIOBE, data access is optimized to consider both the ongoing production and the consumption of the data in the future. Experimental results show that with NIOBE, an integrated scientific workflow can be accelerated by up to 10x when compared to a no-integration baseline and by up to 133% compared to other state-of-the-art integration solutions.
@inproceedings{feng2019niobe, entry_type = {conference}, author = {Feng, Kun and Devarajan, Hariharan and Kougkas, Anthony and Sun, Xian-He}, booktitle = {Proceedings of the International Conference on Big Data}, title = {NIOBE: An intelligent i/o bridging engine for complex and distributed workflows}, year = {2019}, month = dec, publisher = {IEEE}, volume = {}, number = {}, pages = {493--502}, keywords = {Data Integration Frameworks, I/O Acceleration, Storage Bridging, Data Movement Optimization}, doi = {10.1109/BigData47090.2019.9006363}, url = {https://ieeexplore.ieee.org/abstract/document/9006363}, }

2018

ics18 Iris: I/o redirection via integrated storage

Anthony Kougkas, Hariharan Devarajan, and Xian-He Sun

In Proceedings of the International Conference on Supercomputing , Jun 2018

ABS BIB Cite

There is an ocean of available storage solutions in modern high-performance and distributed systems. These solutions consist of Parallel File Systems (PFS) for the more traditional high-performance computing (HPC) systems and of Object Stores for emerging cloud environments. More of ten than not, these storage solutions are tied to specific APIs and data models and thus, bind developers, applications, and entire computing facilities to using certain interfaces. Each storage system is designed and optimized for certain applications but does not perform well for others. Furthermore, modern applications have become more and more complex consisting of a collection of phases with different computation and I/O requirements. In this paper, we propose a unified storage access system, called IRIS (i.e., I/O Redirection via Integrated Storage). IRIS enables unified data access and seamlessly bridges the semantic gap between file systems and object stores. With IRIS, emerging High-Performance Data Analytics software has capable and diverse I/O support. IRIS can bring us closer to the convergence of HPC and Cloud environments by combining the best storage subsystems from both worlds. Experimental results show that IRIS can grant more than 7x improvement in performance than existing solutions.
@inproceedings{kougkas2018iris, entry_type = {conference}, author = {Kougkas, Anthony and Devarajan, Hariharan and Sun, Xian-He}, booktitle = {Proceedings of the International Conference on Supercomputing}, title = {Iris: I/o redirection via integrated storage}, year = {2018}, month = jun, publisher = {}, volume = {}, number = {}, pages = {33--42}, keywords = {Data Integration Frameworks, Storage Bridging, Elastic Storage, Data Movement Optimization}, doi = {10.1145/3205289.3205322}, url = {https://dl.acm.org/doi/abs/10.1145/3205289.3205322}, }

2017

ijhpca17 Rethinking key–value store for parallel i/o optimization

Anthony Kougkas, Hassan Eslami, Xian-He Sun, Rajeev Thakur, and William Gropp

In Proceedings of the International Journal of High Performance Computing Applications , 2017

ABS BIB Cite

Key–value stores are being widely used as the storage system for large-scale internet services and cloud storage systems. However, they are rarely used in HPC systems, where parallel file systems are the dominant storage solution. In this study, we examine the architecture differences and performance characteristics of parallel file systems and key–value stores. We propose using key–value stores to optimize overall Input/Output (I/O) performance, especially for workloads that parallel file systems cannot handle well, such as the cases with intense data synchronization or heavy metadata operations. We conducted experiments with several synthetic benchmarks, an I/O benchmark, and a real application. We modeled the performance of these two systems using collected data from our experiments, and we provide a predictive method to identify which system offers better I/O performance given a specific workload. The results show that we can optimize the I/O performance in HPC systems by utilizing key–value stores.
@inproceedings{kougkas2017rethinking, entry_type = {journal}, author = {Kougkas, Anthony and Eslami, Hassan and Sun, Xian-He and Thakur, Rajeev and Gropp, William}, booktitle = {Proceedings of the International Journal of High Performance Computing Applications}, title = {Rethinking key--value store for parallel i/o optimization}, year = {2017}, month = {}, publisher = {SAGE Publications Sage UK: London, England}, volume = {31}, number = {4}, pages = {335--356}, keywords = {Key–Value Stores, Parallel I/O Optimization, Data-Intensive Applications, Storage Architectures}, doi = {10.1177/109434201667}, url = {https://journals.sagepub.com/doi/abs/10.1177/1094342016677084}, }