IRIS
a unified data access framework for hpc and big data storage
iris (i/o redirection via integrated storage) is a groundbreaking framework that bridges the gap between high-performance computing (hpc) and big data storage systems. as scientific applications become increasingly data-intensive and high-performance data analytics (hpda) requires more computing power, iris provides a unified solution to seamlessly integrate compute-centric and data-centric storage environments.
what makes iris special? π‘
iris acts as an intelligent mediator between different storage worlds. just as a translator helps people speaking different languages communicate, iris enables applications to access data across different storage systems seamlessly. it unifies parallel file systems (pfs) and object stores under one cohesive framework, eliminating the traditional barriers between hpc and big data environments.
behind the innovation
iris emerged from the recognition that the tools and cultures of hpc and hpda have diverged, to the detriment of both. our research shows that unification is essential to address a spectrum of major research domains. the project, funded by the national science foundation, aims to create a unified storage interface that bridges two very different compute-centric and data-centric data storage camps.
key innovations
- cross-system data access: enables mpi applications to directly access object stores and big data applications to access parallel file systems
- virtual files and objects: novel abstractions that overcome semantic gaps between different storage systems
- unified storage interface: seamless integration of compute-centric and data-centric storage systems
- high performance: achieves up to 12x speedup on real scientific applications
- transparency: no modification needed to existing applications
real-world impact π
iris is making significant contributions across various scientific domains:
- climate modeling: supporting applications like cm1 with efficient data analysis integration
- scientific computing: enabling seamless data sharing between simulation and analysis phases
- high-performance data analytics: bridging the gap between computing and data processing
- big data applications: providing efficient access to both file-based and object-based storage
technical architecture
iris consists of several key components working together:
- mappers: bridge semantic gaps between different storage interfaces
- storage modules: handle interactions with underlying storage systems
- metadata manager: maintains consistency and metadata operations
- performance optimizer: includes prefetching, caching, and request aggregation
- unified storage server: provides deep integration at the disk level
looking forward
iris continues to evolve with exciting developments in:
- extended support for various high-level i/o libraries
- enhanced performance optimization techniques
- deeper integration of storage systems at the disk level
- expanded application support across different domains
join the iris community π€
iris is an open-source project welcoming contributions from both academic and industrial researchers:
- repository: github - iris project
- documentation: comprehensive guides and technical details
- research papers: latest findings and technical innovations
key publications π
- iris: i/o redirection via integrated storage (ics 2018)
- foundational paper introducing the iris framework and its core concepts
- demonstrates significant performance improvements in real-world applications
- syndesis: mapping objects to files for a unified data access system (mtags 2017)
- explores novel mapping strategies between file and object-based storage
- enosis: bridging the semantic gap between file-based and object-based data models (datacloud 2017)
- addresses fundamental challenges in unifying different data models
- rethinking key-value store for parallel i/o optimization (ijhpca 2017)
- investigates optimization strategies for key-value stores in parallel environments
- niobe: an intelligent i/o bridging engine for complex and distributed workflows (ieee big data 2019)
- extends iris concepts to support complex workflow scenarios
acknowledgements π
the development of iris has been made possible through the support of the national science foundation (nsf). weβre grateful to our collaborators at the illinois institute of technology and various research institutions whose expertise has been instrumental in advancing this project.
Interested in learning more about IRIS or discussing potential collaborations? Feel free to reach out!