The Second International Workshop on Big Data Reduction

held with 2021 IEEE International Conference on Big Data

Branching

Introduction

Today’s modern applications are producing too large volumes of data to be stored, processed, or transferred efficiently. Data reduction is becoming an indispensable technique in many domains because it can offer a great capability to reduce the data size by one or even two orders of magnitude, significantly saving the memory/storage space, mitigating the I/O burden, reducing communication time, and improving the energy/power efficiency in various parallel and distributed environments, such as high-performance computing (HPC), cloud computing, edge computing, and Internet-of-Things (IoT). An HPC system, for instance, is expected to have a computational capability of $10^{18}$ floating-point operations per second, and large-scale HPC scientific applications may generate vast volumes of data (several orders of magnitude larger than the available storage space) for post-anlaysis. Moreover, runtime memory footprint and communication could be non-negligible bottlenecks of current HPC systems.

Tackling the big data reduction research requires expertise from computer science, mathematics, and application domains to study the problem holistically, and develop solutions and harden software tools that can be used by production applications. Specifically, the big-data computing community needs to understand a clear yet complex relationship between application design, data analysis and reduction methods, programming models, system software, hardware, and other elements of a next-generation large-scale computing infrastructure, especially given constraints on applicability, fidelity, performance portability, and energy efficiency. New data reduction techniques also need to be explored and developed continuously to suit emerging applications and diverse use cases.

There are at least three significant research topics that the community is striving to answer: (1) whether several orders of magnitude of data reduction is possible for extreme-scale sciences; (2) understanding the trade-off between the performance and accuracy of data reduction; and (3) solutions to effectively reduce data size while preserving the information inside the big datasets.

The goal of this workshop is to provide a focused venue for researchers in all aspects of data reduction in all related communities to present their research results, exchange ideas, identify new research directions, and foster new collaborations within the community.

Please note this year’s IEEE BigData conference and IWBDR workshop will be held virtually. Proceedings of the workshop will be published as planned. We will provide more details about how to attend this workshop virtually soon.

Submissions

Topics of Interest

The focus areas for this workshop include, but are not limited to:

Data reduction techniques for big data issues in high-performance computing (HPC), cloud computing, Internet-of-Things (IoT), edge computing, machine learning and deep learning, and other big data areas:
- Lossy and lossless compression methods
- Approximate computation methods
- Compressive/compressed sensing methods
- Tensor decomposition methods
- Data deduplication methods
- Domain-specific methods, e.g., structured/unstructured meshes, particles, tensors
- Accuracy-guarantee data reduction methods
- Optimal design of data reduction methods
Data reduction challenges and solutions in observational and experimental environments
Mathematical methods with robustly estimable or provable error bounds for both data and quantities of interest
Metrics and infrastructures to evaluate reduction methods and assess quality/fidelity of reduced data
Uncertainty quantification for reduction methods/models/representations
Benchmark applications and datasets for big data reduction
Data analysis and visualization techniques leveraging reduced data
Characterizing the impact of data reduction techniques on applications
Hardware-software co-design of data reduction
Trade-offs between accuracy and performance on emerging computing hardware and platforms
Resource-constrained and/or time-constrained data reduction methods
Software, tools, and programming models for managing reduced data
Runtime systems and supports for data reduction
Development of composable data reduction pipelines/workflows
Automation of data reduction in scientific workflows
Data reduction challenges and solutions in observational and experimental environments

Proceedings

All papers accepted for this workshop will be published in the Workshop Proceedings of IEEE Big Data Conference, made available in the IEEE eXplore digital library.

Submission Instructions

Camera-ready version of accepted papers must be compliant with the IEEE Xplore format for publication.
Submissions must be in PDF format.
Submissions are required to be within 6 pages for short paper or 10 pages for full paper (including references).
Submissions must be single-spaced, 2-column pages in IEEE Xplore format.
Submissions are NOT double-blind.
Only web-based submissions are allowed.
All submission deadlines are Anywhere on Earth
Please submit your paper via the submission system.
Submission link: Cyberchair submissions website.

Important Dates

Paper Submission: ~~November 5, 2021~~ November 8, 2021
Paper Acceptance Notification: November 12, 2021
Camera-ready Deadline: November 19, 2021
Workshop: December 17, 2021

Organizers

Program Chairs

Dingwen Tao, Washington State University
Xin Liang, Missouri S&T
Sheng Di, Argonne National Laboratory

Web Chair

Jiannan Tian, Washington State University

Program Committee

Allison Baker, National Center for Atmospheric Research
Mehmet Belviranli, Colorado School of Mines
Martin Burtscher, Texas State University
Franck Cappello, Argonne National Laboratory
Jon Calhoun, Clemson University
Jieyang Chen, Oak Ridge National Laboratory
Yimin Chen, Lawrence Berkeley National Laboratory
Soumya Dutta, Los Alamos National Laboratory
William Godoy, Oak Ridge National Laboratory
Pascal Grosset, Los Alamos National Laboratory
Hanqi Guo, Argonne National Laboratory
Muhammad Asif Khan, Qatar University
Beiyu Lin, University of Nevada, Las Vegas
Shaomeng Li, National Center for Atmospheric Research
Habib Rehman, Khalifa University
Tao Lu, Marvell Technology Group
Panruo Wu, University of Houston
Wen Xia, Harbin Institute of Technology, Shenzhen

Program Schedule

Timezone: Eastern Time (ET/EST), UTC-5

1:00 pm – 5:10 pm ET
12:00 pm – 4:10 pm CT
11:00 am – 3:10 pm MT
10:00 am – 2:10 pm PT

Time	Title
1:00 – 1:05 pm ET	Opening Remarks and Welcome
	Dingwen Tao, Sheng Di, Xin Liang
1:05 – 1:50 pm ET	Keynote Speech: High Ratio, Speed and Accuracy Customizable Scientific Data Compression with SZ
	Franck Cappello, Argonne National Laboratory
1:50 – 2:15 pm ET	S15202: Efficient loading of reduced data ensembles produced at ORNL SNS/HFIR neutron time-of-flight facilities
	William Godoy, Andrei Savici, Steven Hahn, and Peter Peterson
2:15 – 2:40 pm ET	BigD302: LCTL: Lightweight Compression Template Library
	Juliana Hildebrandt, André Berthold, Dirk Habich, and Wolfgang Lehner
2:40 – 3:05 pm ET	S15205: On Large-Scale Matrix-Matrix Multiplication on Compressed Structures
	Sudhindra Gopal Krishna, Aditya Narasimhan, Sridhar Radhakrishnan, and Richard Veras
3:05 – 3:25 pm ET	S15206: Tuning Parallel Data Compression and I/O for Large-scale Earthquake Simulation
	Houjun Tang, Suren Byna, N. Anders Petersson, and David Mccallen
3:25 – 3:30 pm ET	Coffee Break
3:30 – 3:55 pm ET	S15207: Using Neural Networks for Two Dimensional Scientific Data Compression
	Lucas Hayne, John Clyne, and Shaomeng Li
3:55 – 4:20 pm ET	BigD312: Prototyping: Sample Selection for Imbalanced Data
	Edward Schwalb
4:20 – 4:45 pm ET	S15204: Fast Machine Learning in Data Science with a Comprehensive Data Summarization
	Sikder Tahsin Al-Amin and Carlos Ordonez
4:45 – 5:05 pm ET	S15203: Improving Lossy Compression for SZ by Exploring the Best-Fit Lossless Compression Techniques
	Jinyang Liu, Sihuan Li, Sheng Di, Xin Liang, Kai Zhao, Dingwen Tao, Zizhong Chen, and Franck Cappello
5:05 – 5:10 pm ET	Closing Remarks

Participation

Participants can find the Zoom link to join the workshop through Underline (https://underline.io/events/222/sessions?eventSessionId=9588).