Mechanical MNIST – Distribution Shift

Yuan, LingxiaoPark, Harold S.Lejeune, Emma2022-06-012022-06-012022https://hdl.handle.net/2144/44485More details about the data description can be found in the paper "Towards out of distribution generalization for problems in mechanics" (link forthcoming). All code necessary to reproduce the metamodels demonstrated in the manuscript are available on GitHub (https://github.com/lingxiaoyuan/ood_mechanics). For questions, please contact Emma Lejeune (elejeune@bu.edu). The Mechanical MNIST – Distribution Shift dataset contains the results of finite element simulation of heterogeneous material subject to large deformation due to equibiaxial extension at a fixed boundary displacement of d = 7.0. The result provided in this dataset is the change in strain energy after this equibiaxial extension. The Mechanical MNIST dataset is generated by converting the MNIST bitmap images (28x28 pixels) with range 0 - 255 to 2D heterogeneous blocks of material (28x28 unit square) with varying modulus in range 1- s. The original bitmap images are sourced from the MNIST Digits dataset, (http://www.pymvpa.org/datadb/mnist.html) which corresponds to Mechanical MNIST – MNIST, and the EMNIST Letters dataset (https://www.nist.gov/itl/products-and-services/emnist-dataset) which correspond to Mechanical MNIST – EMNIST Letters. The Mechanical MNIST – Distribution Shift dataset is specifically designed to demonstrate three types of data distribution shift: (1) covariate shift, (2) mechanism shift, and (3) sampling bias, for all of which the training and testing environments are drawn from different distributions. For each type of data distribution shift, we have one dataset generated from the Mechanical MNIST bitmaps and one from the Mechanical MNIST – EMNIST Letters bitmaps. For the covariate shift dataset, the training dataset is collected from two environments (2500 samples from s = 100, and 2500 samples from s = 90), and the test data is collected from two additional environments (2000 samples from s = 75, and 2000 samples from s = 50). For the mechanism shift dataset, the training data is identical to the training data in the covariate shift dataset (i.e., 2500 samples from s = 100, and 2500 samples from s = 90), and the test datasets are from two additional environments (2000 samples from s = 25, and 2000 samples from s = 10). For the sampling bias dataset, datasets are collected such that each datapoint is selected from the broader MNIST and EMNIST inputs bitmap selection by a probability which is controlled by a parameter r. The training data is collected from two environments (9800 from r = 15, and 200 from r = -2), and the test data is collected from three different environments (2000 from r = -5, 2000 from r = -10, and 2000 from r = 1). Thus, in the end we have 6 benchmark datasets with multiple training and testing environments in each. The enclosed document “folder_description.pdf'” shows the organization of each zipped folder provided on this page. The code to reproduce these simulations is available on GitHub (https://github.com/elejeune11/Mechanical-MNIST/blob/master/generate_dataset/Equibiaxial_Extension_FEA_test_FEniCS.py). en-USThis dataset is distributed under the terms of the Creative Commons Attribution-ShareAlike 4.0 License. The original MNIST bitmaps are from Yann LeCun (Courant Institute, NYU) and Corinna Cortes (Google Labs, New York) on PyMVPA (http://www.pymvpa.org/datadb/mnist.html) and are licensed under https://creativecommons.org/licenses/by-sa/4.0. The EMNIST bitmaps are from Gregory Cohen, Saeed Afshar, Jonathan Tapson, and André van Schaik (The MARCS Institute for Brain, Western Sydney University) (https://www.nist.gov/itl/products-and-services/emnist-dataset, https://arxiv.org/abs/1702.05373). The finite element simulations were conducted by Lingxiao Yuan using the open source software FEniCS (https://fenicsproject.org).http://creativecommons.org/licenses/by-sa/4.0/MNISTMechanical engineeringHeterogeneous materialMechanical MNIST – Distribution ShiftDataset