Distributed Statistical Inference for Massive Data

Abstract: This project studies distributed statistical inference for a general type of statistics that

encompasses U-statistics and M-estimator in the context of massive data. When the data are stored

on multiple platforms, it is usually expensive and slow to do data communication. To deal with

this issue, we formulate the distributed statistics which can be computed distributively and hence

reduces computational time signicantly. We investigate properties of the distributed statistics from

the perspective of mean square error of estimation and their asymptotic distributions. In addition,

we propose two distributed bootstrap algorithms which are computationally effective and

consistent theoretically. Applications of our approaches and numerical studies are provided to support them.