Large streaming data are an essential part of computational modeling and network communications. Yet such data are generally intractable to store, compute, search, and retrieve. This dynamic data reduction algorithm detects redundant patterns and reduces data size by exploiting the exchangeability of measurements; it exploits both redundancies of data in a time series and redundancies of data distribution. The Berkeley Lab technology can be used for streaming data in high frequency as well as stored data.
A common technique in network monitoring and other practices to reduce the size of collected monitoring measurements is to store a random sample, such as one out of 1,000 network packets. The drawbacks to this approach are lack of scalability for high frequency streaming data and no guarantee of reflecting underlying data distribution. Another method is to use the exact or approximate data compression technique, such as spectral analysis. However, current data compression methods require use of either whole data or data chunks of a designated size; these methods are impractical for large streaming data in high frequency. Berkeley Lab’s algorithm resolves drawbacks to the above approaches.
Applications and Industries
Measurement collection mechanisms for network communications and routers
Statistical analysis, e.g., financial markets, energy use, social network media
Modeling, e.g., environmental studies, nuclear fusion simulations
Science and engineering experiments
Efficient data size reduction – 47-80% in tests, with much higher potential
Retention of data accuracy
Effective on streaming or stored (offline) data