Data Helix: Bringing New Technologies and Approaches to Data Generation
The generation of representative test and simulation data is a challenging and time-consuming task. With ever increasing regulations and controls it is getting harder to use aged production data to thoroughly test these applications. Data Helix is a new open source project designed to streamline the process of generating large volumes of data for simulation and test purposes.
Data Helix is comprised of two components, each of which are currently under active development. The Generator (in beta) creates data, using various generation strategies, based on a JSON profile which describes the shape of the data. The Profile (still in design) examines source data and uses a number of machine learning techniques in order to derive a profile, which can then be supplied to the Generator.
In this talk, we'll present both Data Helix component and discuss its journey through the FINOS contribution process. We are actively looking for participants!