OpenCDISC Validator Performance and Scalability Guide
While our developers continuously work hard to make improvements to increase performance in the OpenCDISC codebase, there are also many things that a user can do to get the most out of running the OpenCDISC Validator in their environment. To make this process easier to understand, we'll cover what some of these customizations are and what benefits they could provide to the validation process.
Using Multiple Processors/Cores
OpenCDISC Validator supports multicore dataset processing, where more than one dataset can be validated simultaneously on computers which have multiple processors/logical cores. However, given that most user machines only have one or two cores and that the validation process is very processor-intensive, the default configuration specifies that only one dataset should be processed at once.
Users with more powerful operating environments can easily change this setting to take full advantage of their hardware by going into the Help -> Preferences -> Performance
and changing the various fields. Modifying the value of Thread Count
will change how many cores the program uses, and it can be set to a numerical value representing a fixed number of logical threads to use, or to the value auto
to allow the program to automatically determine and use the maximum number of available threads.
Note: Processing the datasets can be extremely processor-intensive, so we recommended against using all cores in situations where other critical applications' performance may be compromised.
Increasing Available Memory
Given that the OpenCDISC Validator does all of its processing without the help of databases or temporary files, the memory demands for very large datasets can be high. Currently, the default memory limit for a validation run is 1024 MB (1 GB), a fairly safe "standard" value which allows the Validator to run on most modern workstations and laptops. However, this setting can cause some limitations on the size of the datasets that can be processed.
Our development team has performed tests which suggest that the approximate maximum size of a single dataset which can be handled with this memory limit is on average around two million records. The number of datasets that can be processed overall is not impacted by available memory, although more memory is required if you choose to process more than one dataset at once using the multithreading technique described in the previous section.
Users with machines with several gigabytes of RAM installed may find it useful to increase this memory limit to support studies containing large datasets. For instance, if you are running a 32-bit version of Windows® as your operating system, this setting can only be increased to about 1500 MB. On a 64-bit OS, however, it is possible to increase this value to about 75 percent of available RAM.
To make this change, edit the Maximum Memory
field to a higher value. For instance, if we wanted to increase the maximum available memory to three gigabytes, we would replace the 1024
with 3072.
Important: Keep in mind that these settings are read when the program is first launched, so changes to the files mentioned in this document require a restart of the Validator to become effective.