Hi Fugui,
Defining requirements for variable lengths is a very interesting and controversial topic. This week we had a productive discussion during the FDA/PhUSE Data Validation workgroup meeting. There are several basic and mutually exclusive needs or risks.
1. Datasets should be re-sized (not compressed!) to minimal size because … (very long list of reasons, e.g., data transfer, archiving storage, analysis tools limitations, hardware limitations, etc.)
2. Variables in SAS XPORT datasets should have consistent/pre-defined length to avoid data truncation during data integration. There are many un-perfect and un-complete recommendations how to achieve this. E.g., --TESTCD variables lengths should be always 8 Chars.
3. Variable length should be defined by data collection process. E.g., it can be set to maximum length of value in your data collection or control terminology codelist.
Note, that variable length issue is not SDTM compliance. It’s very specific to SAS format used for regulatory submission.
The decision we made:
There are too many use cases with different requirements. FDA wants to receive re-sized data. They can handle data truncation risk during data integration.
OpenCDISC validator will remove all current Variable Length related checks. Only one new Rule will be introduced. “Before sending data to FDA you need to re-size you data by each variable to actual maximum value.”
Note, that it’s needed only for actual data transfer to FDA. You can do whatever you want before this event. It’s very easy to do.
Profiles will be updated soon. Watch for v1.4.1 release in the next weeks.
Please let us know it you have any questions?
Kind Regards,
Sergiy
Note: A rant, but not directed at Serigy and OpenCDISC
I have to say this is a poor suggestion by the agency. I am a strong proponent for producing a define upfront because I equate it to a data dictionary.
By deducing from the statement above, I will be forced to circle back to my prospective define and update to a different length.
For example, my globle standard AE CRF allows 200 characters for verbatim term. So, the define is default to have length 200. I will then, for each study for submission, update the define file to adjust for the maximum length.
The agency should investigate something else instead of making us do silly adjustment suggested. As a matter of fact, SAS V5 XPORT files often have >90% compression ratio. SAS dataset compression option may be propriatary. However, there are many open source zip archive software out there they can consider accepting.
I hope you see the trouble.
Hi Sergiy,
Thank you very much for the quick response on the issue.
The Minimum length requirement by FDA is certainly a very strange one, guess, we all should go back to the floppy disk not using the 64 bit and the newer window OS/software which is no longer thinking space is an issue.
Do we know when will the FDA enforce this rule? Is there a reference FDA document to share this? As you have indicated that this is SDTM SDTM compliant issue, is the FDA requirement for all fields?
Thank you again for the quick response.
Best Regards,
Fugui
Hi Fugui, There was a recent discussion during FDA/PhUSE CSC event on varibale length requirements. Major FDA people concerns are that many sponsors still use 200 chars for most variables. As a results there are many submissions datasets with 80-90% wasted space and size >2-4 GB. Due to reg. compliance requirements FDA team cannot modify Sponsors' submission data and have to work with huge datasets. It includes archiving of “air pumped” data. Huge size of datasets has direct impact to review process efficiency. In my experience reading large xpt files may take more time than calculation itself. It’s very annoying. Some tools have limitations on dataset file size. Some FDA reviewers may not have the best recent laptops with unlimited RAM memory. Transfer of huge files over a slow network might be a real nightmare. “CDER Common Data Standards Issues Document v1.1 2011-12” was published a year ago and is considered by FDA team as Guidance. There is a special “File Size Issues” section. It will be a document update soon. Datasets and Variable lengths issues are definitely not the SDTM compliance issue. It’s a specific FDA submission requirement. You do not need to worry about this until actual data transfer to the FDA. For your internal data life-cycle processes you can define your own business rules based on company internal policy, SOPs, etc. Kind Regards, Sergiy
Glad to hear that this will be changing soon. This rule being classified as an error for the AE domain does not make much sense !!!
I can understand that the FDA may want to minimize these size of these datasets but I think there another way to do this that is more global and avoids the error flag on the validation report
Hi David,
What is your vision for a handling dataset wasted space issue?
Thanks,
Sergiy
In Opencdisc version 1.4, quite a few rules have been added, one of them is SD1082.
SD1082-- variable length is too long for actual dataThe severity of the SD1082 is Error.
When generating the SDTM datasets for submission, which version of the Opencdisc should we use?
How should we deal with the new rules introduced?