Which variable (and which datasets) are you talking about in your case?
The Pinnacle Community edition could throw this error for most SDTM variables that aren't tied to the 8, 20, 40 character length restrictions. But the variables where this would be at odds with the TCG recommendation would be for e.g. IDVAR, IDVARVAL etc. which appear in more than one domain.
The TCG states:
"The allotted length for each column containing character (text) data should be set to the maximum length of the variable used across all datasets in the study except for suppqual datasets. For suppqual datasets, the allotted length for each column containing character (text) data should be set to the maximum length of the variable used in the individual dataset"
IDVAR and IDVARVAL are variables of the SUPPQUAL datasets (SUPPLB, SUPPVS, ...) so in those cases the rule applies to the individual dataset, not to the whole study.
Also see: http://cdiscguru.blogspot.com/2016/12/how-sas-xpt-works-well-inefficient.html
They are also in CO & RELREC
Thanks for the reference to the blog post.
I agree,
Also there I do not see a reason to coordinate the IDVAR,IDVARVAL lengths between CO and RELREC, although the TCG doesn't explicitely mention such an exception. There are more statements in the TCG that are not very accurate, like "... if the define.xml cannot be printed" - I can guarantee you that EVERY define.xml is printable as XML is text and every text can be printed (of course they mean something else ...)
What about in the case of EPOCH or VISIT which are common across multiple parent domains and may not always have the same max length? Based on the TCG, we are setting the length of the variables to the max length found in all domains but this is showing up as an error in Pinnacle.
Text from FDA TCG:
The allotted length for each column containing character (text) data should be set to the
maximum length of the variable used across all datasets in the study except for suppqual
datasets.
Hi Pryanka,
In the 21st century, this rule is ridiculous! It is due to uncapability of some regulatory authorities to trim values when loading them into software. Even IBM mainframes in the time of punch cards did better.
Also in the case of VISIT and EPOCH, I would set the length equal in all domains/datasets to the maximum across the datasets. This also makes sense in the define.xml as one then can define one single "VISIT" ItemDef, and one single "EPOCH" ItemDef, as the "length" is set at the level of the ItemDef. It also makes sense for the case of combining datasets by primitive software.
So, if a validating software throws an error if in one of all the datasets the longest value is smaller than in the others (as not all possible values are represented in that single dataset), I would consider that as a false positive.
The rule itself has its origin in the use of outdated SAS Transport. It was introduced as sponsors were setting the value of 200 for each variable, meaning that SAS Transport files than contain huge amounts of blanks, as in SAS Transport, each value is padded with blanks up to the maximum provided in the header of the file. That has lead to very large files which seemingly cannot be handled by the regulatory authorities. 20 years ago, when working in bioinformatics, we had files of terabyte size, and we did not have trouble with them.
Essentially, the only difference, besides the use of carton, with punch cards, is that in punch cards this maximum value was 80 in all cases.
We really should move to XML, JSON and/or RDF for transport, these are formats that also allow API usage and RESTful web services. XPT doesn't.
Thanks for the context on this Jozef.
Until technology can catch up, is there any update that could be made to the reports to adjust for this in order to avoid all these errors? I understand deeming them as false positive at this time, but since they are categorized as errors, they really shouldn't be showing up considering the FDA guidance is being followed.
I cannot answer this Priyanka, I am not working for Pinnacle21. This is something that needs to be answered / fixed by Pinnacle21.
The Pinnacle Community edition version 2.2.0 throws an error for the rule SD1082, ‘Variable length is too long for actual data’, when the variable length is larger than the max data length within a single file. This is not consistent with the TCG (version 3.3) expectation that, ‘The allotted length for each column containing character (text) data should be set to the maximum length of the variable used across all datasets’. Is there a plan to update the behavior of the software to address this inconsistency and flag errors only when the variable length is greater than the max data length across all datasets?