Hi Ajay,
A common reason for such case is a presence of some invisible characters (like leading <space> or trailing <new line>) in USUBJID variable value.
Regards,
Sergiy
This is an intrinsic XPT format problem: in some cases a trailing space in USUBJID is unavoidable.
For example, when USUBJID = MyStudy-Subject-1, MyStudy-Subject-2, ... MyStudy-Subject-11, ...
Then the length for USUBJID must be 18. For the first 9 subjects (MyStudy-Subject-1, ...) there are however only 17 characters, and IT WILL BE SAS-XPT ITSELF adding a space after it because of the fixed-field-length format that XPT is using. In such cases, the space at the end is unavoidable.
Or should SDTM have an additional (again stupid) rule that all USUBJID values must be of the same length?
This problem can easily be avoided by adding "trim()" in the java code, as every good java programmer always first does when doing string comparison of strings that come from input fields.
Time to completely move away from XPT anyway. It is a scandal that the FDA still requires us to use such a 30 year old, completely outdated format.
Jozef, we can not blame everything on XPT. This issue does not have to do with the XPT format. SAS has a trim() function too!
Hi Lex, I did not want to blame XPT for that (solely). I blame that the software developers who wrote the code did not recognize this possible problem, and do not seem to use the trim() method on the string they read in when doing string comparison - basic Java programming practice ...
What I meant is that in Transport-5 (XPT) format, a variable value always takes the same number of bytes, and which is defined in the header. For example, if one has defined that USUBJID is 18 long (as needed in the example), and you put "MyStudy-Subject-1", as the value, which takes 17 places, your software (whatever it is) then still needs to add a space after the 17th character because of the fixed length fields in XPT. So even when the software did a "trim()" before, the 18 character, a blank, still needs to be added to complete the 18-length field. (see the TS-140 specification for those wanting to find out the details). So having that blank at the end is unavoidable in such a case.
One can imagine what that does to the file size for the case of 1,000,000 VS records where one VSORRES is a text of 200 long, and all other VSORRES values have less than 10 characters. In that case, you (or your software) will still need to set the length of VSORRES to 200 (the maximum), and all other VSORRES except for the one with the long text will have lots of blanks in it, because for each of them, the field needs to be completely filled up with blanks to the 200 characters.
So when I read XPT files into my software programs, the first thing I do on each variable value is ... trim().
Below Error is from v2.2.0 P21 report. Seems like usubjid are identical in DM and other dataset. Any particular reason?
USUBJID, SUB:SITEID XXXX-XXXXX, null SD0064 FDAC040 Subject is not present in DM domain