Forums: Define.xml
OD0010 (Missing XML declaration) is a complete OVERINTERPRETATION of as well the Define-XML standard as of the W3C XML standard itself.
There is no such rule in the XML world.
Of course having the XML declaration is a good habit, but it is not absolutely necessary, i.e. it is optional.
It is nothing more than an INDICATION for the parser (the software that processes the XML), that what is following might be XML, with the "encoding" giving an indication about what encoding might have been used for the characters (UTF-8 is the default anyway).
See e.g. https://stackoverflow.com/questions/7007427/does-a-valid-xml-file-require-an-xml-declaration/7007781
As the article explains, the confusion may come from a misinterpretation of the word "should" in the English language. "Should" means an expectation in the standards (W3C) world, not an obligation.
I am teaching XML for 25 years now (I have been a professor in medical informatics), and had hoped that such basic knowledge was generally present at software and standards developers ...
Hi Vitalij,
Please see CDISC Define-XML v2.0 documentation on page 51:
"5.3. Define-XML Specification Details
5.3.1. XML Header
All XML files must begin with an XML header, so the first line of a define.xml file must be an XML header. The XML header indicates to applications that the remainder of the file is XML and specifies character encoding it uses.
5.3.1.1. Example XML Header
<?xml version="1.0" encoding="UTF-8"?>
This example shows a define.xml using the "UTF-8" character encoding."
Note that the 2 most common reasons for OD0010 valdiation message are
1. Missing XML declaration (header)
2. Invisible characters in front of it
Kind Regards,
Sergiy
P.S. Note that OD0010 rule is Reject issue for PMDA submissions.
I think, technically, and formally looking at the XML specification, Jozef is right. Having said that, I don’t see a good reason to omit the XML declaration. Maybe that is why we required it in the Define-XML specifications 2.0 and also still in 2.1. Or maybe we did not realize that, formally, it is not required.
Anyway, I like the following article, and completely agree with it: https://www.ibm.com/developerworks/library/x-tipdecl/index.html
This is just my personal view.
Best, Lex
Thanks Lex! And indeed, a very good article!
So my proposal is to make it a "warning" - it should not cause panic or stop people/machines to parse the define.xml file.
Invisible characters before the XML declaration: XML is all about characters, but one can indeed have e.g. a carriage return before the xml declaration, which indeed makes the XML essentially "not well formed". This can however easily be detected by opening the define.xml in an XML editor (not a bad idea ...), and check that the xml declaration starts on the first line and is completely to the left.
The most serious problems with XML files I have seen in the last 30 years (yes, that's my experience with XML) is that people set encoding="UTF-8" in the header, but that the real encoding of the characters is another (e.g. Latin). My experience is that this often happens when people start from an Excel or Word or similar file.
I am getting this message when there is a proper XML declaration that is the first entry in the file and with no leading characters.
<?xml version="1.0" encoding="UTF-8"?>
I also ran the same define.xml through v2.2.0 of the define.xml validator and didn't get this message.
thoughts?
Hi Jennifer,
As always, a good way to find out whether something is really wrong in your define.xml, or whether it is a bug in the software, is to open the define.xml in an XML editor. Each such XML editor that I know has a button "check whether valid XML" or "check whether well-formed". Additionally, you can of course always do an XML-Schema validation.
If there are some (even hidden) invalid characters or so, the XML editor will find out, and show you where.
I was just using MS XML Notepad, no errors detected, and nothing I could see in simple Notepad either. Out of frustration, I ended up deleting the text and retyping it back in, and now no error from Pinnacle, so I guess there was something "hidden" that the simple editors I used don't display. Thanks very much for your reply.
Ok - solved! But of course curious people also want to know why ... ;-)
Does MS XML Notepad still exist? I used it many many years ago ...
If you need some information about good XML editors (at an affordable price), just drop me a mail.
But one can also check XML validity programatically (Java, C#, SAS, ...), For the latter, Lex can probably tell you more...
Hi Jennifer,
OD0010 is a new issue which has just been found and fixed recently. I am not Java programmer. Therefore, here is my best interpretation:
Most likely, your Define.xml file contains a BOM (Byte Order Marker) character immediately before the XML header. You cannot see this character in XML or text editor. You need to use hex editor instead. This is an uncommon but valid scenario and the BOM is being incorrectly flagged as illegal due to a bug in the Java Stream implementation (https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4508058).
P21 validation engine is Java-based and inherits this old java bug. We had a special procedure to work around it. Recently when we have changed an application packaging process, it missed one of those changes relates to the Java libraries used to open and parse XML files.
A bug fix will be available in the next patch or major release whatever will be first.
Meanwhile, you can either explain this OD0010 validation message as a bug or remove BOM character:
Sorry for inconvenience.
Kind regards,
Sergiy
As a Java and XML specialist with over 25 years of experience in both, I must disagree with Sergiy about the statement that a (always hidden) BOM character cannot be detected by an XML editor. See e.g. https://www.oxygenxml.com/doc/versions/22.0/ug-editor/topics/preferences-encoding.html.
Also be aware that the bug in Java has been resolved and closed for many years. The link that Sergiy provides states "This bug is not available". I don't know what exact Java libraries or version the validator uses, but the one blamed to be the cause for OD0010 is surely outdated. For parsing in Java, I would strongly recommend Saxon-HE or higher.
Furthermore, I would STRONGLY discourage people using MS NotePad for editing XML files, as my long experience is that it easily introduces BOM or encoding errors. Use a real XML editor or at least NotePad++. An XML editor such as oXygen, Liquid XML Studio or any other is always the better choice.
Here is a good overview: https://en.wikipedia.org/wiki/Comparison_of_XML_editors.
Hi P21 team,
could you please clarify why this ERROR message named in the subject line has been added as i did not find anything in CDISC Define Specification?
Thanks,
Vitalij