Introduction
Those who deal with data transfer or document exchange within or across organizations with heterogeneous platforms will certainly accept and appreciate the need and power of XML. I am not going to delve into the merits of XML. I will, however, address a simple but powerful schema concept called XSD or XML Schema Definition.
- What is XSD Schema?
- What are the advantages of XSD Schema?
- What is important in XSD Schema?
What Is a Schema?
A schema is a "Structure", and the actual document or data that is represented through the schema is called "Document Instance". Those who are familiar with relational databases can map a schema to a Table Structure and a Document Instance to a record in a Table. And those who are familiar with object-oriented technology can map a schema to a Class Definition and map a Document Instance to an Object Instance.
A structure of an XML document can be defined as follows:
- Document Type Definition (DTDs)
- XML Schema Definition (XSD)
- XML Data Reduced (XDR) -proprietary to Microsoft Technology
We are specifically going to work with XML Schema Definitions (XSD).
What Is XSD?
XSD provides the syntax and defines a way in which elements and attributes can be represented in a XML document. It also advocates that the given XML document should be of a specific format and specific data type.
XSD is fully recommended by W3C consortium as a standard for defining an XML Document. To know more about latest information on XSD, please refer the W3C site (www.w3.org).
Advantages of XSD
So what is the benefit of this XSD Schema?
- XSD Schema is an XML document so there is no real need to learn any new syntax, unlike DTDs.
- XSD Schema supports Inheritance, where one schema can inherit from another schema. This is a great feature because it provides the opportunity for re-usability.
- XSD schema provides the ability to define own data type from the existing data type.
- XSD schema provides the ability to specify data types for both elements and attributes.
Case Study
ABC Corp. a fictitious software consultancy firm, which employs around 25 people, has been requested by its payroll company to submit employee information, which includes the Full Time and Part Time consultants, electronically in an XML format to expedite payroll processing.
The Payroll Company told ABC Corp. the following information will be needed for the Full Time and Part Time Employees.
Employee Information
SSN Name DateOfBirth EmployeeType Salary |
Here is the actual XML document for the above information.
<?xml version="1.0" ?>
- <Employees xmlns="http://www.abccorp.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.abccorp.com/employee.xsd">
- <Employee>
<SSN>737333333</SSN>
<Name>ED HARRIS</Name>
<DateOfBirth>1960-01-01</DateOfBirth>
<EmployeeType>FULLTIME</EmployeeType>
<Salary>4000</Salary>
</Employee>
</Employees>
Here is the XML Schema for the above Employee Information
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN="xsd:string>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:string"/>
<xsd:element name="Salary" type="xsd:long"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Let's examine each line to understand the XSD Schema.
Schema Declaration
For an XSD Schema, the root element is <schema>. The XSD namespace declaration is provided with the <schema > to tell the XML parser that it is an XSD Schema.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
The namespace that references an XSD Schema is the W3C recommended version of XSD. The "xsd:" prefix is used to make sure we are dealing with XSD Schema, but any prefix can be given.
Element Declaration
"Element" is the important part of the schema because it specifies the kind of information. The following element declaration in our example is going to deal with Employee information.
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
An "element" declaration in XSD should have the following attributes.
name: This attribute specifies the name of an element. "Employee" in our example.
type: This attribute refers to Simple Type or Complex Type, which will be explained very soon in this article.
minoccurs: This attribute will specify how many elements at a Minimum will be allowed. The default is '0", which means it is an optional element.
Assume that minoccurs attribute carries a value of "1". This would mean the "Employee" element should be specified at least once in the XML document.
maxoccurs: This attribute will specify how many elements at a Maximum will be allowed in an XML document. Assume that maxoccurs attribute carries a value of "2". This would mean the "Employee" element should NOT be specified more than twice.
To summarize, let's say the minoccurs is "1" and maxoccurs is "2" for the "Employee" element. This means there should be atleast one instance of the "Employee" element in the XML document, but the total number of instances of "Employee" element shouldn't exceed two.
If you tried passing three instances of "Employee" element in the XML document, the XML parser will throw an error.
To allow the "Employee" element to be specified an unlimited number of times in an XML document, specify the "unbounded" value in the maxoccurs attribute.
The following example states that the "Employee" element can occur an unlimited number of times in an XML document.
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
Complex Type
An XSD Schema element can be of the following types:
In an XSD Schema, if an element contains one or more child elements or if an element contains attributes, then the element type should be "Complex Type"
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN="xsd:string>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:string"/>
<xsd:element name="Salary" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
The Employee element has SSN, Name, Date of Birth, Salary and Employee type, which are specified as child elements. As a result Employee Element must be defined as a Complex Type because there are one or more elements under Employee element.
xsd:sequence
The <xsd:sequence> specifies the order in which the elements need to appear in the XML document. This means the elements SSN, Name, DateOfBirth, EmployeeType and Salary should appear in the same order in the XML document. If the order is changed, then the XML parser will throw an error.
Simple Type
In an XSD Schema an element should be referred to as a Simple Type when you create a User Defined type from the given base data type.
Before going further into Simple Types, I would like to mention that XSD provides a wide variety of base data types that can be used in a schema. A complete description of the data types is beyond the scope of this article.
I would suggest reading at the following Web sites to learn more about data types.
- http://www.w3.org
- http://msdn.microsoft.com search for XSDs.
Some of the base data types, which we used in the "Employee" element examples, are:
- xsd:string
- xsd:int
- xsd:date
Knowing that Simple Type Elements can specify User-defined data types in XML Schema, the real question is how do we know where to use a specific Simple Type?
Let's take a look at the schema again.
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN="xsd:string>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:string"/>
<xsd:element name="Salary" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
Assume the Payroll processing company wants the social security number of the employees formatted as "123-11-1233".
For this we will create a new data type called "ssnumber".
The following is the code to accomplish the above requirement.
<xsd:simpleType name="ssnumber">
<xsd:restriction base="xsd:string">
<xsd:length value="11">
<xsd:pattern value="\d{3}\-\d{2}\-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
To start with, we should provide the name of the Simple Type, which is "ssnumber".
The restriction base specifies what is the base data type in which we derive the User Defined Data type, which is the "string" data type in the above example.
To restrict the social security number to 11 characters, we require the length value to be "11".
To ensure the social security number appears in the "123-11-1233" format, the pattern is specified in the following format.
<xsd:pattern value="\d{3}\-\d{2}\-\d{4}"/>
To explain the pattern,
\d{3} specifies that there should be three characters in the start. Followed by two characters after the first "-" and finally followed by four characters after the second "-".
Incorporating Simple Types into Schema
Now that we know what Simple Type means, let us learn how to effectively incorporate Simple Type into an XSD Schema.
First of all, Simple Types can be global or local.
Let's first look at global usage of Simple Type Element "ssnumber".
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN= type ="xsd:ssnumber>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:string"/>
<xsd:element name="Salary" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name="ssnumber">
<xsd:restriction base="xsd:string">
<xsd:length value="11">
<xsd:pattern value="\d{3}\-\d{2}\-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
In the above example, the child element name "SSN" of parent element "Employee" is defined as user defined data type "ssnumber".
The Simple Type element "ssnumber" is declared outside the Employee element, which means if we have to define another element, say for ex. "Employer" inside the <xsd:schema>, then the "Employer" element can still make use of the "ssnumber" data type if it's needed.
Let's examine a different case where the Simple Type element "ssnumber" will be specific to Employee, and it is not going to be needed elsewhere at all. Then the following schema structure accomplishes this.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN="xsd:string>
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:length value="11">
<xsd:pattern value="\d{3}\-\d{2}\-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:string"/>
<xsd:element name="Salary" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
In the above example, all restrictions such as max length, pattern and base data type are declared inline, and it immediately follows the SSN element.
Understanding Schema Flexibility
So far we have seen how to create a schema by:
a. Declaring and Using Complex Element Type b. Declaring and Using Simple Element Type c. Understanding Global scope and Local scope of a given element
I will now explain the flexibility XSD Schema can provide by extending our schema example.
Let's validate further by adding a "FullTime" or "PartTime" Employee Type element.
To provide the validation, we should create a Simple Element Type called "emptype".
<xsd:simpleType name="emptype">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="fulltime"/>
<xsd:enumeration value="parttime"/>
</xsd:restriction>
</xsd:simpleType>
The above schema will successfully create a Simple Type element with base type as "string" data type. The enumeration values are specified in enumeration attributes, which will basically restrict within two values. The enumeration attributes in an XSD Schema provides the ability to define an element in such a way that it should fall between the values given in the enumeration list.
Let us incorporate emptype Simple Type element in our main Employee schema.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN="xsd:ssnumber>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:emptype"/>
<xsd:element name="Salary" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name="ssnumber">
<xsd:restriction base="xsd:string">
<xsd:length value="11">
<xsd:pattern value="\d{3}\-\d{2}\-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="emptype">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="fulltime"/>
<xsd:enumeration value="parttime"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
In the above example, the child element name "EmployeeType" of parent element "Employee" is defined as user defined data type "emptype".
The Simple Type element "emptype" is declared outside the Employee element which is global to elements that fall under <xsd:schema>
So we have defined two simple element types: one is emptype, which basically restricts the value within two enumerated values "fulltime" or "parttime". And another is "ssnumber" which restricts the length of the value to 11 and it should be of "111-11-1111" pattern.
I highlighted the words "enumerated" "length" and "pattern" to emphasize that those words are referred to as Facets.
In XSD Schema, facets provide the flexibility to add restriction for a given base data type. In our examples above, the base data type specified is "string".
Similarly, facets are available for other data types like "int". A few facets available for "int" data type are Enumerated, Minexclusive, Mininclusive, and Pattern.
Let's consider a new requirement. The payroll company wants to process tax information, so they want the employee information which is listed above plus the employee's state and zip code information. All the information should be under the separate header "EmployeeTax".
The above requirement compels us to restructure the schema.
First we will break down the requirement to make it simple.
a. Payroll Company wants all the information listed under "Employee" element. b. Payroll Company wants state and zip of the given Employee. c. Payroll company wants (a) and (b) in a separate header "EmployeeTax"
Fortunately XSD Schema supports object-oriented principles like Inheritance hierarchy. This principle comes handy in our requirement.
The following structure can quite easily accomplish the above requirement.
<xsd:complexType name="EmployeeTax"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexContent>
<xsd:extension base="xsd:Employee">
<xsd:sequence>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zipcode" type="xsd:string"/>
</xsd:sequence>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
</xsd:schema>
In the above code, we started with the name of the Complex Type, which is "EmployeeTax".
The following line is very important. It tells the XML Parser that some portion of EmployeeTax content is getting derived from another Complex Type.
<xsd:complexContent>
To refresh our memory, in simple element type definitions, we used restriction base, which mapped to base data types. In the same way, we need to specify the restriction base for the complex content. The restriction base for complex content should logically be another Complex Type. In our example it is "Employee".
<xsd:extension base="xsd:Employee">
Once the extension base has been defined, all the elements of the "Employee" element will automatically inherit to EmployeeTax Element.
As part of the business requirement, the state and zip code elements are specified which completes the total structure for EmployeeTax Element.
Referencing External Schema
This feature is very useful in situations where one schema has functionality that another schema wants to utilize.
Take a case where the payroll company for ABC Corp. needs some information about the Employers, such as EmployerID and PrimaryContact in a separate XML document.
Assume EmployerID format is the same as Employee SSN format. Since we have already defined the validation for Employee SSN, there exists a valid case for using the Employee schema.
The first step in using different schema is to "include" the schema.
To include the schema, make sure the target namespace is the same as your current working location.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.abccorp.com">
<xsd:include schemaLocation="employee.xsd"/>
<xsd:element name="Employer"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref ="ssnumber"/>
<xsd:element name="PrimaryContact"
type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Please note the include statement, which references the schema location. Make sure the employee.xsd file exists in the same target namespace location.
Once included, the "Employer" element references the ssnumber global element in the same manner as it had been declared within the document itself. Then an additional primary contact element, which is specific to "Employer" element, is defined after the ssnumber element reference.
Annotations
Comments have always been considered a good coding convention. XSD Schema provides a commenting feature through the <Annotation> element.
The <Annotation> element has 2 child elements.
- <documentation>
- <appInfo>
<documentation> element provides help on the source code in terms of its purpose and functionality.
<appinfo> element provides help to the end users about the application.
The following schema describes the usage of the Annotation element.
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Employee"
minOccurs="0"
maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="SSN="xsd:ssnumber>
<xsd:annotation>
<xsd:documentation>
The SSN number identifies each employee of ABC CORP
</xsd:documentation>
<xsd:appInfo>
Employee Payroll Info
</xsd:appInfo>
</xsd:annotation>
<xsd:element name="Name" type="xsd:string"/>
<xsd:element name="DateOfBirth" type="xsd:date"/>
<xsd:element name="EmployeeType" type="xsd:emptype"/>
<xsd:element name="Salary" type="xsd:int"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:simpleType name="ssnumber">
<xsd:restriction base="xsd:string">
<xsd:length value="11">
<xsd:pattern value="\d{3}\-\d{2}\-\d{4}"/>
</xsd:restriction>
</xsd:simpleType>
<xsd:simpleType name="emptype">
<xsd:restriction base="xsd:string">
<xsd:enumeration value="fulltime"/>
<xsd:enumeration value="parttime"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:schema>
Conclusion
I tried to cover a few interesting features of XSD Schema. But there is a whole lot of information about XSD schema in
|