일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
- 라디오
- SWT
- Java
- 그녀가말했다
- 퀄컴
- Wibro
- 차트쇼쇼쇼
- 이지형
- brew
- CDMA
- VoIP
- EV-DO Rev. B
- "명탐정 코난"
- 김장훈의who
- 민동현의토요명화
- 페이스북
- ETF
- 모던음악만만세
- 공정위
- 민동현
- itmusic
- 유희열의라디오천국
- 한국의 기획자들
- USIM
- 위피
- 김장훈
- HSDPA
- 사요
- 러시아
- 자바
- Today
- Total
zyint's blog
Easy Java/XML integration with JDOM, Part 1 본문
Document 생성
JDOM은 XML을 org.jdom.Document 클래스를 이용해서 생성할 수 있다. Document는 DocType, 다수의 ProcessingInstruction 객체, 루트 Element, 주석 등을 가질 수 있는 단순한 클래스이다.
- Document doc = new Document(new Element("rootElement"));
이미 존재하는 파일을 가져오려면
- SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(url);
Document를 데이터 소스를 이용해서 생성할 수 있다. 이는 org.jdom.input 패키지에 구현되어 있다. SAXBuilder와 DOMBuilder가 있다.
SAXBuilder는 SAX 파서를 이용해서 Documnet를 파일로부터 생성한다. the SAXBuilder listens for the SAX events and builds a corresponding Document in memory. That approach is very fast (basically as fast as SAX), and it is the approach we recommend. DOMBuilder is another alternative that builds a JDOM Document from an existing org.w3c.dom.Document object. It allows JDOM to interface easily with tools that construct DOM trees.
JDOM's speed has the potential to improve significantly upon completion of a deferred builder that scans the XML data source but doesn't fully parse it until the information is requested. For example, element attributes don't need to be parsed until their value is requested.
Builders are also being developed that construct JDOM Document objects from SQL queries, LDAP queries, and other data formats. So, once in memory, documents are not tied to their build tool.
The SAXBuilder and DOMBuilder constructors let the user specify if validation should be turned on, as well as which parser class should perform the actual parsing duties.
- public SAXBuilder(String parserClass, boolean validation);
- public DOMBuilder(String adapterClass, boolean validation);
The defaults are to use Apache's open source Xerces parser and to turn off validation. Notice that the DOMBuilder doesn't take a parserClass but rather an adapterClass. That is because not all DOM parsers have the same API. To still allow user-pluggable parsers, JDOM uses an adapter class that has a common API for all DOM parsers. Adapters have been written for all the popular DOM parsers, including Apache's Xerces, Crimson, IBM's XML4J, Sun's Project X, and Oracle's parsers V1 and V2. Each one implements that standard interface by making the right method calls on the backend parser. That works somewhat similarly to JAXP (Resources) except it supports newer parsers that JAXP does not yet support.
Outputting a document
You can output a Document
using an output tool, of which there are several standard ones available. The org.jdom.output.XMLOutputter
tool is probably the most commonly used. It writes the document as XML to a specified OutputStream
.
The SAXOutputter
tool is another alternative. It generates SAX events based on the JDOM document, which you can then send to an application component that expects SAX events. In a similar manner, DOMOutputter
creates a DOM document, which you can then supply to a DOM-receiving application component. The code to output a Document
as XML looks like this:
- XMLOutputter outputter = new XMLOutputter();
- outputter.output(doc, System.out);
XMLOutputter
takes parameters to customize the output. The first parameter is the indentation string; the second parameter indicates whether you should write new lines. For machine-to-machine communication, you can ignore the niceties of indentation and new lines for the sake of speed:
- XMLOutputter outputter = new XMLOutputter("", false);
- outputter.output(doc, System.out);
Here's a class that reads an XML document and prints it in a nice, readable form:
- import java.io.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
public class PrettyPrinter {
public static void main(String[] args) {
// Assume filename argument
String filename = args[0];
try {
// Build the document with SAX and Xerces, no validation
SAXBuilder builder = new SAXBuilder();
// Create the document
Document doc = builder.build(new File(filename));
// Output the document, use standard formatter
XMLOutputter fmt = new XMLOutputter();
fmt.output(doc, System.out);
} catch (Exception e) {
e.printStackTrace();
}
}
}
Reading the DocType
Now let's look at how to read the details of a Document
. One of the things that many XML documents have is a document type, represented in JDOM by the DocType
class. In case you're not an XML guru (hey, don't feel bad, you're our target audience), a document type declaration looks like this.
- <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The first word after DOCTYPE
indicates the name of the element being constrained, the word after PUBLIC
is the document type's public identifier, and the last word is the document type's system identifier. The DocType
is available by calling getDocType()
on a Document
, and the DocType
class has methods to get the individual pieces of the DOCTYPE
declaration.
- DocType docType = doc.getDocType();
System.out.println("Element: " + docType.getElementName());
System.out.println("Public ID: " + docType.getPublicID());
System.out.println("System ID: " + docType.getSystemID());
Reading the element data
Every XML document must have a root element. That element is the starting point for accessing all the information within the document. For example, that snippet of a document has <web-app>
as the root:
- <web-app id="demo">
<description>Gotta fit servlets in somewhere!</description>
<distributable/>
</web-app>
The root Element
instance is available on a Document
directly:
- Element webapp = doc.getRootElement();
You can then access that Element
's attributes (for example, the id above), content, and child Element
s.
Playing with children
XML documents are tree structures, and any Element
may contain any number of child Element
s. For example, the <web-app>
element has <description> and <distributable> tags as children. You can obtain an Element
's children with various methods. getChild()
returns null
if no child by that name exists.
- List getChildren(); // return all children
List getChildren(String name); // return all children by name
Element getChild(String name); // return first child by name
To demonstrate:
- // Get a List of all direct children as Element objects
List allChildren = element.getChildren();
out.println("First kid: " + ((Element)allChildren.get(0)).getName());
// Get a list of all direct children with a given name
List namedChildren = element.getChildren("name");
// Get a list of the first kid with a given name
Element kid = element.getChild("name");
Using getChild()
makes it easy to quickly access nested elements when the structure of the XML document is known in advance. Given that XML:
- <?xml version="1.0"?>
<linux:config>
<gui>
<window-manager>
<name>Enlightenment</name>
<version>0.16.2</version>
</window-manager>
<!-- etc -->
</gui>
</linux:config>
That code directly retrieves the current window manager name:
- String windowManager = rootElement.getChild("gui")
.getChild("window-manager")
.getChild("name") - .getText();
Just be careful about NullPointerExceptions if the document has not been validated. For simpler document navigation, future JDOM versions are likely to support XPath references. Children can get their parent using getParent()
.
Getting element attributes
Attributes are another piece of information that elements hold. They're familiar to any HTML programmer. The following <table> element has width and border attributes.
- <table width="100%" border="0"> </table>
Those attributes are directly available on an Element
.
- String width = table.getAttributeValue("width");
You can also retrieve the attribute as an Attribute
instance. That ability helps JDOM support advanced concepts such as Attribute
s residing in a namespace. (See the section Namespaces later in the article for more information.)
- Attribute widthAttrib = table.getAttribute("width");
String width = widthAttrib.getValue();
primitive type 형태로 속성값을 받을 수 도 있다.
- int width = table.getAttribute("border").getIntValue();
You can retrieve the value as any Java primitive type. If the attribute cannot be converted to the primitive type, a DataConversionException
is thrown. If the attribute does not exist, then the getAttribute()
call returns null.
혼합된 Element 내용 추출하기(Extracting mixed element content)
We touched on getting element content earlier, and showed how easy it is to extract an element's text content using element.getText()
. That is the standard case, useful for elements that look like this:
- <name>Enlightenment</name>
But sometimes an element can contain comments, text content, and child elements. It may even contain, in advanced documents, a processing instruction:
- <table>
<!-- Some comment -->
Some text
<tr>Some child</tr>
<?pi Some processing instruction?>
</table>
This isn't a big deal. You can retrieve text and children as always:
- String text = table.getText(); // "Some text"
Element tr = table.getChild("tr"); // <tr> child
That keeps the standard uses simple. Sometimes as when writing output, it's important to get all the content of an Element
in the right order. For that you can use a special method on Element
called getMixedContent()
. It returns a List
of content that may contain instances of Comment
, String
, Element
, and ProcessingInstruction
. Java programmers can use instanceof
to determine what's what and act accordingly. That code prints out a summary of an element's content:
- List mixedContent = table.getMixedContent();
Iterator i = mixedContent.iterator();
while (i.hasNext()) {
Object o = i.next();
if (o instanceof Comment) {
// Comment has a toString()
out.println("Comment: " + o);
}
else if (o instanceof String) {
out.println("String: " + o);
}
else if (o instanceof ProcessingInstruction) {
out.println("PI: " + ((ProcessingInstriction)o).getTarget());
}
else if (o instanceof Element) {
out.println("Element: " + ((Element)o).getName());
}
}
Dealing with processing instructions
Processing instructions (often called PIs for short) are something that certain XML documents have in order to control the tool that's processing them. For example, with the Cocoon Web content creation library, the XML files may have cocoon processing instructions that look like this:
- <?cocoon-process type="xslt"?>
Each ProcessingInstruction
instance has a target and data. The target is the first word, the data is everything afterward, and they're retrieved by using getTarget()
and getData()
.
- String target = pi.getTarget(); // cocoon-process
String data = pi.getData(); // type="xslt"
Since the data often appears like a list of attributes, the ProcessingInstruction
class internally parses the data and supports getting data attribute values directly with getValue(String name)
:
- String type = pi.getValue("type"); // xslt
You can find PIs anywhere in the document, just like Comment
objects, and can retrieve them the same way as Comment
s -- using getMixedContent()
:
- List mixed = element.getMixedContent(); // List may contain PIs
PIs may reside outside the root Element
, in which case they're available using the getMixedContent()
method on Document
:
- List mixed = doc.getMixedContent();
It's actually very common for PIs to be placed outside the root element, so for convenience, the Document
class has several methods that help retrieve all the Document
-level PIs, either by name or as one large bunch:
- List allOfThem = doc.getProcessingInstructions();
List someOfThem = doc.getProcessingInstructions("cocoon-process");
ProcessingInstruction oneOfThem =
doc.getProcessingInstruction("cocoon-process");
That allows the Cocoon parser to read the first cocoon-process
type with code like this:
- String type =
doc.getProcessingInstruction("cocoon-process").getValue("type");
As you probably expect, getProcessingInstruction(String)
will return null if no such PI exists.
Namespaces
Namespaces are an advanced XML concept that has been gaining in importance. Namespaces allow elements with the same local name to be treated differently because they're in different namespaces. It works similarly to Java packages and helps avoid name collisions.
Namespaces are supported in JDOM using the helper class org.jdom.Namespace
. You retrieve namespaces using the Namespace.getNamespace(String prefix, String uri)
method. In XML the following code declares the xhtml
prefix to correspond to the URL "http://www.w3.org/1999/xhtml". Then <xhtml:title> is treated as a title in the "http://www.w3.org/1999/xhtml" namespace.
- <html xmlns:xhtml="http://www.w3.org/1999/xhtml">
When a child is in a namespace, you can retrieve it using overloaded versions of getChild()
and getChildren()
that take a second Namespace
argument.
- Namespace ns =
Namespace.getNamespace("xhtml", "http://www.w3.org/1999/xhtml");
List kids = element.getChildren("p", ns);
Element kid = element.getChild("title", ns);
If a Namespace
is not given, the element is assumed to be in the default namespace, which lets Java programmers ignore namespaces if they so desire.
Making a list, checking it twice
JDOM has been designed using the List
and Map
interfaces from the Java 2 Collections API. The Collections API provides JDOM with great power and flexibility through standard APIs. It does mean that to use JDOM, you either have to use Java 2 (JDK 1.2) or use JDK 1.1 with the Collections library installed.
All the List
and Map
objects are mutable, meaning their contents can be changed, reordered, added to, or deleted, and the change will affect the Document
itself -- unless you explicitly copy the List
or Map
first. We'll get deeper into that in Part 2 of the article.
Exceptions
As you probably noticed, several exception classes in the JDOM library can be thrown to indicate various error situations. As a convenience, all of those exceptions extend the same base class, JDOMException
. That allows you the flexibility to catch specific exceptions or all JDOM exceptions with a single try/catch block. JDOMException
itself is usually thrown to indicate the occurrence of an underlying exception such as a parse error; in that case, you can retrieve the root cause exception using the getRootCause()
method. That is similar to how RemoteException
behaves in RMI code and how ServletException
behaves in servlet code. However, the underlying exception isn't often needed because the JDOMException
message contains information such as the parse problem and line number.
Using JDOM to read a web.xml file
Now let's see JDOM in action by looking at how you could use it to parse a web.xml file, the Web application deployment descriptor from Servlet API 2.2. Let's assume that you want to look at the Web application to see which servlets have been registered, how many init
parameters each servlet has, what security roles are defined, and whether or not the Web application is marked as distributed.
Here's a sample web.xml file:
- <?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE web-app
PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN"
"http://java.sun.com/j2ee/dtds/web-app_2.2.dtd">
<web-app>
<servlet>
<servlet-name>snoop</servlet-name>
<servlet-class>SnoopServlet</servlet-class>
</servlet>
<servlet>
<servlet-name>file</servlet-name>
<servlet-class>ViewFile</servlet-class>
<init-param>
<param-name>initial</param-name>
<param-value>1000</param-value>
<description>
The initial value for the counter <!-- optional -->
</description>
</init-param>
</servlet>
<servlet-mapping>
<servlet-name>mv</servlet-name>
<url-pattern>*.wm</url-pattern>
</servlet-mapping>
<distributed/>
<security-role>
<role-name>manager</role-name>
<role-name>director</role-name>
<role-name>president</role-name>
</security-role>
</web-app>
On processing that file, you'd want to get output that looks like this:
This WAR has 2 registered servlets:
snoop for SnoopServlet (it has 0 init params)
file for ViewFile (it has 1 init params)
This WAR contains 3 roles:
manager
director
president
This WAR is distributed
With JDOM, achieving that output is easy. The following example reads the WAR file, builds a JDOM document representation in memory, then extracts the pertinent information from it:
-
import java.io.*;
import java.util.*;
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.output.*;
public class WarReader {
public static void main(String[] args) {
PrintStream out = System.out;
if (args.length != 1 && args.length != 2) {
out.println("Usage: WarReader [web.xml]");
return;
}
try {
// Request document building without validation
SAXBuilder builder = new SAXBuilder(false);
Document doc = builder.build(new File(args[0]));
// Get the root element
Element root = doc.getRootElement();
// Print servlet information
List servlets = root.getChildren("servlet");
out.println("This WAR has "+ servlets.size() +" registered servlets:");
Iterator i = servlets.iterator();
while (i.hasNext()) {
Element servlet = (Element) i.next();
out.print("\t" + servlet.getChild("servlet-name")
.getText() +
" for " + servlet.getChild("servlet-class")
.getText());
List initParams = servlet.getChildren("init-param");
out.println(" (it has " + initParams.size() + " init params)");
}
// Print security role information
List securityRoles = root.getChildren("security-role");
if (securityRoles.size() == 0) {
out.println("This WAR contains no roles");
}
else {
Element securityRole = (Element) securityRoles.get(0);
List roleNames = securityRole.getChildren("role-name");
out.println("This WAR contains " + roleNames.size() + " roles:");
i = roleNames.iterator();
while (i.hasNext()) {
Element e = (Element) i.next();
out.println("\t" + e.getText());
}
}
// Print distributed information (notice this is out of order)
List distrib = root.getChildren("distributed");
if (distrib.size() == 0) {
out.println("This WAR is not distributed");
} else {
out.println("This WAR is distributed");
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
http://www.javaworld.com/javaworld/jw-05-2000/jw-0518-jdom.html
이 글은 스프링노트에서 작성되었습니다.