使用dom4j解析XML
最近在工作中需要解析BPMN2.0的XML文件,需要用到XML解析的类库。在Java的生态中通常解析XML可以使用dom4j,jdom,SAX,JAXB等方式,看到一些文章说dom4j的效率要高一些而且它还支持 XPath,所以最终选择了使用dom4j。
步骤 导入依赖 1 2 3 4 5 6 7 8 9 10 <dependency> <groupId>org.dom4j</groupId> <artifactId>dom4j</artifactId> <version>2.1.3</version> </dependency> <dependency> <groupId>jaxen</groupId> <artifactId>jaxen</artifactId> <version>1.2.0</version> </dependency>
如何获得Document对象 读取文件方式 1 2 SAXReader reader = new SAXReader(); Document document = reader.read(new File("D:\\demo1.bpmn" ));
直接操作字符串方式 1 Document document = DocumentHelper.parseText(xml);
如何获得根元素 根元素,也就是一个XML中最外层的元素,即:这行下面的第一个元素,例如下面这个XML片段中的definitions 就是根元素:
1 2 3 4 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > ... ... </bpmn:definitions >
1 Element root = document.getRootElement();
如何获得某个元素的子元素 先取得这个元素,然后直接使用Element element = element(name)
方法,参数指定直接子元素的名称字符串,即可这个元素的直接子元素;或者通过List<Element> elements = process.elements()
方法,获得这个元素的所有直接子元素。
1 2 3 4 5 6 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" > ... ... </bpmn:process > </bpmn:definitions >
1 2 3 4 Element root = document.getRootElement(); Element process = root.element("process" ); List<Element> elements = process.elements();
如何获得某个元素的属性 先取得这个元素,然后直接使用`Attribute name = process.attribute(String attributeName)方法,参数指定元素的属性名称:
1 2 3 4 5 6 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" > ... ... </bpmn:process > </bpmn:definitions >
1 2 3 4 5 Element root = document.getRootElement(); Element process = root.element("process" ); Attribute attribute = process.attribute("name" ); String attributeText = attribute.getText();
如何使用XPath查找元素 dom4j提供了两个方法支持XPath搜索,分别支持返回唯一一个节点或者满足条件的多个节点:List selectNodes(String expr);
和Node selectSingleNode(String expr);
1 2 3 4 5 6 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" camunda:versionTag ="this is a draft" > ... ... </bpmn:process > </bpmn:definitions >
1 2 XPath xPath = document.createXPath("//*[@camunda:versionTag]" ); Node node = xPath.selectSingleNode(document);
如何将元素输出为XML的格式 1 String xmlString = process.asXML();
如何处理带有命名空间的情况 上面介绍的使用XPath来搜索元素的方法只针对元素前没有namespace的情况,如果带有命名空间的话就要复杂一些,比如:bpmn:process。
获得命名空间 获得某个元素的命名空间 1 2 3 4 5 6 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" camunda:versionTag ="this is a draft" > ... ... </bpmn:process > </bpmn:definitions >
1 2 3 4 5 Element root = document.getRootElement(); Element process = root.element("process" ); System.err.println(process.getNamespace().getPrefix() + " ---> " + process.getNamespace().getText());
通过某个元素获得XML文档所有的命名空间 1 2 3 4 5 6 7 8 9 10 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" camunda:versionTag ="this is a draft" > <bpmn:startEvent id ="StartEvent_1" name ="开始" > <bpmn:documentation > 这是一个开始事件</bpmn:documentation > <bpmn:outgoing > SequenceFlow_1agvqx6</bpmn:outgoing > </bpmn:startEvent > ... ... </bpmn:process > </bpmn:definitions >
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Element root = document.getRootElement(); Element process = root.element("process" ); Element startEvent = process.element("startEvent" ); List<Namespace> namespaces = startEvent.getDocument().getRootElement().declaredNamespaces(); for (Namespace namespace : namespaces){ System.err.println(namespace.getPrefix() + " ---> " + namespace.getURI()); } Namespace di = startEvent.getDocument().getRootElement().getNamespaceForPrefix("di" ); System.err.println(di.getPrefix() + " ===> " + di.getURI());
使用命名空间获得元素 dom4j默认不能识别带命名空间的节点,所以在读取带命名空间的XML时,要在每个节点前加上命名空间。
比如我们现在想获得id
为StartEvent_1 的BPMNShape 元素,通过XPath可以很方便的定位,而不需要一次又一次的递归查询:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" camunda:versionTag ="TestVersion" > <bpmn:startEvent id ="StartEvent_1" name ="开始" > <bpmn:documentation > 这是一个开始事件</bpmn:documentation > <bpmn:outgoing > SequenceFlow_1agvqx6</bpmn:outgoing > </bpmn:startEvent > </bpmn:process > <bpmndi:BPMNDiagram id ="BPMNDiagram_1" > <bpmndi:BPMNPlane id ="BPMNPlane_1" bpmnElement ="Process_0pk1kx0" > <bpmndi:BPMNShape id ="_BPMNShape_StartEvent_2" bpmnElement ="StartEvent_1" > <dc:Bounds x ="152" y ="159" width ="36" height ="36" /> <bpmndi:BPMNLabel > <dc:Bounds x ="159" y ="202" width ="22" height ="14" /> </bpmndi:BPMNLabel > </bpmndi:BPMNShape > </bpmndi:BPMNPlane > </bpmndi:BPMNDiagram > </bpmn:definitions >
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 SAXReader reader = new SAXReader(); Document document = reader.read(new File("D:\\demo.bpmn" )); XPath xPath = document.createXPath("//ns:*[@bpmnElement='StartEvent_1']" ); xPath.setNamespaceURIs(ImmutableMap.of("ns" , "http://www.omg.org/spec/BPMN/20100524/DI" )); System.out.println(" --> " + xPath.selectSingleNode(document));
使用命名空间获得元素的属性 在定位到需要操作的元素之后,可以使用Attribute attribute(org.dom4j.QName qName)
方法(注意不是Attribute attribute(String name)
这个方法),来指定属性名以及属性的命名空间。
比如我们想解析process 元素的versionTag 属性,而这个属性是带有命名空间的:camunda:versionTag
1 2 3 4 5 6 7 8 9 10 <?xml version="1.0" encoding="UTF-8"?> <bpmn:definitions xmlns:bpmn ="http://www.omg.org/spec/BPMN/20100524/MODEL" xmlns:bpmndi ="http://www.omg.org/spec/BPMN/20100524/DI" xmlns:dc ="http://www.omg.org/spec/DD/20100524/DC" xmlns:camunda ="http://camunda.org/schema/1.0/bpmn" xmlns:di ="http://www.omg.org/spec/DD/20100524/DI" xmlns:xsi ="http://www.w3.org/2001/XMLSchema-instance" id ="Definitions_0opbk7q" targetNamespace ="http://bpmn.io/schema/bpmn" exporter ="Camunda Modeler" exporterVersion ="3.7.3" > <bpmn:process id ="Process_0pk1kx0" name ="请假申请演示" isExecutable ="true" camunda:versionTag ="TestVersion" > <bpmn:startEvent id ="StartEvent_1" name ="开始" > <bpmn:documentation > 这是一个开始事件</bpmn:documentation > <bpmn:outgoing > SequenceFlow_1agvqx6</bpmn:outgoing > </bpmn:startEvent > ... ... </bpmn:process > </bpmn:definitions >
1 2 3 4 5 6 7 8 9 10 XPath xPath = document.createXPath("//*[@camunda:versionTag]" ); Node node = xPath.selectSingleNode(document); Element element = (Element) node; QName qName = new QName("versionTag" , new Namespace("camunda" , "http://camunda.org/schema/1.0/bpmn" )); Attribute attribute = element.attribute(qName); System.err.println(attribute.getText());
附录 XPath依赖 如果不添加jaxen依赖的话,会提示找不到org/jaxen/JaxenException这个类,导致无法使用XPath。
1 2 3 4 5 6 7 8 9 10 Exception in thread "main" java.lang.NoClassDefFoundError: org/jaxen/JaxenException at org.dom4j.DocumentFactory.createXPath(DocumentFactory.java:222) at org.dom4j.tree.AbstractNode.createXPath(AbstractNode.java:202) at me.ningyu.demo.BPMNDemo.main(BPMNDemo.java:44) Caused by: java.lang.ClassNotFoundException: org.jaxen.JaxenException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 3 more
Path的常用方法 如果我们的XPath表达式以 “/“ 开头,那么表示相对于整个文档进行搜索;如果我们的XPath表达式以节点名开头,那么表示相对于调用搜索方法的节点进行搜索。
参考