Nov 19 2010

docx4j v2.6.0 released

I published docx4j 2.6.0 yesterday.

For details, see the forum. This post introduces TraversalUtil, which makes it easier for you to find and change the bits of a docx you want to manipulate.

If you are working with an existing docx, you often need to get a particular bit of the document, and change it somehow.

If you know you want to change the 6th paragraph, say, that’s easy.

But if you want to find all occurrences of some item, which could occur at various different levels of the hierarchy (for example, paragraphs can appear not just in the document body, but also within table cells, and in content controls)?

docx4j offers a couple of different tools to make this easy.

XPath

XPath is a succinct way to select the things you need to change.

Happily, from docx4j 2.5.0, you can do use XPath to select JAXB nodes:

MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

String xpath = "//w:p";

List<Object> list = documentPart.getJAXBNodesViaXPath(xpath, <strong>false</strong>); 

These JAXB nodes are live, in the sense that if you change them, your document changes.

There is a limitation however: the xpath expressions are evaluated against the XML document as it was when first opened in docx4j.  You can update the associated XML document once only, by passing true into getJAXBNodesViaXPath. Updating it again(with current JAXB 2.1.x or 2.2.x) will cause an error.

To workaround this bug in JAXB, you can marshall it, and then unmarshall the result using either:

    public org.docx4j.wml.Document unmarshal( java.io.InputStream is ) 

    public org.docx4j.wml.Document unmarshal(org.w3c.dom.Element el) 

Both of those will re-create the binder.

Not the most efficient, so consider voting for JAXB bug 459

But now we have an alternative…

TraversalUtil

New to docx4j 2.6.0 is a class TraversalUtil, which is a general approach for traversing the JAXB object tree in the main document part (though it can also be applied to headers, footers etc).

For example, to get a list of hyperlinks, you can do something like:

PHyperlinkFinder finder= new PHyperlinkFinder();
new TraversalUtil(paragraphs, finder);

static class PHyperlinkFinder extends CallbackImpl {
			
        List<P.Hyperlink> links = new ArrayList<P.Hyperlink>();  
        	
        @Override
		public List<Object> apply(Object o) {
				
			if (o instanceof P.Hyperlink)
				links.add((P.Hyperlink)o);
				
			return null;
		}
	}

This approach is used extensively in the MergeDocx extension I discussed in my previous post. It is now also the basis of the OpenMainDocumentAndTraverse sample, so see that for another example of how to use it.

The example above simply finds relevant bits of the docx; you could also modify the objects encountered if you want.

No Responses so far

Comments are closed.

Comment RSS