Exploring XPath: A Flexible Selector Language for JSON Documents
Today, I stumbled upon an interesting discussion on my Twitter timeline about the JSON Pointer draft proposal and an Erlang JSON Pointer implementation on GitHub. I couldn't help but feel a surge of excitement, hoping to discover a solution that would revolutionize JSON document querying.
However, my enthusiasm quickly waned as I delved into the JSON Pointer spec draft. It became apparent that it lacked a crucial feature: the ability to retrieve all matching nodes in the JSON document representation tree for a given "pointer". This limitation posed a significant obstacle when attempting to extract multiple values efficiently. Let's take a closer look at a few examples to illustrate the issue.
JSON Pointer Examples
Imagine we have a JSON document representing a hilarious mock movie called "Java 4-ever." It contains details of the production, including a list of actors and their corresponding characters, like so.
{
"title": "Java 4-ever",
"url": "http://www.youtube.com/watch?v=H7QVITAWdBQ",
"actors": [
{
"name": "Scala Johansson",
"character": "A"
},
{
"name": "William Windows",
"character": "B"
},
{
"name": "Eddie Larrison",
"character": "C"
},
{
"name": "Mona Lisa Harddrive",
"character": "D"
},
{
"name": "Lenny Linux",
"character": "C (Young)"
}
]
}
Suppose we want to extract the names of all the actors from this movie. With JSON Pointer, we would need to access each name individually using the respective indices:
/actors/1/name
/actors/2/name
/actors/3/name
/actors/4/name
/actors/5/name
This approach is undeniably cumbersome, verbose, and even inefficient. It hardly seems practical to employ JSON Pointers in this manner to retrieve a list of names.
Now, consider a scenario where the JSON document contains not just one movie's details but an entire category of movies. In such a case, using the JSON Pointer /actors/X/name
(where X is a valid index for the first movie) would retrieve the Xth actor's name for the first movie, as dictated by the aforementioned spec draft. This limitation can quickly become tiresome and impractical.
Enter XPath
While XML may no longer be the trendiest technology around, there are certain aspects that XML introduced and which remain incredibly useful. One such feature is XPath, in my humble opinion.
I first encountered XPath approximately 7-8 years ago when constructing a canonical object-oriented model for a wide range of financial products spanning multiple asset classes. The goal was to represent every financial product, regardless of complexity, using the same fundamental modeling building blocks. It was a challenging endeavor, but XPath proved invaluable when querying the rich object model, often represented as XML at the system's integration endpoints.
XPath's value manifested in several ways:
- Flexibility: XPath offers the ability to describe various paths for data representation.
- Descriptiveness: Almost anyone can read an XPath expression and grasp the data it intends to access.
- Representation agnosticism: XPath allows clients to utilize it with either a fully realized object representation or an unparsed XML document. This flexibility enables laziness in the runtime where appropriate.
You might be wondering how XPath achieves flexibility. While the JSON Pointer spec draft I mentioned earlier was concise and easy to read, the XPath specification goes further, providing means to initiate matching midway through a path (by adding an extra '/' as a prefix), select the parent of a matching node (by appending '..' to the XPath), choose attributes (by prefixing the attribute name with '@' in the XPath), and utilize various predicates (e.g., last()
, position()
, index numbers, >
, <
). However, the most significant distinction between XPath and JSON Pointer lies in the result set. XPath returns all elements or values that match the given selector, offering unparalleled flexibility.
To better understand this distinction, let's examine the aforementioned movie JSON document and how it would be selected using XPath:
XPath Expression: //actors/name
Result: ["Scala Johansson", "William Windows", ..., "Lenny Linux"]
XPath Expression: title
Result: ["Java 4-ever"]
XPath Expression: //actors/character
Result: ["A", "B", "C", "D", "C (Young)"]
XPath Expression: //actors[2]/character
Result: ["C"]
As you can see, XPath provides an incredible degree of flexibility. Even if our JSON document contained multiple movie documents nested under a subtree, the aforementioned XPath expressions would still yield the desired results.
A Glaring Gap in JSON Document Selectors
At this point, you might be wondering, "So, what's the point of this post, Susan?" Admittedly, this article primarily serves as an outlet for my frustration and longing for a robust selector language specifically tailored for JSON documents. While XPath has proven its worth and versatility in the XML world, JSON currently lacks a similarly powerful and expressive selector language.
However, if you share my sentiments and believe that a subset of XPath, customized for JSON documents, could provide a viable solution, I invite you to collaborate with me. Let's work together to distill a practical and usable subset of XPath, specifically designed for JSON documents. Whether it's through a draft proposal or any other means, let's join forces to address this pressing need. Even HTML boasts a capable element selection mechanism in the DOM for applying styles and behaviors. Shouldn't JSON documents have a comparable solution?
If you're interested in contributing or discussing this further, please feel free to contact me. Let's kickstart the initiative and explore the possibilities together. Who knows? We might just revolutionize JSON document querying and make the lives of developers and data enthusiasts a whole lot easier.
If you enjoyed this content, please consider sharing this link with a friend, following my GitHub, Twitter/X or LinkedIn accounts, or subscribing to my RSS feed.