Category: Hadoop XML Processing

Technology Blog

Iterating a JSON using Jackson-Databind Library like JDOM for XML

I recently came across a situation that required me to be able to iterate over a JSON message payload similar to what can be done with JDOM in regards to XML similar to what I do within my Stax XML Mapreduce InputFormat.  So basically in this case you need to treat JSONArray’s similar to XML…
Read more

Xml Processing with MapReduce/Spark using an Xml StaX Parser

XmlStaxInputFormat / XmlStaxFileRecordReader Github Project – https://github.com/gss2002/xml-stax-mr After some time it seemed like a gap that existed with Hadoop MapReduce and Spark that the existing XmlInputFormat classes from Mahout were using fseek and searching for strings as the file is read in from HDFS. The ability to break up a large Xml file becomes extremely important…
Read more