Hadoop MapReduce Archives

Xml Processing with MapReduce/Spark using an Xml StaX Parser

by GS
in Apache, Apache Hadoop, Apache Spark, Hadoop, Hadoop Mapreduce, Hadoop XML Processing, HDFS, Hortonworks, java, StaX XML Parser, XmlInputFormat, XmlStaxInputFormat
on November 20, 2018

XmlStaxInputFormat / XmlStaxFileRecordReader Github Project – https://github.com/gss2002/xml-stax-mr After some time it seemed like a gap that existed with Hadoop MapReduce and Spark that the existing XmlInputFormat classes from Mahout were using fseek and searching for strings as the file is read in from HDFS. The ability to break up a large Xml file becomes extremely important…
Read more

Tag: Hadoop MapReduce

Xml Processing with MapReduce/Spark using an Xml StaX Parser

Links

Tag: Hadoop MapReduce

Xml Processing with MapReduce/Spark using an Xml StaX Parser

Links

Categories