Apache Spark Archives - GS Tech Blog

Building an RPM for Spark 2.x for Vendor Hadoop Distributions

by GS
in Apache Hadoop, Apache Spark, Hadoop, HDFS, Hortonworks, Pyspark
on October 15, 2018

0

Building an RPM for Spark 2.x for Vendor Hadoop Distribution It may be necessary to produce an alternate packaged version of Spark for usage in a vendor provided Hadoop Distribution. This became apparent many times to me when loading Hortonworks HDP into an Enterprise Environment where update/upgrade cycles do not allow for upgrade of HDFS…
Read more

How to use the Native IBM MQ Client Receiver with Spark Streaming

by GS
in Apache Hadoop, Apache Spark, Apache Spark Streaming, Hadoop, Hortonworks, IBM, IBM MQ, Messaging, MQ, Nifi
on October 14, 2018

0

How to use the Native IBM MQ Client Receiver with Spark Streaming After using Apache Nifi and IBM MQ I noticed that Nifi could not easily guarantee order of incoming messages as failover can occur at anytime. This becomes a problem specifically with database and table replication when the replicating software puts messages to a…
Read more

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

by GS
in Apache, Apache Hadoop, Apache Hive, Apache Ranger, Apache Spark, audit, Hadoop, HDFS, Pyspark
on August 31, 2018

0

Using Apache Spark to parse a large HDFS archive of Ranger Audit logs using Apache Spark to find and verify if a user attempted to access files in HDFS, Hive or HBase. This eliminates the need to use a Hive SerDe to read these Apache Ranger JSON Files and to have to create an external…
Read more

Integrating Apache Hadoop and Apache Flume with IBM MQ

by GS
in Apache, Flume, Hadoop, IBM, java, Linux, Messaging, MQ
on May 30, 2015

0

Integrating Apache Hadoop and Flume with IBM MQ Over the past 2 years of working with Apache Hadoop a few things have come up folks wanting to use Apache Kafka which definitely has it’s place in the Hadoop Big Data and Next Generation of Technology spheres. But there is also the need to integrate with…
Read more

Why now?

by GS
in Apache, gempak, General, grads, Hadoop, IBM, Linux, tcp, Tuning, weather, Websphere
on January 15, 2014

0

GS Tech Blog What is the GS Tech Blog! It’s a place for me to rant and provide my thoughts about technology I’ve worked with over many years. So after working as a Technology Systems Engineer for almost 20 years, I decided it’s time to create a blog to publish some of my ideas and…
Read more

Tag: Apache Spark

Building an RPM for Spark 2.x for Vendor Hadoop Distributions

How to use the Native IBM MQ Client Receiver with Spark Streaming

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

Integrating Apache Hadoop and Apache Flume with IBM MQ

Why now?

Links

Tag: Apache Spark

Building an RPM for Spark 2.x for Vendor Hadoop Distributions

How to use the Native IBM MQ Client Receiver with Spark Streaming

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

Integrating Apache Hadoop and Apache Flume with IBM MQ

Why now?

Links

Categories