GS Tech Blog

Apache Ranger Audit Logs stored in HDFS parsed with Apache Spark

by GS
in Apache, Apache Hadoop, Apache Hive, Apache Ranger, Apache Spark, audit, Hadoop, HDFS, Pyspark
on August 31, 2018

0

Using Apache Spark to parse a large HDFS archive of Ranger Audit logs using Apache Spark to find and verify if a user attempted to access files in HDFS, Hive or HBase. This eliminates the need to use a Hive SerDe to read these Apache Ranger JSON Files and to have to create an external…
Read more

Asus Router with Merlin Firmware External Network Interface Alias How To

by GS
in asus, asus-merlin, cable modem, iptables, router, tcp
on May 13, 2018

0

Asus Router with Merlin Firmware External Network Interface Alias for Cable Modem Management The external network interface alias is used for obtaining direct access to the cable modem’s management interface when the modem is in bridge modem. A reason for this would be if the cable providers network has some sort of issue that causes…
Read more

Site to Site VPN using Asus Merlin Router and Unifi USG-Pro4

by GS
in asus, asus-merlin, openvpn, router, tcp, ubiquiti, ubnt, Uncategorized, unifi, usg, VPN
on May 11, 2018

0

Site to Site VPN using Asus Merlin Router and Unifi USG-Pro4 I decided recently to replace my Asus RT-N66U. It served me well over many years but I had become frustrated that Asus had stopped patching and maintaining the firmware. I also noticed over time strange things would occur at times with the Asus Router.…
Read more

Integrating Apache Nifi with IBM MQ

by GS
in Apache, Hadoop, HDF, Hortonworks, IBM, Linux, Messaging, MQ, Nifi
on May 10, 2018

0

Integrating Apache Nifi with IBM MQ This would be a continuation of the IBM MQ and Hadoop integrationÂ article I first posted a few years ago. This explains how to integrate IBM MQ with Apache Nifi or Hortonworks HDF. IBM MQ is extremely important when attempting to integrate new technologies with legacy environments specifically mainframe environments…
Read more

CIFS SMB to HDFS and FTP to HDFS

by GS
in Apache, cifs2hdfs, ftp2hdfs, Hadoop, Hortonworks
on March 21, 2018

0

CIFS/SMB to HDFS and FTP to HDFS Over the past few years since working on with Hadoop and HDFS. Two types of requests that came up pretty regularly. One being can we move files from a Windows SMB/CIFS file share into Hadoop/HDFS usually containing 1000’s of CSVs or XLSX/XLS files. The other use case was…
Read more

Router to Router VPN Tunnel using Asus Routers

by GS
in asus, asus-merlin, iptables, openvpn, tcp, VPN
on March 12, 2018

0

Router to Router VPN Tunnel using Asus Routers Over the past few years I’ve tried a few times to successfully configure a Router to Router VPN tunnel using Asus Routers. In all the articles online something always was missing. So I figured this was a good article to write about. I currently have a tunnel…
Read more

Apache SolrCloud Kerberos Configuration

by GS
in Apache, Apache Solr, Hadoop, java, Kerberos, Linux, SolrCloud
on March 4, 2017

0

I’ve been working on securing Apache SolrCloud with kerberos. This includes configuring Zookeeper. So after struggling and lots of searching I came up with a working kerberized solution forÂ SolrCloud, with Zookeeper, and Apache Ranger for Authorization. First I tried to secure a standalone Solr instance by updating to theÂ Solr 6x branch which is a SNAPSHOT…
Read more

Benefits of using IBM Java and JDK features

by GS
in IBM, IBM HealthCenter, IBM HeapAnalyzer, IBM Java, IBM Javacore, IBM JDK, IBM System Dump, java, Linux, Tuning, Websphere
on March 17, 2016

0

After working many years with IBM WebSphere Application Server on Solaris, Linux on PSeries, XSeries and ZSeries and Z/OS. I came to realize the IBM version of Java has much better debug tools and documentation available to perform debugging and performance tuning. Examples of these features are the IBM AOT Ahead of Time Compiler which…
Read more

Hadoop, Java and HTTPD and /etc/security/limits.d/ nproc/pid-max

by GS
in Apache, Hadoop, java, limits.d, Linux, nproc, pid, tid, Tuning
on March 1, 2016

0

After successfully running a Large Hadoop Cluster for a period of time.Â I started to notice strange things occurring initially with the MapReduce PI example task where tasks would be marked as failed. When looking more closely and attempting to logon/su/ssh to a machine with the userid that was running the job the sshd/suÂ would return: -bash:…
Read more

Hadoop and ip_conntrack: table full, dropping packet

by GS
in Hadoop, iptables, kernel tuning, Linux, tcp, Tuning
on February 29, 2016

0

I’m pretty sure many folks have seen this specific error across multiple different linux systems specifically when iptables is enabled and the OS has thousands of connections coming in second. In my case I ran into this Examples of this are with Hadoop NameNode. Someone accidentally executed iptables -L to try to get a list…
Read more

GS Tech Blog

Links

Categories