Hadoop and Redhat System Tuning /etc/sysctl.conf

By | February 28, 2016

One of the most overlooked things after setting up a Hadoop cluster is probably OS System Tuning. This entry will cover /etc/sysctl.conf aka the Linux Kernel Params that can be tuned.

/etc/sysctl.conf:

## ALWAYS INCREASE KERNEL SEMAPHORES especially IF using IBM JDK with SharedClassCache also a separate discussion
# Controls the default maxmimum size of a mesage queue
kernel.msgmnb = 65536

# Controls the maximum size of a message, in bytes
kernel.msgmax = 65536

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

## BEGIN INCREASE ALL NETWORK TUNING OPTIONS TO HANDLE ALL HADOOPS NETWORK TRAFFIC ##
#set TCP time-wait buckets pool size, default 180000
net.ipv4.tcp_max_tw_buckets = 4000000

#maximum receive socket buffer size, default 135168
net.core.rmem_max = 67108864

#maximum send socket buffer size, default 135168
net.core.wmem_max = 67108864

#maximum amount of option memory buffers, default 20480
net.core.optmem_max = 67108864

#maximum TCP read-buffer space allocatable, default 4096 87380 174760
net.ipv4.tcp_rmem = 4096 16777216 67108864

#maximum TCP write-buffer space allocatable, default 4096 16384 131072
net.ipv4.tcp_wmem = 4096 16777216 67108864

#maximum TCP buffer space allocatable, default 786432 1048576 1572864
net.ipv4.tcp_mem = 67108864 67108864 67108864

# size of listen queue for accepting new TCP connections, default 128
net.core.somaxconn = 640000
# number of unprocessed input packets before kernel starts dropping them, default 300
net.core.netdev_max_backlog = 250000

# max number of remembered connection requests, default 1024
net.ipv4.tcp_max_syn_backlog = 200000

## END INCREASE ALL NETWORK TUNING OPTIONS TO HANDLE ALL HADOOPS NETWORK TRAFFIC ##

# turns DSACK support off not needed on internal networks, default on
net.ipv4.tcp_dsack = 0

# turns SACK support off not needed on internal networks, default on
net.ipv4.tcp_sack = 0

#turns TCP window scaling support on, default is on for RHEL but adding to be safe as great throughput increase are seen when it’s enabled
net.ipv4.tcp_window_scaling = 1

#increase number of ports available for ephemeral ports for all of the Yarn things like MR/Spark/Tez etc
net.ipv4.ip_local_port_range = 8196 65535

# decrease tcp retries
net.ipv4.tcp_retries2 = 10

# Enable a fix for RFC1337 – time-wait assassination hazards in TCP
net.ipv4.tcp_rfc1337 = 1

# Reduce FIN TIMEOUT and Keepalives to sane levels returns ports back to OS faster. This is a known issue on Linux I first ran into it when I moved a Websphere App Server environment from Solaris to Linux way back in 2005/2006
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_keepalive_probes = 5

# VM Memory Tuning
vm.min_free_kbytes = 204800
vm.lower_zone_protection = 1024
vm.page-cluster = 20
vm.swappiness = 10
vm.vm_vfs_scan_ratio = 2

# Increase Max File Size
fs.file-max=5049800

# Hadoop OpenFiles Problem
fs.epoll.max_user_instances = 4096

# Java performs poor with CFS Scheduler This option came from IBM Websphere Java Tuning Docy
## http://www-01.ibm.com/support/docview.wss?uid=swg21372909
kernel.sched_compat_yield = 1

# Increase conntracks only needed if someone runs iptables -L. This is a whole separate discussion
net.nf_conntrack_max = 10000000

# This is needed for all Hadoop Daemon processes that listen on a port especially on DataNodes or NodeManagers or HBase RegionServers.. You will eventually end up with some process using an ephemeral port on one of these critical ports say for a NodeManager.
net.ipv4.ip_local_reserved_ports = 7337,8000-8088,8141,8188,8440-8485,8651-8670,8788,8983,9083,9898,10000-10033,10200,11000,13562,15000,19888,45454,50010,50020,50030,50060,50070,50075,50090,50091,50470,50475,50100,50105,50111,60010-60030

Leave a Reply

Your email address will not be published. Required fields are marked *