What’s new
In comparison with the previous BD|CESGA platform these are the main improvements:
Hadoop is now upgraded to Hadoop 3.
Spark 2.4 is now the default version.
HUE 4.
HDFS Erasure coding: allows to reduce storage overhead over default 3x replication.
Impala is now available as an alternative to Hive for interactive SQL queries.
The HOME system has been migrated from GlusterFS to the new Netapp storage system, this has greatly improved the latency of the HOME filesystem.
Improved reliability:
The HDFS NameNode is now in HA configuration.
The YARN ResourceManager is now in HA configuration.
Improved security:
SSL/TLS is now enabled for more secure communications.