This section will help you to quickly getting started with the platform. For more details have a look at the rest of this guide, and also check the Tutorials that we have prepared and the Want to know more section.
Before connecting we always recommend that you first start the VPN. If not you will not have access to some services.
If for some reason you are not using the VPN, then one alternative could be to launch a remote desktop from the visualization platform and then connect from there.
By far, the most common way to connect is by establishing an SSH session:
Once connected, you will notice that there are two main filesytems:
HOME: The standard filesystem when you log in
HDFS: The distributed Hadoop filesystem
To migrate your HDFS data from the old platform to the new one, you can use a command similar to the following:
hadoop distcp -i -pat -update hdfs://10.121.13.19:8020/user/uscfajlc/wcresult hdfs://nameservice1/user/uscfajlc/wcresult
It is recommended to launch the distcp command inside a screen session so it will continue later.
See the Migrating Data section for more details about how to migrate your data from the previous platform.
The default version of Spark is 2.4.0. If you plan to use code coming from Spark 1.6 take that into account.