Hive

Hive allows to query data in HDFS using SQL queries, so it is a very useful tool for all those people familiar with SQL.

Under the hood Hive translates the SQL queries into MapReduce jobs that are run using YARN.

There are several options to start executing queries with Hive:

  • Using the old and now deprecated hive CLI:

    hive
    
  • Using the new beeline client:

    beeline> !connect jdbc:hive2://c14-19.bd.cluster.cesga.es:10000/default;ssl=true;sslTrustStore=/opt/cesga/cdh61/hiveserver2.jks;trustStorePassword=notsecret
    
  • Using the HUE web interface that you can access through:

    https://bigdata.cesga.es
    

If you are just testing hive, we recommend that you start using the testing database instead of the default one:

use testing;

We do not recommend to create tables in the default database, instead if you have tables that you want to keep create a database with your username and then create your tables in this database. For example if your username is uscfajlc create a database with that name and then use it to create your tables:

create database if not exists uscfajlc;
use uscfajlc;

For enhanced privacy, you can restrict access to the data in your database just to your username:

hdfs dfs -chmod go-rwx /user/hive/warehouse/uscfajlc.db

Of course you can use HDFS ACLs to fine tune the permissions to further fit your needs.

For further information on how to use Hive you can check the Hive Tutorial that we have prepared to get you started and the Hive Guide in the CDH documentation.