What is Hive

The Apache Hive was first released in the year 2012
Apache Hive is an open-source data warehouse system built on top of Hadoop for querying and analyzing large datasets stored in Hdfs files.
Hive uses a language called Hive-QL (HQL), which is similar to SQL.
Hive-QL automatically translates SQL-like queries into Map Reduce jobs.
Hive is not a database, but it uses a database (called the metastore) to store the tables that you define.
The Hive generally runs on your workstation and converts your SQL query into a series of jobs for execution on a Hadoop cluster
Hive abstracts the complexity of Hadoop. The main thing to notice is that there is no need to learn java for Hive.

History of Hive

Data Infrastructure Team at Facebook developed Hive.
Apache Hive is also one of the technologies that are being used to address the requirements at Facebook.
It is very popular with all the users internally at Facebook.
It is being used to run thousands of jobs on the cluster with hundreds of users, for a wide variety of applications.
Apache Hive-Hadoop cluster at Facebook stores more than 2PB of raw data. It regularly loads 15 TB of data on a daily basis.
Now it is being used and developed by a number of companies like Amazon, IBM, Yahoo, Netflix, Financial Industry Regulatory Authority (FINRA) and many others.

Hive Architecture

Why Hive

Writing map reduce code is very tough and can not be expected from all engineers. Facebook was struggling with same issue and they created Hive-SQL language which in the background runs map-reduce jobs

Hive-SQL:

It is called as Hive SQL Coding, most of syntax are similar to SQL

Hive CLI:

It is a Hive Command line interface that connects to a Hive Server instances
$ hive
hive >

Beeline:

It is a new command line clint that connects to a Hive server2 instance
$ beeline
Hive version 0.11.0-snapshot by Apache
beeline>

How to Run Hive queries

The Hive CLI shell is started using the hive executable:

$ hive -h hostname
hive>

Use the ‐f flag to specify a file that contains a Hive script:
$ hive -f myquery.hive

Beeline is started using the beeline executable:

$ beeline
Hive version 0.11.0-SNAPSHOT by Apache
beeline>
Once Beeline is started, you issue a connect command to connect to a specific HiveServer2 instance:

beeline> !connect jdbc:hive2://hostname:10000 username password org.apache.hive.jdbc.HiveDriver

Knowledge Sharing

Pages

Introduction to Hive

What is Hive

History of Hive

Hive Architecture

Why Hive

Hive-SQL:

How to Run Hive queries

No comments:

Post a Comment

Who am I