How to do Partitioning in Hive - Demo code

 


 Implement Partitioning in Hive




The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city, or country.

Let's assume we have data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If we use a traditional approach, we have to read the entire data leads to performance degradation. The better approach will be to partition the table in Hive and divide the data among the different datasets based on particular columns. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.


Types of Partitioning


There are 2 Types of Partitioning in Hive

Static Partitioning

  • It is required to pass the values of partitioned columns manually while loading the data into the table.
  • Insert input data files individually into each partition table is Static Partition

Dynamic Partitioning

  • Single insert to partition table (all partitions in one go) is known as a dynamic partition.
  • Usually, dynamic partition loads the data from the non-partitioned table.



Partitioning  Demo -Static



Step 1. Create 2 files 

  • Stud_M   (for male students)
  • Stud_F (For female students)

bucketing algae study in hive



Step 2. Copy both files in hdfs


algae study cloudera hadoop

Step 3. Create hive table with partition on gender


hdfs dfs commands algaeservices in hadoop hive big data

Step 4. Load stud_m with partition gender =m


hdfs dfs commands in hadoop hive big data algaestudy

Step 5. Load stud_f with partiton gender = f


algaeservices.co.in hadoop big data

Step 6. Validate the physical location of the file


hdfs dfs commands in hadoop hive big data


Step 7. Validate the data


algaestudy.com



No comments:

Post a Comment