Implement Partitioning in Hive
The partitioning in Hive means dividing the table into some parts based on the values of a particular column like date, course, city, or country.
Let's assume we have data of 10 million students studying in an institute. Now, we have to fetch the students of a particular course. If we use a traditional approach, we have to read the entire data leads to performance degradation. The better approach will be to partition the table in Hive and divide the data among the different datasets based on particular columns. The advantage of partitioning is that since the data is stored in slices, the query response time becomes faster.
Types of Partitioning
There are 2 Types of Partitioning in Hive
Static Partitioning
- It is required to pass the values of partitioned columns manually while loading the data into the table.
- Insert input data files individually into each partition table is Static Partition
Dynamic Partitioning
- Single insert to partition table (all partitions in one go) is known as a dynamic partition.
- Usually, dynamic partition loads the data from the non-partitioned table.
Partitioning Demo -Static
Step 1. Create 2 files
- Stud_M (for male students)
- Stud_F (For female students)
No comments:
Post a Comment