您的当前位置：首页 Hive的数据模型

Hive的数据模型

来源：图艺博知识网

Hive 的数据模型

一,数据的存储

1) 基于HDFS

2) 没有专门的存储格式(默认情况下,hive中使用制表符,来分割列与列之间的间隔)

3) 存储结构主要有:数据库,文件,表,视图

4) 可以直接加载文本文件(.txt文件等)

5) 创建表时可以指定,hive数据列分割符合行分割符;

二,数据表

1) Table 内部表

a. 创建一张表不指定位置

Create table t1(tid int,tname string,age int);

b. 创建一张表并指定位置

Create table t2 (tid int ,tname string,age int)

Location ‘/mytable/hive/t2’;

c. 创建一张表并指定行分割符

Create table t3(tid int,tname string ,age int)

Row format delimited fields terminated by ‘,’;

d. 使用查询语句创建表(会转换为MapReduce创建表);

Create table t4 as select * from students;

Linux上查看hdfs上的文件(hdfs dfs -cat /user/hive/warehouse/t4/00_00)

e. 使用查询语句创建表,并指定分割符

Create table t5

row format delimited fields terminared by ‘,’

Select * from students;

f. 在表中添加一个列(向表中)

alter talbe t1 add columns (english int);

e.给表中的列重命名

ALTER TABLE test_change CHANGE a a1 int;

2) Partition 分区表(为了提高查询的效率)

注:什么是分区:patition分区对应数据库中的列密集索引

在hive中,表中的一个partition对应表的一个目录,所有的partition的数据都存在对应的目录中;

a. 表操作

1. 建立一张分区表

Create table partitiono_table(pid int,pname string)

Partition by(gender string)

Row format delimited fields terminaed by ‘,’;

2. 向新建的表插入数据

Insert into table partition_talbe partition(gender=’M’) select * from student

where gender=’M’;

3. 如何检查效率是否提高

Explain select * from student where gender=’M’;

Explain select * from patition_talbe where gender = ‘M’;

3) External Talbe 外部表

注:指向已经存在HDFS中的的数据,可以创建partition分区;

他和内部表元数据的组织上是相同的,而实际数据存储则有较大差距

外部表只有一个过程,加载数据和创建表同时完成,且不会移动到数据仓库中,只是与外部数据建立一个连接;当删除一个外部表时仅删除该连接;

a. 建立一张外部表

Step1.在hadoop上建立3个数据文件(假如叫 student01,student02,student03)

Step2.将3个文件放在hadpoo上的input 文件夹下

使用命令: hdfs dfs -input student01.txt /input

Step3.创建

External table external_table(eid int, ename string,age int)

Row format delimited fields terminated by ‘,’

on ‘/input’;

Step4.删除后查询

hdfs dfs -rm /input/student03.txt;

Select * from external_table;

4) Bucket 通表

三.视图(类似于表)

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文