    在上述前提下,在大数据时代建设数据仓库,着重要解决的就是打通企业内数据和企业外数据,实现“全数据”的挖掘和应用,这是大数据的精髓所在。实现全数据的分析,可以更加全面的定位问题,提出解决办法; 针对未来的预测可以更加精确,利用数据挖掘算法实现辅助决策,而分析算法的精确性,取决于影响预测结果的变量多样性和准确性。用户可以针对任何问题进行分析和查询,也就是说,分析系统要提供更加友好的操作体验,更加明细的数据粒度;同时在大数据量的情况下,也能保证分析过程的高效率;数据混搭和数据分享,强调企业内部数据和外部数据的综合分析,以及数据的货币化;而传统数据仓库,由于只专注于打通企业内部的业务系统孤岛,获取到的是企业内部数据,是影响问题的企业内部因素,而引起问题的原因往往是复杂的,除了企业自身因素外,外部的宏观因素、社会因素也是必不可少的分析内容。所以大数据环境下数据仓库的建设目标是实现跨系统数据共享,解决信息孤岛,提升数据质量,辅助决策分析,提供统一的数据服务。
关键词:大数据环境下  仓库建设  方案设计
      The data warehouse is a data integration of history, the subject, the analysis of the relationship between data in multi-dimensional hierarchical, provides the function of data mining, data for the knowledge, to provide services for decision support. With the development of the Internet and the concept of Internet of things, the data become geometric growth, and large data is also the focus of research.
Based on the above condition, in the era of big data warehouse construction, an important problem is the open enterprise data and business data, to achieve full data mining and application ", this is the essence of big data. To achieve full data analysis, can be more comprehensive positioning problems, propose solutions for the future; it can be more accurate, the use of data mining algorithms and analysis of decision-making, the accuracy of the algorithm, depending on the results of variable accuracy and diversity. The user can analyze and query for any problem, that is to say, the system analysis to provide a more user-friendly experience, more detailed data granularity; at the same time in the large amount of data, but also to ensure the high efficiency of the analysis process; data mashups and data sharing, emphasizing the comprehensive analysis of internal and external data. And the data of currency; while the traditional data warehouse, because the island business systems only focus on open access to the enterprise, is the enterprise internal data, is the impact of the internal problem of enterprise, and the cause of the problem is often complex, in addition to their own factors, content analysis of macro factors and external society is an essential factor. Therefore, the goal of building a data warehouse in large data environment is to realize cross system data sharing, to solve information silos, to enhance data quality, to aid decision-making analysis, and to provide a unified data service.
The completion of this set is in the above context, research data warehouse under the data warehouse construction program.
Key words: large data environment; warehouse construction; project design

                   目  录

一、大数据环境下建立数据仓库数据库结构和设置数据源 3
1.任务描述 3
2.大数据环境下建立数据仓库数据库 3
3.设置数据源 3
二、销售数据Hbase数据库分析 3
1.任务描述 3
2.设计星型架构多维数据集(Sales) 3
3.设计存储和数据集处理 5
4.Hbase数据库分析 6
三、人力资源数据Hbase数据库分析 6
1.任务描述 6
2.设计父子维度的多维数据集(HR) 7
3.修改多维数据集(HR)的结构 7
4.设计存储和数据集处理 8
5.Hbase数据库分析 8
四、数据仓库及多维数据集其它操作 8
1.任务描述 8
2.设置数据仓库及多维数据集角色及权限 8
3.查看元数据 9
4.创建对策 10
5.钻取 11
6.建立远程 Internet 连接 12
五、数据仓库高级操作 12
1.任务描述 12
2.创建分区 13
3.创建虚拟多维数据集 13
4.DTS调度多维数据集处理 14
5.备份/还原数据仓库  15
六、数据挖掘 15
1.任务描述 15
2.创建揭示客户模式的决策树挖掘模型 15
3.决策树挖掘结果分析 16
4.创建聚类挖掘模型 17
5.聚类挖掘结果分析 18
6.创建基于关系数据表的决策树挖掘模型 18
7.浏览“相关性网络”视图 19

