1.What is a DataWareHouse?Explain The Characteristics?
Integ : Data from separate sources must be put into a consistent format thru Data Transformations and Data Cleansing.
Non-Volatile : Data Once entered into the warehouse can’t be changed.
Time-Variant : Creates a time dimension to the Data ware house. Can view patterns and trends over time.
Online Transactional Processing | Online Analytical Processing |
Functional : Day to Day Operations | Decision Support |
Db Design : Application Oriented | Subject Oriented. |
Data : Current Up to date | Historical Data |
Detailed, Flat,Relational | Summarized, Isolated |
Higly Normalized | Highly De-Normailized |
Unit of work : Short, Simple, Transaction | Complex Query |
3.Diff between Datawarehouse and DataMart?
A data warehouse is for very large databases (VLDBs) | A data mart is for smaller databases. |
A data warehouse is a database designed to support a broad range of decision tasks in a specific organization. | A data mart is a selected part of the data warehouse which supports specific decision support application requirements of a company’s department Is a Smaller Subset of Data ware House. |
Has More Subject-Oriented Areas | Has Less Subject – Oriented Areas |
Size is > 100 GB | Size is <> |
Running Ad hoc(unplanned) Queries from a huge data warehouse takes long time. | Running ad hoc queries in data marts allow the efficient execution of predicted queries. |
It is Enterprise Level One | It is Smaller Scale One. |
4. What are Surrogate Keys?
A unique Identifier or Number for each row , which is used as a Substitute for Natural Key such as Customer no in Customer table.
Used as a primary key because normal natural key will be changing or updating.
5.What is Star Schema, Snow Flake Schema,Fact Constellation Schema/(Galaxy Schema)?
A Star Schema is a Central Fact table is Surrounded by De-Normalized Dimensions.
A Snow Flake Schema is a Central Fact table is surrounded by Normalized Dimensions, split Dimension table data into other tables.
A Fact Constellation Schema is the one in which multiple fact tables share the dimension tables. Can also be called collection of stars (Galaxy Schema).
6.Advantages Of Star Schema Over Snow Flake Schema, Drawbacks of Snow Flake Schema?
ADVANTAGES OF STAR SCHEMA | DRAWBACKS OF SNOW FLAKE |
Easy to Understand | Time Consuming Joins |
Easy to define Hierarchies | Report Generation Slow |
Reduces No: of Physical Joins |
7.What is Dimensional Modelling?
The way to identify which information belongs to a central fact table and which information belongs to its associated dimension tables.
8.What is a Junk Dimension?
It is a convenient way to separate the random attributes in a dimension into another dimension.
A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.
The fact would contain several metrics (principal amount, net amount, price per share, commission, margin amount, etc.) and would be related to several dimensions such as account, date, rep, office, exchange, etc. This fact would also contain several codes and flags that were related to the transaction rather than any of the dimensions ... such as origin code (that indicates whether the trade was initiated with a phone call or via the Web), a reinvest flag (that indicates whether or not this trade as was the result of the reinvestment of a dividend payout) and a comment field for storing special instructions from the customer. These three attributes would normally be removed from the fact table and stored in a junk dimension ... perhaps called the trade dimension. In this way, the number of indexes on the fact table would be reduced, and performance (not to mention ease of use) would be enhanced. Hope this helps.
9.How do u load data into Date Dimension?
By using the Maxdate and MinDate.,with a stored procedure.
10.What are Additive Facts,Semi-Additive Facts,Non-Additive Facts?
Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
11. What is Staging Area?
Provides a place and an area with a set of functions to clean, change, combine, convert, deduplicate and prepare source data for storage and use in the data warehouse.
12.What is Factless Fact? What are different types?
A Fact table which do not have any facts.They may consist of nothing but keys.
Types :
*. Event Tracking. : It is a table that records an event. Eg: Student Tracking System that detects each student attendance event each day.
*. Coverage Tables : These are frequently needed when a primary fact table in a dimensional data warehouse is sparse(small).
13.What is ETL & CDC?
ETL-(Extraction,Transaction & Loading)Data Extraction takes data from source systems, necessary transformations takes place and data load takes extracted data and loads it into the data warehouse.(where the transformation takes place)
CDC-(Change Data Capture) Captures the data which is changing and will be used for transformations.(Which data should be picked up)
14.Explain the Datawarehouse LifeCycle?
Extraction - As a first step, heterogeneous data from different online transaction processing systems is extracted. This data becomes the data source for the data warehouse.
Cleansing/transformation - The source data is sent into the populating systems where the data is cleansed, integrated, consolidated, secured and stored in the corporate or central data warehouse.
Distribution - From the central data warehouse, data is distributed to independent data marts specifically designed for the end user.
Analysis - From these data marts, data is sent to the end users who access the data stored in the data mart depending upon their requirement.
15. What is the aim/objective of having a data warehouse? And who needs a data warehouse? Or what is the use of Data Warehousing?
Data warehousing technology comprises a set of new concepts and tools which support the executives, managers and analysts with information material for decision making.
The fundamental reason for building a data warehouse is to improve the quality of information in the organization.
The main goal of data warehouse is to report and present the information in a very user friendly form.
16. What is Confirmed Dimension ,Confirmed Fact , Stoved Pipe?
- Dimensions which are 100% shared across Star Schema is called Confirmed Dimension.
- Keeping the consistency of same unit of measurement and same background calculation across schema is called Confirmed Fact.
- Any Data ware house that are not having Confirmed Fact/Dimension are called Stoved Pipes.
- 100% shared---Tables Structure should be consistent.
17.What is S.C.D? What are different Types?
Slowly changing dimensions are dimension tables that have slowly increasing dimension data, as well as updates to existing dimensions.
Use the Type 3 Dimension mapping to update a slowly changing dimension table when you want to keep only current and previous versions of column data in the table. Both versions of the specified column or columns are saved in the same row.
18. What is S.P.T and S.G.D?
Inserts all source rows. Use the truncate target table option in the session properties, or use a pre-session shell command to drop or truncate the target before each session run. | |||
A dimension that has a key only, no additional attributes.
I have a fact table that stores insurance contracts and one important dimension is the year signed. So the fact table does have many columns, like CUSTOMER_ID, CONTRACT_ID, etc and one column YEAR_SIGNED as varchar(4). The CUSTOMER_ID is the foreign key column to the DIM_CUSTOMER with all the customer date, name address, .... CONTRACT_ID relates to the DIM_CONTRACT with all the contract specific information. Any YEAR_SIGNED? Should I really have a DIM_YEAR_SIGNED and it will have one column only. What other attributes should a year have?
Therefore, we do not create an explicit dimension table and call that YEAR_SIGNED column a degenerated dimension.