Structure of databases/architecture


#1

OLAP vs. OLTP

The OLAP (Online Analytics Processing) approach typically satisfies use cases such as report building, forecasting, and predictive analytics by enabling the user to perform operations such as rollup, drill-down, and slicing. There is also ROLAP. ROLAP’s main differentiator is that it doesn’t require the pre-computation and storage of information. Instead, it uses the mechanics of SQL and relational set theory to perform aggregation and computation at query time. This jives with an “in-database”, “no summation”, and “late-binding schema”.

Classic OLAP Operation: Query on “how much did we make off of jean sales in August 2012”.

OLAP-Oriented DBs: Redshift, Vertica, Teradata

In contrast with OLAP, OLTP (Online Transaction Processing) are operations that are tailor made for quick data entry and retrieval. It is usually characterized as high concurrency, small I/O operations. Because of this, it satisfies use cases such as order entry, retail sales, and financial transactions. Databases that are optimized for these use cases are likely going to focus on redundancy, atomicity of operations, and high-throughput. They typically are not optimized for joins and are therefore not preferred for analytics processing.

Classic OLTP Operation: Log a deposit from an ATM and print out a receipt with the balance.

OLTP-Oriented DBs: Oracle, SQL Server

Big Picture: In the ideal enterprise ecosystem, the OLTP systems serve as the source systems built to handle raw data logging and gathering, which then feed the OLAP systems which are designed to perform analytical querying on the data.

Columnar database, column-oriented DBMS

as opposed to row-oriented or row-store DBMS

The difference is in the way that the table data is serialized on disk. In a columnar database, the table is stored contiguously by column. In a row-store database, they are stored by row.

See: http://stackoverflow.com/questions/2133017/what-is-a-columnar-database

Advantages of a columnar database:

  • More compressible
  • Aggregate functions and range queries are faster
  • Wide tables impact performance less
  • More efficient at updating a column or columns for a table at once
  • Better suited for OLAP-like (Online analytic processing) workloads with complex queries over all data. Not suited for OLTP.

Advantages of a row-store database include:

  • Quicker to retrieve the entire row, especially when there are many columns
  • More efficient at writing (insert/update/delete) a row
  • Better for OLTP-like (Online transaction processing) workloads with more interactive transactions, okay at OLAP as well.

The SQL-on-Hadoop file formats (Parquet, ORC) are columnar storage formats.

JDBC

Java Database Connectivity (JDBC) is a standard Java API that defines how a client can access a database. Most BI tools use JDBC to connect to databases.

MySQL specific: InnoDB vs. MyISAM

See this Stack Overflow article: http://stackoverflow.com/questions/20148/myisam-versus-innodb


unlisted #2

listed #3