Types of Databases


#1

MPP (Massively Parallel Processing)

MPP databases are relational databases that store data and perform processing across multiple machines (nodes), as opposed to single-server machines. The benefits of MPP databases usually include:

  • Shared-nothing architecture, which translates into “a no single point of failure” characteristic that is desirable for enterprise and high-availability solutions
  • Linear scaling with the addition of nodes for processing and storage
  • Higher data ingestion rates due to parallelized data movement

In general they have the following properties regarding their respective SQL dialects:

  • They either can’t support correlated subqueries or they are very inefficient at them
  • They use a sort key and distribution key rather than an index for speeding up queries (see below for more detail)

Redshift

  • Amazon-hosted, cloud-based, data warehouse product
  • Announced November 2012
  • Popular with e-commerce companies

HP Vertica Analytics Platform

  • Cluster-based, column-oriented data warehouse product
  • Claims petabyte scalability on commodity hardware
  • Acquired by HP in March 2011
  • Expensive: ~$150k/TB is one quote I found on the Internet vs. $1k/TB/year on Redshift

Teradata Aster

  • Teradata acquired remaining part of Aster in March 2011
  • Considered their “Data Discovery” Platform
  • Runs their brand of SQL-MapReduce via a Queen/Worker hardware architecture

Teradata

  • Standard Teradata data warehouse offering with an ANSI-compliant SQL dialect
  • Naturally row-store, MPP, shared-nothing architecture
  • Expensive; found usually at large, enterprise clients
  • Features an advanced query optimizer and workload management system

Greenplum

  • Shared-nothing, massive parallel architected data warehouse built on PostgreSQL
  • Acquired by EMC in July 2010, became part of Pivotal in 2012

IBM Netezza

  • Data warehouse appliance with shared-nothing architecture
  • Processing done by “S-blades” with custom chips to speed up processing
  • Founded 1999, acquired by IBM in 2010

Infobright?

  • Not sure if it belongs in MPP
  • Focused on machine-generated data (“internet of things” in marketing speak)

Snowflake Computing

  • Based on Amazon Web Services, it spins up a data warehouse quickly and cheaply over data stored in S3
  • Compute and Store are separate functions

unlisted #2

listed #3