PostgreSQL10搭建时间序列数据库TimescaleDB

说明

这里只记录搭建和简单测试过程

不保证ctrl+cctrl+v能完整跑起来

操作系统使用的CentOS-7.6.1810 x86_64

PostgreSQL版本号10.8

TimescaleDB版本号1.3.0

虚拟机配置1CPU 2G内存 20G系统盘

postgres用户默认情况下PGDATA变量是/var/lib/pgsql/10/data

这里没有使用数据盘,有需要的可以自行调整!

TimescaleDB简介

这段介绍是来自德哥的Github文档TimescaleDB介绍

TimescaleDB是基于PostgreSQL数据库打造的一款时序数据库,插件化的形式,随着PostgreSQL的版本升级而升级。

架构图

环境准备

安装PostgreSQL

这个参考PostgreSQL10安装部署和初始化

TimescaleDB安装

TimescaleDB的YUM包已经被集成到PostgreSQL社区源里面,所以直接装就是了

安装软件包

1
yum install -y timescaledb_10

配置PostgreSQL

切换到postgres用户

1
su - postgres

$PGDATA/postgresql.conf添加配置

1
shared_preload_libraries = 'timescaledb'

重启PostgreSQL

切换到postgres用户

1
2
su - postgres
pg_ctl restart -D $PGDATA

验证TimescaleDB功能

登录PostgreSQL

1
psql -U postgres

创建名为tutorial的数据库

1
CREATE database tutorial;

切换到tutorial数据库

1
\c tutorial

加载TimescaleDB的extensions

1
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

示例输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 1.3.0
For more information on TimescaleDB, please visit the following links:

1. Getting started: https://docs.timescale.com/getting-started
2. API reference documentation: https://docs.timescale.com/api
3. How TimescaleDB is designed: https://docs.timescale.com/introduction/architecture

Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescaledb.com/using-timescaledb/telemetry.

CREATE EXTENSION

创建一个普通SQL标准的表

1
2
3
4
5
6
CREATE TABLE conditions (
time TIMESTAMPTZ NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION NULL,
humidity DOUBLE PRECISION NULL
);

查看表结构

1
2
3
4
5
6
7
8
tutorial=# \d conditions
Table "public.conditions"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
time | timestamp with time zone | | not null |
location | text | | not null |
temperature | double precision | | |
humidity | double precision | | |

使用create_hypertable创建hypertable

1
2
3
4
5
SELECT create_hypertable('conditions', 'time');
create_hypertable
-------------------------
(1,public,conditions,t)
(1 row)

这时再看表结构的时候,会发现不一样了

1
2
3
4
5
6
7
8
9
10
11
12
tutorial=# \d conditions
Table "public.conditions"
Column | Type | Collation | Nullable | Default
-------------+--------------------------+-----------+----------+---------
time | timestamp with time zone | | not null |
location | text | | not null |
temperature | double precision | | |
humidity | double precision | | |
Indexes:
"conditions_time_idx" btree ("time" DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON conditions FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()

插入数据

1
tutorial=# INSERT INTO conditions(time, location, temperature, humidity) VALUES (NOW(), 'office', 70.0, 50.0);

查询数据

1
2
3
4
5
tutorial=# SELECT * FROM conditions ORDER BY time DESC LIMIT 100;
time | location | temperature | humidity
-------------------------------+----------+-------------+----------
YYYY-MM-DD HH:mm:SS.354351+08 | office | 70 | 50
(1 row)

TimescaleDB Tutorials

这里使用TimescaleDB官方的测试样例

地址在这里

纽约TAXI数据透视分析

测试数据

这里直接下载的同时解压压缩包

1
wget -O - https://timescaledata.blob.core.windows.net/datasets/nyc_data.tar.gz | tar xz

解压出来有三个文件

  • nyc_data_contagg.sql
  • nyc_data.sql
  • nyc_data_rides.csv

创建数据库

1
2
3
CREATE DATABASE nyc_data;
\c nyc_data
CREATE EXTENSION IF NOT EXISTS timescaledb CASCADE;

创建表结构

1
psql -U postgres -d nyc_data -h localhost < nyc_data.sql

导入数据

1
psql -U postgres -d nyc_data -h localhost -c "\COPY rides FROM nyc_data_rides.csv CSV"

查看表

1
2
3
4
5
6
7
8
nyc_data=# \dt
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+----------
public | payment_types | table | postgres
public | rates | table | postgres
public | rides | table | postgres
(3 rows)

查看表结构

  • payments
1
2
3
4
5
6
nyc_data=# \d payment_types
Table "public.payment_types"
Column | Type | Collation | Nullable | Default
--------------+---------+-----------+----------+---------
payment_type | integer | | |
description | text | | |
  • rates
1
2
3
4
5
6
nyc_data=# \d rates
Table "public.rates"
Column | Type | Collation | Nullable | Default
-------------+---------+-----------+----------+---------
rate_code | integer | | |
description | text | | |
  • rides
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
postgres=# \d+ rides;
Table "public.rides"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
vendor_id | text | | | | extended | |
pickup_datetime | timestamp without time zone | | not null | | plain | |
dropoff_datetime | timestamp without time zone | | not null | | plain | |
passenger_count | numeric | | | | main | |
trip_distance | numeric | | | | main | |
pickup_longitude | numeric | | | | main | |
pickup_latitude | numeric | | | | main | |
rate_code | integer | | | | plain | |
dropoff_longitude | numeric | | | | main | |
dropoff_latitude | numeric | | | | main | |
payment_type | integer | | | | plain | |
fare_amount | numeric | | | | main | |
extra | numeric | | | | main | |
mta_tax | numeric | | | | main | |
tip_amount | numeric | | | | main | |
tolls_amount | numeric | | | | main | |
improvement_surcharge | numeric | | | | main | |
total_amount | numeric | | | | main | |
Indexes:
"rides_passenger_count_pickup_datetime_idx" btree (passenger_count, pickup_datetime DESC)
"rides_pickup_datetime_vendor_id_idx" btree (pickup_datetime DESC, vendor_id)
"rides_rate_code_pickup_datetime_idx" btree (rate_code, pickup_datetime DESC)
"rides_vendor_id_pickup_datetime_idx" btree (vendor_id, pickup_datetime DESC)
Triggers:
ts_insert_blocker BEFORE INSERT ON rides FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.insert_blocker()
Child tables: _timescaledb_internal._hyper_2_10_chunk,
_timescaledb_internal._hyper_2_11_chunk,
_timescaledb_internal._hyper_2_12_chunk,
_timescaledb_internal._hyper_2_1_chunk,
_timescaledb_internal._hyper_2_2_chunk,
_timescaledb_internal._hyper_2_3_chunk,
_timescaledb_internal._hyper_2_4_chunk,
_timescaledb_internal._hyper_2_5_chunk,
_timescaledb_internal._hyper_2_6_chunk,
_timescaledb_internal._hyper_2_7_chunk,
_timescaledb_internal._hyper_2_8_chunk,
_timescaledb_internal._hyper_2_9_chunk

查询数据

普通查询

1月1日至1月10日,同车超过两人,每天平均计费是多少

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
nyc_data=# SELECT date_trunc('day', pickup_datetime) as day, avg(fare_amount)
nyc_data-# FROM rides
nyc_data-# WHERE passenger_count > 1 AND (pickup_datetime >= '2016-01-01' AND pickup_datetime <= '2016-01-10')
nyc_data-# GROUP BY day ORDER BY day;
day | avg
---------------------+---------------------
2016-01-01 00:00:00 | 13.3990821679715529
2016-01-02 00:00:00 | 13.0224687415181399
2016-01-03 00:00:00 | 13.5382068607068607
2016-01-04 00:00:00 | 12.9618895561740149
2016-01-05 00:00:00 | 12.6614611935518309
2016-01-06 00:00:00 | 12.5775245695086098
2016-01-07 00:00:00 | 12.5868802584437019
2016-01-08 00:00:00 | 12.4288630909742120
2016-01-09 00:00:00 | 11.8049625897430078
2016-01-10 00:00:00 | 27.2500000000000000
(10 rows)

查询每天交易了多少笔,只显示头五个记录

1
2
3
4
5
6
7
8
9
10
11
nyc_data=# SELECT date_trunc('day', pickup_datetime) as day, COUNT(*) FROM rides
nyc_data-# GROUP BY day ORDER BY day
nyc_data-# LIMIT 5;
day | count
---------------------+--------
2016-01-01 00:00:00 | 345037
2016-01-02 00:00:00 | 312831
2016-01-03 00:00:00 | 302878
2016-01-04 00:00:00 | 316171
2016-01-05 00:00:00 | 343251
(5 rows)

使用TimescaleDB内置函数

每5分钟间隔为一个BUCKET,输出每个间隔产生了多少笔订单

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
nyc_data=# SELECT time_bucket('5 minute', pickup_datetime) as five_min, count(*)  
nyc_data=# FROM rides
nyc_data=# WHERE pickup_datetime < '2016-01-01 02:00'
nyc_data=# GROUP BY five_min ORDER BY five_min;

five_min | count
---------------------+-------
2016-01-01 00:00:00 | 703
2016-01-01 00:05:00 | 1482
2016-01-01 00:10:00 | 1959
2016-01-01 00:15:00 | 2200
2016-01-01 00:20:00 | 2285
2016-01-01 00:25:00 | 2291
2016-01-01 00:30:00 | 2349
2016-01-01 00:35:00 | 2328
2016-01-01 00:40:00 | 2440
2016-01-01 00:45:00 | 2372
2016-01-01 00:50:00 | 2388
2016-01-01 00:55:00 | 2473
2016-01-01 01:00:00 | 2395
2016-01-01 01:05:00 | 2510
2016-01-01 01:10:00 | 2412
2016-01-01 01:15:00 | 2482
2016-01-01 01:20:00 | 2428
2016-01-01 01:25:00 | 2433
2016-01-01 01:30:00 | 2337
2016-01-01 01:35:00 | 2366
2016-01-01 01:40:00 | 2325
2016-01-01 01:45:00 | 2257
2016-01-01 01:50:00 | 2316
2016-01-01 01:55:00 | 2250
(24 rows)

每个城市的TAXI交易量

1
2
3
4
5
6
7
8
9
10
11
12
13
nyc_data=# SELECT rates.description, COUNT(vendor_id) as num_trips FROM rides
nyc_data-# JOIN rates on rides.rate_code = rates.rate_code
nyc_data-# WHERE pickup_datetime < '2016-01-08'
nyc_data-# GROUP BY rates.description ORDER BY rates.description;
description | num_trips
-----------------------+-----------
group ride | 17
JFK | 54832
Nassau or Westchester | 967
negotiated fare | 7193
Newark | 4126
standard rate | 2266401
(6 rows)

查看select执行计划

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
postgres=# EXPLAIN SELECT * FROM rides;
QUERY PLAN
----------------------------------------------------------------------------------
Append (cost=0.00..282354.03 rows=5963403 width=258)
-> Seq Scan on _hyper_2_1_chunk (cost=0.00..23485.78 rows=781178 width=113)
-> Seq Scan on _hyper_2_2_chunk (cost=0.00..36322.06 rows=1187506 width=115)
-> Seq Scan on _hyper_2_3_chunk (cost=0.00..32909.67 rows=1075467 width=116)
-> Seq Scan on _hyper_2_4_chunk (cost=0.00..15601.72 rows=518772 width=112)
-> Seq Scan on _hyper_2_5_chunk (cost=0.00..20396.28 rows=281328 width=472)
-> Seq Scan on _hyper_2_6_chunk (cost=0.00..41670.68 rows=574768 width=472)
-> Seq Scan on _hyper_2_7_chunk (cost=0.00..20921.76 rows=288576 width=472)
-> Seq Scan on _hyper_2_8_chunk (cost=0.00..42966.40 rows=592640 width=472)
-> Seq Scan on _hyper_2_9_chunk (cost=0.00..15160.04 rows=209104 width=472)
-> Seq Scan on _hyper_2_10_chunk (cost=0.00..32896.44 rows=453744 width=472)
-> Seq Scan on _hyper_2_11_chunk (cost=0.00..11.60 rows=160 width=472)
-> Seq Scan on _hyper_2_12_chunk (cost=0.00..11.60 rows=160 width=472)
(13 rows)

配合PostGIS插件实现时间+空间数据库

安装PostGIS

1
yum install -y postgis25_10

加载PostGIS

1
2
postgres=# create extension postgis;
CREATE EXTENSION

修改表结构

1
2
3
4
5
6
7
8
postgres=# ALTER TABLE rides ADD COLUMN pickup_geom geometry(POINT,2163);
ALTER TABLE
postgres=# ALTER TABLE rides ADD COLUMN dropoff_geom geometry(POINT,2163);
ALTER TABLE
postgres=# UPDATE rides SET pickup_geom = ST_Transform(ST_SetSRID(ST_MakePoint(pickup_longitude,pickup_latitude),4326),2163);
UPDATE 10906860
postgres=# UPDATE rides SET dropoff_geom = ST_Transform(ST_SetSRID(ST_MakePoint(dropoff_longitude,dropoff_latitude),4326),2163);
UPDATE 10906860

查询数据

查询在(lat, long) (40.7589,-73.9851)附近400米范围内,每30分钟有多少辆的士被乘坐

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
postgres=# SELECT time_bucket('30 minutes', pickup_datetime) AS thirty_min, COUNT(*) AS near_times_sq
postgres-# FROM rides
postgres-# WHERE ST_Distance(pickup_geom, ST_Transform(ST_SetSRID(ST_MakePoint(-73.9851,40.7589),4326),2163)) < 400
postgres-# AND pickup_datetime < '2016-01-01 14:00'
postgres-# GROUP BY thirty_min ORDER BY thirty_min;
thirty_min | near_times_sq
---------------------+---------------
2016-01-01 00:00:00 | 74
2016-01-01 00:30:00 | 102
2016-01-01 01:00:00 | 120
2016-01-01 01:30:00 | 98
2016-01-01 02:00:00 | 112
2016-01-01 02:30:00 | 109
2016-01-01 03:00:00 | 163
2016-01-01 03:30:00 | 181
2016-01-01 04:00:00 | 214
2016-01-01 04:30:00 | 185
2016-01-01 05:00:00 | 158
2016-01-01 05:30:00 | 113
2016-01-01 06:00:00 | 102
2016-01-01 06:30:00 | 91
2016-01-01 07:00:00 | 88
2016-01-01 07:30:00 | 58
2016-01-01 08:00:00 | 72
2016-01-01 08:30:00 | 94
2016-01-01 09:00:00 | 115
2016-01-01 09:30:00 | 118
2016-01-01 10:00:00 | 135
2016-01-01 10:30:00 | 160
2016-01-01 11:00:00 | 212
2016-01-01 11:30:00 | 229
2016-01-01 12:00:00 | 244
2016-01-01 12:30:00 | 230
2016-01-01 13:00:00 | 235
2016-01-01 13:30:00 | 238
(28 rows)

TimescaleDB集群

由于TimescaleDB是PostgreSQL的一个插件,因此可以借助PostgreSQL自身的特性实现集群功能

  • 基于流复制实现主从数据同步
  • 基于PGPool实现读写分离+主从自动切换
  • 基于patroni实现高可用集群
  • 基于PostgreSQL-X2实现多主多读+横向扩展的分布式集群