在 Github Pages 或任何静态主机上托管 SQLite 数据库

译者:这实在是太强了

原文标题:Hosting SQLite databases on Github Pages (or any static file hoster)

原文地址:https://phiresky.github.io/blog/2021/hosting-sqlite-databases-on-github-pages/

原文许可:GPLv3

原文发布时间:2021.4.17 翻译时的原文最终修改时间:2021.5.3

译文最后修改时间: 2023.3.27

注1:原文可能在本篇翻译发布后进行了更新,本篇译文根据译者当时能获取到的最新版本而翻译。译者水平一般,如有错误还恳请读者指正。

注2:下文中的代码在原博客是可以运行的,由于译者水平不足,代码并不能在本文中运行,故贴出正常运行后的输出。您可以点击原文链接去自行感受。


在我发觉我经常写一个网站来充当一个从数据库中获取一些数据,并以图标形式展示的小工具时,我正在写一个小网站,它用来展示某位 Youtube 创作者在一段时间内所收到的赞助。但是一旦你想使用数据库,你要么得写一个后端,这意味这你要一直托管并维护它;要么将整个数据集都下载到浏览器内,而让数据集大于 10 MB 时,这样做并不美好。

之前在我为这些小型附属项目使用后端服务器的时候,有时会遇到一些外部API故障,或者密钥过期了,又或者是我忘记了这玩意儿的后端然后停止为运行它的某台服务器续费。多年后当我重新回来看这些小东西时,它已经没了。这让我很是懊恼,并咒骂自己依赖外部服务,或者责备自己应当要对这些东西上心的时间更久。

托管静态网页比搞“真”服务器来的简单的多 —— 市面上有许多免费且可靠的选择 (例如GitHub、GitLab Pages, Netlify 等等),并且它能拓展到几乎无穷大而不费吹灰之力。

所以我写了一个工具,它能在一个静态托管的网站中使用真正的 SQL 数据库!

下面是使用 世界发展指标数据集(World Development Indicators dataset) 的演示——这个数据集包含6张表,超过800万行数据(共计670MiB)。

SQL 示例

1
select country_code, long_name from wdi_country limit 3;

运行输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
[
  {
    "country_code": "ABW",
    "long_name": "Aruba"
  },
  {
    "country_code": "AFG",
    "long_name": "Islamic State of Afghanistan"
  },
  {
    "country_code": "AGO",
    "long_name": "People's Republic of Angola"
  }
]

Sqlite 统计数据

1
fetched 1.0KB in 1 requests (DB size: 668.8MB)

你可以看到, 我们可以在查询 wdi_country 表时仅需获取 1kB 的数据!

这是一个完整的查询引擎。因此我们可以使用下例的 SQLite Json 函数

SQL 示例

1
2
select json_extract(arr.value, '$.foo.bar') as bar
  from json_each('[{"foo": {"bar": 123}}, {"foo": {"bar": "baz"}}]') as arr

运行输出

1
2
3
4
5
6
7
8
[
  {
    "bar": 123
  },
  {
    "bar": "baz"
  }
]

我们也可以注册 JS 函数以便在查询中调用。下面是一个 getFlag 函数示例,它能够获得国家或地区的旗帜emoji。

JS 示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
function getFlag(country_code) {
  // just some unicode magic
  return String.fromCodePoint(...Array.from(country_code||"")
    .map(c => 127397 + c.codePointAt()));
}

await db.create_function("get_flag", getFlag)
return await db.query(`
  select long_name, get_flag("2-alpha_code") as flag from wdi_country
    where region is not null and currency_unit = 'Euro';
`)

运行输出

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
[
  {
    "long_name": "Principality of Andorra",
    "flag": "🇦🇩"
  },
  {
    "long_name": "Republic of Austria",
    "flag": "🇦🇹"
  },
  {
    "long_name": "Kingdom of Belgium",
    "flag": "🇧🇪"
  },
  {
    "long_name": "Republic of Cyprus",
    "flag": "🇨🇾"
  },
  {
    "long_name": "Federal Republic of Germany",
    "flag": "🇩🇪"
  },
  {
    "long_name": "Kingdom of Spain",
    "flag": "🇪🇸"
  },
  {
    "long_name": "Republic of Estonia",
    "flag": "🇪🇪"
  },
  {
    "long_name": "Republic of Finland",
    "flag": "🇫🇮"
  },
  {
    "long_name": "French Republic",
    "flag": "🇫🇷"
  },
  {
    "long_name": "Hellenic Republic",
    "flag": "🇬🇷"
  },
  {
    "long_name": "Ireland",
    "flag": "🇮🇪"
  },
  {
    "long_name": "Italian Republic",
    "flag": "🇮🇹"
  },
  {
    "long_name": "Republic of Lithuania",
    "flag": "🇱🇹"
  },
  {
    "long_name": "Grand Duchy of Luxembourg",
    "flag": "🇱🇺"
  },
  {
    "long_name": "Republic of Latvia",
    "flag": "🇱🇻"
  },
  {
    "long_name": "St. Martin (French part)",
    "flag": "🇲🇫"
  },
  {
    "long_name": "Principality of Monaco",
    "flag": "🇲🇨"
  },
  {
    "long_name": "Republic of Malta",
    "flag": "🇲🇹"
  },
  {
    "long_name": "Montenegro",
    "flag": "🇲🇪"
  },
  {
    "long_name": "Kingdom of the Netherlands",
    "flag": "🇳🇱"
  },
  {
    "long_name": "Portuguese Republic",
    "flag": "🇵🇹"
  },
  {
    "long_name": "Republic of San Marino",
    "flag": "🇸🇲"
  },
  {
    "long_name": "Slovak Republic",
    "flag": "🇸🇰"
  },
  {
    "long_name": "Republic of Slovenia",
    "flag": "🇸🇮"
  },
  {
    "long_name": "Republic of Kosovo",
    "flag": "🇽🇰"
  }
]

注意,本站是100%托管在托管在静态文件托管(GitHub Pages)上的。

那么,你要如何才能在静态文件托管上使用数据库呢?首先,(使用C语音编写的)SQLite 是能被编译为 WebAssembly 的。SQLite 可以在没有任何修改的情况下使用 emscripten 进行编译,并且 sql.js 库是围绕 WebAssembly 代码的瘦JS打包工具。

但是 sql.js 只允许你去创建和读取完全存储在内存中的数据库——所以我实现了一个虚拟文件系统 sql.js-httpvfs,它能够在 SQLite 尝试从文件系统中读取时使用HTTP范围请求来获取数据库的 chunk。从 SQLite 的角度来看,这看起来是它自己在一台普通电脑上运行,除了里面有个可以读取的名叫 /wdi.sqlite3 的文件外,其他的文件系统就是空的。当然它并不能写入这个文件,但是一个只读的数据库依然十分有用。

由于通过HTTP来获取所有数据有相当大的开销,所以我们要以块的方式来获取数据,并期以在请求数与使用的带宽之间寻找一定的平衡。值得庆幸的是,SQLite 已经将其数据库组织进用户定义大小(默认为 4 KiB)的“页”中。我已经将此数据库的页大小设置为 1 KiB。

下面是一个简单的索引查询示例:

SQL 示例

1
2
select indicator_code, long_definition from wdi_series where indicator_name
    = 'Literacy rate, youth total (% of people ages 15-24)'

运行输出

1
2
3
4
5
6
[
  {
    "indicator_code": "SE.ADT.1524.LT.ZS",
    "long_definition": "Youth literacy rate is the percentage of people ages 15-24 who can both read and write with understanding a short simple statement about their everyday life."
  }
]

SQLite 页读取请求

PageCacheAccess patternTable/IndexPage Type
0hit[system]
17hitsqlite_schemaleaf
57458missrandom (1 KiB XHR)index on wdi_series (indicator_name)leaf
57464missrandom (1 KiB XHR)index on wdi_series (indicator_name)leaf
6hitwdi_footnoteinternal
55684missrandom (1 KiB XHR)wdi_seriesleaf
55774missrandom (1 KiB XHR)wdi_seriesoverflow

运行上述查询并检查日志。SQLite 为该查询执行了7页的读取。

  • 三页的读取只是为了获取架构信息,这些已被缓存。
  • 两页的读取是查询 on wdi_series (indicator_name) 中的索引。
  • 两页的读取是在 wdi_series 表上,第一张是为了通过主键查找行值,第二张是在溢出页中获取文本数据。

索引和表的读取都是 B 树查找。

来一个更复杂的问题:根据 2010 年以后的最新数据,青年识字率最低的国家是哪个?

SQL 示例

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
with newest_datapoints as (
  select country_code, indicator_code, max(year) as year from wdi_data
  join wdi_series using (indicator_code)
  where
    indicator_name = 'Literacy rate, youth total (% of people ages 15-24)'
    and year > 2010
  group by country_code
)
select c.short_name as country, printf('%.1f %%', value) as "Youth Literacy Rate"
from wdi_data
  join wdi_country c using (country_code)
  join newest_datapoints using (indicator_code, country_code, year)
order by value asc limit 10

运行输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
[
  {
    "country": "Chad",
    "Youth Literacy Rate": "30.8 %"
  },
  {
    "country": "Central African Republic",
    "Youth Literacy Rate": "38.3 %"
  },
  {
    "country": "Niger",
    "Youth Literacy Rate": "43.5 %"
  },
  {
    "country": "Guinea",
    "Youth Literacy Rate": "46.3 %"
  },
  {
    "country": "South Sudan",
    "Youth Literacy Rate": "47.9 %"
  },
  {
    "country": "Mali",
    "Youth Literacy Rate": "50.1 %"
  },
  {
    "country": "Liberia",
    "Youth Literacy Rate": "55.4 %"
  },
  {
    "country": "Burkina Faso",
    "Youth Literacy Rate": "58.3 %"
  },
  {
    "country": "Côte d'Ivoire",
    "Youth Literacy Rate": "58.4 %"
  },
  {
    "country": "Guinea-Bissau",
    "Youth Literacy Rate": "60.4 %"
  }
]

SQLite 页读取请求 (太大了,不放)

上面的查询应该会做 10 ~ 20 个 GET 请求,总共获取 130 ~ 270 KiB 数据。请注意,它只需要执行 20 个请求,而不是 270 个请求(对于一个 270 KiB 的文件一次获取 1 KiB 时预期的数量)。这是因为我实现了一个预取系统,该系统试图通过三个单独的虚拟读取头来检测访问模式,并以指数方式增加顺序读取的请求大小。这意味着读取超过数 KiB 数据的索引扫描或表扫描只会导致在扫描的总字节长度上成对数的请求数。您可以通过查看上面页面读取日志中的 “Access pattern” 列来查看其效果。

只有当数据库中的索引与查询匹配良好时,所有这些方法才能很好地工作。例如,上述查询中使用的索引是 INDEX ON wdi_data (indicator_code, country_code, year, value) 。如果该索引不包含值列,则 SQLite 引擎必须执行另一次随机访问(不可预测)读取,从而使 HTTP 请求检索每个数据点的实际值。如果索引是以 country_code, indicator_code, ... 排序的,那么我们将能够快速获取单个国家或地区的所有指标,但不是单个指标的所有国家或地区值。

我们还可以利用 SQLite FTS模块,这样我们就可以对数据库中文本较多的信息进行全文搜索 —— 在下面的例子中,数据库中有超过 1000 个人类发展指标,并且具有更长的描述。

SQL 示例

1
2
3
select * from indicator_search
where indicator_search match 'educatio* femal*'
order by rank limit 10

运行输出

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
[
  {
    "indicator_code": "SL.UEM.BASC.FE.ZS",
    "topic": "Social Protection & Labor: Unemployment",
    "indicator_name": "Unemployment with basic education, female (% of female labor force with basic education)",
    "short_definition": null,
    "long_definition": "The percentage of the labor force with a basic level of education who are unemployed. Basic education comprises primary education or lower secondary education according to the International Standard Classification of Education 2011 (ISCED 2011).",
    "statistical_concept_and_methodology": null,
    "development_relevance": null
  },
  {
    "indicator_code": "SL.UEM.INTM.FE.ZS",
    "topic": "Social Protection & Labor: Unemployment",
    "indicator_name": "Unemployment with intermediate education, female (% of female labor force with intermediate education)",
    "short_definition": null,
    "long_definition": "The percentage of the labor force with an intermediate level of education who are unemployed. Intermediate education comprises upper secondary or post-secondary non tertiary education according to the International Standard Classification of Education 2011 (ISCED 2011).",
    "statistical_concept_and_methodology": null,
    "development_relevance": null
  },
  {
    "indicator_code": "SL.TLF.BASC.FE.ZS",
    "topic": "Social Protection & Labor: Labor force structure",
    "indicator_name": "Labor force with basic education, female (% of female working-age population with basic education)",
    "short_definition": null,
    "long_definition": "The percentage of the working age population with a basic level of education who are in the labor force. Basic education comprises primary education or lower secondary education according to the International Standard Classification of Education 2011 (ISCED 2011).",
    "statistical_concept_and_methodology": null,
    "development_relevance": null
  },
  {
    "indicator_code": "SL.TLF.INTM.FE.ZS",
    "topic": "Social Protection & Labor: Labor force structure",
    "indicator_name": "Labor force with intermediate education, female (% of female working-age population with intermediate education)",
    "short_definition": null,
    "long_definition": "The percentage of the working age population with an intermediate level of education who are in the labor force. Intermediate education comprises upper secondary or post-secondary non tertiary education according to the International Standard Classification of Education 2011 (ISCED 2011).",
    "statistical_concept_and_methodology": null,
    "development_relevance": null
  },
  {
    "indicator_code": "SL.UEM.ADVN.FE.ZS",
    "topic": "Social Protection & Labor: Unemployment",
    "indicator_name": "Unemployment with advanced education, female (% of female labor force with advanced education)",
    "short_definition": null,
    "long_definition": "The percentage of the labor force with an advanced level of education who are unemployed. Advanced education comprises short-cycle tertiary education, a bachelor’s degree or equivalent education level, a master’s degree or equivalent education level, or doctoral degree or equivalent education level according to the International Standard Classification of Education 2011 (ISCED 2011).",
    "statistical_concept_and_methodology": null,
    "development_relevance": null
  },
  {
    "indicator_code": "SL.TLF.ADVN.FE.ZS",
    "topic": "Social Protection & Labor: Labor force structure",
    "indicator_name": "Labor force with advanced education, female (% of female working-age population with advanced education)",
    "short_definition": null,
    "long_definition": "The percentage of the working age population with an advanced level of education who are in the labor force. Advanced education comprises short-cycle tertiary education, a bachelor’s degree or equivalent education level, a master’s degree or equivalent education level, or doctoral degree or equivalent education level according to the International Standard Classification of Education 2011 (ISCED 2011).",
    "statistical_concept_and_methodology": null,
    "development_relevance": null
  },
  {
    "indicator_code": "SE.PRE.TCAQ.FE.ZS",
    "topic": "Education: Inputs",
    "indicator_name": "Trained teachers in preprimary education, female (% of female teachers)",
    "short_definition": null,
    "long_definition": "Trained teachers in preprimary education are the percentage of preprimary school teachers who have received the minimum organized teacher training (pre-service or in-service) required for teaching in a given country.",
    "statistical_concept_and_methodology": "Share of trained teachers is calculated by dividing the number of trained teachers of the specified level of education by total number of teachers at the same level of education, and multiplying by 100. \n\nData on education are collected by the UNESCO Institute for Statistics from official responses to its annual education survey. All the data are mapped to the International Standard Classification of Education (ISCED) to ensure the comparability of education programs at the international level. The current version was formally adopted by UNESCO Member States in 2011. \n\nThe reference years reflect the school year for which the data are presented. In some countries the school year spans two calendar years (for example, from September 2010 to June 2011); in these cases the reference year refers to the year in which the school year ended (2011 in the example).",
    "development_relevance": "Trained teachers refer to teaching force with the necessary pedagogical skills to teach and use teaching materials in an effective manner. The share of trained teachers reveals a country's commitment to investing in the development of its human capital engaged in teaching.\n\nTeachers are important resource, especially for children who are the first-generation of receiving education in their families and heavily rely on teachers in acquiring basic literacy skills. However, rapid increase in enrollments may cause the shortage of trained teachers. Education finance is a key for appropriate teacher allocations, since teacher salaries account for a large share of education budgets. The shortage of trained teacher may result in low qualified teachers in more disadvantaged area."
  },
  {
    "indicator_code": "SE.PRM.TCAQ.FE.ZS",
    "topic": "Education: Inputs",
    "indicator_name": "Trained teachers in primary education, female (% of female teachers)",
    "short_definition": null,
    "long_definition": "Trained teachers in primary education are the percentage of primary school teachers who have received the minimum organized teacher training (pre-service or in-service) required for teaching in a given country.",
    "statistical_concept_and_methodology": "Share of trained teachers is calculated by dividing the number of trained teachers of the specified level of education by total number of teachers at the same level of education, and multiplying by 100. \n\nData on education are collected by the UNESCO Institute for Statistics from official responses to its annual education survey. All the data are mapped to the International Standard Classification of Education (ISCED) to ensure the comparability of education programs at the international level. The current version was formally adopted by UNESCO Member States in 2011. \n\nThe reference years reflect the school year for which the data are presented. In some countries the school year spans two calendar years (for example, from September 2010 to June 2011); in these cases the reference year refers to the year in which the school year ended (2011 in the example).",
    "development_relevance": "Trained teachers refer to teaching force with the necessary pedagogical skills to teach and use teaching materials in an effective manner. The share of trained teachers reveals a country's commitment to investing in the development of its human capital engaged in teaching.\n\nTeachers are important resource, especially for children who are the first-generation of receiving education in their families and heavily rely on teachers in acquiring basic literacy skills. However, rapid increase in enrollments may cause the shortage of trained teachers. Education finance is a key for appropriate teacher allocations, since teacher salaries account for a large share of education budgets. The shortage of trained teacher may result in low qualified teachers in more disadvantaged area."
  },
  {
    "indicator_code": "SE.SEC.TCAQ.FE.ZS",
    "topic": "Education: Inputs",
    "indicator_name": "Trained teachers in secondary education, female (% of female teachers)",
    "short_definition": null,
    "long_definition": "Trained teachers in secondary education are the percentage of secondary school teachers who have received the minimum organized teacher training (pre-service or in-service) required for teaching in a given country.",
    "statistical_concept_and_methodology": "Share of trained teachers is calculated by dividing the number of trained teachers of the specified level of education by total number of teachers at the same level of education, and multiplying by 100. \n\nData on education are collected by the UNESCO Institute for Statistics from official responses to its annual education survey. All the data are mapped to the International Standard Classification of Education (ISCED) to ensure the comparability of education programs at the international level. The current version was formally adopted by UNESCO Member States in 2011. \n\nThe reference years reflect the school year for which the data are presented. In some countries the school year spans two calendar years (for example, from September 2010 to June 2011); in these cases the reference year refers to the year in which the school year ended (2011 in the example).",
    "development_relevance": "Trained teachers refer to teaching force with the necessary pedagogical skills to teach and use teaching materials in an effective manner. The share of trained teachers reveals a country's commitment to investing in the development of its human capital engaged in teaching.\n\nTeachers are important resource, especially for children who are the first-generation of receiving education in their families and heavily rely on teachers in acquiring basic literacy skills. However, rapid increase in enrollments may cause the shortage of trained teachers. Education finance is a key for appropriate teacher allocations, since teacher salaries account for a large share of education budgets. The shortage of trained teacher may result in low qualified teachers in more disadvantaged area."
  },
  {
    "indicator_code": "SE.SEC.TCAQ.LO.FE.ZS",
    "topic": "Education: Inputs",
    "indicator_name": "Trained teachers in lower secondary education, female (% of female teachers)",
    "short_definition": null,
    "long_definition": "Trained teachers in lower secondary education are the percentage of lower secondary school teachers who have received the minimum organized teacher training (pre-service or in-service) required for teaching in a given country.",
    "statistical_concept_and_methodology": "Share of trained teachers is calculated by dividing the number of trained teachers of the specified level of education by total number of teachers at the same level of education, and multiplying by 100. \n\nData on education are collected by the UNESCO Institute for Statistics from official responses to its annual education survey. All the data are mapped to the International Standard Classification of Education (ISCED) to ensure the comparability of education programs at the international level. The current version was formally adopted by UNESCO Member States in 2011. \n\nThe reference years reflect the school year for which the data are presented. In some countries the school year spans two calendar years (for example, from September 2010 to June 2011); in these cases the reference year refers to the year in which the school year ended (2011 in the example).",
    "development_relevance": "Trained teachers refer to teaching force with the necessary pedagogical skills to teach and use teaching materials in an effective manner. The share of trained teachers reveals a country's commitment to investing in the development of its human capital engaged in teaching.\n\nTeachers are important resource, especially for children who are the first-generation of receiving education in their families and heavily rely on teachers in acquiring basic literacy skills. However, rapid increase in enrollments may cause the shortage of trained teachers. Education finance is a key for appropriate teacher allocations, since teacher salaries account for a large share of education budgets. The shortage of trained teacher may result in low qualified teachers in more disadvantaged area."
  }
]

indicator_search FTS 表中的数据总量约为 8 MB。上面的查询应该只获取了大约 70 KiB 的数据。您可以在此处查看其构造方式。

最后,这里更完整地演示了这个系统的实用性 —— 这里有一个交互式图表,显示了一些国家或地区随时间的发展情况,适用于在数据库中的任何指标的任何国家/地区,只要你想:

示例图片
示例图片

注意,许多指标只适用于某些国家,例如,指标“认为丈夫有理由在妻子焚烧食物时殴打妻子的妇女”仅基于在较欠发达国家或地区进行的调查。

扩展: DOM 数据库

既然我们已经在浏览器中运行了一个数据库,为什么不使用一个虚拟表 dom 来使我们的浏览器作为数据库?

SQL 示例

1
2
3
4
select count(*) as number_of_demos from dom
  where selector match '.content div.sqlite-httpvfs-demo';
select count(*) as sqlite_mentions from dom
  where selector match '.content p' and textContent like '%SQLite%';

运行输出

1
2
3
4
5
6
7
8
[
  {
    "number_of_demos": 10
  },
  {
    "sqlite_mentions": 8
  }
]

我们甚至可以将元素直接插入到 DOM 中:

SQL 示例

1
2
3
insert into dom (parent, tagName, textContent)
    select 'ul#outtable1', 'li', short_name
    from wdi_country where currency_unit = 'Euro'

运行之前
运行之前
运行之后
运行之后

并且更新 DOM 中的元素:

SQL 示例

1
2
3
4
5
update dom set textContent =
  get_flag("2-alpha_code") || ' ' || textContent
from wdi_country
where selector match 'ul#outtable1 > li'
  and textContent = wdi_country.short_name

运行之前
运行之前
运行之后
运行之后


当然,文章里的一切都是开源的。sqlite 打包工具的主要实现是 sql.js-httpvfs。这篇博客文章的源代码是一个 pandoc markdown 文件,演示是一个自定义的“受限代码块”React组件

使用 Hugo 构建
主题 StackJimmy 设计