Postgres synchronization solution P&L: -8 (≃ -678 CNY)
This is a challenging computer problem, data synchronization but I have an idea
I plan to use a inefficient rolling hash and sorting to solve the problem of synchronization. This is similar to rsync
我实现了一个散列,它采用以前的数据和当前列以及以前的散列来生成整个数据库的散列。
这允许我们在编写同步器部分时以最少的数据传输进行同步,该部分将重新散列所有自己的数据,然后检索已排序数据的二进制搜索的散列。
I implemented a hash that takes previous data and current column and the previous hash to produce a hash of the entire database.
This allows us to synchronize with the minimum of data transmissions when I write the synchronizer part which shall rehash all its own data, then retrieve the hash of a binary search of the sorted data.
我对如何解决“获胜”副本问题有一个想法。
有一个单独的表,对每个列字段和行进行哈希处理并为其提供一个版本。
这是比较的版本。
I have an idea on how to solve the "winning" copy problem.
Have a separate table that hashes every column field and row and gives it a version.
This is the version that is compared.
我的 vagrant 设置使用永久性磁盘并使用 ansible 来部署 cronjob 和同步脚本。它由 YAML 文件配置。我还安装了 psycopg2,并找到了有关如何在 Postgres 中检索数据库中表的文档。现在只需编写同步算法即可。
我的问题是检测哪一方是获胜副本。
当一侧更改数据时,应有不同的散列并且检测到更改的行。这部分我明白了。
问题是检测哪一方是最新的变化,哪一方应该获胜。我可能需要介绍一个版本列。
如果我有一个最后更新的时间戳字段,我可以使用它。或版本列,但我明确试图避免将新列引入架构。这意味着它要困难得多。
My vagrant setup uses persistent disks and uses ansible to deploy the cronjob and sync script. It is configured by YAML file. I've also installed psycopg2 and I found documentation on how to retrieve the tables in a database in Postgres. It's just a matter of writing the sync algorithm now.
My problem is detecting which side is the winning copy.
When one side changes the data there shall be a different hash and the changed rows are detected. This part I understand.
The problem is detecting which side is the latest change and which side should win. I might need to introduce a version column.
If I had a last updated timestamp field I could use that. Or a version column but I am expressly trying to avoid introducing new columns to the schema. It means it's a lot harder.