문제 설명
SQL 테이블에서 누락된 날짜를 채우는 방법 (How to fill the missing date in a sql table)
CREATE TABLE IF NOT EXISTS date_test1 ( items CHAR ( 8 ), trade_date date );
INSERT INTO `date_test1` VALUES ( 'a', '2020‑03‑20');
INSERT INTO `date_test1` VALUES ( 'b', '2020‑03‑20');
INSERT INTO `date_test1` VALUES ('a', '2020‑03‑21');
INSERT INTO `date_test1` VALUES ( 'c', '2020‑03‑22');
INSERT INTO `date_test1` VALUES ( 'd', '2020‑03‑22');
INSERT INTO `date_test1` VALUES ('a', '2020‑03‑25');
INSERT INTO `date_test1` VALUES ( 'e', '2020‑03‑26');
이 표에서 '2020‑03‑23' 및 '2020‑03 ‑24'가 빠졌습니다. 이 테이블의 '2020‑03‑22' 데이터인 이전 데이터로 채우고 싶습니다. 예상 결과:
연속 누락된 날짜의 수와 하루의 기록 수는 모두 불확실합니다. mysql에서 이것을 어떻게 합니까?
참조 솔루션
방법 1:
This solution uses Python and assumes that there aren't so many rows that they cannot be read into memory. I do not warrant this code free from defects; use at your own risk. So I suggest you run this against a copy of your table or make a backup first.
This code uses the pymysql
driver.
import pymysql
from datetime import date, timedelta
from itertools import groupby
import sys
conn = pymysql.connect(db='x', user='x', password='x', charset='utf8mb4', use_unicode=True)
cursor = conn.cursor()
# must be sorted by date:
cursor.execute('select items, trade_date from date_test1 order by trade_date, items')
rows = cursor.fetchall() # tuples: (datetime.date, str)
if len(rows) == 0:
sys.exit(0)
groups = []
for k, g in groupby(rows, key=lambda row: row[1]):
groups.append(list(g))
one_day = timedelta(days=1)
previous_group = groups.pop(0)
next_date = previous_group[0][1]
for group in groups:
next_date = next_date + one_day
while group[0][1] != next_date:
# missing date
for tuple in previous_group:
cursor.execute('insert into date_test1(items, trade_date) values(%s, %s)', (tuple[0], next_date))
print('inserting', tuple[0], next_date)
conn.commit()
next_date = next_date + one_day
previous_group = group
Prints:
inserting c 2020‑03‑23
inserting d 2020‑03‑23
inserting c 2020‑03‑24
inserting d 2020‑03‑24
Discussion
With your sample data, after the rows are fetched, rows
is:
(('a', datetime.date(2020, 3, 20)), ('b', datetime.date(2020, 3, 20)), ('a', datetime.date(2020, 3, 21)), ('c', datetime.date(2020, 3, 22)), ('d', datetime.date(2020, 3, 22)), ('a', datetime.date(2020, 3, 25)), ('e', datetime.date(2020, 3, 26)))
After the following is run:
groups = []
for k, g in groupby(rows, key=lambda row: row[1]):
groups.append(list(g))
groups
is:
[[('a', datetime.date(2020, 3, 20)), ('b', datetime.date(2020, 3, 20))], [('a', datetime.date(2020, 3, 21))], [('c', datetime.date(2020, 3, 22)), ('d', datetime.date(2020, 3, 22))], [('a', datetime.date(2020, 3, 25))], [('e', datetime.date(2020, 3, 26))]]
That is, all the tuples with the same date are grouped together in a list so it becomes to easier to detect missing dates.