modify scripts

This commit is contained in:
2025-07-12 13:59:28 +08:00
parent 96790a8365
commit 83d0745695
5 changed files with 436 additions and 0 deletions

View File

@ -0,0 +1,63 @@
我提供的文件,是 paperless 的SQLite数据库的关键表。现在我们编写它的 PAPERLESS_POST_CONSUME_SCRIPT。需求如下
1, 我们提供的pdf文件格式为 {publish_date}_{report_type}_{org_sname}_{industry_name}_{stock_name}_{title}.pdf
2我们提取上面的各个字段然后
1 report_type 对应到 documents_documenttype.name 所以我们要查询 documents_documenttype 表如果对应的name不存在则插入一条记录然后得到对应的 documents_documenttype.id
2 org_sname 对应到 documents_correspondent.name 所以我们要查询 documents_correspondent 表如果对应的name 不存在,则插入一条记录,然后得到对应的 documents_correspondent.id
3 检查 documents_customfield 表是否包含 '行业' 和 '股票名称' 字段,如果不存在,则创建; 查到他们分别对应的 documents_customfield.id , 记为 stockname_id, industry_id
3我们开始更新数据表
1 更新 documents_document 表对应的记录, reated = publish_date, correspondent_id = documents_correspondent.id , document_type_id = documents_documenttype.id, title={title}
2) 向 documents_customfieldinstance 两条记录,分别为 (document_id, stockname_id, stock_name) 和 (document_id, industry_id, industry_name)
好了请你根据以上需求完成这个python脚本。注意异常情况的处理以及日志输出。如果文件名无法匹配以上的格式则忽略不用处理。
Paperless makes use of the Django REST Framework standard API interface. It provides a browsable API for most of its endpoints, which you can inspect at http://<paperless-host>:<port>/api/. This also documents most of the available filters and ordering fields.
The API provides the following main endpoints:
/api/correspondents/: Full CRUD support.
/api/custom_fields/: Full CRUD support.
/api/documents/: Full CRUD support, except POSTing new documents. See below.
/api/document_types/: Full CRUD support.
/api/groups/: Full CRUD support.
/api/logs/: Read-Only.
/api/mail_accounts/: Full CRUD support.
/api/mail_rules/: Full CRUD support.
/api/profile/: GET, PATCH
/api/share_links/: Full CRUD support.
/api/storage_paths/: Full CRUD support.
/api/tags/: Full CRUD support.
/api/tasks/: Read-only.
/api/users/: Full CRUD support.
/api/workflows/: Full CRUD support.
/api/search/ GET, see below.
All of these endpoints except for the logging endpoint allow you to fetch (and edit and delete where appropriate) individual objects by appending their primary key to the path, e.g. /api/documents/454/.
The objects served by the document endpoint contain the following fields:
id: ID of the document. Read-only.
title: Title of the document.
content: Plain text content of the document.
tags: List of IDs of tags assigned to this document, or empty list.
document_type: Document type of this document, or null.
correspondent: Correspondent of this document or null.
created: The date time at which this document was created.
created_date: The date (YYYY-MM-DD) at which this document was created. Optional. If also passed with created, this is ignored.
modified: The date at which this document was last edited in paperless. Read-only.
added: The date at which this document was added to paperless. Read-only.
archive_serial_number: The identifier of this document in a physical document archive.
original_file_name: Verbose filename of the original document. Read-only.
archived_file_name: Verbose filename of the archived document. Read-only. Null if no archived document is available.
notes: Array of notes associated with the document.
page_count: Number of pages.
set_permissions: Allows setting document permissions. Optional, write-only. See below.
custom_fields: Array of custom fields & values, specified as { field: CUSTOM_FIELD_ID, value: VALUE }
以上是paperless提供的api。我们现在使用 http://localhost:8000 来访问它。那么我想对编号为19的文档进行查询以及更新操作应该如何写对应的python代码