modify scripts
This commit is contained in:
@ -34,7 +34,23 @@ environment:
|
||||
PAPERLESS_POST_CONSUME_SCRIPT: "/usr/src/paperless/scripts/parse_filename.py"
|
||||
|
||||
|
||||
paperless 默认不会删除重复的文件,这会导致如果重复添加,会不停扫描,加载,报错。没找到配置,直接修改源码解决:
|
||||
对于无法简单读取pdf内容的文档,paperless会启动OCR扫描,且复杂情况下会执行两遍,非常慢而且消耗资源。只能通过修改源码解决:
|
||||
/usr/src/paperless/src/paperless_tesseract/parsers.py :
|
||||
|
||||
# force skip ocr process.
|
||||
if not original_has_text:
|
||||
original_has_text = True
|
||||
text_original = "this is default content, as we skipped ocr process..."
|
||||
self.log.warning("Cannot read text from Document, use default message.")
|
||||
|
||||
if skip_archive_for_text and original_has_text:
|
||||
self.log.debug("Document has text, skipping OCRmyPDF entirely.")
|
||||
self.text = text_original
|
||||
return
|
||||
|
||||
|
||||
|
||||
paperless 默认不会删除重复的文件,这会导致如果重复添加,会不停扫描,加载,报错。没找到配置,直接修改源码解决:(已经有配置,详见 docker-compose.yml)
|
||||
|
||||
/usr/src/paperless/src/documents/consumer.py
|
||||
|
||||
|
||||
Reference in New Issue
Block a user