How does the fulltext processing work?
After a document was stored into a file cabinet with enabled fulltext support, the fulltext for this document is created:
Data is entered into the TASK_PROCESS (on DocuWare Version 6.0 - 6.6) or DWTASKS table (on DocuWare Version 6.7 - 7)
DocuWare Version 6.0 - 6.6: Data is entered into the TASK_PROCESS table
All document sections (parts of a document), that have to be indexed by the fulltext are entered with the task type 0 into the TASK_PROCESS table in the dwdata database first.
Overview on STATUS (Fulltext status) in the TASK_PROCESS table:
TASK 0: Create fulltext for one document section (document can be identified through the SECTIONID in the file_cabinet_SECT table)
TASK 1: Upgrade Task from an older DocuWare version
TASK 2: Delete fulltext data
TASK 3: Create new or reset fulltext data for a whole file cabinet (file cabinet can be identified through the FCGUID)
DocuWare Version 6.7 - 7: Data is entered into the DWTASKS table
All document sections (parts of a document), that have to be indexed by the fulltext are entered with the TASK_TYPE 0 or TASK_TYPE 2 into the DWTASKS table in the dwsystem database first.
Overview on STATUS (Fulltext status) in the DWTASKS table:
TASK_TYPE 0: OCR recognition is processed for the document
TASK_TYPE 1: The textshot for Intelligent Indexing is created
TASK_TYPE 2: Fulltext information are processed for the document
TASK_TYPE 3: During the deletion of a document the needed fulltext infomration are deleted from SOLR
TASK_TYPE 4: Upgrade Task from an older DocuWare Version (Please delete this always manually if it occurs after the DocuWare Version 6.5 because it is not needed any more and it can cause problems - Make a backup of the dwsystem database before!)
TASK_TYPE 5: Reset of the fulltext information in a file cabinet
Data is written into the:
- _PAGE and _SECT table: The document sections are processed further. The corresponding data records are deleted from the TASK_PROCESS resp. DWTasks table and the created information is written into the corresponding _SECT and _PAGE table.
- _SECT table: During the fulltext processing, all documents with their sections are written into the file_cabinet_SECT table. A document can have one or more sections. Several sections of a document can occur on an Email with several attachments for instance. In addition a section can have several document pages (for instance a PDF with several pages).
- file_cabinet_PAGE table: First the document pages are written with the FTSTATUS 0 into the file_cabinet_PAGE table, then the textshot is created (FTSTATUS 1) and the fulltext is transferred to SOLR (FTSTATUS 3).
Overview of FTSTATUS in _PAGE table:
0 = New
1 = Textshot successfully created
2 = Error during creation of the Textshot
3 = Textshot successfully transferred to SOLR
4 = Error during transfer to SOLR
Only after the fulltext data of a document was completely transferred to the fulltext index path and SOLR, the document can be found via the fulltext search.
Check the fulltext status of a document: KBA-35310
Fulltext and SOLR: KBA-35311
Check Fulltext textshot of a document: KBA-34944