https://bugs.kde.org/show_bug.cgi?id=479180
            Bug ID: 479180
           Summary: baloo sometimes fails to index content of new files;
                    'Mzerosize'
    Classification: Frameworks and Libraries
           Product: frameworks-baloo
           Version: 5.111.0
          Platform: Fedora RPMs
                OS: Linux
            Status: REPORTED
          Severity: major
          Priority: NOR
         Component: Baloo File Daemon
          Assignee: baloo-bugs-n...@kde.org
          Reporter: skierp...@gmail.com
  Target Milestone: ---

SUMMARY
In https://discuss.kde.org/t/how-do-i-troubleshoot-baloo/2830/3? , a user
reported Baloo did not index a .flac file as a music file, commenting
> I found that balooshow -x <file> exists. When I run this for files that are
> shown in Elisa I get the line Property Terms: Maudio Mflac T2 whereas
> the files that aren’t showing in Elisa have the line Property Terms: 
> Mapplication
> Moctet Mstream.

I have not experienced that, but I have noticed that baloo sometimes does not
index the contents of text files. A few times when I create a text file (on
both btrfs file system and an NTFS partition mounted in Linux with the ntfs-3g
FUSE file system), I noticed that its contents aren't indexed, and `balooshow
-x <file>` shows
  XAttr Terms: 
  Plain Text Terms: 
  Property Terms: Mapplication Mx Mzerosize
i.e. no words in the file were indexed, and note the Mzerosize. The latter
comes from baloo/src/file/basicindexingjob.cpp when filePathToStat() believes
statBuf.st_size == 0. But the file is definitely non-zero length and baloo
should have indexed its words.

These two cases may be unrelated, but in both it seems that baloo sometimes
indexes a file when its contents aren't fully present. And in the second case,
it seems `balooctl index <file>` fails to fix the problem and index the file
contents; you have to `balooctl clear <file>` first.

STEPS TO REPRODUCE
0. Run `balooctl monitor` in a second terminal window
1. Somewhere in a GUI, enter a unique word like "flamablama", and copy it.
2. I used the Wayland command-line utility wl-paste in the terminal command
`wl-paste > /path/to/file.txt` 
3. Run the terminal command `balooshow -x /path/to/file.txt`
4. Run the terminal command `baloosearch flamablama`
5. Run the terminal command `balooct index /path/to/file.txt`
6. Repeat steps 3 and 4.
7. Repeat the steps but instead create the text file in a text editor like vim.

OBSERVED RESULT
Sometimes, balooctl monitor shows
  Indexing new files
  Idle
without displaying "Indexing: /path/to/file.txt: Ok", and balooshow output
includes
  Internal Info
  File Name Terms: Ffile Ftxt 
  XAttr Terms: 
  Plain Text Terms: 
  Property Terms: Mapplication Mx Mzerosize
, and baloosearch does not return the new file.

When indexing fails in this way, the manual forced indexing of the file 
`balooctl index /path/to/file.txt` step prints
  Indexing /path/to/file.txt
  File(s) indexed
but the file's contents remain unknown to baloo. The second `balooshow -x
/path/to/file.txt` prints slightly different metadata:
  Plain Text Terms: 
  Property Terms: Mplain Mtext T5 T8
The meta attribute Mzerosize is gone and baloo detected the mime type correctly
(now text/plain, not x/application), but baloo still did not index the file
contents.

When I create new files in the `vim` file editor, baloo seems to reliably index
their contents.

EXPECTED RESULT
Baloo should reliably index new files.
`balooctl index` should actually index a file's current contents, even if you
don't clear it from the index first.

SOFTWARE/OS VERSIONS
Linux/KDE Plasma:
KDE Plasma Version: 5.27.0
KDE Frameworks Version: 5.111.0
Qt Version: 5.15.11 on Wayland

ADDITIONAL INFORMATION
_IF_ you notice this, the fix is to run `balooctl clear /path/to/file.txt` then
`balooctl index /path/to/file.txt`.
I turned on kf.baloo and kf.filemetadata debug output and did not see anything
useful in `journalctl` output.

I don't know if this is a file system issue; maybe Qt's filePathToStat() is
caching file info.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to