Re: Nepomuk in 4.13 and beyond

Vishesh Handa Thu, 19 Dec 2013 05:50:14 -0800

On Wednesday 18 Dec 2013 12:09:02 François K. wrote:
> Hi Vishesh, hi guys,
> 
> I'm sorry to short-circuit the thread. I deleted Vishesh's original email by
> mistake...
> 
> Well, that sounds really exciting ! Thanks again for your work.
> 
> Here are a few thoughts/questions I have since you've made the announcement.
> They might be a bit technical, I hope that's not a problem (should I start
> a new thread ?).
> 
> * What are the plans to store tags ? On OSX, tags are stored in files xattrs
> which is -IMHO- very nice : - Metadata live and die with the file ;
>   - No "store" query when you move or copy a file ;
>   - You don't rely on a "store" to tag files ;
>   - You also don't end with a huge store full of unuseful things like it
> used to happen with Nepomuk some time ago (no offense) ; - You can easily
> backup the metadata (at least files metadata) : you just have to use a
> decent backup tool that handles xattrs ; - It's CLI-friendly ;
>   - ...


+1

I'm leaning towards this as well.

> 
> * What are the plans to store indexes ? Again, with OSX (sorry, I work a lot
> with Macs -maybe too much-), the system builds an index per volume. This is
> quite nice because when you connect a volume that has already been indexed,
> the system gets the information and can immediatly search the volume index.
> Let's take an example : let's say you have some remote storage (NAS or
> whatever) at home with your medias. You mount this remote volume and let
> the indexers do their stuff. Then you mount the volume from another device
> and *tadaaa*, you're able to query the previously-built index. Wouldn't
> that be awesome ? If you disconnect the volume, the index for this volume
> isn't available anymore and you don't get results for it. This also means
> that if one index gets corrupted, you don't have to scan and index every
> volume again. I think this would also solve Ignacio's issue.
>

This is exactly what I'm aiming for. We're currently using Xapian to store the 
indexes. Its engine allows multiple databases to be queried easily.
 
> * You probably already know it, but SQLite DB might have some problems when
> stored on remote filesystems (see: http://www.sqlite.org/wal.html and
> especially "All processes using a database must be on the same host
> computer; WAL does not work over a network filesystem."). So if you plan to
> store each index on its volume (as previously suggested), SQLite might not
> be the (best) solution.
> 

Nah.

The sqlite is used to map file urls to unique identifiers. We need unique 
identifiers for files since the url can change on rename/move.

This unique identifier (an unsigned integer) is then used in xapian to uniquely 
identify the file.

> * Will there be several separated indexers (one for PDF files, one for video
> files, ...) or just one that takes care of everything ? I was thinking
> about the ability to add indexers that could retrieve stuff from the
> Internet. For example, have an indexer that could retrieve movie
> information from TheMovieDB.org.
> 

There are separate indexers for each file format, as was the case with Nepomuk. 
Please have a look at kfilemetadata [1].

For web extractors, I still haven't figured out how we would approach that. 
Another Nepomuk developer, Jorg, has similar ideas. Maybe we should start a 
thread about it and discuss it?

> * I hope there will be a nice query API ? Dealing with Sparql was a
> nightmare for me !
> 

There is one right now. Perhaps you could take a look and give some feedback?

> * Will it come with a QML DataEngine ?
> 

Can't say. It will have QML Bindings, but I'm not sure about a DataEngine. 
Lets see.

-- 
Vishesh Handa

[1] https://projects.kde.org/projects/playground/base/kfilemetadata


>> Visit http://mail.kde.org/mailman/listinfo/kde-devel#unsub to unsubscribe <<

Re: Nepomuk in 4.13 and beyond

Reply via email to