Monday, February 1, 2010

6 of the best desktop search tools for Linux

Tools such as grep, find and awk have often come to the rescue of gleeful Bashmongers searching for files buried beneath gigabytes of other items.

But when a typical Linux distro takes up a couple of gigs of disk space, it's not hard to imagine that finding your files will only become trickier over time.

Compared with their internet brethren, today's desktop search tools can be used not only to look for the names of files on your disk, but can also perform context-sensitive searches within email archives, images, videos and music.

Some tools take it a bit further and even index your browser history and bookmarks.

Desktop search tools work by creating an index of all the files on your system. When you search for a file, instead of looking through the entire disk, the tools only search the index.

Since the reason for the existence of desktop search tools is to enable you to find files faster and more conveniently than conventional tools such as find and grep, they need to be fast and reliable, and offer as much information as possible to help you identify items.

Most tools can therefore read the metadata on files, provide you with an excerpt from text documents, show you the resolution of images (along with a thumbnail) and provide other details for each file.

Stemming – which means that, for example, if you search for 'beat' the tool will also match 'Beatles', 'deadbeat' and so on – is standard.

Convenient graphical interfaces are a common feature of most desktop search tools, but many also come with a suite of command line applications to help you index and search your system for files.

How we tested...
Installation hiccups and large memory footprints are no concern for the desktop search tools in our list, with 512MB RAM sufficing for most of them.

Since the size of the index grows with the number of files, you want a tool that can find items quickly and accurately.

So, how vague can you be and still end up with the file you want? And, on the other hand, how much information can you fill in to be as specific as possible? Can you use wildcards?

Bonus points go to any tool that keeps the size of the index small and offers numerous search options, such as limiting the search to certain MIME types.

Features such as stemming, reading metadata, reporting on exotic formats and searching text within files make for a great tool.


It's still lumbered with the memory-hog tag, but Beagle's becoming a classic
Having been around since the glory days of Richard Burton, discussions about Beagle's memory-hogging habit should really have come to an end by now.

If anything, however, it's become a prime example of an application that just can't drop an old tag, however inaccurate.

Not surprisingly then, a large number of potential new users are scared away from Beagle because of the numerous forum and blog posts trashing it for its insatiable appetite for memory.

The latest version, 0.3.9, is available in the repositories of just about all distros.

Beagle boolean

Controlling what directories to index and what paths to ignore has become standard in most desktop search tools, and Beagle doesn't disappoint.

Unlike most other tools, however, it also enables you to index your emails, IM, RSS readers, address book and more, in addition to the browsing history and bookmarks from your browser.

And, with its built-in Inotify support, Beagle updates the index as soon as it detects any changes to files or directories.

By default, it indexes everything in your home directory, excluding the default *~, ~.tmp and other such paths.

To change this behaviour, start Beagle, listed as Search under the Application > Accessories menu, and click Search > Preferences.

From the Indexing tab of the Search Preferences window you can specify the directories to index, as well as the paths to exclude.

In addition to the graphical interface, Beagle has an extensive suite of command-line tools that you can use to create an index and search files.

The command beagle-search .txt launches the graphical interface and shows you search results for .txt. The alternative is to use beagle-query, which prints out the search results on the terminal itself.


Browser interface
Beagle is equally at home on Gnome and KDE, but if you prefer a neutral environment, it can easily be arranged.

To enable the web interface, open a terminal window and type the following:


/beagle-config Networking


WebInterface true
You can access the experimental web interface at http://localhost:4000. This is supposed to be accessible from other machines on the network too, but hey, it's still experimental.

When using these desktop search tools, remember that few of them can differentiate between filenames and file types, so searching for "mp3" and ".mp3" gives you very different results.

Beagle can extract text and metadata from a host of filetypes including Office documents, plain text, HTML, DocBook, various image and audio formats, and more.

When looking for files, you can refine your search to one of the 14 available categories, such as Pictures, Media, Files, Archives, Mails and so on.

Select a type from the Find In drop‑down list to narrow the criteria. Beagle displays only eight items per page, so it's best to refine the search as much as possible, or you'll be clicking away at the tiny blue navigation arrow until your mouse gives up.

Beagle index

If you don't specify a category when searching, Beagle will still break up the results into different categories, such as Images, Documents or Folders.

You can then scroll through several pages of each of these if there are many results.

Beagle can also look for search terms within files. If your search returns files that contain the query term, clicking on the file will reveal a partial sentence or matching text within that file.

This isn't true for PDFs, for which Beagle only gives you a thumbnail.

The interface isn't very aesthetically pleasing, but the real beauty of Beagle lies in its complex search options.

You can, for instance, prefix terms with a minus sign to exclude them from your search, or use the OR operator to define your query, or the date operator to limit the search within a date range.


Verdict Version: 0.3.9 
Website: http://beagle-project.org
Price: Free under GPL
Surviving despite the bad press, a little interface redesign will spell victory for Beagle.
Rating: 9/10


Familiarity breeds confidence. But can it deliver?
Release cycles for most distros being what they are, it seems aeons ago that Google Desktop was available via the online software repositories.

Nowadays, all you've got to do is head to the project's homepage and download the latest release. There's even a 64-bit version.

Additional points go to Google Desktop for providing RPM and Deb binaries. Once installed, Google Desktop sits pretty in the system tray and begins indexing immediately.

While indexing the disk doesn't tax any of the system resources, it does take its own sweet time.

Configure it to your liking from the get-go and define what directories to index and what types of files to ignore.

Google Desktop can also keep tabs on your browsing history and all of your email accounts, thanks to the incredible Thunderbird support.

You can configure it to index your Gmail account too, even if you only access it via the browser.

Google advanced

The convenient search applet can be accessed by pressing the Ctrl key twice – and you can put it somewhere on your desktop for quick access.

As you start typing into the search applet, matching results will be displayed there. You can then use the arrow keys to move down to the file you were looking for and hit Enter to open it.

You can also click on See All Results In A Browser to open a new Firefox tab, if it's running. If not, Google Desktop launches Firefox for you.

Google offers many diverse services, such as Groups, Maps and News, and you can configure the applet to search for the specified term in any one of them.

Right-click the system tray button and choose the Default Search Type from Web, Desktop, News, Groups, Images and even I'm Feeling Lucky.

When you search your desktop, the results are displayed in the browser, much like a traditional Google search.

A small icon to the left of each result reflects the type of file, so you can never confuse your MP3s with your emails.

For text and PDF files, it'll also show you a small excerpt under the results. Clicking on the files opens them in the relevant associated application.

Emails are opened in the browser itself, but you do get the Reply With Gmail and the Read In Gmail options when applicable.


Feeling lucky
To perform an advanced search, rightclick the system tray button and click Show Home Page. Next, click the Advanced Search link.

You can now limit your search to certain file types. For instance, if you only wish to search for ODT files, click the Files radio button and select OpenOffice.org Writer from the File Type drop-down list.

Despite the clever design, Advanced Search has one problem: you can't use it to see a list of all files of a particular type.

Despite all the categories and file types to choose from, you must always specify a search term. So, if you want a list of all the PNG or MP3 files on your machine, just search for ".mp3" without going into the Advanced section.

There's one Google Desktop feature that's not offered by the competition: Google calls it file versioning, and we call it a brilliant idea. Each time you edit a file, Google Desktop creates and stores a cached copy that you can access later if needed.

Google applet

Click the Cached link at the bottom-right of the result item whose copies you wish to review. All cached items are displayed, with the most recent at the top.

You can configure Google Desktop to no longer index deleted files, but if you do that, you'll also lose the ability to rescue files you've deleted accidentally.

Since Google Desktop is also available on Windows and Mac, it's natural to compare the features available on each platform.

While the Linux variant works flawlessly, has a slick interface and can index your emails as well as browsing history, we do feel cheated to find that the bling of the proprietary versions is missing from the Linux version, even two years after Google Desktop 1.0 was released.


Verdict Version: 1.2.0 
Website: http://desktop.google.com/linux/index.html
Price: Free
Fast and reliable with impressive features, but bare-bones compared with the Windows version.
Rating: 8/10



Those Gnome fellas have everything
Here's another desktop search tool with a fetish for your system tray, but only if you run Gnome. KDE users can feel free to hunt for it in the beloved K menu.

Tracker, like most others in this test, is still to reach the pivotal 1.0 release and yet it too is robust, elegant and effective.

Available from the software repositories of most distros, Tracker is a bit more fussy about getting started.

How else would you define a file indexer where indexing is disabled by default?

Use the tracker-preferences command to launch the Preferences window and enable indexing.

Unlike the other tools, you can configure Tracker to index certain directories, but not actively watch them.

That is, if you change the files within a directory that Tracker isn't watching, the changes won't reflect in the index.

The Preferences window comprises several tabs, each of which deals with different aspects of Tracker.

You can specify the paths and patterns that you want Tracker to ignore from the Ignored Files tab and enable Evolution email indexing from the Email tab.

Tracker

Future versions will support indexing browser bookmarks, history, notes, tasks etc. You can't change the default Tracker behaviour of displaying 10 results per page, so keep an eye out for the Next and Previous buttons to scroll through pages.

When you click on a result, Tracker shows you details about the file, such as its dimensions if it's an image. Tracker also lets you add tags to each of the indexed files.

You can use the same tags for different files and thus create a collection of grouped content that can be accessed easily.

Unfortunately, this isn't very reliable because Tracker sometimes fails to display all files assigned the same tag.

Verdict Version: 0.6.93 
Website: projects.gnome.org/tracker
Price: Free under GPL
A definite podium contender if it can improve its launch time and fix the broken tagging feature.
Rating: 7/10


There's nothing to see here, folks
The fastest and smallest desktop search tool (as the Strigi developers claim) disappointed us, never once returning the correct search results for a query. Here's hoping that your mileage varies…

It can be installed easily enough from the software repositories of most distributions. Designed to be a graphical replacement for grep and find, its many shortcomings make Strigi our least favourite tool on this list.

You can launch Strigi using the strigiclient command or the Alt+F2 quick launcher. By default, it doesn't create an entry in any of the menus.

It begins indexing the directories listed as soon as you click the Start Indexing button.

Strigi

Depending on the number of files involved, the size of the index can be very large, so you might have to keep a careful watch on that number.

When you're ready to use Strigi, begin typing into the text bar at the bottom of the window.

Matching results are displayed pretty much for each keystroke, so it does deserve the title of being the fastest search tool. But while Strigi claims to be able to search inside archives, all tests point to the contrary.

The biggest problem with Strigi is that it'll only show you the first 10 results for your query, so even if it finds 55 matching items, you can't browse this list.

Being forced to refine your search so the file you want is listed in the first 10 matched items defies the entire premise of desktop search tools.

Even command line noobs will probably have more luck using find and grep.

For a tool that racks up a 200MB index in half an hour without breaking a sweat, the fact that it can't find the file you're looking for is beyond baffling.


Verdict Version: 0.6.3
Website: http://strigi.sourceforge.net
Price: Free under GPL
Not even usable enough to test the claims made on its website. Never shows any file you're looking for.
Rating: 1/10


It's fast and reliable, but Recoll isn't without a few shortcomings
With the might of the Xapian search engine at its command, the lightweight Recoll might just be the tool for you, if you can get it up and running.

Almost no distro carries Recoll in its software repositories, and the tool's dependency list might put off a few users.

To begin, you need xapian-core, plus Qmake and Qt. Fortunately, these are readily available from the software repositories.

This would translate to a 10-place penalty on the starting grid were it not for the packaged binaries for Ubuntu, Fedora, Mandriva and other distros.

Recoll starts indexing as soon as you click File > Update. It stores the index in the ~/.recoll/xapiandb/ directory. By default, it'll begin indexing from your home directory, including any mounted partitions or SMB shares.

Recoll layout

Distros that use gvfs, the replacement tool for Gnome Virtual File System, mount shares under the ~/.gvfs/ directory, so when indexing begins at the user's home directory, it also engulfs the mounted shares.

You can configure Recoll to avoid certain paths and directories in the Prefs window.

Along with the search bar at the top of the interface, you can use the All Terms drop‑down list to select one of Any Term, All Terms, File Name and Query Language.

You can then limit your search to text files, or any other MIME type, by clicking the relevant radio button from under the search bar. For example, when you're looking for emails, select Messages.

You can use the *, ? and square bracket wildcards when searching for files in the index.

This, along with the auto-complete feature (accessed by pressing Esc+Space) gives Recoll a slight edge over the other tools. For example, typing "pyt" and pressing Esc+Space displays a list of possible terms such as Python, pytype etc.


Go fetch
Depending on your search term, and how many results turn up, you might have to browse through several pages to find what you're looking for – simply click the Next Page link at the top-right of the results panel.

When displaying the results, Recoll prints a small excerpt with each, but this might not be enough to decide if the result is what you're looking for.

Helpfully, you can click the Preview link next to the entries in the results list to read the contents of the file in Recoll's internal document viewer.

To help you refine your search, Recoll enables you to define keywords and then filter them by All Of These, None Of These, Any Of These and other such clauses from the Advanced Search dialog.

You can also limit your search to specific MIME types, such as PDF or spreadsheets.

Finally, if you know the general location of the file, you can contain the search to defined subtrees. While most other tools keep a constant eye on your disk and keep the index abreast of any changes, Recoll by default only creates a static index.

This means you must manually update the index by clicking File > Update if you wish to find up-to-date search results.

Recoll preview

You can, however, use Cron to configure periodic indexing. This is both a curse and an advantage: system resources aren't constantly manhandled in an attempt to keep the index up to date, but it requires users to take an extra step compared with the other tools.

But there's a ray of hope, at least for those who want to compile Recoll themselves. File Alteration Monitor (Fam) and Inotify are two tools that monitor the filesystem for any changes.

When compiling Recoll, you can enable support for either of these with the --with-fam or --with-inotify options.

Recoll isn't built to index all file types. To index PDF, MP3, RTF, MS Office and a few other exotic formats properly, you need to install additional packages, such as Antiword (for MS Word), and Catdoc (for MS Excel and PowerPoint).

Without these tools, only filenames will be indexed and Recoll won't offer an abstract or the Preview function.


Verdict Version: 1.12.1 
Website: lesbonscomptes.com/recoll
Price: Free under GPL
Faces tough competition from Beagle. The option to create static indexes is a real advantage.
Rating: 9/10


Winner: Recoll - 9/10
The most startling aspect of desktop search tools is that many are yet to reach the big version 1.0. Despite that, each program grows more impressive with every release. While the majority of their features are the same across the board, almost every application has something unique to offer.

As desktop search tools eat up enough disk space to shame the great mountain apes, you need to carefully choose the one that you want to implement on your system.

While most of these tools have celebrated a few birthdays already, desktop search as a whole (and these tools individually) doesn't have a large user base.

You'll still find more people discussing grep than you will Recoll. Quite possibly, this stems from a misconception that desktop search tools are resource-hungry. Nothing could be further from the truth.


Top three
Although we weren't exactly spoilt for choice, it wasn't easy deciding the finishing order because the three podium contenders, Google Desktop, Beagle and Recoll, made for some stiff competition.

Here's hoping that this also drives innovation.

But first, let's consider the ones that didn't make it to the top three. The worst thing about Strigi is its lack of page navigation.

You can't use a tool if it won't let you browse through the pages of search results.

Tracker, however, has all the makings of a top contender. The ability to specify which directories to watch and index translates to a smaller index size, which is why it gets 7/10 despite the broken tagging feature and slow launch time.

Coming in at number three is Google Desktop. This tool gets bonus points for using the browser to display results, but the advanced search section could still use a little refinement.

The auto-suggestion feature in the search applet is helpful, although the fact that it's limited to showing only the top six results in the applet by default is a definite design flaw in our book.

Meanwhile, the unfairly maligned Beagle comes in at number two. One of this tool's strong points is that it keeps the size of the index small, especially compared with Strigi and Google Desktop.

With a network-accessible browser interface in the offing, Beagle's future as one of the best is sealed.

With strong competition from Beagle, especially in the number of search options, Recoll just manages to cling on to the top spot. Forcing the user to index the system manually is actually a clever design.

Recoll layout 2

For anyone who's convinced that desktop search tools eat up too many resources, this offers the opportunity to index the system at a convenient time.

It should therefore help Recoll to attract users who'd normally distance themselves from desktop search tools.

Defining the keywords to look for and avoid for each search helps Recoll accurately locate the correct file each time. Other tools should also consider adopting this feature.

No comments:

Post a Comment