A blog about software development and other software related matters

Blog Archive

Monday, April 28, 2008

Quick and dirty code search

There are some situations in which i need to query some code folder quick and dirty on my hard drive.
There are existing tools like desktop search applications (such as Google desktop) but it usually takes some time for them to index new folders and they are also quite resources hungry.
Bash find utility is also an alternative how ever this tool works best only on Unix machines (NTFS & cygwin's find aren't the best pair for such a use case).

A nice quick and dirty solution it to use Ferret which is a Ruby text indexing framework inspired by Apache Lucence to quickly index the code & search it up, all by using two small snippets of Ruby code!
The code following code is based on this entry which in turn is based on this one, first is the indexing code (index.rb):

require 'rubygems'
require 'ferret'
require 'find'
include Ferret

index = Index::Index.new(:default_field => 'content', :path => '/tmp/index_folder')# creating the index
ini = Time.now
numFiles=0
IndexedExts=['.java','.properties']# the list of file extensions that we wish to index
Find.find('/code/to/index') do |path|

if(IndexedExts.find {|ext| path.include?(ext)}==nil)
next # this file is ignored
end
puts "Indexing: #{path}"
numFiles=numFiles+1

if FileTest.file? path
File.open(path) do |file|
index.add_document(:file => path, :content => file.readlines)
end
end
end

elapsed = Time.now - ini
puts "Files: #{numFiles}"
puts "Elapsed time: #{elapsed} secs\n"

This code is quite simple, first we are creating an index folder then we use the Find module and iterate all the files which are contained within the 'code/to/search' folder, the files which don't have a matching extension are rejected the rest have their content added to the index, next is the query code (search.rb):
require 'rubygems'
require 'ferret'
require 'find'

wot = ARGV[0]
if wot.nil?
puts "use: search.rb "
exit
end

index = Ferret::Index::Index.new(:default_field => 'content', :path => '/tmp/index_folder')
ini = Time.now
puts "Searching.."
docs=0

index.search_each(wot, options={:limit=>:all}) do |doc, score|

res= < -------------------------------------------------------
#{File.basename(index[doc]['file'])} :
#{index.highlight(wot, doc,:field => :content,:pre_tag => "->>",:post_tag => "<<-")}
STRING_END
puts res
docs+=1
end

elapsed = Time.now - ini
puts "Elapsed time: #{elapsed} secs\n"
puts "Documents found: #{docs}"

In this code we first load up the index with the default query field content, the query itself takes its value from the wot parameter, after the query execution the code block prints out the matching highlighting of the found match in the file.
That all about there is to it, see ya.

1 comment:

Palma said...

You write very well.