net.sf.zekr.engine.search
Class SearchUtils

java.lang.Object
  extended by net.sf.zekr.engine.search.SearchUtils
All Implemented Interfaces:
ArabicCharacters

public class SearchUtils
extends java.lang.Object
implements ArabicCharacters

This file contains several useful public static methods for finding occurrences of a source text in another text. Since the Arabic language has some diacritics, there is also functions to ignore or match diacritics.

Author:
Mohsen Saboorian

Field Summary
 
Fields inherited from interface net.sf.zekr.engine.search.ArabicCharacters
ALEF, ALEF_HAMZA_ABOVE, ALEF_HAMZA_BELOW, ALEF_MADDA, ALEF_MAKSURA, ALEF_WASLA, ARABIC_KAF, ARABIC_QUESION_MARK, ARABIC_YEH, BARREE_YEH, DAMMA, DAMMATAN, FARSI_KEHEH, FARSI_YEH, FATHA, FATHATAN, HAMZA, HAMZA_ABOVE, HAMZA_BELOW, KASRA, KASRATAN, MADDA, MADDAH_ABOVE, RUB_EL_HIZB, SAJDA_PLACE, SHADDA, SMALL_HIGH_MEEM, SMALL_LOW_SEEN, SMALL_ROUNDED_ZERO, SMALL_WAW, SMALL_YEH, SUKUN, SUPERSCRIPT_ALEF, SWASH_KEHEH, TATWEEL, TEH, TEH_MARBUTA, WAQF_HIGH_SEEN, WAQF_JEEM, WAQF_LA, WAQF_QALA, WAQF_SALA, WAQF_SMALL_MEEM, WAQF_THREE_DOT, WAW, WAW_HAMZA_ABOVE, YEH_HAMZA_ABOVE
 
Constructor Summary
SearchUtils()
           
 
Method Summary
static java.lang.String arabicSimplify(java.lang.String str)
          This method removes specific diacritics form the string.
static java.lang.String arabicSimplify4AdvancedSearch(java.lang.String str)
          This method removes specific diacritics form the string, and also replaces Hamza characters with their base character.
static Range indexOfIgnoreDiacritic(java.lang.String src, java.lang.String key, boolean matchCase, java.util.Locale locale)
          Will find a Range of the first occurrence of key in src.
static Range indexOfMatchDiacritic(java.lang.String src, java.lang.String key, boolean matchCase, java.util.Locale locale)
          Will find a range of the first occurrence of key in src.
static boolean isDiac(char ch)
          These characters are Arabic Harakets (diacritics): Sukun Shadda Fatha Kasra Damma Fathatan Kasratan Dammatan Superscript alef
static java.lang.String replaceLayoutSimilarCharacters(java.lang.String str)
          Replace Farsi unicode Yeh with Arabic one, and so about Kaf (Farsi Keheh).
static java.lang.String replaceSimilarArabic(java.lang.String str)
          Replace similar arabic characters which are used commonly instead of others.
static java.lang.String simplifyAdvancedSearchQuery(java.lang.String query)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SearchUtils

public SearchUtils()
Method Detail

replaceLayoutSimilarCharacters

public static java.lang.String replaceLayoutSimilarCharacters(java.lang.String str)
Replace Farsi unicode Yeh with Arabic one, and so about Kaf (Farsi Keheh).

Parameters:
str -
Returns:
updated String result

replaceSimilarArabic

public static java.lang.String replaceSimilarArabic(java.lang.String str)
Replace similar arabic characters which are used commonly instead of others. This is a helper method for easing the search. This method should be applied on Quran text.
Characters which are replaced are listed below:

arabicSimplify

public static java.lang.String arabicSimplify(java.lang.String str)
This method removes specific diacritics form the string. Also replaces incorrect characters (which are present due to keyboard layout problems) using replaceLayoutSimilarCharacters().
This method removes/replaces characters which are not be exactly matched (for example replacing ALEF_MADDA with ALEF).

Parameters:
str - the string to be simplified
Returns:
simplified form of the str

arabicSimplify4AdvancedSearch

public static java.lang.String arabicSimplify4AdvancedSearch(java.lang.String str)
This method removes specific diacritics form the string, and also replaces Hamza characters with their base character. It also replaces ARABIC_LETTER_TEH_MATBUTA with ARABIC_LETTER_TEH, and ARABIC_LETTER_ALEF_MAKSURA with ARABIC_LETTER_YEH.

Parameters:
str - string to be simplified
Returns:
simplified form of the str

simplifyAdvancedSearchQuery

public static java.lang.String simplifyAdvancedSearchQuery(java.lang.String query)

isDiac

public static boolean isDiac(char ch)
These characters are Arabic Harakets (diacritics):

Parameters:
ch - the character to be examined
Returns:
true if ch is an Arabic Harakat, otherwise false

indexOfIgnoreDiacritic

public static Range indexOfIgnoreDiacritic(java.lang.String src,
                                           java.lang.String key,
                                           boolean matchCase,
                                           java.util.Locale locale)
Will find a Range of the first occurrence of key in src. This method will ignore diacritics on both src and key strings.
This is a generic method, meaning it can be used to search on Quran text as well as translations.

Parameters:
src - source string to be searched on
key - non-null target string to be found the first occurrence of which on the src string
matchCase - specifies whether to search in a case sensitive manner or not
locale - the text locale (for casing conversion)
Returns:
a Range object from the previous space character just before the key (or start of the source string if no space found) to the first space just after the key in src (or end of src if no space found)

indexOfMatchDiacritic

public static Range indexOfMatchDiacritic(java.lang.String src,
                                          java.lang.String key,
                                          boolean matchCase,
                                          java.util.Locale locale)
Will find a range of the first occurrence of key in src. This method will consider diacritics on both src and key.

Parameters:
src - source string to be searched on
key - target string which is to search first occurrence of which on src
matchCase - specifies whether to search in a case sensitive manner or not
locale - the text locale (for casing conversion)
Returns:
a Range object from the previous space character just before the key (or start of the source string if no space found) to the first space just after the key in src (or end of src if no space found)