Solr query for same keyword returning different results? -
I am using the text_general type to search in the Solr index with the configuration below.
& lt; FieldType name = "text_general" class = "solr.TextField" statusIncantgap = "100" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.SnowballPorterFilterFactory" /> & Lt; Filter class = "org.apache.solr.analysis.WordDelimiterFilterFactory" generWordParts = "1" generNumberParts = "1" catenateWords = "1" catenateNumbers = "1" catenateAll = "1" splitOnCaseChange = "1" splitOnNumerics = "1" protected Basic = "1" stem EnglishPositive = "1" /> & Lt; Ignore filter class = "solr.StopFilterFactory" = "true" word = "stopwords.txt" /> & Lt; -! In this example, we will only use synonyms from query time at & lt; Filter class = "solr.SynonymFilterFactory" Synonyms = "index_synonyms.txt" ignoreCase = "true" expansion = "false" /> - & gt; & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; / Analyzer & gt; & Lt; Analyzer type = "query" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.SnowballPorterFilterFactory" /> & Lt; Ignore filter class = "solr.StopFilterFactory" = "true" word = "stopwords.txt" /> & Lt; Ignore filter class = "solr.SynonymFilterFactory" synonyms = "synonyms.txt" = "true" detailed = "true" /> & lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt;
I have indexed a lot of content and searched with keywords: Please, please and please.
Please give the keyword query a very small result.
q =% 22PLEASE% 22 & amp; Q.op = or & amp; DF = Text & amp; Qt =% 2Fselect & amp; Type = CONTENT_NAME + desc & amp; Fq = content_source% 3asharepoint & amp; AuthenticatedUserName = Fine
but please & amp; Please give large resultset
q =% 22please% 22 &. Q.op = or & amp; DF = Text & amp; Qt =% 2Fselect & amp; Type = CONTENT_NAME + desc & amp; Fq = content_source% 3asharepoint & amp; AuthenticatedUserName = Fine
q =% 22Please% 22 & amp; Q.op = or & amp; DF = Text & amp; Qt =% 2Fselect & amp; Type = CONTENT_NAME + desc & amp; Fq = content_source% 3asharepoint & amp; AuthenticatedUserName = Fine
Even when I am using WordDelimiterFilterFactory, please consider it, please & amp; Please as a single keyword?
Any ideas.
You have a fundamental conflict using your tokenizer and filter SnowBallPorterFilterFactory to work correctly. Input Required:
Public Ultimate Class PorterStemFilter TokenFilter
According to the porter-generated algorithm, the changed token section is expanded. Note: In order to filter the generated, the input must already be in lower case, so that you will need to work in order to use the LowerCaseFilter or LowerCaseTokenizer further in order to work properly under the Tokenizer series
< / Blockquote>!>
This will motivate you to run your LowerSafefilterhere before running the stream in SnowBallPorterFilterFactory.
You are also using WordDelimiterFilterFactory after generating -. Which means that new words will not be precipitated after running through WordDelimiterFilterFactory.
Fixing is not as simple as putting it in frontCarCaseFilterFactory, because when the Snowball PorterFilterFactory will fix the issue, then He will interfere with the Word Delimiter Filter Factory, in which case will create new words on the change.
I suggest trying the following order:
StandardTokenizerFactory
WordDelimiterFilterFactory
LowerCaseFilterFactory
SynonymFilterFactory
StopFilterFactory
SnowballPorterFilterFactory
When you start to use as many filters as this is difficult to get a correct sequence as this but I believe That it will solve your current issues. As always, I suggest running multiple tests with my usual words from my document set to see how well it matches your desired output.
Comments
Post a Comment