Python Regex catch multi caps words and adjacent words -


I have a regex that does the following:

  1. Find a word that contains two or two More adjacent capital letters AE ("word of multi caps");
  2. When possible, extend the match to another multi-caps word to the left and right sides, as long as more than three non-multi-million words are between each multi-caps word; And
  3. Increase the match to the left and right to include the sequence of 5 and 3, respectively, non-multi caps term.

My regex catches the desired pattern but returns to many overlapping matches when there are adjacent multi-cap words, like AA BBD D below. Please help me to work your regedx as desired.

This is my draft code:

  str1 = 'zzzz z11a bb dd ffdd gd df sdf ggf we aa ff dff' re.findall (r? (? =? (\: [^: [^ \:] [[\ S] +) {5} (?: [^ AZ \ s] * [as] [as] + (?: [^ \ S] [\ s] } {1,3}) * [^ AZ] * [AZ] [AZ] * (?.? [\ S] [^ \ S] +) {3} \ s)) ', str1)  

Actual output:

  Mail 1 - 'zzzz z11a bb dd f' match 2 - 'z j11aa bb ddff' match 3 - 'jade Jade 11A Bb DD FFD Match 4 - 'GD DF SDF GGF AA FF DF'  

desired output:

< Pre> mail 1 - 'zzzz z11a bb ddffd' match 2 - 'gd df sdf ggf we aa ff df'

Try this:

  & gt; & Gt; & Gt; Pattern = r '(?: [Az \ d] + \ s *) {0,5} (?: [AZ] +) (?: \ S * [AZ] +) * (?: \ S * [Edge ] +) {0}} 'gt; & Gt; & Gt; ('Z z z z z11 AA BB DD FFD', 'GD DF SDF GGF We AA FF DF']  

Comments

Popular posts from this blog

java - org.apache.http.ProtocolException: Target host is not specified -

java - Gradle dependencies: compile project by relative path -

ruby on rails - Object doesn't support #inspect when used with .include -