Python Regex catch multi caps words and adjacent words -
I have a regex that does the following:
- Find a word that contains two or two More adjacent capital letters AE ("word of multi caps");
- When possible, extend the match to another multi-caps word to the left and right sides, as long as more than three non-multi-million words are between each multi-caps word; And
- Increase the match to the left and right to include the sequence of 5 and 3, respectively, non-multi caps term.
My regex catches the desired pattern but returns to many overlapping matches when there are adjacent multi-cap words, like AA BBD D below. Please help me to work your regedx as desired.
This is my draft code:
str1 = 'zzzz z11a bb dd ffdd gd df sdf ggf we aa ff dff' re.findall (r? (? =? (\: [^: [^ \:] [[\ S] +) {5} (?: [^ AZ \ s] * [as] [as] + (?: [^ \ S] [\ s] } {1,3}) * [^ AZ] * [AZ] [AZ] * (?.? [\ S] [^ \ S] +) {3} \ s)) ', str1)
Actual output:
Mail 1 - 'zzzz z11a bb dd f' match 2 - 'z j11aa bb ddff' match 3 - 'jade Jade 11A Bb DD FFD Match 4 - 'GD DF SDF GGF AA FF DF'
desired output:
< Pre> mail 1 - 'zzzz z11a bb ddffd' match 2 - 'gd df sdf ggf we aa ff df'
Try this:
& gt; & Gt; & Gt; Pattern = r '(?: [Az \ d] + \ s *) {0,5} (?: [AZ] +) (?: \ S * [AZ] +) * (?: \ S * [Edge ] +) {0}} 'gt; & Gt; & Gt; ('Z z z z z11 AA BB DD FFD', 'GD DF SDF GGF We AA FF DF']
Comments
Post a Comment