Pandas Series str function extractall() in Python
The Pandas Series str function extractall() used to extract all matches of capture groups in the regex pattern in a DataFrame. It returns either DataFrame, Series or Index any capture group names in regular expression pat will be used for column names, otherwise capture group numbers.
Note : In addition to extract() function extractall() function adds index and occurrence and it does not includes non matching elements in the resulting object.
Syntax
 1 Series.str.extract(pat, flags = 0, expand = True)
pat : It is string that is regular expression pattern with capturing groups which matches with each elements.
flags : It is an integer value that specifies the flags from the re module. ie. re.IGNORECASE, If specified it modify regular expression matches like case, space etc.
extract() function
 1 import pandas as pd
 2 
 3 idx = ['X', 'Y', 'Z']
 4 ser = pd.Series(['A1A3', 'C2D', 'E5F'], index = idx)
 5 print('The original Series object :')
 6 print(ser)
 7 
 8 res = ser.str.extractall(r'([AE])(\d)')
 9 print('The resulting Series object :')
 10 print(res)
In the above example, a Series object is created by passing an array of string. A extractall() function is called by passing a regular expression with two groups. It include all matches in each elements with specified regular expression and returns DataFrame object that includes multiple groups specified in regex. The result is assigned to variable that will be printed on console.
Output
 1 The original Series object :
 2 X    A1A3
 3 Y     C2D
 4 Z     E5F
 5 dtype: object
 6 The resulting Series object :
 7          0  1
 8   match
 9 X 0      A  1 # first element has multiple matches
 10   1      A  3
 11 Z 0      E  5
In the above output, first column represent the first regex group and second column represent second group. A none matching element value will be NaN.

extract() with optional group

extract() with optional group
 1 import pandas as pd
 2 
 3 ser = pd.Series(['A1A3', 'C2D', 'E5F'])
 4 print('The original Series object :')
 5 print(ser)
 6 
 7 res = ser.str.extractall(r'([AE])?(\d)')
 8 print('The resulting Series object :')
 9 print(res)
In the above example, a extract() function is called by passing a regular expression that includes optional group by adding ? mark before group. It returns all the matches of an element.
Output
 1 The original Series object :
 2 0    A1A3
 3 1     C2D
 4 2     E5F
 5 dtype: object
 6 The resulting Series object :
 7            0  1
 8   match
 9 0 0        A  1
 10   1        A  3
 11 1 0      NaN  2
 12 2 0        E  5
In the above output, it include value for the optional group rather adding NaN value.

extract() with named group

extract() with named group
 1 import pandas as pd
 2 
 3 ser = pd.Series(['A1A3', 'C2D', 'E5F'])
 4 print('The original Series object :')
 5 print(ser)
 6 
 7 res = ser.str.extractall(r'(?P<Char>[AE])(?P<digit>\d)')
 8 print('The resulting Series object :')
 9 print(res)
In the above example, a extract() function is called by passing a named group regular expression. A group name will be the column names in the resulting DataFrame.
Output
 1 The original Series object :
 2 0    A1A3
 3 1     C2D
 4 2     E5F
 5 dtype: object
 6 The resulting Series object :
 7         Char digit
 8   match
 9 0 0        A     1
 10   1        A     3
 11 2 0        E     5
Privacy Policy
Terms of Service
Disclaimer
Contact us
About us