import re
target_str = 'I live at 9-162 Malibeu. My phone number is +351911199911. I have 5.50 dollars with me, but I have a net income of -1.01 per day which is about -1 dollar a day with an error of +-.01. Also the earth has a mass of 5.972e24 kg or about 6e24 kg.'
regex_expressions = {
'p_ints' : "\d+",
'pn_ints' : "[-+]?\d+",
'p_floats' : "\d*\.\d+",
'pn_floats' : "[-+]?\d*\.\d+",
'scientific_notation':"[-+]?\d+(?:\.\d+)?e[-+]?\d+",
'pn_floats_or_ints' : "(?:[-+]?)(?:\d*\.\d+|\d+)",
'universal': "(?:[-+]?)(?:\d+(?:\.\d+)?e[-+]?\d+|\d*\.\d+|\d+)"
}
regex_results = dict()
for target_type, regex_expression in zip (regex_expressions.keys(), regex_expressions.values()):
regex_results[target_type] = re.findall(regex_expression, target_str)
print(target_type,':',regex_results[target_type])
print ('\nThese results are still strings, but can easily be turned into floats or ints:')
for number in regex_results['universal']:
print(float(number))
"""
Used RegEx symbols:
[] : look for any character inside the brackets
\d : look for any digit
\. : look for a dot (.)
+ : look for one or more occurences of the previous expression
* : look for zero or more occurences of the previous expression
? : look for zero or one occurences of the previous expression
(?:...) : create a non-capturing group
| : look for either of the previous expressions (OR operator)
Short explanation of each regex:
-> positive integers: \d+
look for one or more digits
-> positive or negative integers: [-+]?\d+
look for one or more digits, potentially preceded by a '-' or a '+'
-> positive floats: \d*\.\d+
look for zero or more digits, followed by a dot, followed by one or more digits (a lazy representation such as '.3' works in this case). Scientific notation is not allowed.
-> positive or negative floats: [-+]?\d*\.\d+]
look for zero or more digits, followed by a dot, followed by one or more digits, potentially preceded by a '-' or a '+'
-> scientific notation: [-+]?\d+(?:\.\d+)?e[-+]?\d+
look for any '+' or '-' signs, if they exist. Look for one or more digits, potentially followed by a dot and decimal part. Look for an 'e', followed by one or more digits
-> any number not in scientific notation: (?:[-+]?)(?:\d*\.\d+|\d+)
look for any '+' or '-' signs, if they exist. Look for zero or more digits, followed by a dot, followed by one or more digits (float) OR look for one or more digits (integer).
-> any number: (?:[-+]?)(?:\d*\.\d+|\d+|\d?e[-+]?\d?)
basically look for '+' or '-' and then do an OR between the previous expressions using non capturing groups.
"""
"""
OUTPUT:
p_ints : ['9', '162', '351911199911', '5', '50', '1', '01', '1', '01', '5', '972', '24', '6', '24']
pn_ints : ['9', '-162', '+351911199911', '5', '50', '-1', '01', '-1', '01', '5', '972', '24', '6', '24']
p_floats : ['5.50', '1.01', '.01', '5.972']
pn_floats : ['5.50', '-1.01', '-.01', '5.972']
scientific_notation : ['5.972e24', '6e24']
pn_floats_or_ints : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972', '24', '6', '24']
universal : ['9', '-162', '+351911199911', '5.50', '-1.01', '-1', '-.01', '5.972e24', '6e24']
These results are still strings, but can easily be turned into floats or ints:
9.0
-162.0
351911199911.0
5.5
-1.01
-1.0
-0.01
5.972e+24
6e+24
"""