python - Adding a calculated column to pandas dataframe -
I am completely new to Python, Pandas and programming in general, and I can not understand the following :
I have reached a database with the help of Pandas and I have put the data in the data frame, DF. There are birthdays in one of the columns, in which the following forms can be: - 01/25/1980 (string) - 01/25 (string) - none (no type)
Now, I A new column DF, which stores the age of people in the database. So I have done the following:
def addAge (df): today = date.today () df ["age"] = None for index, line in df.iterrows ( =) None: if line ["birthday"] = none: if lane (line ["birthday"]) == 10: birthday = df ["birthday"] birthday day = dated date (int (birthday [6:]), int (Birthday) [: 2]), int (birthday [3: 5])) line ["age"] = today Year - birthday day. Year - ((today, month, today.) <(Birthday day, birthday day prints "[birthday]], line [" age "] # This addAge (df) print df only
< For the test [row] line row ["birthday"], line ["age"] correctly prints birthdays and ages but when I call print DF, at the age of the column Always have "none" Can you tell me what I'm doing wrong? Thanks!
When you call iterrows ()
you are not receiving the copies of each row and the large dataframe can not be returned generally, you can Instead of climbing rows, you should try using the vector methods.
For example, in this example, to parse the 'Birthday' column, you can do something like: Rows Length is 10, Parse parsed periodically Aaga, otherwise it will be filled with a missing value.
NP imports Panda PD DF as import [ 'birthday'] = Np.where (DF [ 'birthday']. Str.len () == 10, PDto_dateim (DF ['Birthday']), '')
By age, you use .apply
Which can apply the function to each row of a series.
So if you have wrapped in your age calculation function:
def calculation_age (date of birth, today): if pd.isnull (date of birthdate): return np.nan else: Return today Year - birthday day Year - ((Today, Today, Today) & lt; (Birthday Date, Birthday Day.))
Then, you can calculate Age Period like this :
today = date.today () df ['age'] = df ['birthday']. Applied (lambda x: calculate_age (x, today))
Comments
Post a Comment