python - Pandas - GroupBy and then Merge on original table -
I am trying to write a function and I am trying to execute various data calculations on dataframes in pandas and Then the original dataframe to merge it though, I'm running on issues. This code is equivalent in SQL:
select EID, PCODE, SUM (pvalue) as pvalue, SUM (SQRT (Scheduled Caste * exp (scheduled-1)) scheduled Join ASC, SUM (EE) AS EE in Foo_bar_grp, and then join the original table from ESI, PCOD, as caste, yoga (SI) SELECT * FROM Foo_bar_grp INNER foo_bar Join foo_bar.EID = foo_bar_grp.EID and foo_bar.PCODE = foo_bar_grp.PCODE
Here are the steps: Data Load IN: >>
pol_dict = {'P '[' GU ',' GR ',' GU ',' GR '],' pvalue ': [1,1,2,2],' EID ': [123,123,123,123],' PCODE ': [100, 50,150,300], 'SI': [400,40,140,140], 'Scheduled Caste': [230,23,213,213], 'E': [10000,10000,2000,30000],} pol_df = DataFrame (pol_dict) pol_df
Out: >>
EIDEE PCODE PID pvalue SC SI 123 123 10000 GU 1 100 230 400 1 123 10000 GR 1 50 23 40 2 123 2000 GU 02 150 213 140 3 123 30000 GR 2 300 213 140
Step 2: Counting and grouping on data:
My panda code The following is:
#create aggregation dataframe poagg_df = pol_df del poagg_df ['PID'] po_grouped_df = poagg_df.groupby (['eid', ' SCIO ']) #ANC level total acc_df = po_grouped_df.agg ({' PVALUE ': NP SUM,' SI ': Lambda X: NP SCRT (NP SOM (X * NP.X (X-1)),' SC ' : Np.sum, 'EE': np.sum})
Unless I want to join the original table works fine:
IN:> >
po_account_df = pd.merge (acc_df, po_df, on = ['EID', 'PCOAD'], how? 'Inner', suffix = ('_ac', '_po' ))
Out: >> KEYER: There is no object named EID
For some reason, grouped dataframe The Ul can not connect back to the table. I have looked at ways to column tried to convert the actual column through the column, but it does not work
Please note that the ultimate goal of each column (Piviaraan, SI, SC, EE ) Has to be able to achieve the percentage I:
pol_acc_df ['PVALUE_PCT'] = Np.round (pol_acc_df.PVALUE_Po / pol_acc_df.PVALUE_Acc, 4)
Thanks!
By default, Group
output by the group as the column indicies There are no columns, so the merge is failing.
There are some different ways to handle it, perhaps the easiest when you define an object by group, using the as_index
parameter.
po_grouped_df = poagg_df.groupby ([ 'EID', 'PCODE'], as_index = false)
Then, work your merge expected
in [356]:. [Id = " Piviuaruarudiiiiipio Piviuaryuelopo \ 0123 gR 236 40000 1.805222 e + 31 350 10000 50 1 123 gr 236 40000 1.805222e + 31 350 30000 300 2 123 GU 443 12000 8.765549e + 87 250 10000 100 3 123 GU 443 12000 8.765549e + 87 250 2000 150 SCOPO Sio 0 23 40 1 213 140 2 230 400 3 213 140
Comments
Post a Comment