python - Access part of a row very quickly in Pandas -
I'm calculating 20 billion, and it has come to know that the slow phase of the two orders of magnitude is only in the relevant panda dataframe Rows
% timeit x = query_results.ix [i] 10000 loop, best 3: 155 μs per loop
How do we get that speed from one Can leave or two orders of magnitude?
200,000 rows and 11 columns in the Detafrem, to float all the strings strings moving the hard speed reaches a certain length values of different length (which is not feasible in terms of the use ) Will drop about half the pace.
Editing for more context: It is almost a matter of full use to use the suggestion of Brainburn: ix
instead of IoL
note that we Use only two rows at a time Large number of calculations are computed by every second line (200,000 ^ 2/2).
test = pd.DataFrame (index = array (200000), columns = array (11)) test.ix [,,:] = 'asdfasdf' i = 0 j = 1% timeit X = set (test.iloc [i]). Intersection (test.iloc [j]) 1000 loops, best 3: 235 μs per loop
It would be great if it could be more like number 5μs
side note, as an example, each μs Why counts: need to delete does not actually have the data in each cell, so I still have resulted in missing values of data ( nan
) , Which will take more μs test.iloc [i] .dropna ()
something like, very slow for these purposes Is there. [22]: tx-out [22]: array ([['' ' Asdfksdf 'Asdfksdf' Asdfksdf ', ...,' Asdfksdf 'Asdfksdf' Asdfksdf '], [' Asdfksdf 'Asdfksdf' Asdfksdf ', ...,' Asdfksdf ' ('Asdfasdf', 'asdfasdf', 'asdfasdf', ..., 'asdfasdf', 'asdfasdf', 'asdfasdf'], ..., ['asdfasdf', 'asdfasdf' "Asdfksdf ', ...,' Asdfksdf 'Asdfksdf' Asdfksdf '], [' Asdfksdf 'Asdfksdf' Asdfksdf ', ...,' Asdfksdf 'Asdfksdf' Asdfksdf ' ], ['Asdfasdf', 'asdfasdf', 'asdfasdf' ..., 'asdfasdf', 'asdfasdf', 'asdfasdf']], dtype = in the object [23]:% timeit x = set (tx [i] ). Intersection (tx [j]) 100000 loops, best 3: 1.99 μs per loop
Comments
Post a Comment