data.table - Finding sequences in rows in R based on the rep function on a certain column -
I'm trying to find a sequence of 0 in a row The representative work of the column below is my best effort which throws an error. I tried to use an application loop, but badly failed and I really do not want to use the loop, unless my true dataset I do not have nearly 800,000 lines The minister tried, but could not find anything and spent a few hours on this and no luck. I have also attached the desired output.
Library (DataW) TEST_DF & lt; - data.table (INDEX = c (1,2,3,4), COL_1 = C (0,0,0,0), COL_2 = c (0,0,2,5), COL_3 = c (0, 0,0,0), COL_4 = c (0,2,0,1), day = C (4,4,2,2)) IN_FUN < - Function (x, y) {x & lt; - rle (x) if (max (as.numeric (x $ length [x $ value == 0]) & gt; = y) {"y"} and {"n"}} TEST_DF $ DEFINITION & lt; - Apply (TEST_DF [, c (2: 5), = FALSE], 1, FUN = IN_FUN (TEST_DF) [, C = 2] 0,0,0), COL_2 = c (0,0,2,5) , COL_3 = c (0,0,0,0), COL_4 = c (0,2,0,1), day = C (4,4,2,2). Definition = C ("Y", "N" For the first row I want to see that COL_14 of 4 0 Is in COL_4, 0 within rows 0 and rows 3 and 4 within two is 0. Originally 0 is given by the value in the DAYS column. Four lines are within 1, the definition is given as the value of "y", line 2 gets the value of "n" because the row 4 of three 0 should be given the value of "y" because it is two, Etc. In addition, if possible, if the definition column has the value of "Y", then it should return the column indicator of the first event of the desired sequence, e.g. Since the first incident of 0, we are in COL_1, we have 2 columns for the INDEX column. Neither should receive and the line 2 NA because the definition "N", and so on.
Feel free to make any edits to make it clear to other users and tell me that you need better information.
Cheers in advance :)
Edit: Below is a slightly expanded data table. Let me know that this is enough. TEST_DF & lt; - Data is qualified (P_ID = c (1,2,3,4,5,6,7,8,10), COL_1 = C (0,0,0,0,0,0,0,5,5,5,90 ), COL_2 = c (0,0,0,0,0,3,3,6,6,6), COL_3 = c (0, 0,0,0,0,0,7,5,0 ), COL_4 = c (0,0,0,0,0,5,0,2,0,0), COL_5 = c (0,0,0, 0,07,2,0,0), COL_6 = C (0,0,0,0,0,9,0,0,5,5), COL_7 = c (0,0,0,0,0,0, 1,0,0,6), COL_8 = C (0,0,0,0,0,0,0,1,1,8), COL_9 = c (0,0,0,0,1,1,6, 1.0), COL_10 = C (0,0,0,0,0,0,7,1,0), COL_11 = c (0,0,0,0,0,0,8,3,0), COL_12 = c (0, 0,0,0,0,0,9,6,7), day = C (10, 8, 12, 4, 4, 4, 4, 4, 7))
Where the definition column for rows will be C (1,1,1,1,1,0,1,0,0), where 1 is "Y" and 0 is "N". Either okay.
In the new edit, there should be value C (2,2,2,2,2, NA, 7, NA, NA) for the index column.
It was able to do this with some math inducement. I have created a binary matrix where one element is 1 if it was originally 0 and 0 otherwise. Then, for each line I set the nth element in the row of nth element (n -1th element + nth element). In this change matrix, the value of an element is consistently the same as the number of elements which used to contain 0 (this element was included).
m gt; = m [, ncol (m)], 1, function (x) match (TRUE, x) TEST_DF $ DEFINITION & lt; -ifels (is.na (indx), 0,1) TEST_DF $ INDEX & lt; -indx-TEST_DF $ DAYS + 2
Note: I stole some stuff from
Comments
Post a Comment