Python: optimizing pairwise overlaps between intervals -


I have a lot of interval (about 5k to 10k). The beginning and the end position of these elements is; Such as (203, 405) coordinates of intervals are deposited in a list.

I want to determine the coordinates and length of the between each pair overlapping parts of the intervals it can be done as follows:

  #pecification For a small list, usually around 5000 = length (= (20, 54), (25, 48), (67, 133) (9, 152), (140,211), (19230)), C1 in enumerate (CLIIS [: - 1]): # Linear pairing for C2 in # CLIS [i + 1:]: left = maximum (C1 [0], C2 [0]) = right (C1 [1], C2 [1]) overlap = Overlap if right-left & gt; 0: Results in the "left:% s, right:% s, length:% s"% (left, right, overlap)   

:

  Left: 25, Right: 48, Length: 23 left: 90, right: 133, Length: 43 Left side: 140, Right: 152, Length: 12 left: 190, Right: 211, Length: 21   

As can be seen, it works ... because it can take some time (20 seconds) I have a question, how do I optimize it? I tried to make another cut for the loop when the initial position of the second loop exceeds the end position:

  if c1 [1]   

This process reduces time, but the resulting number of overlaps is almost three times less than before, and as a result it is definitely not valid . This is due to elements that are much higher in length than the preceding elements.

I'm sure there is some mathematical move to solve this problem

If the algorithm you described can be written then:

 for  I, c1 enumerate (cLIIS [ : - 1]): CLIS [i + 1:]: o = Overlap (C1, C2) if it is not O: Printing "left:% s, true:% s, length:% s"% o   

If you sort elements, as soon as you get a non-overlapping segment You can "short-circuit" because you know it further, the list will be "far away":

  l = sorted (cList) for i, c1 e In the numeric (L [: - 1]): L for [i + 1:]: o = overlap (c1, c2) if o is none: skip print "left:% s, right:% S, length:% Definitely, if your input is already sorted (as it seems), you can skip that step.  

Note, in general, Instead of double To use for , you can use very clear it guarantees the same kind of order Unfortunately, this is not suitable for the optimized version of the algorithm, but it is written by you Can be done combinations from itertools to c1, in c2 combinations (cList, 2): o = overlap (C1, C2) if not, then none Is: print "left:% s, true:% s, length:% s"% o

Finally, if you have ever Want to perform on the fly at intervals normal , you can also consider using the data structure. Is there.

Comments

Popular posts from this blog

Pass DB Connection parameters to a Kettle a.k.a PDI table Input step dynamically from Excel -

multithreading - PhantomJS-Node in a for Loop -

c++ - MATLAB .m file to .mex file using Matlab Compiler -