벡터 병합 알고리즘 도움말 (Help with algorithm for merging vectors)

문제 설명

다음 작업에 매우 빠른 알고리즘이 필요합니다. 나는 이미 그것을 완성하는 여러 알고리즘을 구현했지만 내가 필요한 성능에 비해 너무 느립니다. 알고리즘이 최신 CPU에서 초당 100,000번 이상 실행될 수 있을 만큼 충분히 빨라야 합니다. C++로 구현될 것입니다.

저는 한 줄에 시작과 끝 좌표가 있는 구조인 범위/범위로 작업하고 있습니다.

두 개의 벡터(동적 배열 ) 범위를 병합해야 합니다. 하나의 벡터는 src이고 다른 하나는 dst입니다. 벡터는 범위 시작 좌표에 따라 정렬되며 범위는 한 벡터 내에서 겹치지 않습니다.

src 벡터의 범위는 결과 벡터가 그대로 유지되도록 dst 벡터의 범위와 병합되어야 합니다. 정렬되고 중복되지 않습니다. 즉. 병합 중에 겹침이 감지되면 두 범위가 하나로 병합됩니다. (두 범위를 병합하는 것은 구조의 좌표를 변경하는 문제일 뿐입니다.)

이제 한 가지 더 주의할 점이 있습니다. 병합하는 동안 src 벡터의 범위를 "확대"해야 합니다. 이것은 상수가 src의 모든 범위의 시작 좌표에 추가되고 다른 (더 큰) 상수가 끝 좌표에 추가됨을 의미합니다. 이것은 src 스팬이 확장된 후에 겹칠 수 있음을 의미합니다.

지금까지 내가 도달한 것은 완전히 제자리에서 수행할 수 없으며 일종의 임시 저장소가 필요하다는 것입니다. src 및 dst 합산된 요소 수에 대해 선형 시간에 수행할 수 있어야 한다고 생각합니다.

임시 저장소는 알고리즘의 여러 실행 간에 공유할 수 있습니다.

내가 시도한 두 가지 기본 접근 방식은 너무 느리지만 다음과 같습니다.

src의 모든 요소를 dst에 추가하고 각 요소를 추가하기 전에 확장합니다. 그런 다음 내부 정렬을 실행합니다. 마지막으로 "읽기" 및 "쓰기" 포인터를 사용하여 결과 벡터를 반복합니다. 읽기 포인터는 쓰기 포인터보다 앞서 실행하고 스팬을 병합합니다. 모든 요소가 병합되면(읽기 포인터가 끝에 도달) dst가 잘립니다.
임시 작업 벡터를 만듭니다. src 또는 dst에서 다음 요소를 반복적으로 선택하고 작업 벡터에 병합하여 위에서 설명한 대로 순진한 병합을 수행합니다. 완료되면 작업 벡터를 dst에 복사하여 교체합니다.

첫 번째 방법은 정렬이 O(m+n)이 아닌 O((m+n)*log(m+n))이고 약간의 오버헤드가 있다는 문제가 있습니다. 이것은 또한 dst 벡터가 실제 필요한 것보다 훨씬 더 커야 함을 의미합니다.

두 번째는 메모리의 할당/해제를 반복하고 복사하는 많은 일차적 문제가 있습니다.

스팬/벡터를 저장/관리하는 데 사용되는 데이터 구조는 필요하다고 생각되면 변경할 수 있습니다.

업데이트: 데이터 세트의 크기를 말하는 것을 잊었습니다. 가장 일반적인 경우는 두 벡터의 요소가 4~30개이고 dst가 비어 있거나 src와 dst의 범위가 많이 겹칩니다.

두 번째는 메모리 할당/해제를 반복적으로 반복하는 많은 복사의 주요 문제가 있습니다.

스팬/벡터를 저장/관리하는 데 사용되는 데이터 구조는 필요하다고 생각되면 변경할 수 있습니다.

두 번째는 메모리 할당/해제를 반복적으로 반복하는 많은 복사의 주요 문제가 있습니다.

스팬/벡터를 저장/관리하는 데 사용되는 데이터 구조는 필요하다고 생각되면 변경할 수 있습니다.

참조 솔루션

방법 1:

We know that the absolute best case runtime is O(m+n) this is due to the fact that you at least have to scan over all of the data in order to be able to merge the lists. Given this, your second method should give you that type of behavior.

Have you profiled your second method to find out what the bottlenecks are? It is quite possible that, depending on the amount of data you are talking about it is actually impossible to do what you want in the specified amount of time. One way to verify this is to do something simple like sum up all the start and end values of the spans in each vector in a loop, and time that. Basically here you are doing a minimal amount of work for each element in the vectors. This will provide you with a baseline for the best performance you can expect to get.

Beyond that you can avoid copying the vectors element by element by using the stl swap method, and you can preallocate the temp vector to a certain size in order to avoid triggering the expansion of the array when you are merging the elements.

You might consider using 2 vectors in your system and whenever you need to do a merge you merge into the unused vector, and then swap (this is similar to double buffering used in graphics). This way you don't have to reallocate the vectors every time you do the merge.

However, you are best off profiling first and finding out what your bottleneck is. If the allocations are minimal compared to the actual merging process than you need to figure out how to make that faster.

Some possible additional speedups could come from accessing the vectors raw data directly which avoids the bounds checks on each access the data.

방법 2:

How about the second method without repeated allocation‑‑in other words, allocate your temporary vector once, and never allocate it again? Or, if the input vectors are small enough (But not constant size), just use alloca instead of malloc.

Also, in terms of speed, you may want to make sure that your code is using CMOV for the sorting, since if the code is actually branching for every single iteration of the mergesort:

if(src1[x] < src2[x])
    dst[x] = src1[x];
else
    dst[x] = src2[x];

The branch prediction will fail 50% of the time, which will have an enormous hit on performance. A conditional move will likely do much much better, so make sure the compiler is doing that, and if not, try to coax it into doing so.

방법 3:

The sort you mention in Approach 1 can be reduced to linear time (from log‑linear as you describe it) because the two input lists are already sorted. Just perform the merge step of merge‑sort. With an appropriate representation for the input span vectors (for example singly‑linked lists) this can be done in‑place.

http://en.wikipedia.org/wiki/Merge_sort

방법 4:

i don't think a strictly linear solution is possible, because widening the src vector spans may in the worst‑case cause all of them to overlap (depending on the magnitude of the constant that you are adding)

the problem may be in the implementation, not in the algorithm; i would suggest profiling the code for your prior solutions to see where the time is spent

reasoning:

for a truly "modern" CPU like the Intel Core 2 Extreme QX9770 running at 3.2GHz, one can expect about 59,455 MIPS

for 100,000 vectors, you would have to process each vector in 594,550 instuctions. That's a LOT of instructions.

ref: wikipedia MIPS

in addition, note that adding a constant to the src vector spans does not de‑sort them, so you can normalize the src vector spans independently, then merge them with the dst vector spans; this should reduce the workload of your original algorithm

방법 5:

1 is right out ‑ a full sort is slower than merging two sorted lists.

So you're looking at tweaking 2 (or something entirely new).

If you change the data structures to doubly linked lists, then you can merge them in constant working space.

Use a fixed‑size heap allocator for the list nodes, both to reduce memory use per node and to improve the chance that the nodes are close together in memory, reducing page misses.

You might be able to find code online or in your favourite algorithm book to optimise a linked list merge. You'll want to customise this in order to do span coalescing at the same time as the list merge.

To optimise the merge, first note that for each run of values coming off the same side without one coming from the other side, you can insert the whole run into the dst list in one go, instead of inserting each node in turn. And you can save one write per insertion over a normal list operation, by leaving the end "dangling", knowing that you'll patch it up later. And provided that you don't do deletions anywhere else in your app, the list can be singly‑linked, which means one write per node.

As for 10 microsecond runtime ‑ kind of depends on n and m...

(by jfs、Ben Childs、Dark Shikari、jdb、Steven A. Lowe、Steve Jessop)

참조 문서

Help with algorithm for merging vectors (CC BY‑SA 2.5/3.0/4.0)

벡터 병합 알고리즘 도움말 (Help with algorithm for merging vectors)

문제 설명

참조 솔루션

방법 1:

방법 2:

방법 3:

방법 4:

방법 5:

참조 문서

관련 질문

코멘트

벡터 병합 알고리즘 도움말 (Help with algorithm for merging vectors)

문제 설명

참조 솔루션

방법 1:

방법 2:

방법 3:

방법 4:

방법 5:

참조 문서

관련 질문

세 개의 작은 숫자를 하나의 두 배에 저장하는 방법은 무엇입니까? (How to store three small numbers into one double?)

중첩 최적화에서 솔루션을 반환하는 방법은 무엇입니까? (How to return solutions in a nested optimization?)

C에서 "signed int"가 "unsigned int"보다 빠른 이유는 무엇입니까? (In C, why is "signed int" faster than "unsigned int"?)

값이 없는 경우 HTML5 로컬 저장소 개체는 무엇을 반환합니까? (What does the HTML5 Local Storage object return if there is no value?)

200K 게시물이 있는 Wordpress 사이트는 JOIN이 느린 SQL_CALC_FOUND_ROWS를 게시합니까? 속도 최적화? (Wordpress site with 200K posts SQL_CALC_FOUND_ROWS with JOINs slow? Optimize for speed?)

xlx의 각 시트에서 특정 셀을 추출하기 위해 이 R 코드를 개선하는 데 도움이 필요합니다. (I need help to improve this R code to extract specific cell from each sheet in xlxs)

이 for 루프에서 fminunc가 매번 동일한 결과를 제공하는 이유는 무엇입니까? (Why is fminunc in this for loop giving the same result each time?)

정수 스칼라 배열만 Hyperopt에서 스칼라 인덱스 오류로 변환될 수 있습니다. (only integer scalar arrays can be converted to a scalar index error in Hyperopt)

벡터 병합 알고리즘 도움말 (Help with algorithm for merging vectors)

이 MIPS 프로그램이 이렇게 오래 걸리나요? (파이에 가까운 프로그램) (Is this MIPS program supposed to take this long? (program that approximates pi))

힘을 최소화하기 위한 최적의 궤적, 최종 조건 문제 (Optimal trajectory to minimize force, issues with final conditions)

단일 svg 요소를 만들기 위해 여러 모양 병합 (Merging number of shapes to make a single svg element)

코멘트