Libraries for high performance, low footprint graph queries? -


I am very close to implementing myself, but before I still want to know that this wheel is already Has been invented only: What do I need is a library that allows me to represent a DAG (guided graphic graphic graphic) and which is allowed to interrogate directly or indirectly connected nodes with very high performance Will give I have compared two approaches so far.

This graph will be around 10-20 million edges of millions of nodes in size. Most nodes will have only one or two edges but some thousand nodes can have 10000 edges or more.

There will be a case of use: Actually no matter what the attempt to create a graph, and once it has been created, it does not need to be updated, or the update does not happen fast However, it should be very fast to find specific indirect connections of direct connection or length 2 (an intermediate node), and be able to be able to have labels (eg weight, calculation etc.) on the edges. Hia. In addition, the memory footprint should be small and the query should be thread-safe.

I used to use some standard software packages for this, e.g. Neo4J or relational databases, but for some things, both are very slow: Relational databases grind with a crawl to find indirect relationships, when multiple nodes (huge insert set) are involved. Neo4j handles that situation better, but only the basic speed of finding direct connections is slower than thousands of times in relation to relational database solutions. On the workstation, the results of relational database direct and many indirect queries can be removed in less than 5ms, but some Indirect queries can take up to a minute with Neo4j on the same system, those indirect queries take a few seconds, but direct All queries take up to more than 100 times. I want to be able to get my direct inquiry under one MS and the worst indirect people under 1 second (on average).

I think that when it is done wisely, it can all be represented and in-store with only heap space in few places, and even bigger Even for the graph, the strategies to make these things very fast will be a clever way of clever caching and ways to continue on the disk part of the graph. But I could not find a solution or library (better open source), which would offer this, was I missing anything?

With a tens of millions of nodes and millions of edges, a graph will fit in trivial memory on any desktop computer Created this century I suggest using Fortran-style

  int ia [NVERT + 1]; Int J [NEDG];  

where the edges are sorted from the upper part of the tail, with the tail ia [v] from v to < Code> Ia [v + 1] -1, and ja [e] list e note that it is about 4 (NVERT + NEDGE + 1) take bytes of memory, which is "much less than a few hours".

The top is simple for each other; You first see the edges going out of the peak. Checking whether a two-way route is also simple from second to second stage; You first find all the top neighbors and check that none of them has an outer edge which is on the other top. It's worst, scan through all your edges, do yourself it is definitely the least code to connect to a database.

A software that takes more than a few milliseconds for any type of query you describe, is usable for this purpose.


Comments

Popular posts from this blog

java - org.apache.http.ProtocolException: Target host is not specified -

java - Gradle dependencies: compile project by relative path -

ruby on rails - Object doesn't support #inspect when used with .include -