Opened 12 years ago

Closed 11 years ago

Last modified 11 years ago

#21 closed enhancement (fixed)

crawler: add support to re-crawl zones

Reported by: wander Owned by:
Priority: important Component: crawler
Keywords: Cc:

Description

When a zone has been completely crawled, we should re-crawl it from time to time to find new NSEC3 hashes. Unfortunately we have to send all n queries again to the server but at least we already have the query names and do not need to do BF-range computation.

Attachments (2)

vc-20130506.txt (4.7 KB) - added by wander 12 years ago.
vc-historic.txt (2.4 KB) - added by wander 12 years ago.

Download all attachments as: .zip

Change History (9)

comment:1 Changed 12 years ago by wander

Component: nsec3breakercrawler

Changed 12 years ago by wander

Attachment: vc-20130506.txt added

Changed 12 years ago by wander

Attachment: vc-historic.txt added

comment:2 Changed 12 years ago by wander

See the attached dump files of vc. today and vc. from some point in the past. Both seem to be complete for the crawler without re-crawling feature.

comment:3 Changed 12 years ago by wander

sort by serial

comment:4 Changed 12 years ago by wander

(In [462]) write flags (OPTOUT) and types (in text presentation format) for each NSEC into database

closes #28

note: all currently existing NSEC3 records have these fields empty, need to implement re-crawl feature (references #21)

comment:5 Changed 11 years ago by wander

Resolution: fixed
Status: newclosed

(In [1017]) added re-crawl feature, closes #21

design choices:

  • when using re-crawl, all current NSEC entries are marked as obsolete, thus duplicating all database entries if nothing has changed
  • all obsolete NSEC entries are hashed at *EVERY STARTUP* with non-optimized Python code to identify gap cutters without brute-force (TODO: we should either move this into OpenCL or at least make this a CLI argument)

comment:6 Changed 11 years ago by wander

(In [1018]) commit during re-crawl, references #21

comment:7 Changed 11 years ago by wander

(In [1019]) withhold named cursor after commit, references #21

Note: See TracTickets for help on using tickets.